Literature DB >> 12819153

Development and evaluation of an automated annotation pipeline and cDNA annotation system.

Takeya Kasukawa1, Masaaki Furuno, Itoshi Nikaido, Hidemasa Bono, David A Hume, Carol Bult, David P Hill, Richard Baldarelli, Julian Gough, Alexander Kanapin, Hideo Matsuda, Lynn M Schriml, Yoshihide Hayashizaki, Yasushi Okazaki, John Quackenbush.   

Abstract

Manual curation has long been held to be the "gold standard" for functional annotation of DNA sequence. Our experience with the annotation of more than 20,000 full-length cDNA sequences revealed problems with this approach, including inaccurate and inconsistent assignment of gene names, as well as many good assignments that were difficult to reproduce using only computational methods. For the FANTOM2 annotation of more than 60,000 cDNA clones, we developed a number of methods and tools to circumvent some of these problems, including an automated annotation pipeline that provides high-quality preliminary annotation for each sequence by introducing an "uninformative filter" that eliminates uninformative annotations, controlled vocabularies to accurately reflect both the functional assignments and the evidence supporting them, and a highly refined, Web-based manual annotation tool that allows users to view a wide array of sequence analyses and to assign gene names and putative functions using a consistent nomenclature. The ultimate utility of our approach is reflected in the low rate of reassignment of automated assignments by manual curation. Based on these results, we propose a new standard for large-scale annotation, in which the initial automated annotations are manually investigated and then computational methods are iteratively modified and improved based on the results of manual curation.

Mesh:

Substances:

Year:  2003        PMID: 12819153      PMCID: PMC403710          DOI: 10.1101/gr.992803

Source DB:  PubMed          Journal:  Genome Res        ISSN: 1088-9051            Impact factor:   9.043


  20 in total

1.  The InterPro database, an integrated documentation resource for protein families, domains and functional sites.

Authors:  R Apweiler; T K Attwood; A Bairoch; A Bateman; E Birney; M Biswas; P Bucher; L Cerutti; F Corpet; M D Croning; R Durbin; L Falquet; W Fleischmann; J Gouzy; H Hermjakob; N Hulo; I Jonassen; D Kahn; A Kanapin; Y Karavidopoulou; R Lopez; B Marx; N J Mulder; T M Oinn; M Pagni; F Servant; C J Sigrist; E M Zdobnov
Journal:  Nucleic Acids Res       Date:  2001-01-01       Impact factor: 16.971

2.  RefSeq and LocusLink: NCBI gene-centered resources.

Authors:  K D Pruitt; D R Maglott
Journal:  Nucleic Acids Res       Date:  2001-01-01       Impact factor: 16.971

3.  Viva la revolution! A report from the FANTOM meeting.

Authors:  J Quackenbush
Journal:  Nat Genet       Date:  2000-11       Impact factor: 38.330

4.  The Protein Information Resource: an integrated public resource of functional annotation of proteins.

Authors:  Cathy H Wu; Hongzhan Huang; Leslie Arminski; Jorge Castro-Alvear; Yongxing Chen; Zhang-Zhi Hu; Robert S Ledley; Kali C Lewis; Hans-Werner Mewes; Bruce C Orcutt; Baris E Suzek; Akira Tsugita; C R Vinayaka; Lai-Su L Yeh; Jian Zhang; Winona C Barker
Journal:  Nucleic Acids Res       Date:  2002-01-01       Impact factor: 16.971

5.  Assignment of homology to genome sequences using a library of hidden Markov models that represent all proteins of known structure.

Authors:  J Gough; K Karplus; R Hughey; C Chothia
Journal:  J Mol Biol       Date:  2001-11-02       Impact factor: 5.469

6.  Creating the gene ontology resource: design and implementation.

Authors: 
Journal:  Genome Res       Date:  2001-08       Impact factor: 9.043

7.  The SWISS-PROT protein sequence database and its supplement TrEMBL in 2000.

Authors:  A Bairoch; R Apweiler
Journal:  Nucleic Acids Res       Date:  2000-01-01       Impact factor: 16.971

8.  The Mouse Genome Database (MGD): the model organism database for the laboratory mouse.

Authors:  Judith A Blake; Joel E Richardson; Carol J Bult; Jim A Kadin; Janan T Eppig
Journal:  Nucleic Acids Res       Date:  2002-01-01       Impact factor: 16.971

9.  Consed: a graphical tool for sequence finishing.

Authors:  D Gordon; C Abajian; P Green
Journal:  Genome Res       Date:  1998-03       Impact factor: 9.043

10.  The TIGR Gene Indices: analysis of gene transcript sequences in highly sampled eukaryotic species.

Authors:  J Quackenbush; J Cho; D Lee; F Liang; I Holt; S Karamycheva; B Parvizi; G Pertea; R Sultana; J White
Journal:  Nucleic Acids Res       Date:  2001-01-01       Impact factor: 16.971

View more
  13 in total

1.  The mouse secretome: functional classification of the proteins secreted into the extracellular environment.

Authors:  Sean M Grimmond; Kevin C Miranda; Zheng Yuan; Melissa J Davis; David A Hume; Ken Yagi; Naoko Tominaga; Hidemasa Bono; Yoshihide Hayashizaki; Yasushi Okazaki; Rohan D Teasdale
Journal:  Genome Res       Date:  2003-06       Impact factor: 9.043

2.  CDS annotation in full-length cDNA sequence.

Authors:  Masaaki Furuno; Takeya Kasukawa; Rintaro Saito; Jun Adachi; Harukazu Suzuki; Richard Baldarelli; Yoshihide Hayashizaki; Yasushi Okazaki
Journal:  Genome Res       Date:  2003-06       Impact factor: 9.043

3.  EICO (Expression-based Imprint Candidate Organizer): finding disease-related imprinted genes.

Authors:  Itoshi Nikaido; Chika Saito; Akiko Wakamoto; Yasuhiro Tomaru; Takahiro Arakawa; Yoshihide Hayashizaki; Yasushi Okazaki
Journal:  Nucleic Acids Res       Date:  2004-01-01       Impact factor: 16.971

4.  The Mouse Genome Database (MGD): integrating biology with the genome.

Authors:  Carol J Bult; Judith A Blake; Joel E Richardson; James A Kadin; Janan T Eppig; R M Baldarelli; K Barsanti; M Baya; J S Beal; W J Boddy; D W Bradt; D L Burkart; N E Butler; J Campbell; R Corey; L E Corbani; S Cousins; H Dene; H J Drabkin; K Frazer; D M Garippa; L H Glass; C W Goldsmith; P L Grant; B L King; M Lennon-Pierce; J Lewis; I Lu; C M Lutz; L J Maltais; L M McKenzie; D Miers; D Modrusan; L Ni; J E Ormsby; D Qi; S Ramachandran; T B K Reddy; D J Reed; R Sinclair; D R Shaw; C L Smith; P Szauter; B Taylor; P Vanden Borre; M Walker; L Washburn; I Witham; J Winslow; Y Zhu
Journal:  Nucleic Acids Res       Date:  2004-01-01       Impact factor: 16.971

5.  SSHscreen and SSHdb, generic software for microarray based gene discovery: application to the stress response in cowpea.

Authors:  Nanette Coetzer; Inge Gazendam; Dean Oelofse; Dave K Berger
Journal:  Plant Methods       Date:  2010-04-01       Impact factor: 4.993

6.  G protein-coupled receptor genes in the FANTOM2 database.

Authors:  Yuka Kawasawa; Louise M McKenzie; David P Hill; Hidemasa Bono; Masashi Yanagisawa
Journal:  Genome Res       Date:  2003-06       Impact factor: 9.043

7.  Update on human genome completion and annotations: gene nomenclature.

Authors:  Daniel W Nebert; Hester M Wain
Journal:  Hum Genomics       Date:  2003-11       Impact factor: 4.639

8.  Transcript annotation in FANTOM3: mouse gene catalog based on physical cDNAs.

Authors:  Norihiro Maeda; Takeya Kasukawa; Rieko Oyama; Julian Gough; Martin Frith; Pär G Engström; Boris Lenhard; Rajith N Aturaliya; Serge Batalov; Kirk W Beisel; Carol J Bult; Colin F Fletcher; Alistair R R Forrest; Masaaki Furuno; David Hill; Masayoshi Itoh; Mutsumi Kanamori-Katayama; Shintaro Katayama; Masaru Katoh; Tsugumi Kawashima; John Quackenbush; Timothy Ravasi; Brian Z Ring; Kazuhiro Shibata; Koji Sugiura; Yoichi Takenaka; Rohan D Teasdale; Christine A Wells; Yunxia Zhu; Chikatoshi Kai; Jun Kawai; David A Hume; Piero Carninci; Yoshihide Hayashizaki
Journal:  PLoS Genet       Date:  2006-04       Impact factor: 5.917

9.  Social tagging in the life sciences: characterizing a new metadata resource for bioinformatics.

Authors:  Benjamin M Good; Joseph T Tennis; Mark D Wilkinson
Journal:  BMC Bioinformatics       Date:  2009-09-25       Impact factor: 3.169

10.  A semi-automated genome annotation comparison and integration scheme.

Authors:  Zhe Liu; Hongwu Ma; Igor Goryanin
Journal:  BMC Bioinformatics       Date:  2013-06-01       Impact factor: 3.169

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.