Literature DB >> 10366660

Automated genome sequence analysis and annotation.

M A Andrade1, N P Brown, C Leroy, S Hoersch, A de Daruvar, C Reich, A Franchini, J Tamames, A Valencia, C Ouzounis, C Sander.   

Abstract

MOTIVATION: Large-scale genome projects generate a rapidly increasing number of sequences, most of them biochemically uncharacterized. Research in bioinformatics contributes to the development of methods for the computational characterization of these sequences. However, the installation and application of these methods require experience and are time consuming.
RESULTS: We present here an automatic system for preliminary functional annotation of protein sequences that has been applied to the analysis of sets of sequences from complete genomes, both to refine overall performance and to make new discoveries comparable to those made by human experts. The GeneQuiz system includes a Web-based browser that allows examination of the evidence leading to an automatic annotation and offers additional information, views of the results, and links to biological databases that complement the automatic analysis. System structure and operating principles concerning the use of multiple sequence databases, underlying sequence analysis tools, lexical analyses of database annotations and decision criteria for functional assignments are detailed. The system makes automatic quality assessments of results based on prior experience with the underlying sequence analysis tools; overall error rates in functional assignment are estimated at 2.5-5% for cases annotated with highest reliability ('clear' cases). Sources of over-interpretation of results are discussed with proposals for improvement. A conservative definition for reporting 'new findings' that takes account of database maturity is presented along with examples of possible kinds of discoveries (new function, family and superfamily) made by the system. System performance in relation to sequence database coverage, database dynamics and database search methods is analysed, demonstrating the inherent advantages of an integrated automatic approach using multiple databases and search methods applied in an objective and repeatable manner. AVAILABILITY: The GeneQuiz system is publicly available for analysis of protein sequences through a Web server at http://www.sander.ebi.ac. uk/gqsrv/submit

Entities:  

Mesh:

Substances:

Year:  1999        PMID: 10366660     DOI: 10.1093/bioinformatics/15.5.391

Source DB:  PubMed          Journal:  Bioinformatics        ISSN: 1367-4803            Impact factor:   6.937


  56 in total

1.  WIT: integrated system for high-throughput genome sequence analysis and metabolic reconstruction.

Authors:  R Overbeek; N Larsen; G D Pusch; M D'Souza; E Selkov; N Kyrpides; M Fonstein; N Maltsev; E Selkov
Journal:  Nucleic Acids Res       Date:  2000-01-01       Impact factor: 16.971

2.  Gene ontology: tool for the unification of biology. The Gene Ontology Consortium.

Authors:  M Ashburner; C A Ball; J A Blake; D Botstein; H Butler; J M Cherry; A P Davis; K Dolinski; S S Dwight; J T Eppig; M A Harris; D P Hill; L Issel-Tarver; A Kasarskis; S Lewis; J C Matese; J E Richardson; M Ringwald; G M Rubin; G Sherlock
Journal:  Nat Genet       Date:  2000-05       Impact factor: 38.330

3.  Analysis of the Thermotoga maritima genome combining a variety of sequence similarity and genome context tools.

Authors:  N C Kyrpides; C A Ouzounis; I Iliopoulos; V Vonstein; R Overbeek
Journal:  Nucleic Acids Res       Date:  2000-11-15       Impact factor: 16.971

4.  Motif-based fold assignment.

Authors:  L Salwinski; D Eisenberg
Journal:  Protein Sci       Date:  2001-12       Impact factor: 6.725

5.  Predictome: a database of putative functional links between proteins.

Authors:  Joseph C Mellor; Itai Yanai; Karl H Clodfelter; Julian Mintseris; Charles DeLisi
Journal:  Nucleic Acids Res       Date:  2002-01-01       Impact factor: 16.971

6.  GTOP: a database of protein structures predicted from genome sequences.

Authors:  Takeshi Kawabata; Satoshi Fukuchi; Keiichi Homma; Motonori Ota; Jiro Araki; Takehiko Ito; Nobuyuki Ichiyoshi; Ken Nishikawa
Journal:  Nucleic Acids Res       Date:  2002-01-01       Impact factor: 16.971

7.  Search and retrieve. Large-scale data generation is becoming increasingly important in biological research. But how good are the tools to make sense of the data?

Authors:  Alfonso Valencia
Journal:  EMBO Rep       Date:  2002-05       Impact factor: 8.807

8.  GenDB--an open source genome annotation system for prokaryote genomes.

Authors:  Folker Meyer; Alexander Goesmann; Alice C McHardy; Daniela Bartels; Thomas Bekel; Jörn Clausen; Jörn Kalinowski; Burkhard Linke; Oliver Rupp; Robert Giegerich; Alfred Pühler
Journal:  Nucleic Acids Res       Date:  2003-04-15       Impact factor: 16.971

9.  Analysis of protein sequence/structure similarity relationships.

Authors:  Hin Hark Gan; Rebecca A Perlow; Sharmili Roy; Joy Ko; Min Wu; Jing Huang; Shixiang Yan; Angelo Nicoletta; Jonathan Vafai; Ding Sun; Lihua Wang; Joyce E Noah; Samuela Pasquali; Tamar Schlick
Journal:  Biophys J       Date:  2002-11       Impact factor: 4.033

10.  PipeOnline 2.0: automated EST processing and functional data sorting.

Authors:  Patricia Ayoubi; Xiaojing Jin; Saul Leite; Xianghui Liu; Jeson Martajaja; Abdurashid Abduraham; Qiaolan Wan; Wei Yan; Eduardo Misawa; Rolf A Prade
Journal:  Nucleic Acids Res       Date:  2002-11-01       Impact factor: 16.971

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.