Warning: Undefined array key "mm" in /www/wwwroot/www.ai-bt.com/si.php on line 10 Deprecated: trim(): Passing null to parameter #1 ($string) of type string is deprecated in /www/wwwroot/www.ai-bt.com/si.php on line 10 Automated genome sequence analysis and annotation.

Literature DB >> 10366660

Automated genome sequence analysis and annotation.

M A Andrade¹, N P Brown, C Leroy, S Hoersch, A de Daruvar, C Reich, A Franchini, J Tamames, A Valencia, C Ouzounis, C Sander.

Abstract

MOTIVATION: Large-scale genome projects generate a rapidly increasing number of sequences, most of them biochemically uncharacterized. Research in bioinformatics contributes to the development of methods for the computational characterization of these sequences. However, the installation and application of these methods require experience and are time consuming.
RESULTS: We present here an automatic system for preliminary functional annotation of protein sequences that has been applied to the analysis of sets of sequences from complete genomes, both to refine overall performance and to make new discoveries comparable to those made by human experts. The GeneQuiz system includes a Web-based browser that allows examination of the evidence leading to an automatic annotation and offers additional information, views of the results, and links to biological databases that complement the automatic analysis. System structure and operating principles concerning the use of multiple sequence databases, underlying sequence analysis tools, lexical analyses of database annotations and decision criteria for functional assignments are detailed. The system makes automatic quality assessments of results based on prior experience with the underlying sequence analysis tools; overall error rates in functional assignment are estimated at 2.5-5% for cases annotated with highest reliability ('clear' cases). Sources of over-interpretation of results are discussed with proposals for improvement. A conservative definition for reporting 'new findings' that takes account of database maturity is presented along with examples of possible kinds of discoveries (new function, family and superfamily) made by the system. System performance in relation to sequence database coverage, database dynamics and database search methods is analysed, demonstrating the inherent advantages of an integrated automatic approach using multiple databases and search methods applied in an objective and repeatable manner. AVAILABILITY: The GeneQuiz system is publicly available for analysis of protein sequences through a Web server at http://www.sander.ebi.ac. uk/gqsrv/submit

Entities: Species

Mesh：

Substances：
Proteins

Year: 1999 PMID： 10366660 DOI： 10.1093/bioinformatics/15.5.391

Source DB: PubMed Journal: Bioinformatics ISSN： 1367-4803 Impact factor: 6.937

Keyword Cloud
Cited

56 in total

1. WIT: integrated system for high-throughput genome sequence analysis and metabolic reconstruction.

Authors: R Overbeek; N Larsen; G D Pusch; M D'Souza; E Selkov; N Kyrpides; M Fonstein; N Maltsev; E Selkov
Journal: Nucleic Acids Res Date: 2000-01-01 Impact factor: 16.971

2. Gene ontology: tool for the unification of biology. The Gene Ontology Consortium.

Authors: M Ashburner; C A Ball; J A Blake; D Botstein; H Butler; J M Cherry; A P Davis; K Dolinski; S S Dwight; J T Eppig; M A Harris; D P Hill; L Issel-Tarver; A Kasarskis; S Lewis; J C Matese; J E Richardson; M Ringwald; G M Rubin; G Sherlock
Journal: Nat Genet Date: 2000-05 Impact factor: 38.330

3. Analysis of the Thermotoga maritima genome combining a variety of sequence similarity and genome context tools.

Authors: N C Kyrpides; C A Ouzounis; I Iliopoulos; V Vonstein; R Overbeek
Journal: Nucleic Acids Res Date: 2000-11-15 Impact factor: 16.971

4. Motif-based fold assignment.

Authors: L Salwinski; D Eisenberg
Journal: Protein Sci Date: 2001-12 Impact factor: 6.725

5. Predictome: a database of putative functional links between proteins.

Authors: Joseph C Mellor; Itai Yanai; Karl H Clodfelter; Julian Mintseris; Charles DeLisi
Journal: Nucleic Acids Res Date: 2002-01-01 Impact factor: 16.971

6. GTOP: a database of protein structures predicted from genome sequences.

Authors: Takeshi Kawabata; Satoshi Fukuchi; Keiichi Homma; Motonori Ota; Jiro Araki; Takehiko Ito; Nobuyuki Ichiyoshi; Ken Nishikawa
Journal: Nucleic Acids Res Date: 2002-01-01 Impact factor: 16.971

7. Search and retrieve. Large-scale data generation is becoming increasingly important in biological research. But how good are the tools to make sense of the data?

Authors: Alfonso Valencia
Journal: EMBO Rep Date: 2002-05 Impact factor: 8.807

8. GenDB--an open source genome annotation system for prokaryote genomes.

Authors: Folker Meyer; Alexander Goesmann; Alice C McHardy; Daniela Bartels; Thomas Bekel; Jörn Clausen; Jörn Kalinowski; Burkhard Linke; Oliver Rupp; Robert Giegerich; Alfred Pühler
Journal: Nucleic Acids Res Date: 2003-04-15 Impact factor: 16.971

9. Analysis of protein sequence/structure similarity relationships.

Authors: Hin Hark Gan; Rebecca A Perlow; Sharmili Roy; Joy Ko; Min Wu; Jing Huang; Shixiang Yan; Angelo Nicoletta; Jonathan Vafai; Ding Sun; Lihua Wang; Joyce E Noah; Samuela Pasquali; Tamar Schlick
Journal: Biophys J Date: 2002-11 Impact factor: 4.033

10. PipeOnline 2.0: automated EST processing and functional data sorting.

Authors: Patricia Ayoubi; Xiaojing Jin; Saul Leite; Xianghui Liu; Jeson Martajaja; Abdurashid Abduraham; Qiaolan Wan; Wei Yan; Eduardo Misawa; Rolf A Prade
Journal: Nucleic Acids Res Date: 2002-11-01 Impact factor: 16.971