Literature DB >> 12202776

Dictionary-driven protein annotation.

Isidore Rigoutsos1, Tien Huynh, Aris Floratos, Laxmi Parida, Daniel Platt.   

Abstract

Computational methods seeking to automatically determine the properties (functional, structural, physicochemical, etc.) of a protein directly from the sequence have long been the focus of numerous research groups. With the advent of advanced sequencing methods and systems, the number of amino acid sequences that are being deposited in the public databases has been increasing steadily. This has in turn generated a renewed demand for automated approaches that can annotate individual sequences and complete genomes quickly, exhaustively and objectively. In this paper, we present one such approach that is centered around and exploits the Bio-Dictionary, a collection of amino acid patterns that completely covers the natural sequence space and can capture functional and structural signals that have been reused during evolution, within and across protein families. Our annotation approach also makes use of a weighted, position-specific scoring scheme that is unaffected by the over-representation of well-conserved proteins and protein fragments in the databases used. For a given query sequence, the method permits one to determine, in a single pass, the following: local and global similarities between the query and any protein already present in a public database; the likeness of the query to all available archaeal/ bacterial/eukaryotic/viral sequences in the database as a function of amino acid position within the query; the character of secondary structure of the query as a function of amino acid position within the query; the cytoplasmic, transmembrane or extracellular behavior of the query; the nature and position of binding domains, active sites, post-translationally modified sites, signal peptides, etc. In terms of performance, the proposed method is exhaustive, objective and allows for the rapid annotation of individual sequences and full genomes. Annotation examples are presented and discussed in Results, including individual queries and complete genomes that were released publicly after we built the Bio-Dictionary that is used in our experiments. Finally, we have computed the annotations of more than 70 complete genomes and made them available on the World Wide Web at http://cbcsrv.watson.ibm.com/Annotations/.

Mesh:

Substances:

Year:  2002        PMID: 12202776      PMCID: PMC137405          DOI: 10.1093/nar/gkf464

Source DB:  PubMed          Journal:  Nucleic Acids Res        ISSN: 0305-1048            Impact factor:   16.971


  71 in total

1.  Automated genome sequence analysis and annotation.

Authors:  M A Andrade; N P Brown; C Leroy; S Hoersch; A de Daruvar; C Reich; A Franchini; J Tamames; A Valencia; C Ouzounis; C Sander
Journal:  Bioinformatics       Date:  1999-05       Impact factor: 6.937

2.  Whole-genome sequence annotation: 'Going wrong with confidence'.

Authors:  N C Kyrpides; C A Ouzounis
Journal:  Mol Microbiol       Date:  1999-05       Impact factor: 3.501

3.  Reproducibility in genome sequence annotation: the Plasmodium falciparum chromosome 2 case.

Authors:  S Tsoka; V Promponas; C A Ouzounis
Journal:  FEBS Lett       Date:  1999-05-28       Impact factor: 4.124

4.  Detecting protein function and protein-protein interactions from genome sequences.

Authors:  E M Marcotte; M Pellegrini; H L Ng; D W Rice; T O Yeates; D Eisenberg
Journal:  Science       Date:  1999-07-30       Impact factor: 47.728

5.  The use of gene clusters to infer functional coupling.

Authors:  R Overbeek; M Fonstein; M D'Souza; G D Pusch; N Maltsev
Journal:  Proc Natl Acad Sci U S A       Date:  1999-03-16       Impact factor: 11.205

6.  Protein Data Bank archives of three-dimensional macromolecular structures.

Authors:  E E Abola; J L Sussman; J Prilusky; N O Manning
Journal:  Methods Enzymol       Date:  1997       Impact factor: 1.600

7.  Sequence analysis of the Methanococcus jannaschii genome and the prediction of protein function.

Authors:  M Andrade; G Casari; A de Daruvar; C Sander; R Schneider; J Tamames; A Valencia; C Ouzounis
Journal:  Comput Appl Biosci       Date:  1997-08

8.  Comparison of archaeal and bacterial genomes: computer analysis of protein sequences predicts novel functions and suggests a chimeric origin for the archaea.

Authors:  E V Koonin; A R Mushegian; M Y Galperin; D R Walker
Journal:  Mol Microbiol       Date:  1997-08       Impact factor: 3.501

9.  Evidence for lateral gene transfer between Archaea and bacteria from genome sequence of Thermotoga maritima.

Authors:  K E Nelson; R A Clayton; S R Gill; M L Gwinn; R J Dodson; D H Haft; E K Hickey; J D Peterson; W C Nelson; K A Ketchum; L McDonald; T R Utterback; J A Malek; K D Linher; M M Garrett; A M Stewart; M D Cotton; M S Pratt; C A Phillips; D Richardson; J Heidelberg; G G Sutton; R D Fleischmann; J A Eisen; O White; S L Salzberg; H O Smith; J C Venter; C M Fraser
Journal:  Nature       Date:  1999-05-27       Impact factor: 49.962

10.  Comparative genomes of Chlamydia pneumoniae and C. trachomatis.

Authors:  S Kalman; W Mitchell; R Marathe; C Lammel; J Fan; R W Hyman; L Olinger; J Grimwood; R W Davis; R S Stephens
Journal:  Nat Genet       Date:  1999-04       Impact factor: 38.330

View more
  10 in total

1.  In silico pattern-based analysis of the human cytomegalovirus genome.

Authors:  Isidore Rigoutsos; Jiri Novotny; Tien Huynh; Stephen T Chin-Bow; Laxmi Parida; Daniel Platt; David Coleman; Thomas Shenk
Journal:  J Virol       Date:  2003-04       Impact factor: 5.103

2.  Re-evaluation and in silico annotation of the Tupaia herpesvirus proteins.

Authors:  Udo Bahr; Gholamreza Darai
Journal:  Virus Genes       Date:  2004-01       Impact factor: 2.332

3.  The web server of IBM's Bioinformatics and Pattern Discovery group.

Authors:  Tien Huynh; Isidore Rigoutsos; Laxmi Parida; Daniel Platt; Tetsuo Shibuya
Journal:  Nucleic Acids Res       Date:  2003-07-01       Impact factor: 16.971

4.  Structural details (kinks and non-alpha conformations) in transmembrane helices are intrahelically determined and can be predicted by sequence pattern descriptors.

Authors:  Isidore Rigoutsos; Peter Riek; Robert M Graham; Jiri Novotny
Journal:  Nucleic Acids Res       Date:  2003-08-01       Impact factor: 16.971

5.  The web server of IBM's Bioinformatics and Pattern Discovery group: 2004 update.

Authors:  Tien Huynh; Isidore Rigoutsos
Journal:  Nucleic Acids Res       Date:  2004-07-01       Impact factor: 16.971

6.  MACSIMS: multiple alignment of complete sequences information management system.

Authors:  Julie D Thompson; Arnaud Muller; Andrew Waterhouse; Jim Procter; Geoffrey J Barton; Frédéric Plewniak; Olivier Poch
Journal:  BMC Bioinformatics       Date:  2006-06-23       Impact factor: 3.169

7.  Identifying the missing proteins in human proteome by biological language model.

Authors:  Qiwen Dong; Kai Wang; Xuan Liu
Journal:  BMC Syst Biol       Date:  2016-12-23

8.  Minimotif miner 2nd release: a database and web system for motif search.

Authors:  Sanguthevar Rajasekaran; Sudha Balla; Patrick Gradie; Michael R Gryk; Krishna Kadaveru; Vamsi Kundeti; Mark W Maciejewski; Tian Mi; Nicholas Rubino; Jay Vyas; Martin R Schiller
Journal:  Nucleic Acids Res       Date:  2008-10-31       Impact factor: 16.971

9.  Automatic annotation of protein motif function with Gene Ontology terms.

Authors:  Xinghua Lu; Chengxiang Zhai; Vanathi Gopalakrishnan; Bruce G Buchanan
Journal:  BMC Bioinformatics       Date:  2004-09-02       Impact factor: 3.169

Review 10.  Alignment-free inference of hierarchical and reticulate phylogenomic relationships.

Authors:  Guillaume Bernard; Cheong Xin Chan; Yao-Ban Chan; Xin-Yi Chua; Yingnan Cong; James M Hogan; Stefan R Maetschke; Mark A Ragan
Journal:  Brief Bioinform       Date:  2019-03-22       Impact factor: 11.622

  10 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.