Literature DB >> 11119305

Accurate prediction of protein functional class from sequence in the Mycobacterium tuberculosis and Escherichia coli genomes using data mining.

R D King1, A Karwath, A Clare, L Dehaspe.   

Abstract

The analysis of genomics data needs to become as automated as its generation. Here we present a novel data-mining approach to predicting protein functional class from sequence. This method is based on a combination of inductive logic programming clustering and rule learning. We demonstrate the effectiveness of this approach on the M. tuberculosis and E. coli genomes, and identify biologically interpretable rules which predict protein functional class from information only available from the sequence. These rules predict 65% of the ORFs with no assigned function in M. tuberculosis and 24% of those in E. coli, with an estimated accuracy of 60-80% (depending on the level of functional assignment). The rules are founded on a combination of detection of remote homology, convergent evolution and horizontal gene transfer. We identify rules that predict protein functional class even in the absence of detectable sequence or structural homology. These rules give insight into the evolutionary history of M. tuberculosis and E. coli. Copyright 2000 John Wiley & Sons, Ltd.

Entities:  

Mesh:

Substances:

Year:  2000        PMID: 11119305      PMCID: PMC2448385          DOI: 10.1002/1097-0061(200012)17:4<283::AID-YEA52>3.0.CO;2-F

Source DB:  PubMed          Journal:  Yeast        ISSN: 0749-503X            Impact factor:   3.239


  21 in total

1.  Cascaded multiple classifiers for secondary structure prediction.

Authors:  M Ouali; R D King
Journal:  Protein Sci       Date:  2000-06       Impact factor: 6.725

2.  Functional genomics: from genes to new therapies.

Authors: 
Journal:  Drug Discov Today       Date:  1999-03       Impact factor: 7.851

3.  1997 ushers in an era of yeast functional genomics.

Authors:  H Bussey
Journal:  Yeast       Date:  1997-12       Impact factor: 3.239

4.  Functional genomics: it's all how you read it.

Authors:  P Hieter; M Boguski
Journal:  Science       Date:  1997-10-24       Impact factor: 47.728

5.  The complete genome sequence of Escherichia coli K-12.

Authors:  F R Blattner; G Plunkett; C A Bloch; N T Perna; V Burland; M Riley; J Collado-Vides; J D Glasner; C K Rode; G F Mayhew; J Gregor; N W Davis; H A Kirkpatrick; M A Goeden; D J Rose; B Mau; Y Shao
Journal:  Science       Date:  1997-09-05       Impact factor: 47.728

6.  Improved tools for biological sequence comparison.

Authors:  W R Pearson; D J Lipman
Journal:  Proc Natl Acad Sci U S A       Date:  1988-04       Impact factor: 11.205

7.  Structure-activity relationships derived by machine learning: the use of atoms and their bond connectivities to predict mutagenicity by inductive logic programming.

Authors:  R D King; S H Muggleton; A Srinivasan; M J Sternberg
Journal:  Proc Natl Acad Sci U S A       Date:  1996-01-09       Impact factor: 11.205

8.  Intermediate sequences increase the detection of homology between sequences.

Authors:  J Park; S A Teichmann; T Hubbard; C Chothia
Journal:  J Mol Biol       Date:  1997-10-17       Impact factor: 5.469

Review 9.  Life with 6000 genes.

Authors:  A Goffeau; B G Barrell; H Bussey; R W Davis; B Dujon; H Feldmann; F Galibert; J D Hoheisel; C Jacq; M Johnston; E J Louis; H W Mewes; Y Murakami; P Philippsen; H Tettelin; S G Oliver
Journal:  Science       Date:  1996-10-25       Impact factor: 47.728

10.  Deciphering the biology of Mycobacterium tuberculosis from the complete genome sequence.

Authors:  S T Cole; R Brosch; J Parkhill; T Garnier; C Churcher; D Harris; S V Gordon; K Eiglmeier; S Gas; C E Barry; F Tekaia; K Badcock; D Basham; D Brown; T Chillingworth; R Connor; R Davies; K Devlin; T Feltwell; S Gentles; N Hamlin; S Holroyd; T Hornsby; K Jagels; A Krogh; J McLean; S Moule; L Murphy; K Oliver; J Osborne; M A Quail; M A Rajandream; J Rogers; S Rutter; K Seeger; J Skelton; R Squares; S Squares; J E Sulston; K Taylor; S Whitehead; B G Barrell
Journal:  Nature       Date:  1998-06-11       Impact factor: 49.962

View more
  9 in total

1.  Predicting gene ontology functions from ProDom and CDD protein domains.

Authors:  Jonathan Schug; Sharon Diskin; Joan Mazzarelli; Brian P Brunk; Christian J Stoeckert
Journal:  Genome Res       Date:  2002-04       Impact factor: 9.043

2.  Accurate evaluation and analysis of functional genomics data and methods.

Authors:  Casey S Greene; Olga G Troyanskaya
Journal:  Ann N Y Acad Sci       Date:  2012-01-23       Impact factor: 5.691

3.  GOPred: GO molecular function prediction by combined classifiers.

Authors:  Omer Sinan Saraç; Volkan Atalay; Rengul Cetin-Atalay
Journal:  PLoS One       Date:  2010-08-31       Impact factor: 3.240

4.  Automatic discovery of cross-family sequence features associated with protein function.

Authors:  Markus Brameier; Josien Haan; Andrea Krings; Robert M MacCallum
Journal:  BMC Bioinformatics       Date:  2006-01-12       Impact factor: 3.169

5.  Predicting protein function by machine learning on amino acid sequences--a critical evaluation.

Authors:  Ali Al-Shahib; Rainer Breitling; David R Gilbert
Journal:  BMC Genomics       Date:  2007-03-20       Impact factor: 3.969

6.  Homology induction: the use of machine learning to improve sequence similarity searches.

Authors:  Andreas Karwath; Ross D King
Journal:  BMC Bioinformatics       Date:  2002-04-23       Impact factor: 3.169

7.  Deciphering the association between gene function and spatial gene-gene interactions in 3D human genome conformation.

Authors:  Renzhi Cao; Jianlin Cheng
Journal:  BMC Genomics       Date:  2015-10-28       Impact factor: 3.969

8.  A critical assessment of Mus musculus gene function prediction using integrated genomic evidence.

Authors:  Lourdes Peña-Castillo; Murat Tasan; Chad L Myers; Hyunju Lee; Trupti Joshi; Chao Zhang; Yuanfang Guan; Michele Leone; Andrea Pagnani; Wan Kyu Kim; Chase Krumpelman; Weidong Tian; Guillaume Obozinski; Yanjun Qi; Sara Mostafavi; Guan Ning Lin; Gabriel F Berriz; Francis D Gibbons; Gert Lanckriet; Jian Qiu; Charles Grant; Zafer Barutcuoglu; David P Hill; David Warde-Farley; Chris Grouios; Debajyoti Ray; Judith A Blake; Minghua Deng; Michael I Jordan; William S Noble; Quaid Morris; Judith Klein-Seetharaman; Ziv Bar-Joseph; Ting Chen; Fengzhu Sun; Olga G Troyanskaya; Edward M Marcotte; Dong Xu; Timothy R Hughes; Frederick P Roth
Journal:  Genome Biol       Date:  2008-06-27       Impact factor: 13.583

9.  Functional genomics via metabolic footprinting: monitoring metabolite secretion by Escherichia coli tryptophan metabolism mutants using FT-IR and direct injection electrospray mass spectrometry.

Authors:  Naheed N Kaderbhai; David I Broadhurst; David I Ellis; Royston Goodacre; Douglas B Kell
Journal:  Comp Funct Genomics       Date:  2003
  9 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.