Literature DB >> 17646340

Information theory applied to the sparse gene ontology annotation network to predict novel gene function.

Ying Tao1, Lee Sam, Jianrong Li, Carol Friedman, Yves A Lussier.   

Abstract

MOTIVATION: Despite advances in the gene annotation process, the functions of a large portion of gene products remain insufficiently characterized. In addition, the in silico prediction of novel Gene Ontology (GO) annotations for partially characterized gene functions or processes is highly dependent on reverse genetic or functional genomic approaches. To our knowledge, no prediction method has been demonstrated to be highly accurate for sparsely annotated GO terms (those associated to fewer than 10 genes).
RESULTS: We propose a novel approach, information theory-based semantic similarity (ITSS), to automatically predict molecular functions of genes based on existing GO annotations. Using a 10-fold cross-validation, we demonstrate that the ITSS algorithm obtains prediction accuracies (precision 97%, recall 77%) comparable to other machine learning algorithms when compared in similar conditions over densely annotated portions of the GO datasets. This method is able to generate highly accurate predictions in sparsely annotated portions of GO, where previous algorithms have failed. As a result, our technique generates an order of magnitude more functional predictions than previous methods. A 10-fold cross validation demonstrated a precision of 90% at a recall of 36% for the algorithm over sparsely annotated networks of the recent GO annotations (about 1400 GO terms and 11,000 genes in Homo sapiens). To our knowledge, this article presents the first historical rollback validation for the predicted GO annotations, which may represent more realistic conditions than more widely used cross-validation approaches. By manually assessing a random sample of 100 predictions conducted in a historical rollback evaluation, we estimate that a minimum precision of 51% (95% confidence interval: 43-58%) can be achieved for the human GO Annotation file dated 2003. AVAILABILITY: The program is available on request. The 97,732 positive predictions of novel gene annotations from the 2005 GO Annotation dataset and other supplementary information is available at http://phenos.bsd.uchicago.edu/ITSS/. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.

Entities:  

Mesh:

Substances:

Year:  2007        PMID: 17646340      PMCID: PMC2882681          DOI: 10.1093/bioinformatics/btm195

Source DB:  PubMed          Journal:  Bioinformatics        ISSN: 1367-4803            Impact factor:   6.937


  50 in total

1.  Assessment of the reliability of protein-protein interactions and protein function prediction.

Authors:  Minghua Deng; Fengzhu Sun; Ting Chen
Journal:  Pac Symp Biocomput       Date:  2003

2.  Gene annotation from scientific literature using mappings between keyword systems.

Authors:  Antonio J Pérez; Carolina Perez-Iratxeta; Peer Bork; Guillermo Thode; Miguel A Andrade
Journal:  Bioinformatics       Date:  2004-04-01       Impact factor: 6.937

3.  Protein function prediction using local 3D templates.

Authors:  Roman A Laskowski; James D Watson; Janet M Thornton
Journal:  J Mol Biol       Date:  2005-08-19       Impact factor: 5.469

4.  Measures of semantic similarity and relatedness in the biomedical domain.

Authors:  Ted Pedersen; Serguei V S Pakhomov; Siddharth Patwardhan; Christopher G Chute
Journal:  J Biomed Inform       Date:  2006-06-10       Impact factor: 6.317

5.  Convergent functional genomics: a Bayesian candidate gene identification approach for complex disorders.

Authors:  B Bertsch; C A Ogden; K Sidhu; H Le-Niculescu; R Kuczenski; A B Niculescu
Journal:  Methods       Date:  2005-11       Impact factor: 3.608

6.  Automatic extraction of keywords from scientific text: application to the knowledge domain of protein families.

Authors:  M A Andrade; A Valencia
Journal:  Bioinformatics       Date:  1998       Impact factor: 6.937

7.  A method of comparing the areas under receiver operating characteristic curves derived from the same cases.

Authors:  J A Hanley; B J McNeil
Journal:  Radiology       Date:  1983-09       Impact factor: 11.105

8.  Predicting protein function from protein/protein interaction data: a probabilistic approach.

Authors:  Stanley Letovsky; Simon Kasif
Journal:  Bioinformatics       Date:  2003       Impact factor: 6.937

9.  Systematic survey reveals general applicability of "guilt-by-association" within gene coexpression networks.

Authors:  Cecily J Wolfe; Isaac S Kohane; Atul J Butte
Journal:  BMC Bioinformatics       Date:  2005-09-14       Impact factor: 3.169

10.  Prediction of yeast protein-protein interaction network: insights from the Gene Ontology and annotations.

Authors:  Xiaomei Wu; Lei Zhu; Jie Guo; Da-Yong Zhang; Kui Lin
Journal:  Nucleic Acids Res       Date:  2006-04-26       Impact factor: 16.971

View more
  64 in total

1.  Identifying informative subsets of the Gene Ontology with information bottleneck methods.

Authors:  Bo Jin; Xinghua Lu
Journal:  Bioinformatics       Date:  2010-08-11       Impact factor: 6.937

Review 2.  Network integration and graph analysis in mammalian molecular systems biology.

Authors:  A Ma'ayan
Journal:  IET Syst Biol       Date:  2008-09       Impact factor: 1.615

Review 3.  Accessing and integrating data and knowledge for biomedical research.

Authors:  A Burgun; O Bodenreider
Journal:  Yearb Med Inform       Date:  2008

4.  Methodology for the inference of gene function from phenotype data.

Authors:  Joao A Ascensao; Mary E Dolan; David P Hill; Judith A Blake
Journal:  BMC Bioinformatics       Date:  2014-12-12       Impact factor: 3.169

Review 5.  Biomechanisms of Comorbidity: Reviewing Integrative Analyses of Multi-omics Datasets and Electronic Health Records.

Authors:  N Pouladi; I Achour; H Li; J Berghout; C Kenost; M L Gonzalez-Garay; Y A Lussier
Journal:  Yearb Med Inform       Date:  2016-11-10

6.  kMEn: Analyzing noisy and bidirectional transcriptional pathway responses in single subjects.

Authors:  Qike Li; A Grant Schissler; Vincent Gardeux; Joanne Berghout; Ikbel Achour; Colleen Kenost; Haiquan Li; Hao Helen Zhang; Yves A Lussier
Journal:  J Biomed Inform       Date:  2016-12-19       Impact factor: 6.317

7.  NanoParticle Ontology for cancer nanotechnology research.

Authors:  Dennis G Thomas; Rohit V Pappu; Nathan A Baker
Journal:  J Biomed Inform       Date:  2010-03-06       Impact factor: 6.317

8.  Network-based elucidation of human disease similarities reveals common functional modules enriched for pluripotent drug targets.

Authors:  Silpa Suthram; Joel T Dudley; Annie P Chiang; Rong Chen; Trevor J Hastie; Atul J Butte
Journal:  PLoS Comput Biol       Date:  2010-02-05       Impact factor: 4.475

9.  An integrative multi-network and multi-classifier approach to predict genetic interactions.

Authors:  Gaurav Pandey; Bin Zhang; Aaron N Chang; Chad L Myers; Jun Zhu; Vipin Kumar; Eric E Schadt
Journal:  PLoS Comput Biol       Date:  2010-09-09       Impact factor: 4.475

Review 10.  Semantic similarity in biomedical ontologies.

Authors:  Catia Pesquita; Daniel Faria; André O Falcão; Phillip Lord; Francisco M Couto
Journal:  PLoS Comput Biol       Date:  2009-07-31       Impact factor: 4.475

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.