Literature DB >> 19254922

The use of gene ontology evidence codes in preventing classifier assessment bias.

Mark F Rogers1, Asa Ben-Hur.   

Abstract

MOTIVATION: The biological community's reliance on computational annotations of protein function makes correct assessment of function prediction methods an issue of great importance. The fact that a large fraction of the annotations in current biological databases are based on computational methods can lead to bias in estimating the accuracy of function prediction methods. This can happen since predicting an annotation that was derived computationally in the first place is likely easier than predicting annotations that were derived experimentally, leading to over-optimistic classifier performance estimates.
RESULTS: We illustrate this phenomenon in a set of controlled experiments using a nearest neighbor classifier that uses PSI-BLAST similarity scores. Our results demonstrate that the source of Gene Ontology (GO) annotations used to assess a protein function predictor can have a highly significant influence on classifier accuracy: the average accuracy over four species and over GO terms in the biological process namespace increased from 0.72 to 0.87 when the classifier was given access to annotations that are assigned evidence codes that indicate a possible computational source, instead of experimentally determined annotations. Slightly smaller increases were observed in the other namespaces. In these comparisons the total number of annotations and their distribution across GO terms were kept the same.
CONCLUSION: In conclusion, taking into account GO evidence codes is required for reporting accuracy statistics that do not overestimate a model's performance, and is of particular importance for a fair comparison of classifiers that rely on different information sources. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.

Mesh:

Substances:

Year:  2009        PMID: 19254922     DOI: 10.1093/bioinformatics/btp122

Source DB:  PubMed          Journal:  Bioinformatics        ISSN: 1367-4803            Impact factor:   6.937


  22 in total

1.  Transcriptome sequencing and de novo characterization of Korean endemic land snail, Koreanohadra kurodana for functional transcripts and SSR markers.

Authors:  Se Won Kang; Bharat Bhusan Patnaik; Hee-Ju Hwang; So Young Park; Jong Min Chung; Dae Kwon Song; Hongray Howrelia Patnaik; Jae Bong Lee; Changmu Kim; Soonok Kim; Hong Seog Park; Yeon Soo Han; Jun Sang Lee; Yong Seok Lee
Journal:  Mol Genet Genomics       Date:  2016-08-09       Impact factor: 3.291

2.  Androgen-induced Rhox homeobox genes modulate the expression of AR-regulated genes.

Authors:  Zhiying Hu; Dineshkumar Dandekar; Peter J O'Shaughnessy; Karel De Gendt; Guido Verhoeven; Miles F Wilkinson
Journal:  Mol Endocrinol       Date:  2009-11-09

3.  Combining heterogeneous data sources for accurate functional annotation of proteins.

Authors:  Artem Sokolov; Christopher Funk; Kiley Graim; Karin Verspoor; Asa Ben-Hur
Journal:  BMC Bioinformatics       Date:  2013-02-28       Impact factor: 3.169

4.  Gene expression rate comparison for multiple high-throughput datasets.

Authors:  Chien-Ming Chen; Tsan-Huang Shih; Tun-Wen Pai; Zhen-Long Liu; Margaret Dah-Tsyr Chang; Chin-Hwa Hu
Journal:  IET Syst Biol       Date:  2013-10       Impact factor: 1.615

5.  FFPred 2.0: improved homology-independent prediction of gene ontology terms for eukaryotic protein sequences.

Authors:  Federico Minneci; Damiano Piovesan; Domenico Cozzetto; David T Jones
Journal:  PLoS One       Date:  2013-05-22       Impact factor: 3.240

6.  IntelliGO: a new vector-based semantic similarity measure including annotation origin.

Authors:  Sidahmed Benabderrahmane; Malika Smail-Tabbone; Olivier Poch; Amedeo Napoli; Marie-Dominique Devignes
Journal:  BMC Bioinformatics       Date:  2010-12-01       Impact factor: 3.169

7.  An approach to describing and analysing bulk biological annotation quality: a case study using UniProtKB.

Authors:  Michael J Bell; Colin S Gillespie; Daniel Swan; Phillip Lord
Journal:  Bioinformatics       Date:  2012-09-15       Impact factor: 6.937

8.  Protein function prediction by massive integration of evolutionary analyses and multiple data sources.

Authors:  Domenico Cozzetto; Daniel W A Buchan; Kevin Bryson; David T Jones
Journal:  BMC Bioinformatics       Date:  2013-02-28       Impact factor: 3.169

9.  The ortholog conjecture is untestable by the current gene ontology but is supported by RNA sequencing data.

Authors:  Xiaoshu Chen; Jianzhi Zhang
Journal:  PLoS Comput Biol       Date:  2012-11-29       Impact factor: 4.475

10.  A domain-centric solution to functional genomics via dcGO Predictor.

Authors:  Hai Fang; Julian Gough
Journal:  BMC Bioinformatics       Date:  2013-02-28       Impact factor: 3.169

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.