BACKGROUND: Ontologies and taxonomies are among the most important computational resources for molecular biology and bioinformatics. A series of recent papers has shown that the Gene Ontology (GO), the most prominent taxonomic resource in these fields, is marked by flaws of certain characteristic types, which flow from a failure to address basic ontological principles. As yet, no methods have been proposed which would allow ontology curators to pinpoint flawed terms or definitions in ontologies in a systematic way. RESULTS: We present computational methods that automatically identify terms and definitions which are defined in a circular or unintelligible way. We further demonstrate the potential of these methods by applying them to isolate a subset of 6001 problematic GO terms. By automatically aligning GO with other ontologies and taxonomies we were able to propose alternative synonyms and definitions for some of these problematic terms. This allows us to demonstrate that these other resources do not contain definitions superior to those supplied by GO. CONCLUSION: Our methods provide reliable indications of the quality of terms and definitions in ontologies and taxonomies. Further, they are well suited to assist ontology curators in drawing their attention to those terms that are ill-defined. We have further shown the limitations of ontology mapping and alignment in assisting ontology curators in rectifying problems, thus pointing to the need for manual curation.
BACKGROUND: Ontologies and taxonomies are among the most important computational resources for molecular biology and bioinformatics. A series of recent papers has shown that the Gene Ontology (GO), the most prominent taxonomic resource in these fields, is marked by flaws of certain characteristic types, which flow from a failure to address basic ontological principles. As yet, no methods have been proposed which would allow ontology curators to pinpoint flawed terms or definitions in ontologies in a systematic way. RESULTS: We present computational methods that automatically identify terms and definitions which are defined in a circular or unintelligible way. We further demonstrate the potential of these methods by applying them to isolate a subset of 6001 problematic GO terms. By automatically aligning GO with other ontologies and taxonomies we were able to propose alternative synonyms and definitions for some of these problematic terms. This allows us to demonstrate that these other resources do not contain definitions superior to those supplied by GO. CONCLUSION: Our methods provide reliable indications of the quality of terms and definitions in ontologies and taxonomies. Further, they are well suited to assist ontology curators in drawing their attention to those terms that are ill-defined. We have further shown the limitations of ontology mapping and alignment in assisting ontology curators in rectifying problems, thus pointing to the need for manual curation.
Authors: M A Harris; J Clark; A Ireland; J Lomax; M Ashburner; R Foulger; K Eilbeck; S Lewis; B Marshall; C Mungall; J Richter; G M Rubin; J A Blake; C Bult; M Dolan; H Drabkin; J T Eppig; D P Hill; L Ni; M Ringwald; R Balakrishnan; J M Cherry; K R Christie; M C Costanzo; S S Dwight; S Engel; D G Fisk; J E Hirschman; E L Hong; R S Nash; A Sethuraman; C L Theesfeld; D Botstein; K Dolinski; B Feierbach; T Berardini; S Mundodi; S Y Rhee; R Apweiler; D Barrell; E Camon; E Dimmer; V Lee; R Chisholm; P Gaudet; W Kibbe; R Kishore; E M Schwarz; P Sternberg; M Gwinn; L Hannick; J Wortman; M Berriman; V Wood; N de la Cruz; P Tonellato; P Jaiswal; T Seigfried; R White Journal: Nucleic Acids Res Date: 2004-01-01 Impact factor: 16.971
Authors: Jacob Köhler; Jan Baumbach; Jan Taubert; Michael Specht; Andre Skusa; Alexander Rüegg; Chris Rawlings; Paul Verrier; Stephan Philippi Journal: Bioinformatics Date: 2006-03-13 Impact factor: 6.937
Authors: Jacob Koehler; Chris Rawlings; Paul Verrier; Rowan Mitchell; Andre Skusa; Alexander Ruegg; Stephan Philippi Journal: In Silico Biol Date: 2005
Authors: William A Baumgartner; K Bretonnel Cohen; Lynne M Fox; George Acquaah-Mensah; Lawrence Hunter Journal: Bioinformatics Date: 2007-07-01 Impact factor: 6.937