BACKGROUND: Recent years have seen an explosion in the availability of data in the chemistry domain. With this information explosion, however, retrieving relevant results from the available information, and organising those results, become even harder problems. Computational processing is essential to filter and organise the available resources so as to better facilitate the work of scientists. Ontologies encode expert domain knowledge in a hierarchically organised machine-processable format. One such ontology for the chemical domain is ChEBI. ChEBI provides a classification of chemicals based on their structural features and a role or activity-based classification. An example of a structure-based class is 'pentacyclic compound' (compounds containing five-ring structures), while an example of a role-based class is 'analgesic', since many different chemicals can act as analgesics without sharing structural features. Structure-based classification in chemistry exploits elegant regularities and symmetries in the underlying chemical domain. As yet, there has been neither a systematic analysis of the types of structural classification in use in chemistry nor a comparison to the capabilities of available technologies. RESULTS: We analyze the different categories of structural classes in chemistry, presenting a list of patterns for features found in class definitions. We compare these patterns of class definition to tools which allow for automation of hierarchy construction within cheminformatics and within logic-based ontology technology, going into detail in the latter case with respect to the expressive capabilities of the Web Ontology Language and recent extensions for modelling structured objects. Finally we discuss the relationships and interactions between cheminformatics approaches and logic-based approaches. CONCLUSION: Systems that perform intelligent reasoning tasks on chemistry data require a diverse set of underlying computational utilities including algorithmic, statistical and logic-based tools. For the task of automatic structure-based classification of chemical entities, essential to managing the vast swathes of chemical data being brought online, systems which are capable of hybrid reasoning combining several different approaches are crucial. We provide a thorough review of the available tools and methodologies, and identify areas of open research.
BACKGROUND: Recent years have seen an explosion in the availability of data in the chemistry domain. With this information explosion, however, retrieving relevant results from the available information, and organising those results, become even harder problems. Computational processing is essential to filter and organise the available resources so as to better facilitate the work of scientists. Ontologies encode expert domain knowledge in a hierarchically organised machine-processable format. One such ontology for the chemical domain is ChEBI. ChEBI provides a classification of chemicals based on their structural features and a role or activity-based classification. An example of a structure-based class is 'pentacyclic compound' (compounds containing five-ring structures), while an example of a role-based class is 'analgesic', since many different chemicals can act as analgesics without sharing structural features. Structure-based classification in chemistry exploits elegant regularities and symmetries in the underlying chemical domain. As yet, there has been neither a systematic analysis of the types of structural classification in use in chemistry nor a comparison to the capabilities of available technologies. RESULTS: We analyze the different categories of structural classes in chemistry, presenting a list of patterns for features found in class definitions. We compare these patterns of class definition to tools which allow for automation of hierarchy construction within cheminformatics and within logic-based ontology technology, going into detail in the latter case with respect to the expressive capabilities of the Web Ontology Language and recent extensions for modelling structured objects. Finally we discuss the relationships and interactions between cheminformatics approaches and logic-based approaches. CONCLUSION: Systems that perform intelligent reasoning tasks on chemistry data require a diverse set of underlying computational utilities including algorithmic, statistical and logic-based tools. For the task of automatic structure-based classification of chemical entities, essential to managing the vast swathes of chemical data being brought online, systems which are capable of hybrid reasoning combining several different approaches are crucial. We provide a thorough review of the available tools and methodologies, and identify areas of open research.
Authors: M Ashburner; C A Ball; J A Blake; D Botstein; H Butler; J M Cherry; A P Davis; K Dolinski; S S Dwight; J T Eppig; M A Harris; D P Hill; L Issel-Tarver; A Kasarskis; S Lewis; J C Matese; J E Richardson; M Ringwald; G M Rubin; G Sherlock Journal: Nat Genet Date: 2000-05 Impact factor: 38.330
Authors: Gil Alterovitz; Michael Xiang; David P Hill; Jane Lomax; Jonathan Liu; Michael Cherkassky; Jonathan Dreyfuss; Chris Mungall; Midori A Harris; Mary E Dolan; Judith A Blake; Marco F Ramoni Journal: Nat Biotechnol Date: 2010-02 Impact factor: 54.908
Authors: Leonid L Chepelev; Janna Hastings; Marcus Ennis; Christoph Steinbeck; Michel Dumontier Journal: BMC Bioinformatics Date: 2012-01-06 Impact factor: 3.169
Authors: Lee Harland; Christopher Larminie; Susanna-Assunta Sansone; Sorana Popa; M Scott Marshall; Michael Braxenthaler; Michael Cantor; Wendy Filsell; Mark J Forster; Enoch Huang; Andreas Matern; Mark Musen; Jasmin Saric; Ted Slater; Jabe Wilson; Nick Lynch; John Wise; Ian Dix Journal: Drug Discov Today Date: 2011-09-23 Impact factor: 7.851
Authors: Mélanie Courtot; Nick Juty; Christian Knüpfer; Dagmar Waltemath; Anna Zhukova; Andreas Dräger; Michel Dumontier; Andrew Finney; Martin Golebiewski; Janna Hastings; Stefan Hoops; Sarah Keating; Douglas B Kell; Samuel Kerrien; James Lawson; Allyson Lister; James Lu; Rainer Machne; Pedro Mendes; Matthew Pocock; Nicolas Rodriguez; Alice Villeger; Darren J Wilkinson; Sarala Wimalaratne; Camille Laibe; Michael Hucka; Nicolas Le Novère Journal: Mol Syst Biol Date: 2011-10-25 Impact factor: 11.429
Authors: S Kerrien; Y Alam-Faruque; B Aranda; I Bancarz; A Bridge; C Derow; E Dimmer; M Feuermann; A Friedrichsen; R Huntley; C Kohler; J Khadake; C Leroy; A Liban; C Lieftink; L Montecchi-Palazzi; S Orchard; J Risse; K Robbe; B Roechert; D Thorneycroft; Y Zhang; R Apweiler; H Hermjakob Journal: Nucleic Acids Res Date: 2006-12-01 Impact factor: 16.971
Authors: Paula de Matos; Rafael Alcántara; Adriano Dekker; Marcus Ennis; Janna Hastings; Kenneth Haug; Inmaculada Spiteri; Steve Turner; Christoph Steinbeck Journal: Nucleic Acids Res Date: 2009-10-23 Impact factor: 16.971