Literature DB >> 33726837

Learning chemistry: exploring the suitability of machine learning for the task of structure-based chemical ontology classification.

Janna Hastings1, Martin Glauer2, Adel Memariani2, Fabian Neuhaus2, Till Mossakowski2.   

Abstract

Chemical data is increasingly openly available in databases such as PubChem, which contains approximately 110 million compound entries as of February 2021. With the availability of data at such scale, the burden has shifted to organisation, analysis and interpretation. Chemical ontologies provide structured classifications of chemical entities that can be used for navigation and filtering of the large chemical space. ChEBI is a prominent example of a chemical ontology, widely used in life science contexts. However, ChEBI is manually maintained and as such cannot easily scale to the full scope of public chemical data. There is a need for tools that are able to automatically classify chemical data into chemical ontologies, which can be framed as a hierarchical multi-class classification problem. In this paper we evaluate machine learning approaches for this task, comparing different learning frameworks including logistic regression, decision trees and long short-term memory artificial neural networks, and different encoding approaches for the chemical structures, including cheminformatics fingerprints and character-based encoding from chemical line notation representations. We find that classical learning approaches such as logistic regression perform well with sets of relatively specific, disjoint chemical classes, while the neural network is able to handle larger sets of overlapping classes but needs more examples per class to learn from, and is not able to make a class prediction for every molecule. Future work will explore hybrid and ensemble approaches, as well as alternative network architectures including neuro-symbolic approaches.

Entities:  

Keywords:  Automated classification; Chemical ontology; LSTM; Machine learning

Year:  2021        PMID: 33726837      PMCID: PMC7962259          DOI: 10.1186/s13321-021-00500-8

Source DB:  PubMed          Journal:  J Cheminform        ISSN: 1758-2946            Impact factor:   5.514


  8 in total

1.  CO: A chemical ontology for identification of functional groups and semantic comparison of small molecules.

Authors:  Howard J Feldman; Michel Dumontier; Susan Ling; Norbert Haider; Christopher W V Hogue
Journal:  FEBS Lett       Date:  2005-08-29       Impact factor: 4.124

2.  Machine Learning to Predict Binding Affinity.

Authors:  Gabriela Bitencourt-Ferreira; Walter Filgueira de Azevedo
Journal:  Methods Mol Biol       Date:  2019

3.  Machine Learning Methods in Computational Toxicology.

Authors:  Igor I Baskin
Journal:  Methods Mol Biol       Date:  2018

4.  "Found in Translation": predicting outcomes of complex organic chemistry reactions using neural sequence-to-sequence models.

Authors:  Philippe Schwaller; Théophile Gaudin; Dávid Lányi; Costas Bekas; Teodoro Laino
Journal:  Chem Sci       Date:  2018-06-22       Impact factor: 9.825

Review 5.  Machine learning in chemoinformatics and drug discovery.

Authors:  Yu-Chen Lo; Stefano E Rensi; Wen Torng; Russ B Altman
Journal:  Drug Discov Today       Date:  2018-05-08       Impact factor: 7.851

6.  Systematic classification of unknown metabolites using high-resolution fragmentation mass spectra.

Authors:  Kai Dührkop; Louis-Félix Nothias; Markus Fleischauer; Raphael Reher; Marcus Ludwig; Martin A Hoffmann; Daniel Petras; William H Gerwick; Juho Rousu; Pieter C Dorrestein; Sebastian Böcker
Journal:  Nat Biotechnol       Date:  2020-11-23       Impact factor: 54.908

Review 7.  Deep Learning for Deep Chemistry: Optimizing the Prediction of Chemical Patterns.

Authors:  Tânia F G G Cova; Alberto A C C Pais
Journal:  Front Chem       Date:  2019-11-26       Impact factor: 5.221

8.  Mind the Gap: Mapping Mass Spectral Databases in Genome-Scale Metabolic Networks Reveals Poorly Covered Areas.

Authors:  Clément Frainay; Emma L Schymanski; Steffen Neumann; Benjamin Merlet; Reza M Salek; Fabien Jourdan; Oscar Yanes
Journal:  Metabolites       Date:  2018-09-15
  8 in total
  4 in total

1.  OARD: Open annotations for rare diseases and their phenotypes based on real-world data.

Authors:  Cong Liu; Casey N Ta; Jim M Havrilla; Jordan G Nestor; Matthew E Spotnitz; Andrew S Geneslaw; Yu Hu; Wendy K Chung; Kai Wang; Chunhua Weng
Journal:  Am J Hum Genet       Date:  2022-08-22       Impact factor: 11.043

2.  NPClassifier: A Deep Neural Network-Based Structural Classification Tool for Natural Products.

Authors:  Hyun Woo Kim; Mingxun Wang; Christopher A Leber; Louis-Félix Nothias; Raphael Reher; Kyo Bin Kang; Justin J J van der Hooft; Pieter C Dorrestein; William H Gerwick; Garrison W Cottrell
Journal:  J Nat Prod       Date:  2021-10-18       Impact factor: 4.803

Review 3.  From Platform to Knowledge Graph: Evolution of Laboratory Automation.

Authors:  Jiaru Bai; Liwei Cao; Sebastian Mosbach; Jethro Akroyd; Alexei A Lapkin; Markus Kraft
Journal:  JACS Au       Date:  2022-01-10

Review 4.  Metabolomics-Guided Elucidation of Plant Abiotic Stress Responses in the 4IR Era: An Overview.

Authors:  Morena M Tinte; Kekeletso H Chele; Justin J J van der Hooft; Fidele Tugizimana
Journal:  Metabolites       Date:  2021-07-08
  4 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.