Literature DB >> 33010151

A machine learning-based method for prediction of macrocyclization patterns of polyketides and non-ribosomal peptides.

Priyesh Agrawal1, Debasisa Mohanty1.   

Abstract

MOTIVATION: Even though genome mining tools have successfully identified large numbers of non-ribosomal peptide synthetase (NRPS) and polyketide synthase (PKS) biosynthetic gene clusters (BGCs) in bacterial genomes, currently no tool can predict the chemical structure of the secondary metabolites biosynthesized by these BGCs. Lack of algorithms for predicting complex macrocyclization patterns of linear PK/NRP biosynthetic intermediates has been the major bottleneck in deciphering the final bioactive chemical structures of PKs/NRPs by genome mining.
RESULTS: Using a large dataset of known chemical structures of macrocyclized PKs/NRPs, we have developed a machine learning (ML) algorithm for distinguishing the correct macrocyclization pattern of PKs/NRPs from the library of all theoretically possible cyclization patterns. Benchmarking of this ML classifier on completely independent datasets has revealed ROC-AUC and PR-AUC values of 0.82 and 0.81, respectively. This cyclization prediction algorithm has been used to develop SBSPKSv3, a genome mining tool for completely automated prediction of macrocyclized structures of NRPs/PKs. SBSPKSv3 has been extensively benchmarked on a dataset of over 100 BGCs with known PKs/NRPs products.
AVAILABILITY AND IMPLEMENTATION: The macrocyclization prediction pipeline and all the datasets used in this study are freely available at http://www.nii.ac.in/sbspks3.html. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
© The Author(s) 2020. Published by Oxford University Press. All rights reserved. For permissions, please e-mail: journals.permissions@oup.com.

Entities:  

Year:  2021        PMID: 33010151     DOI: 10.1093/bioinformatics/btaa851

Source DB:  PubMed          Journal:  Bioinformatics        ISSN: 1367-4803            Impact factor:   6.937


  2 in total

Review 1.  Mining genomes to illuminate the specialized chemistry of life.

Authors:  Marnix H Medema; Tristan de Rond; Bradley S Moore
Journal:  Nat Rev Genet       Date:  2021-06-03       Impact factor: 53.242

Review 2.  Natural product drug discovery in the artificial intelligence era.

Authors:  F I Saldívar-González; V D Aldas-Bulos; J L Medina-Franco; F Plisson
Journal:  Chem Sci       Date:  2021-12-13       Impact factor: 9.825

  2 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.