Tingyang Li1, Ashootosh Tripathi2,3, Fengan Yu2, David H Sherman2,3,4, Arvind Rao1,5,6. 1. Department of Computational Medicine and Bioinformatics, MI, USA. 2. Natural Products Discovery Core, Life Sciences Institute, MI, USA. 3. Department of Medicinal Chemistry, MI, USA. 4. Department of Chemistry, Department of Microbiology and Immunology, MI, USA. 5. Department of Radiation Oncology, Michigan Institute for Data Science and Department of Biomedical Engineering, University of Michigan, Ann Arbor, MI, USA. 6. Department of Electrical and Computer Engineering, Rice University, Houston, TX, USA.
Abstract
SUMMARY: DDAP is a tool for predicting the biosynthetic pathways of the products of type I modular polyketide synthase (PKS) with the focus on providing a more accurate prediction of the ordering of proteins and substrates in the pathway. In this study, the module docking domain (DD) affinity prediction performance on a hold-out testing dataset reached 0.88 as measured by the area under the receiver operating characteristic (ROC) curve (AUC); the Mean Reciprocal Ranking (MRR) of pathway prediction reached 0.67. DDAP has advantages compared to previous informatics tools in several aspects: (i) it does not rely on large databases, making it a high efficiency tool, (ii) the predicted DD affinity is represented by a probability (0-1), which is more intuitive than raw scores, (iii) its performance is competitive compared to the current popular rule-based algorithm. DDAP is so far the first machine learning based algorithm for type I PKS DD affinity and pathway prediction. We also established the first database of type I modular PKSs, featuring a comprehensive annotation of available docking domains information in bacterial biosynthetic pathways. AVAILABILITY AND IMPLEMENTATION: The DDAP database is available at https://tylii.github.io/ddap. The prediction algorithm DDAP is freely available on GitHub (https://github.com/tylii/ddap) and released under the MIT license. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
SUMMARY: DDAP is a tool for predicting the biosynthetic pathways of the products of type I modular polyketide synthase (PKS) with the focus on providing a more accurate prediction of the ordering of proteins and substrates in the pathway. In this study, the module docking domain (DD) affinity prediction performance on a hold-out testing dataset reached 0.88 as measured by the area under the receiver operating characteristic (ROC) curve (AUC); the Mean Reciprocal Ranking (MRR) of pathway prediction reached 0.67. DDAP has advantages compared to previous informatics tools in several aspects: (i) it does not rely on large databases, making it a high efficiency tool, (ii) the predicted DD affinity is represented by a probability (0-1), which is more intuitive than raw scores, (iii) its performance is competitive compared to the current popular rule-based algorithm. DDAP is so far the first machine learning based algorithm for type I PKS DD affinity and pathway prediction. We also established the first database of type I modular PKSs, featuring a comprehensive annotation of available docking domains information in bacterial biosynthetic pathways. AVAILABILITY AND IMPLEMENTATION: The DDAP database is available at https://tylii.github.io/ddap. The prediction algorithm DDAP is freely available on GitHub (https://github.com/tylii/ddap) and released under the MIT license. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
Authors: Kai Blin; Thomas Wolf; Marc G Chevrette; Xiaowen Lu; Christopher J Schwalen; Satria A Kautsar; Hernando G Suarez Duran; Emmanuel L C de Los Santos; Hyun Uk Kim; Mariana Nave; Jeroen S Dickschat; Douglas A Mitchell; Ekaterina Shelest; Rainer Breitling; Eriko Takano; Sang Yup Lee; Tilmann Weber; Marnix H Medema Journal: Nucleic Acids Res Date: 2017-07-03 Impact factor: 16.971
Authors: Paul F Zierep; Natàlia Padilla; Dimitar G Yonchev; Kiran K Telukunta; Dennis Klementz; Stefan Günther Journal: Nucleic Acids Res Date: 2017-07-03 Impact factor: 16.971
Authors: Michael H T Li; Peter M U Ung; James Zajkowski; Sylvie Garneau-Tsodikova; David H Sherman Journal: BMC Bioinformatics Date: 2009-06-16 Impact factor: 3.169
Authors: Somnath Dutta; Jonathan R Whicher; Douglas A Hansen; Wendi A Hale; Joseph A Chemler; Grady R Congdon; Alison R H Narayan; Kristina Håkansson; David H Sherman; Janet L Smith; Georgios Skiniotis Journal: Nature Date: 2014-06-18 Impact factor: 49.962