| Literature DB >> 28499008 |
Priyesh Agrawal1, Shradha Khater1, Money Gupta1, Neetu Sain1, Debasisa Mohanty1.
Abstract
Ribosomally synthesized and post-translationally modified peptides (RiPPs) constitute a rapidly growing class of natural products with diverse structures and bioactivities. We have developed RiPPMiner, a novel bioinformatics resource for deciphering chemical structures of RiPPs by genome mining. RiPPMiner derives its predictive power from machine learning based classifiers, trained using a well curated database of more than 500 experimentally characterized RiPPs. RiPPMiner uses Support Vector Machine to distinguish RiPP precursors from other small proteins and classify the precursors into 12 sub-classes of RiPPs. For classes like lanthipeptide, cyanobactin, lasso peptide and thiopeptide, RiPPMiner can predict leader cleavage site and complex cross-links between post-translationally modified residues starting from genome sequences. RiPPMiner can identify correct cross-link pattern in a core peptide from among a very large number of combinatorial possibilities. Benchmarking of prediction accuracy of RiPPMiner on a large lanthipeptide dataset indicated high sensitivity, specificity, accuracy and precision. RiPPMiner also provides interfaces for visualization of the chemical structure, downloading of simplified molecular-input line-entry system and searching for RiPPs having similar sequences or chemical structures. The backend database of RiPPMiner provides information about modification system, precursor sequence, leader and core sequence, modified residues, cross-links and gene cluster for more than 500 experimentally characterized RiPPs. RiPPMiner is available at http://www.nii.ac.in/rippminer.html.Entities:
Mesh:
Substances:
Year: 2017 PMID: 28499008 PMCID: PMC5570163 DOI: 10.1093/nar/gkx408
Source DB: PubMed Journal: Nucleic Acids Res ISSN: 0305-1048 Impact factor: 16.971
Figure 1.Schematic diagram depicting overall organization of RiPPMiner web server The RiPPMiner web server consists of two major components, the backend database RiPPDB and query interface RiPPMiner. The backend database catalogs information on experimentally characterized ribosomally synthesized and post-translationally modified peptides (RiPPs), while the query interface has been developed based on machine learning based analysis of the sequence and chemical structure data of known RiPPs in RiPPDB.
Figure 2.Screenshots depicting query interface of RiPPMiner and various of known RiPPs present in RiPPDB The textbox for providing sequence of the RiPP for prediction of cleavage site and cross-links (Panel 1). Statistics on number of RiPPs in each of the 11 RiPP classes (Panel 2). Screenshot depicting various features of experimentally characterized RiPPs cataloged in RiPPDB using lathipeptide Ericin A as example (Panel 3).
Figure 3.Typical output screen of RiPPMiner for cleavage and cross-link prediction for a lanthipeptide and subsequent analysis of the results screenshot depicting predicted RiPP class, cleavage site, cross-links and chemical structures for a lanthipeptide (Panel 1). Search results for RiPPs in RiPPDB having similar precursor sequence as the query RiPP (Panel 2). BLAST alignment of the query RiPP sequence with matching RiPPs in RiPPDB (Panel 3). SMILES code for the predicted cross-linked structure (Panel 4). Results of search for known RiPPs having chemical structure similarity to the predicted cross-linked structure (Panel 5).
Summary of benchmarking results
| Prediction type | Classifier type | Cross validation | AUC-ROC | Sensitivity | Specificity | MCC | Precision |
|---|---|---|---|---|---|---|---|
|
| SVM | 2-FOLD | 0.96 | 0.93 | 0.90 | 0.85 | 0.90 |
|
| Multi Class SVM | LOO |
|
|
| ||
|
| SVM | LOO | 0.97 | 0.71 | 0.99 | 0.69 | 0.69 |
| SVM | 2-FOLD | 0.97 | |||||
|
| RF | LOO | 0.90 | 0.72 | 0.95 | 0.73 | 0.68 |
| RF | 2-FOLD# | 0.81 | |||||
| RF | 2-FOLD* | 0.92 | |||||
| SVM | LOO | 0.81 | 0.57 | 0.94 | 0.63 | 0.54 | |
| SVM | 2-FOLD# | 0.76 | |||||
| SVM | 2-FOLD* | 0.87 | |||||
| Lasso peptide cleavage and cross-link | SVM | LOO | 0.99 | In 83% (50 out of 60) of the test cases correct prediction was in top rank, while in 92% of the test cases correct prediction was in top two ranks. | |||
| Cyanobactin core peptide | SVM RSII@ | LOO | 0.96 | Correct prediction could be done in all cases in a dataset consisting of 21 fragments with heterocycle rings and 7 fragments without heterocycle rings. | |||
| SVM RSIII@ | LOO | 0.95 | |||||
| Thiopeptide | Motif Based | Correct cross-links could be predicted in 28 out of 35 thiopeptides | |||||
In case of RiPP class prediction sensitivity, specificity and MCC values indicated by underline are average over all 12 RiPP classes.
#For validation of lanthipeptide prediction the dataset has been divided into two halves at cyclizable fragment level (i.e. sub-sequences of the type Ser/Thr-(X)n-Cys or Cys-(X)n-Ser/Thr).
*For validation of lanthipeptide prediction the dataset has been divided into two halves at lanthipeptide level.
@ Each core sequence of cyanobactin is flanked by an N-terminal recongnition sequence (RSII) and a C-terminal recognition sequence (RSIII).