| Literature DB >> 29897322 |
Kyungrin Noh1, Sunyong Yoo1,2, Doheon Lee3,4.
Abstract
BACKGROUND: Natural products have been widely investigated in the drug development field. Their traditional use cases as medicinal agents and their resemblance of our endogenous compounds show the possibility of new drug development. Many researchers have focused on identifying therapeutic effects of natural products, yet the resemblance of natural products and human metabolites has been rarely touched.Entities:
Keywords: Data mining; Human metabolite; Medicinal compound; Natural product; Similarity-based prediction
Mesh:
Substances:
Year: 2018 PMID: 29897322 PMCID: PMC5998763 DOI: 10.1186/s12859-018-2196-0
Source DB: PubMed Journal: BMC Bioinformatics ISSN: 1471-2105 Impact factor: 3.169
Fig. 1A systematic overview of the score matrix generation. Three scoring features, structure, target and phenotype, are utilized to generate score matrix of all possible human metabolite and natural product pairs. a Structural similarity is measured by Tanimoto coefficient. Molecular fingerprints of both human metabolite and natural product are compared using Tanimoto coefficient with bond sensitive, remove hydrogen and stereo filter settings in CDK. b Target similarity is measured by amino acid sequence similarity between two target proteins. Amino acid sequences of each target proteins of human metabolites and natural products are compared using Smith-Waterman algorithm. Among all possible target sequence similarity scores, scores over top 5% of the score distribution curve are selected and averaged to generate target similarity score of a human metabolite and natural product pair. c Phenotype similarity is measured by the random walk restart algorithm on the CODA network. For a human metabolite and natural product pair, vectors of all phenotype scores are compared by Pearson’s correlation. The absolute value of the Pearson’s correlation score is calculated for each human metabolite and natural product pairs. d Final score matrix is generated for all human metabolite and natural product pairs. The matrix is later utilized to train SVM model
Fig. 2AUROC value of models generated from different feature sets. SVM models are trained with single features and compared with the model trained with the whole feature sets. a For random test set, SVM model trained only with structure feature shows as high AUROC as the model trained with the whole feature sets. b For structurally similar test set (Tanimoto score ≥ 0.77), all features contribute to improve the overall performance
Fig. 3Performance of predicting natural product derived drugs. For all natural product derived drugs listed in the DrugBank, we evaluated whether our model can predict at least one indication of them. The blue bar represents natural product and human metabolite pairs which are predicted by our method as being similar, and grey bar represents the randomly matched natural product and human metabolite pairs. The predicted pairs generally show four times higher performance than the random pairs
Literature validation for natural products, human metabolites and phenotype association
| Natural product-human metabolite | Natural product – phenotype | |||
|---|---|---|---|---|
| Our method | Random | Our method | Random | |
| Co-occurrence | 45.09 | 3.77 | 3.78 | 1.47 |
| Jaccard index | 1.87 × 10−3 | 8.21 × 10− 5 | 1.13 × 10− 3 | 1.16 × 10− 4 |
| Fisher’s exact testa | 266 | 38 | 52 | 6 |
aThe number of significant associations satisfying the Fisher’s exact test p-value threshold (p-value < 0.001)