| Literature DB >> 24473174 |
Florbela Pereira1, Diogo A R S Latino2, Susana P Gaudêncio3.
Abstract
The comprehensive information of small molecules and their biological activities in the PubChem database allows chemoinformatic researchers to access and make use of large-scale biological activity data to improve the precision of drug profiling. A Quantitative Structure-Activity Relationship approach, for classification, was used for the prediction of active/inactive compounds relatively to overall biological activity, antitumor and antibiotic activities using a data set of 1804 compounds from PubChem. Using the best classification models for antibiotic and antitumor activities a data set of marine and microbial natural products from the AntiMarin database were screened-57 and 16 new lead compounds for antibiotic and antitumor drug design were proposed, respectively. All compounds proposed by our approach are classified as non-antibiotic and non-antitumor compounds in the AntiMarin database. Recently several of the lead-like compounds proposed by us were reported as being active in the literature.Entities:
Mesh:
Substances:
Year: 2014 PMID: 24473174 PMCID: PMC3944514 DOI: 10.3390/md12020757
Source DB: PubMed Journal: Mar Drugs ISSN: 1660-3397 Impact factor: 5.118
Comparison of different machine learning techniques for building QSAR classification models with CDK descriptors.
| ML | SVM a | RF b | CT | |
|---|---|---|---|---|
|
| ||||
|
|
|
|
|
|
| Overall | YES f
| YES | YES | YES |
| Antitumor | YES | YES | YES | YES |
| Antibiotic | YES | YES | YES | YES |
a Ten-fold cross-validation; b out-of-bag; c the ratio of true positives to the sum of true positives and false negatives; d the ratio of true negatives to the sum of true negatives and false positives; e the square root of the product of sensitivity and specificity; f active; g non-active.
Figure 1Comparison of the van der Waals volume of: (a) 2706 PubChem compounds, and (b) 1192 AntiMarin compounds containing different ring numbers.
Scheme 1The selected 28 lead bioactive MNPs and MbNPs from the AntiMarin database using theRF activity model.
Scheme 2The selected 16 lead antitumor MNPs and MbNPs from the AntiMarin database using the RF antitumor model.
Scheme 3The reported compounds from training and test sets.
Scheme 4The unreported 18 lead antibiotic MNPs and MbNPs from AntiMarin database using the RF antitumor model with a Probantibiotic greater than or equal to 0.9.
The comparison of descriptor selected with descriptor importance using to build QSAR models for the prediction of overall activity, antitumor and antibiotic activities.
| Model | CDK Descriptors | ||
|---|---|---|---|
| Overall biological activity | SVM a | 20D: ALogp2; BCUTc-1l; BCUTp-1h; PPSA-2; FPSA-3; TPSA; RHSA; Wlambda2.unity; ATSc3; ATSc4; C3SP2; SCH-5; SP-6; VP-7; khs.ssCH2; khs.dsCH; khs.sssCH; khs.sNH2; MDEC-33; TopoPSA | |
| RF | MeanDecreaseAccuracy b | Weta1.unity; FMF; BCUTw-1l; HybRatio; PNSA-2; ATSm1; FPSA-1; bpol; BCUTp-1l; TPSA | |
| CT | 7D: SP-6; BCUTc-1h; Wnu1.unity; Weta1.unity; SC-5; ATSc4; PPSA-3 | ||
| Antitumor activity | SVM a | 43D: ALogp2; AMR; BCUTw-1h; BCUTp-1l; PNSA-3; FPSA-3; FNSA-2; WNSA-3; TPSA; naAromAtom; nAromBond; ATSc2; ATSc3; ATSc4; ATSc5; ATSm5; bpol; C1SP2; C2SP2; SCH-4; VCH-4; VCH-5; VCH-7; VC-6; SPC-4; FMF; HybRatio; khs.dsCH; khs.aaCH; khs.sssCH; khs.tsC; khs.sNH2; khs.dO; khs.ssO; khs.sF; LOBMIN; MDEC-12; MDEC-13; MDEC-22; MDEO-11; MDEO-12; MDEO-22; TopoPSA | |
| RF | MeanDecreaseAccuracy b | MDEO-12; XlogP; khs.sssCH; ATSc5; FMF; MDEC-33; TopoPSA; MDEO-11; VC-5; MDEC-22 | |
| CT | 9D: MDEO-12; Khs.sssCH; MDEO-11; VC-6; ALogp2; SCH-7; BCUTc-1h; C2SP2; BCUTp-1l | ||
| Antibiotic activity | SVM a | 38D: ALogP; BCUTw-1h; BCUTp-1l; DPSA-3; FPSA-3; FNSA-2; RPCG; RNCS; TPSA; Wnu1.unity; nAromBond; ATSc1; ATSc5; ATSm5; nBase; C2SP2; C3SP3; SCH-4; SCH-5; VCH-4; VCH-7; VPC-5; khs.sssCH; khs.tsC; khs.dssC; khs.ssO; khs.sF; nAtomLC; MDEC-13; MDEC-22; MDEC-24; MDEC-33; MDEO-11; MDEO-12; MDEO-22; MOMI-XZ; TopoPSA; XLogP | |
| RF | MeanDecreaseAccuracy b | MDEO-12; MDEC-22; C2SP2; khs.dsCH; khs.sssCH; khs.dssC; VCH-5; MDEC-33; TopoPSA; XlogP | |
| CT | 16D: TopoPSA; C2SP2; VC-5; MDEC-22; XlogP; BCUTp-1h; VP-0; SCH-7; DPSA-1; Khs.dssC; Khs.ssCH2; Khs.sssCH; Khs.sssN; MDEO-12; THSA; VC-4 | ||
a The selection of the descriptors was with the CFS (correlation-based feature subset selection) filter from Weka; b the mean decrease in accuracy and the mean decrease in Gini are two measures of importance for the descriptors using the RF algorithm.
Figure 2Representation of the first two rules of the antitumor classification tree derived with the CART algorithm for the training set.
Scheme 5The mannosylparomomycin from the training set.
Figure 3Representation of the first two rules of the antibiotic classification tree derived with the CART algorithm for the training set.