| Literature DB >> 29420638 |
Yunierkis Perez-Castillo1, Aminael Sánchez-Rodríguez2, Eduardo Tejera3, Maykel Cruz-Monteagudo4,5,6, Fernanda Borges4, M Natália D S Cordeiro5, Huong Le-Thi-Thu7, Hai Pham-The8.
Abstract
Gastric cancer is the third leading cause of cancer-related mortality worldwide and despite advances in prevention, diagnosis and therapy, it is still regarded as a global health concern. The efficacy of the therapies for gastric cancer is limited by a poor response to currently available therapeutic regimens. One of the reasons that may explain these poor clinical outcomes is the highly heterogeneous nature of this disease. In this sense, it is essential to discover new molecular agents capable of targeting various gastric cancer subtypes simultaneously. Here, we present a multi-objective approach for the ligand-based virtual screening discovery of chemical compounds simultaneously active against the gastric cancer cell lines AGS, NCI-N87 and SNU-1. The proposed approach relays in a novel methodology based on the development of ensemble models for the bioactivity prediction against each individual gastric cancer cell line. The methodology includes the aggregation of one ensemble per cell line using a desirability-based algorithm into virtual screening protocols. Our research leads to the proposal of a multi-targeted virtual screening protocol able to achieve high enrichment of known chemicals with anti-gastric cancer activity. Specifically, our results indicate that, using the proposed protocol, it is possible to retrieve almost 20 more times multi-targeted compounds in the first 1% of the ranked list than what is expected from a uniform distribution of the active ones in the virtual screening database. More importantly, the proposed protocol attains an outstanding initial enrichment of known multi-targeted anti-gastric cancer agents.Entities:
Mesh:
Substances:
Year: 2018 PMID: 29420638 PMCID: PMC5805264 DOI: 10.1371/journal.pone.0192176
Source DB: PubMed Journal: PLoS One ISSN: 1932-6203 Impact factor: 3.240
Fig 1Summary of the data sets curation workflow.
Results are summarized for the: a) AGS, b) NCI-N87 and c) SNU-1 data sets.
Performance of the base models.
| Classification Performance | BCR | ||||||
|---|---|---|---|---|---|---|---|
| Train ( | Selection ( | External ( | Cross-validation Accuracy | Train | Selection | External | |
| AGS | |||||||
| Mean | 84 (67/96) | 77 (60/89) | 73 (58/84) | 74.06 | 58.44 | 53.02 | 52.29 |
| Best Model | 97 (96/98) | 85 (83/87) | 72 (76/69) | 71.86 | 94.67 | 81.4 | 67.7 |
| NCI-N87 | |||||||
| Mean | 89 (88/89) | 74 (82/67) | 56 (57/55) | 73.93 | 82.57 | 58.88 | 46.97 |
| Best Model | 100 (100/100) | 82 (82/82) | 47 (53/41) | 78.07 | 100.00 | 81.82 | 41.52 |
| SNU-1 | |||||||
| Mean | 82 (62/96) | 68 (40/86) | 52 (24/72) | 70.66 | 54.43 | 33.97 | 24.56 |
| Best Model | 98 (97/100) | 80 (78/81) | 56 (64/50) | 72.84 | 95.50 | 76.75 | 49.07 |
(a) Training data set.
(b) Selection data set.
(c) External data set.
(d) Cross-validation accuracy
(e) Results are presented as Accuracy (Sensitivity/Specificity)
Summary of the performance of the developed ensemble models.
| Classification Performance | BCR | ||||||
|---|---|---|---|---|---|---|---|
| Train( | Selection( | External( | Train | Selection | External | ||
| AGS | |||||||
| Mean | 98 (95/100) | 86 (85/87) | 78 (78/78) | 93.02 | 83.21 | 77.58 | 87.96 |
| Best Model | 99 (98/99) | 87 (85/88) | 77 (77/78) | 96.88 | 84.59 | 76.77 | 90.53 |
| NCI-N87 | |||||||
| Mean | 99 (99/99) | 81 (85/76) | 71 (71/71) | 97.69 | 70.73 | 69.84 | 82.89 |
| Best Model | 100 (100/100) | 91 (91/91) | 71 (71/71) | 100.00 | 90.91 | 70.59 | 95.35 |
| SNU-1 | |||||||
| Mean | 98 (97/100) | 85 (80/88) | 64 (63/65) | 94.08 | 71.67 | 61.60 | 81.47 |
| Best Model | 99 (97/100) | 100 (100/100) | 63 (64/63) | 95.50 | 100.00 | 62.35 | 97.72 |
(a) Training data set.
(b) Selection data set.
(c) External data set.
(d) Geometric mean of the BCR metric across training and selection sets
(e) Results are presented as Accuracy (Sensitivity/Specificity)
Fig 2Comparison between the accuracy of the best ensemble for each endpoint and the mean accuracy of the base models they are composed by.
The mean accuracy of the base models is shown using squares-pattern bars.
VS performance of the multi-target protocols.
| VS Protocol | Model | EF 1% | BEDROC (α = 160.9) | AUAC | Cov. Dom. (%) | ||
|---|---|---|---|---|---|---|---|
| AGS | NCI-N87 | SNU-1 | |||||
| 1 | MV-5-BCRTS | MV-5- BCRTS | MV-15- BCRTS | 19.60 | 0.35 | 0.54 | 100 |
| 2 | MV-5- BCRTS | MV-5- BCRTS | MV-15-AIC | 19.60 | 0.35 | 0.58 | 100 |
| 3 | MV-5- BCRTS | MV-5- BCRTS | SV-10- BCRTS | 19.60 | 0.29 | 0.60 | 100 |
| 4 | MV-10- BCRTS | SV-5- BCRTS | MV-15- BCRTS | 19.60 | 0.26 | 0.62 | 100 |
| 5 | SV-5- BCRTS | MV-5- BCRTS | MV-15- BCRTS | 19.60 | 0.25 | 0.54 | 100 |
| 6 | SV-5- BCRTS | MV-5- BCRTS | MV-15-AIC | 19.60 | 0.31 | 0.59 | 100 |
| 7 | SV-15-AIC | MV-5- BCRTS | MV-5- BCRTS | 19.60 | 0.27 | 0.59 | 100 |
| 8 | SV-15-AIC | MV-5- BCRTS | MV-15- BCRTS | 19.60 | 0.41 | 0.57 | 100 |
| 9 | SV-15-AIC | MV-5- BCRTS | MV-15-AIC | 19.60 | 0.32 | 0.61 | 100 |
| 10 | SV-15-AIC | MV-5- BCRTS | SV-10- BCRTS | 19.60 | 0.43 | 0.64 | 100 |
| 11 | SV-15-AIC | MV-5- BCRTS | SV-15- BCRTS | 19.60 | 0.38 | 0.57 | 100 |
| 12 | SV-15-AIC | MV-10- BCRTS | MV-15- BCRTS | 19.60 | 0.23 | 0.55 | 100 |
| 13 | SV-15-AIC | MV-15- BCRTS | MV-15- BCRTS | 19.60 | 0.35 | 0.57 | 100 |
| 14 | SV-15-AIC | MV-15- BCRTS | MV-15-AIC | 19.60 | 0.26 | 0.62 | 100 |
| 15 | SV-15-AIC | MV-15- BCRTS | SV-10- BCRTS | 19.60 | 0.35 | 0.64 | 100 |
| 16 | SV-15-AIC | SV-5- BCRTS | MV-15- BCRTS | 19.60 | 0.33 | 0.56 | 100 |
| 17 | SV-15-AIC | SV-5- BCRTS | SV-10- BCRTS | 19.60 | 0.35 | 0.65 | 100 |
| 18 | SV-15-AIC | SV-10- BCRTS | MV-15- BCRTS | 19.60 | 0.25 | 0.59 | 100 |
(a) Model of each endpoint aggregated for the VS protocol. The code of each model is based upon the combination of Aggregation Method (MV, SV), Number of Allowed Base Models in the Initial Population (5, 10, 15) and Minimized Metric (AIC, Classification Error)
(b) EF at a selection size equals to 1% of screened data
(c) BEDROC for α = 160.9
(d) Area Under the Accumulative Curve
(e) Percent of coverage of the VSVS by the multi-objective VS protocol AD
Fig 3Accumulative curves for the 18 top-performing VS protocols.
Results are presented for a) Whole screening. b) Top 5% of screened data.