| Literature DB >> 31589649 |
E Tejera1,2, I Carrera3,4, Karina Jimenes-Vargas1, V Armijos-Jaramillo1,2, A Sánchez-Rodríguez2,5, M Cruz-Monteagudo6,7, Y Perez-Castillo2,8.
Abstract
The prediction of cell-lines sensitivity to a given set of compounds is a very important factor in the optimization of in-vitro assays. To date, the most common prediction strategies are based upon machine learning or other quantitative structure-activity relationships (QSAR) based approaches. In the present research, we propose and discuss a straightforward strategy not based on any learning modelling but exclusively relying upon the chemical similarity of a query compound to reference compounds with annotated activity against cell lines. We also compare the performance of the proposed method to machine learning predictions on the same problem. A curated database of compounds-cell lines associations derived from ChemBL version 22 was created for algorithm construction and cross-validation. Validation was done using 10-fold cross-validation and testing the models on new data obtained from ChemBL version 25. In terms of accuracy, both methods perform similarly with values around 0.65 across 750 cell lines in 10-fold cross-validation experiments. By combining both methods it is possible to achieve 66% of correct classification rate in more than 26000 newly reported interactions comprising 11000 new compounds. A Web Service implementing the described approaches (both similarity and machine learning based models) is freely available at: http://bioquimio.udla.edu.ec/cellfishing.Entities:
Year: 2019 PMID: 31589649 PMCID: PMC6779297 DOI: 10.1371/journal.pone.0223276
Source DB: PubMed Journal: PLoS One ISSN: 1932-6203 Impact factor: 3.240
Fig 1Variation in the true positive (TP) (Left) and true negative (TN) (Right) with respect to similarity cutoff considering Morgan radius of 2, 4 and 8. Results obtained from 10-fold cross-validation.
Fig 2Left (A) Variation in the accuracy (ACU) with respect to similarity cutoff considering Morgan radius of 2, 4 and 8. Right (B) Boxplot for true positive (TP) and true negative (TN) in 758 cell lines for similarity cutoff of 0.25 and 0.30 with Radius = 8.
Fig 3Comparative result in terms of accuracy, true positive and true negative, for SVM (blue boxes) and similarity-based strategy (orange boxes) under several cutoff similarities.
Performance indexes obtained in 10-fold cross-validation by combining both methods under cutoff distance of 0.25 and 0.3.
Results are limited to cell lines with accuracy of 0.65 and 0.7.
| Accuracy (ACU) | ||||
|---|---|---|---|---|
| 0.65 | 0.65 | 0.7 | 0.7 | |
| Similarity Cutoff | 0.25 | 0.3 | 0.25 | 0.3 |
| Cell Lines (SVM) | 258 | 189 | ||
| TP (SVM) | 0.700 | 0.758 | ||
| TN (SVM) | 0.82 | 0.830 | ||
| Cell Lines (Similarity) | 162 | 123 | 48 | 54 |
| TP (Similarity) | 0.722 | 0.67 | 0.791 | 0.777 |
| TN (Similarity) | 0.661 | 0.75 | 0.699 | 0.750 |
| Cell Lines (Common) | 65 | 98 | 27 | 45 |
| Cell Lines (Total) | 355 | 283 | 210 | 198 |
| TP (Total) | 0.69 | 0.675 | 0.751 | 0.758 |
| TN (Total) | 0.786 | 0.832 | 0.825 | 0.834 |
Performance indexes across new compounds in external validation.
| Model | Method | Compounds/ Interactions | Cell lines | TP | TN | Coverage | TP_C | TN_C |
|---|---|---|---|---|---|---|---|---|
| 1 | SVM | 11202 / | 758 | 0.610 | 0.663 | 100% | ||
| 2 | Similarity-based | 0.572 | 0.574 | 91.76 | 0.617 | 0.551 | ||
| 3 | Similarity-based | 0.376 | 0.765 | 65.5 | 0.550 | 0.673 | ||
| 4 | Similarity-based | 0.260 | 0.843 | 47.56 | 0.515 | 0.722 | ||
| 5 | Similarity-based | 0.617 | 0.551 | 100% | ||||
| 6 | Similarity-based | 0.661 | 0.700 | 100% | ||||
| 7 | Similarity-based | 0.523 | 0.656 | 100% | ||||
Notes: 1) Coverage: the amount of compounds in which at least one prediction was made with similarity-based approach. 2) TP_C and TN_C are the true positive and true negative evaluated only in the portion of compounds covered by similarity-approach.