| Literature DB >> 29717183 |
Roger Estrada-Tejedor1, Gerhard F Ecker2.
Abstract
ATP binding cassette (ABC) transporters play a pivotal role in drug elimination, particularly on several types of cancer in which these proteins are overexpressed. Due to their promiscuous ligand recognition, building computational models for substrate classification is quite challenging. This study evaluates the use of modified Self-Organizing Maps (SOM) for predicting drug resistance associated with P-gp, MPR1 and BCRP activity. Herein, we present a novel multi-labelled unsupervised classification model which combines a new clustering algorithm with SOM. It significantly improves the accuracy of substrates classification, catching up with traditional supervised machine learning algorithms. Results can be applied to predict the pharmacological profile of new drug candidates during the drug development process.Entities:
Mesh:
Substances:
Year: 2018 PMID: 29717183 PMCID: PMC5931609 DOI: 10.1038/s41598-018-25235-9
Source DB: PubMed Journal: Sci Rep ISSN: 2045-2322 Impact factor: 4.379
Description of the benchmark data sets used for the validation of the model.
| Data | Inputs | Attributes | Classes | Imbalance ratio |
|---|---|---|---|---|
| Wines | 178 | 13 | 3 | 1:2 |
| New Thyroid | 193 | 5 | 3 | 1:19 |
| Cars | 1728 | 6 | 4 | 1:19 |
| Yeast | 1484 | 8 | 10 | 1:93 |
Descriptors reported by Demel et al.[16] for the binary classification of P-gp, MRP1 and BCRP substrates.
| Model | Selected Descriptors |
|---|---|
| P-gp | apol, chi0_C, chi0v_C, chi1_C, rings, PEOE_VSA-5, PEOE_VSA_POL, PEOE_VSA_PPOS, SlogP_VSA0, SMR_VSA2, TPSA, opr_brigid |
| MRP1 | a_count, a_hyd, chi1v, opr_nring, PEOE_VSA + 3, PEOE_VSA + 5, PEOE_VSA-4, PEOE_VSA-6, Q_VSA_PNEG, vsa_acc |
| BCRP | a_count, a_hyd, a_nC, a_nH, chi1v, SlogP_VSA1, SlogP_VSA2, SlogP_VSA8, SMR_VSA1, SMR_VSA6, VDistMa |
| DD17 | apol, opr_brigid, PEOE_VSA + 3, PEOE_VSA + 5, PEOE_VSA-4, PEOE_VSA-5, PEOE_VSA-6, PEOE_VSA_POL, Q_VSA_PNEG, SlogP_VSA0, SlogP_VSA1, SlogP_VSA2, SlogP_VSA8, SMR_VSA1, SMR_VSA2, SMR_VSA6, vsa_acc |
A total of 17 descriptors with Pearson’s correlation coefficient lower than 0.9 were identified from all models and they were joined into the DD17 set.
Figure 1Graphical representation of the probability expansion algorithm proposed for SOM mapping. Considering a binary classification problem (classes are represented as solid and pointed dots) the expansion algorithm proposed undertake the following steps: (A) After SOM mapping into a 5 × 5 neuron space, each filled position transfers its content to the adjacent coordinates (grey arrows), excluding the filled positions of the training set. (B) This procedure also affects boundary neurons since these positions are interrelated, generating a toroidal shaped space. Note that grey cells account for the original occupied neurons and they are not affected by the expansion algorithm. (C) The contributions are added up at every position. (D) Final probabilities of original unoccupied neurons are therefore calculated according to the number of examples within each coordinate.
Figure 2Schematic representation of the algorithm implemented to improve predictions in the boundaries. The neighbours of a given compound (k) can be different at each SOM run. With this algorithm, we would like to identify those coordinates that tend to locate in boundaries. The class of every neighbour is added and the total distribution is averaged over all calculated SOMs. Resulting probabilities were therefore used as prior probabilities in the expansion algorithm.
Figure 3Differences in class probabilities when mapping wines data set in a 20 × 20 SOM using k-NN and CSOM algorithms. Higher P value refer to areas with higher class probability.
Figure 4(A) Distribution of the accuracies obtained in 10-fold cross-validation by applying k-NN and CSOM algorithms on wines database. (B) Example of a mapped CSOM in which the probability of the six non-conclusive examples reported by the algorithm are shown. Interestingly, true class is the one with the higher probability in all the examples, although it is not enough to achieve the threshold value (t > 0.5).
Results obtained using k-NN and CSOM applying 10-fold cross-validation.
| Database | Method | Acc. | Mean Recall | F | p-value |
|---|---|---|---|---|---|
| Wines | CSOM (t = 0.6) | 0.86 | 0.84 | 0.86 | 0.008 |
| k-NN | 0.77 | 0.72 | 0.73 | ||
| Yeast | CSOM (t = 0.5) | 0.58 | 0.52 | 0.52 | 0.005 |
| k-NN | 0.50 | 0.45 | 0.45 | ||
| New Thyroid | CSOM (t = 0.5) | 0.93 | 0.90 | 0.91 | 0.17 |
| k-NN | 0.90 | 0.85 | 0.86 | ||
| Cars | CSOM (t = 0.7) | 0.91 | 0.82 | 0.83 | 0.006 |
| k-NN | 0.88 | 0.74 | 0.75 |
The effect of the threshold values (t) were studied individually for each data set (see supporting information, Fig. S2).
Figure 5SOM projection of the original data set using ChemGPS descriptors (A) in contrast to the SMOTE data set (B) on 100 × 100 SOM with a variable adaptation radius from 10 to 1 in 100 iterations.
10-fold cross validation results obtained in the classification of ABC-transporter substrates.
| Descriptors | Classification Method | Accuracy | Mean Recall | F-measure | % out |
|---|---|---|---|---|---|
| ChemGPS |
|
|
|
|
|
| SOM+k-NN | 0.69 | 0.45 | 0.40 | ||
| k-NN | 0.68 | 0.45 | 0.41 | ||
| RF | 0.73 | 0.43 | 0.41 | ||
| SHED |
|
|
|
|
|
| SOM+k-NN | 0.66 | 0.42 | 0.38 | ||
| k-NN | 0.66 | 0.44 | 0.39 | ||
| RF | 0.74 | 0.42 | 0.41 | ||
| DD17 |
|
|
|
|
|
| SOM+k-NN | 0.69 | 0.44 | 0.40 | ||
| k-NN | 0.65 | 0.45 | 0.40 | ||
| RF | 0.77 | 0.45 | 0.44 | ||
| AANN |
|
|
|
|
|
| SOM+k-NN | 0.65 | 0.41 | 0.37 | ||
| k-NN | 0.65 | 0.45 | 0.40 | ||
| RF | 0.71 | 0.43 | 0.41 |
The use of SOM projection combined with CSOM and k-NN are compared with those obtained directly with the original dataset. For the sake of clarity, only results regarding k-NN and Random Forest (RF) are presented. In all cases SMOTE oversampling was applied. SOM topology was fixed to 100 × 100, initial adaptation radius and CSOM threshold were set to 20 and 0.9, correspondingly. The percentage of non-conclusive examples (% out) are shown when the CSOM approach is used.
Figure 6Distribution of the prior probabilities (calculated by the CSOM algorithm) on SOM coordinates for every kind of ABC-transporter substrates in the training set (A). The combination of these probabilities allowed to obtain the mapped SOM in which black dots account for non-conclusive coordinates (B). Sum of calculated probabilities for the test set examples, organized as a confusion matrix. The size of rounded shape accounts for the probability of obtaining the corresponding predicted class (C).