| Literature DB >> 23984415 |
Melisa Edith Gantner1, Mauricio Emiliano Di Ianni, María Esperanza Ruiz, Alan Talevi, Luis E Bruno-Blanch.
Abstract
ABC efflux transporters are polyspecific members of the ABC superfamily that, acting as drug and metabolite carriers, provide a biochemical barrier against drug penetration and contribute to detoxification. Their overexpression is linked to multidrug resistance issues in a diversity of diseases. Breast cancer resistance protein (BCRP) is the most expressed ABC efflux transporter throughout the intestine and the blood-brain barrier, limiting oral absorption and brain bioavailability of its substrates. Early recognition of BCRP substrates is thus essential to optimize oral drug absorption, design of novel therapeutics for central nervous system conditions, and overcome BCRP-mediated cross-resistance issues. We present the development of an ensemble of ligand-based machine learning algorithms for the early recognition of BCRP substrates, from a database of 262 substrates and nonsubstrates compiled from the literature. Such dataset was rationally partitioned into training and test sets by application of a 2-step clustering procedure. The models were developed through application of linear discriminant analysis to random subsamples of Dragon molecular descriptors. Simple data fusion and statistical comparison of partial areas under the curve of ROC curves were applied to obtain the best 2-model combination, which presented 82% and 74.5% of overall accuracy in the training and test set, respectively.Entities:
Mesh:
Substances:
Year: 2013 PMID: 23984415 PMCID: PMC3747366 DOI: 10.1155/2013/863592
Source DB: PubMed Journal: Biomed Res Int Impact factor: 3.411
Figure 1Representative BCRP substrates (left) and nonsubstrates (right) from the six most populated clusters in the dataset.
Features of the best individual model (Model 1) and the other individual models (models 2 to 4) that composed the two best 2-model ensembles.
| Descriptors included |
|
| Sp training set* | Se training set* | Overall accuracy training set* | Sp test set* | Se test set* | Overall accuracy test set* | Leave-group-out CV1 | Randomization2 |
|---|---|---|---|---|---|---|---|---|---|---|
| Model 1: | 8.04 | <0.000000 | 79% | 68% | 74% | 63% | 74% | 66% | 70.4% (±11.9) | 64.4% (±3.4) |
|
| ||||||||||
| Model 2: | 7.52 | <0.000000 | 75.3% | 74.7% | 75% | 76% | 66.7% | 73.5% | 67% (±15) | 61.5% (±3.6) |
|
| ||||||||||
| Model 3: | 10.39 | <0.000000 | 83.5% | 83.5% | 83.5% | 73.2% | 74% | 73.5% | 81.2% (±11.3) | 62.4% (±5.1) |
|
| ||||||||||
| Model 4: | 6.56 | <0.000014 | 63.3% | 70.6% | 67% | 77.5% | 70.4% | 75.5% | 64.8% (±13.6) | 58.3% (±4.05) |
*Considering zero as a cutoff value between substrates and non-substrates. This threshold may be later optimized through ROC curves analysis to provide a background-dependent optimal balance between Sp and Se.
1Results are presented as the average result for the folds ± the standard deviation.
2Results are presented as the average performance of the randomized models ± the standard deviation.
Features of the best individual model (Model 1) and the best ensembles selected.
| Model/ensemble | AUC ROC curve training set | AUC ROC curve test set | Sp training set* | Se training set* | Overall accuracy training set* | Sp test set* | Se test set* | Overall accuracy test set* |
|---|---|---|---|---|---|---|---|---|
| Model 1 | 0.796 | 0.748 | 78.8% | 68.3% | 74% | 63.4% | 74% | 66% |
| Ensemble 1 | 0.850 | 0.785 | 83.5% | 74.7% | 79% | 70.4% | 74% | 71.4% |
| Ensemble 2 | 0.902 | 0.804 | 84.7% | 79.7% | 82% | 76% | 70.4% | 74.5% |
*Considering zero as a cutoff value between substrates and non-substrates.
Figure 2ROC curves of the training set for the best individual model plus the two best model ensembles.
Results of the calculation of the total and partial areas under ROC curve for the best individual model and the 2 best 2-model ensembles.
| Model 1 | Ensemble 1 | Ensemble 2 | |
|---|---|---|---|
| Training set | |||
| Total ROC curve AUC (95% CI) | 0.7964 (0.7284–0.8643) | 0.8503 (0.7917–0.9089) | 0.9022* (0.8578–0.9466) |
| Partial ROC curve AUC (±SD) | |||
| From 1 to Sp = [1 to 0.70] | 0.1612 (±0.0199) | 0.1877 (±0.0199) | 0.2218 (±0.0163)* |
| From 1 to Sp = [1 to 0.75] | 0.1252 (±0.0171) | 0.1459 (±0.0178) | 0.1771 (±0.0145)* |
| From 1 to Sp = [1 to 0.80] | 0.0917 (±0.0147) | 0.1059 (±0.0149) | 0.1337 (±0.0122)† |
|
| |||
| Simulated 577-compound database | |||
| Total ROC curve AUC (95% CI) | 0.7321 (0.6413–0.8229) | 0.7357 (0.6418–0.8297) | 0.7707 (0.6746–0.8669) |
| Partial ROC curve AUC (±SD) | |||
| From 1 to Sp = [1 to 0.70] | 0.1035 (±0.0213) | 0.1127 (±0.0223) | 0.1421 (±0.0223) |
| From 1 to Sp = [1 to 0.75] | 0.0708 (±0.0183) | 0.0794 (±0.0189) | 0.1075 (±0.0208)† |
| From 1 to Sp = [1 to 0.80] | 0.0458 (±0.0140) | 0.0504 (±0.0148) | 0.0765 (±0.0162)† |
*The value is different from the best individual model (model 1) (P < 0.001).
†The value is different from the best individual model (model 1) (P < 0.01).
Results of the enrichment parameters calculation for the best individual model and the best 2-model ensemble.
| Model 1 | Ensemble 2 | |
|---|---|---|
| Training set | ||
| Accumulation curve AUC (AUCc)‡ | 0.6458 | 0.6938 |
| Enrichment factor (EF) | 1.9294 | 1.9294 |
| Robust initial enhancement (RIE) | 1.8338 | 1.9261 |
| Bedroc | 0.9505 | 0.9983 |
| Simulated 577-compound database | ||
| Accumulation curve AUC (AUCc)‡ | 0.7212 | 0.7581 |
| Enrichment factor (EF) | 2.9630 | 5.9259 |
| Robust initial enhancement (RIE) | 2.9455 | 4.6663 |
| Bedroc | 0.2268 | 0.3593 |
‡It verifies that ROC AUC = AUCc/R i − R a/(2∗R i), where R i and R a are the ratios of inactives and actives, respectively.