| Literature DB >> 23586520 |
Eszter Hazai1, Istvan Hazai, Isabelle Ragueneau-Majlessi, Sophie P Chung, Zsolt Bikadi, Qingcheng Mao.
Abstract
BACKGROUND: Human breast cancer resistance protein (BCRP) is an ATP-binding cassette (ABC) efflux transporter that confers multidrug resistance in cancers and also plays an important role in the absorption, distribution and elimination of drugs. Prediction as to if drugs or new molecular entities are BCRP substrates should afford a cost-effective means that can help evaluate the pharmacokinetic properties, efficacy, and safety of these drugs or drug candidates. At present, limited studies have been done to develop in silico prediction models for BCRP substrates. In this study, we developed support vector machine (SVM) models to predict wild-type BCRP substrates based on a total of 263 known BCRP substrates and non-substrates collected from literature. The final SVM model was integrated to a free web server.Entities:
Mesh:
Substances:
Year: 2013 PMID: 23586520 PMCID: PMC3641962 DOI: 10.1186/1471-2105-14-130
Source DB: PubMed Journal: BMC Bioinformatics ISSN: 1471-2105 Impact factor: 3.169
The mean values of SVM prediction performance parameters of 100 runs using various kernels
| Linear | Training | 80.7 | 90.4 | 64.8 | 0.581 |
| | Test | 68.8 | 82.1 | 46.6 | 0.312 |
| | External | 70.7 | 77.6 | 59.1 | 0.375 |
| Polynomial | Training | 79.2 | 96.4 | 50.7 | 0.506 |
| | Test | 65.8 | 86.8 | 30.8 | 0.198 |
| | External | 66.1 | 81.9 | 39.8 | 0.174 |
| RBF | Training | 84.5 | 93.8 | 69.1 | 0.665 |
| | Test | 69.7 | 83.3 | 47.2 | 0.332 |
| External | 70.9 | 77.7 | 59.6 | 0.382 |
ACC, accuracy (overall prediction accuracy); SP, specificity (prediction accuracy for the non-substrates); SE, sensitivity (prediction accuracy for the substrates); MCC, the Matthews correlation coefficient (a more balanced prediction parameter than ACC). The external data set was only used to validate the prediction power of the models constructed, and was not used for model selection.
Performance parameters of 100 runs using various ratios of training/test sets
| 0.5/0.5 | Training | 83.8 | 94.6 | 65.8 | 0.649 |
| | Test | 67.8 | 82.2 | 44.0 | 0.288 |
| | External | 69.5 | 78.3 | 54.7 | 0.344 |
| 0.6/0.4 | Training | 85.1 | 94.7 | 69.1 | 0.678 |
| | Test | 69.0 | 83.2 | 45.5 | 0.315 |
| | External | 70.8 | 79.0 | 57.1 | 0.372 |
| 0.7/0.3 | Training | 83.4 | 93.3 | 67.1 | 0.642 |
| | Test | 71.1 | 84.2 | 49.1 | 0.360 |
| | External | 70.8 | 78.4 | 58.1 | 0.375 |
| 0.75/0.25 | Training | 84.5 | 93.8 | 69.1 | 0.665 |
| | Test | 69.7 | 83.3 | 47.2 | 0.332 |
| | External | 70.9 | 77.7 | 59.6 | 0.382 |
| 0.8/0.2 | Training | 83.6 | 93.5 | 67.1 | 0.644 |
| | Test | 70.4 | 83.6 | 48.7 | 0.35 |
| | External | 70.6 | 77.5 | 59.1 | 0.376 |
| 0.85/0.15 | Training | 82.9 | 93.5 | 65.1 | 0.627 |
| | Test | 70.3 | 85.1 | 46.4 | 0.347 |
| External | 70.9 | 78.5 | 58.1 | 0.376 |
The total number of molecules used in the training and test data sets were 223. The number of molecules in the external validation data set was 40. ACC, accuracy (overall prediction accuracy); SP, specificity (prediction accuracy for the non-substrates); SE, sensitivity (prediction accuracy for the substrates); MCC, the Matthews correlation coefficient (a more balanced prediction parameter than ACC). The external data set was only used to validate the prediction power of the models constructed, and was not used for model selection.
Prediction power of the selected SVM model
| Training set | 89 | 15 | 38 | 25 | 76.0 | 85.6 | 60.3 | 0.478 |
| Test set | 31 | 4 | 11 | 10 | 75.0 | 88.6 | 52.4 | 0.448 |
| External set | 19 | 6 | 10 | 5 | 72.5 | 76.0 | 66.7 | 0.422 |
TP, true positive; TN, true negative; FP, false positive; FN, false negative; ACC, accuracy (overall prediction accuracy); SP, specificity (prediction accuracy for the non-substrates); SE, sensitivity (prediction accuracy for the substrates); MCC, the Matthews correlation coefficient (a more balanced prediction parameter than ACC).
Overlap of classification in 10 experimental models
| 98 | 100 | 86.31 | 87.07 | 86.69 | 84.41 | 81.37 | 87.07 | 87.07 | 87.45 | 86.69 |
| 87 | | 100 | 87.83 | 85.93 | 87.45 | 81.37 | 87.83 | 87.07 | 87.45 | 90.49 |
| 77 | | | 100 | 89.73 | 94.30 | 87.45 | 91.64 | 92.34 | 89.73 | 91.26 |
| 63 | | | | 100 | 87.83 | 84.79 | 88.21 | 89.73 | 89.35 | 88.59 |
| 91 | | | | | 100 | 88.59 | 88.97 | 94.30 | 91.64 | 92.40 |
| 45 | | | | | | 100 | 85.93 | 88.21 | 84.79 | 84.79 |
| 42 | | | | | | | 100 | 91.64 | 92.78 | 92.02 |
| 62 | | | | | | | | 100 | 91.26 | 91.26 |
| 7 | | | | | | | | | 100 | 91.26 |
| 73 | 100 |
The overall prediction accuracies (ACC) of the 10 experimental models 98, 87, 77, 63, 91, 45, 42, 62, 7, and 73 were 78.33%, 75.29%, 76.81%, 80.23%, 75.67%, 70.34%, 76.05%, 75.29%, 78.71%, and 77.95%, respectively.
List of molecular descriptors found to be used by the selected SVM model
| AAC | Mean information index on atomic composition |
| SPH | spherosity |
| Mor17m | 3D Morse signal 17/weighed by mass |
| Mor25m | 3D Morse signal 25/weighed by mass |
| R2m | Gateway R autocorrelation of lag2 weighed by mass |