| Literature DB >> 21845082 |
Sorana D Bolboacă1, Lorentz Jäntschi.
Abstract
The goal of the present research was to present a predictivity statistical approach applied on structure-based prediction models. The approach was applied to the domain of blood-brain barrier (BBB) permeation of diverse drug-like compounds. For this purpose, 15 statistical parameters and associated 95% confidence intervals computed on a 2 × 2 contingency table were defined as measures of predictivity for binary quantitative structure-property models. The predictivity approach was applied on a set of compounds comprised of 437 diverse molecules, 122 with measured BBB permeability and 315 classified as active or inactive. A training set of 81 compounds (~2/3 of 122 compounds assigned randomly) was used to identify the model and a test set of 41 compounds was used as the internal validation set. The molecular descriptor family on vertices cutting was the computation tool used to generate and calculate structural descriptors for all compounds. The identified model was assessed using the predictivity approach and compared to one model previously reported. The best-identified classification model proved to have an accuracy of 69% in the training set (95%CI [58.53-78.37]) and of 73% in the test set (95%CI [58.32-84.77]). The predictive accuracy obtained on the external set proved to be of 73% (95%CI [67.58-77.39]). The classification model proved to have better abilities in the classification of inactive compounds (specificity of ~74% [59.20-85.15]) compared to abilities in the classification of active compounds (sensitivity of ~64% [48.47-77.70]) in the training and external sets. The overall accuracy of the previously reported model seems not to be statistically significantly better compared to the identified model (~81% [71.45-87.80] in the training set, ~93% [78.12-98.17] in the test set and ~79% [70.19-86.58] in the external set). In conclusion, our predictivity approach allowed us to characterize the model obtained on the investigated set of compounds as well as compare it with a previously reported model. According to the obtained results, the reported model should be chosen if a correct classification of inactive compounds is desired and the previously reported model should be chosen if a correct classification of active compounds is most wanted.Entities:
Keywords: blood-brain barrier (BBB); in silico prediction; molecular descriptors family on vertices cutting (MDFV); partition-coefficient; permeation; structure-property relationship (SPR)
Mesh:
Substances:
Year: 2011 PMID: 21845082 PMCID: PMC3155355 DOI: 10.3390/ijms12074348
Source DB: PubMed Journal: Int J Mol Sci ISSN: 1422-0067 Impact factor: 5.923
Diagnostic of classification model presented in Equation 3.
| Parameter (Abbreviation) | Equation | Training Set ( | Test Set ( | External Set ( |
|---|---|---|---|---|
| χ2 statistic ( | 10.29 (0.0013) | 7.75 (0.0054) | 28.24 ( | |
| Φ | 0.3564 | 0.4347 | 0.2994 | |
| Accuracy (AC) | 5 | 69.14 [58.53–78.37] | 73.17 [58.32–84.77] | 72.70 [67.58–77.39] |
| Error Rate (ER) | 6 | 30.86 | 26.83 | 27.30 |
| Prior proportional probability of | 7 | |||
| -an active class | 0.482 [0.371–0.592] | 0.463 [0.318–0.614] | 0.302 [0.253–0.354] | |
| -an inactive class | 0.519 [0.408–0.630] | 0.537 [0.367–0.682] | 0.698 [0.644–0.749] | |
| Sensitivity (Se) | 8 | 64.10 [48.47–77.70] | 84.21 [63.16–95.05] | 42.11 [32.54–52.15] |
| False-negative rate (under-classification, FNR) | 9 | |||
| 35.90 [22.30–45.51] | 15.79 [4.95–36.84] | 57.89 [47.85–67.46] | ||
| Specificity (Sp) | 10 | 73.81 [59.20–85.15] | 63.64 [42.87–81.04] | 85.91 [80.80–89.98] |
| False-positive rate (over-classification, FPR) | 11 | |||
| 26.19 [14.86–40.80] | 36.36 [0.1896–0.5712] | 14.09 [10.02–19.20] | ||
| Positive predictivity (PP) | 12 | 69.44 [53.32–82.51] | 66.67 [46.76–82.76] | 56.34 [44.74–67.43] |
| Negative predictivity (NP) | 13 | 68.89 [54.49–80.89] | 82.35 [59.63–97.48] | 77.46 [72.59–81.80] |
| Post-test probability of classification | 14 | |||
| -as active (PCA) | 0.444 [0.340–0.553] | 0.585 [0.433–0.726] | 0.225 [0.177–0.281] | |
| -as inactive (PCIC) | 0.556 [0.447–0.660] | 0.415 [0.274–0.567] | 0.775 [0.7259–0.818] | |
| Probability of a wrong classification | 15 | |||
| -as active compound (PWCA) | 0.306 [0.175–0.467] | 0.333 [0.172–0.532] | 0.437 [0.326–0.553] | |
| -as inactive compound (PWCI) | 0.311 [0.191–0.455] | 0.177 [0.055–0.404] | 0.225 [0.177–0.281] | |
| Odds Ratio (OR) | 16 | 5.03 [1.96–13.12] | 9.33 [2.18–40.07] | 4.43 [2.53–7.76] |
Diagnostic of classification model presented in Equation 2.
| Parameter (Abbreviation) | Equation | Training ( | Test ( | External ( |
|---|---|---|---|---|
| χ2 statistic ( | 30.91 ( | 9.82 (0.0017) | 28.76 ( | |
| 0.5927 | 0.5922 | 0.5591 | ||
| Accuracy (AC) | 5 | 80.68 [71.45–87.80] | 92.86 [78.12–98.17] | 79.35 [70.19–86.58] |
| Error Rate (ER) | 6 | 19.32 | 7.14 | 20.65 |
| Prior proportional probability of | 7 | |||
| -an active class | 0.511 [0.408–0.614] | 0.179 [0.074–0.350] | 0.435 [0.337–0.537] | |
| -an inactive class | 0.489 [0.375–0.602] | 0.821 [0.644–0.927] | 0.565 [0.457–0.674] | |
| Sensitivity (Se) | 8 | 77.78 [64.06–87.87] | 60.00 [20.97–90.51] | 77.50 [62.85–88.14] |
| False-negative rate (under-classification, FNR) | 9 | |||
| 22.22 [12.13–35.94] | 40.00 [9.49–79.03] | 22.50 [11.86–37.15] | ||
| Specificity (Sp) | 10 | 83.72 [70.48–92.25] | 100.00 [87.79–100.00] | 80.77 [68.45–89.55] |
| False-positive rate (over-classification, FPR) | 11 | |||
| 16.28 [7.78–29.52] | 0.00 [0.00–12.21] | 19.23 [10.45–31.55] | ||
| Positive predictivity (PP) | 12 | 83.33 [70.48–92.25] | 100.00 [36.84–100.00] | 75.61 [60.69–86.63] |
| Negative predictivity (NP) | 13 | 78.26 [64.76–88.14] | 92.00 [75.85–97.97] | 82.35 [70.12–90.77] |
| Post-test probability of classification | 14 | 0.477 [0.375–0.581] | ||
| -as active (PCA) | 0.523 [0.419–0.625] | 0.107 [0.034–0.265] | 0.446 [0.347–0.548] | |
| -as inactive (PCIC) | 0.477 [0.375–0.581] | 0.893 [0.735–0.966] | 0.554 [0.452–0.653] | |
| Probability of a wrong classification | 15 | |||
| -as active compound (PWCA) | 0.167 [0.079–0.302] | 0.000 [0.000–0.122] | 0.244 [0.134–0.390] | |
| -as inactive compound (PWCI) | 0.217 [0.119–0.352] | 0.080 [0.020–0.242] | 0.177 [0.092–0.299] | |
| Odds Ratio (OR) | 16 | 18.00 [6.25–52.39] | n.a. | 14.47 [5.32–39.99] |
Φ = coefficient of correlation in 2 × 2 contingency table; χ2 = Chi-squared statistic.
Parameters for the characterization of prediction.
| Parameter (Abbreviation) | Formula | Definition | Equation |
|---|---|---|---|
| Concordance/Accuracy/Non-error Rate (CC/AC) | 100 × ( | Total fraction of compounds correctly classified | 6 |
| Error Rate (ER) | 100 × ( | Total fraction of compounds misclassified | 7 |
| Prior proportional probability of a class (PPP) | ni/n | Fraction of compounds belonging to class | 8 |
| Sensitivity (Se) | 100 × | Percentage of BBB+ compounds correctly assigned to the active class | 9 |
| False-negative rate (under-classification, FNR) | 100 × | percentage of BBB+ compounds falsely assigned to the non-active class | 10 |
| Specificity (Sp) | 100 × | Percentage of BBB– compounds correctly assigned to the negative class | 11 |
| False-positive rate (over-classification, FPR) | 100 × | Percentage of BBB– compounds falsely assigned to be active | 12 |
| Positive predictivity (PP) | 100 × | Percentage of compounds correctly assigned as active out of all active assigned compounds | 13 |
| Negative predictivity (NP) | 100 × | Percentage of non-active out of non-active assigned compounds | 14 |
| Probability of classification | 15 | ||
| as active (PCA) | ( | - Probability to classify a compound as active (true positive & false positive) | |
| as inactive (PCIC) | ( | - Probability to classify a compound as inactive (true negative & false negative) | |
| Probability of a wrong classification | 16 | ||
| as active compound (PWCA) | Probability of a false positive classification | ||
| as inactive compound (PWCI) | Probability of a false negative classification | ||
| Odds Ratio (OR) | ( | The odds of correct classification in the group of active compounds divided to the odds of a incorrect classification in the group of inactive compounds | 17 |
TP = number of true positive (BBB+ compounds classified as active); TN = number of true negative (BBB– compounds classified as non-active); n = sample size; FP = false positive (BBB- compounds classified as active); FN = false negative (BBB+ compounds classified as non-active); ni = number of compounds belonging to class i; i = 1, 2 (where 1 = active BBB compounds; 2 = inactive BBB compounds).
Compounds on training and test sets.
| No. | Name | ID | logBB | Set | Ref. |
|---|---|---|---|---|---|
| 1 | Cimetidine | CID: 2756 | −1.42 | 2 | [ |
| 2 | Icotidine | CID: 72108 | −2.00 | 1 | [ |
| 3 | Lupitidine | CID: 51671 | −1.06 | 2 | [ |
| 4 | Clonidine | CID: 2803 | 0.11 | 1 | [ |
| 5 | Mepyramine | CID: 4992 | 0.49 | 1 | [ |
| 6 | Imipramine | CID: 3696 | 0.83 | 1 | [ |
| 7 | Ranitidine | CID: 5039 | −1.23 | 2 | [ |
| 8 | Tiotidine | CID: 50287 | −0.82 | 1 | [ |
| 9 | Zolantidine | CID: 91769 | 0.14 | 2 | [ |
| 10 | Butanone | CID: 6569 | −0.08 | 2 | [ |
| 11 | Benzene | CID: 241 | 0.37 | 1 | [ |
| 12 | 3-Methylpentane | CID: 7282 | 1.01 | 1 | [ |
| 13 | 3-Methylhexane | CID: 11507 | 0.90 | 1 | [ |
| 14 | 2-Propanol | CID: 3776 | −0.15 | 1 | [ |
| 15 | 2-Methylpropanol | CID: 6560 | −0.17 | 1 | [ |
| 16 | 2-Methylpentane | CID: 7892 | 0.97 | 2 | [ |
| 17 | 2,2-Dimethylbutane | CID: 580244 | 1.04 | 2 | [ |
| 18 | 1,1,1-Trichloroethane | CID: 6278 | 0.40 | 1 | [ |
| 19 | Diethyl ether | CID: 3283 | 0.00 | 2 | [ |
| 20 | Enflurane | CID: 3226 | 0.24 | 1 | [ |
| 21 | Ethanol | CID: 702 | −0.16 | 2 | [ |
| 22 | Fluroxene | CID: 9844 | 0.13 | 1 | [ |
| 23 | Halothane | CID: 3562 | 0.35 | 1 | [ |
| 24 | Heptane | CID: 8900 | 0.81 | 1 | [ |
| 25 | Hexane | CID: 8058 | 0.80 | 2 | [ |
| 26 | Isoflurane | CID: 3763 | 0.42 | 2 | [ |
| 27 | Methylcyclopentane | CID: 7296 | 0.93 | 2 | [ |
| 28 | Nitrogen | CID: 947 | 0.03 | 1 | [ |
| 29 | Pentane | CID: 8003 | 0.76 | 2 | [ |
| 30 | n-Propanol | CID: 1031 | −0.16 | 2 | [ |
| 31 | Propanone | CID: 180 | −0.15 | 2 | [ |
| 32 | Teflurane | CID: 31300 | 0.27 | 1 | [ |
| 33 | Toluene | CID: 1140 | 0.37 | 1 | [ |
| 34 | Acetylsalicylic acid | CID: 2244 | −0.50 | 1 | [ |
| 35 | Pentobarbital | CID: 4737 | 0.12 | 1 | [ |
| 36 | Physostigmine | CID: 5983 | 0.08 | 2 | [ |
| 37 | Salicylic acid | CID: 338 | −1.10 | 1 | [ |
| 38 | Trifluoro Perazine | CID: 5566 | 1.44 | 1 | [ |
| 39 | Valproic acid | CID: 3121 | −0.22 | 1 | [ |
| 40 | Verapamil | CID: 2520 | −0.70 | 1 | [ |
| 41 | Zidovudine | CID: 5726 | −0.72 | 1 | [ |
| 42 | Hydroxyzine | CID: 3658 | 0.39 | 2 | [ |
| 43 | Thioridazine | CID: 5452 | 0.24 | 1 | [ |
| 44 | Alprazolam | CID: 2118 | 0.04 | 2 | [ |
| 45 | Phenserine | CID: 192706 | 1.00 | 1 | [ |
| 46 | Midazolam | CID: 4192 | 0.36 | 2 | [ |
| 47 | Codeine | CID: 5284371 | 0.55 | 2 | [ |
| 48 | Chlorpromazine | CID: 2726 | 1.06 | 2 | [ |
| 49 | Promazine | CID: 4926 | 1.23 | 1 | [ |
| 50 | Nevirapine | CID: 4463 | 0.00 | 1 | [ |
| 51 | Thioperamide | CID: 3035905 | −0.16 | 1 | [ |
| 52 | Didanosine | CID: 3043 | −1.30 | 2 | [ |
| 53 | Ibuprofen | CID: 3672 | −0.18 | 1 | [ |
| 54 | Antipyrine | CID: 2206 | −2.00 | 2 | [ |
| 55 | Theophyline | CID: 2153 | −0.29 | 1 | [ |
| 56 | p-Acetamido phenol | CID: 1983 | −0.31 | 1 | [ |
| 57 | Nitrous Oxide | CID: 948 | 0.03 | 1 | [ |
| 58 | Carbon bisulphide | CID: 6348 | 0.60 | 1 | [ |
| 59 | Indomethacin | CID: 3715 | −1.26 | 1 | [ |
| 60 | Indinavir | CID: 5362440 | −0.75 | 1 | [ |
| 61 | Oxazepam | CID: 4616 | 0.61 | 1 | [ |
| 62 | Carbamazepine | CID: 2554 | −0.14 | 2 | [ |
| 63 | Carbamazepine epoxide | CID: 2555 | −0.35 | 1 | [ |
| 64 | Amitriptyline | CID: 2160 | 0.88 | 1 | [ |
| 65 | Desipramine | CID: 2995 | 1.00 | 1 | [ |
| 66 | Mianserin | CID: 4184 | 0.99 | 2 | [ |
| 67 | ORG 4428 | CID: 166560 | 0.82 | 2 | [ |
| 68 | Mirtazapine | CID: 4205 | 0.53 | 1 | [ |
| 69 | Tibolone | CID: 21844 | 0.40 | 1 | [ |
| 70 | Domperidone | CID: 3151 | −0.78 | 2 | [ |
| 71 | Risperidone | CID: 5073 | −0.67 | 2 | [ |
| 72 | 9-OH-Risperidone | CID: 475100 | −0.02 | 1 | [ |
| 73 | Temelastine | CID: 55482 | −1.88 | 2 | [ |
| 74 | BBCPD13 | CSID: 14922095 | −0.66 | 1 | [ |
| 75 | BBCPD15 | CSID: 2992532 | −0.18 | 1 | [ |
| 76 | BBCPD57 | CSID: 10439135 | −1.15 | 2 | [ |
| 77 | BBCPD58 | CSID: 10442225 | −1.54 | 1 | [ |
| 78 | BBCPD17 | CSID: 10442293 | −1.12 | 1 | [ |
| 79 | BBCPD20 | CID: 9971484 | −0.46 | 1 | [ |
| 80 | BBCPD21 | CID: 10498206 | −0.24 | 2 | [ |
| 81 | SB222200 | CSID: 3167851 | 0.30 | 1 | [ |
| 82 | Y-G14 | CSID: 2276 | −0.30 | 1 | [ |
| 83 | Y-G15 | CSID: 72747 | −0.06 | 1 | [ |
| 84 | Caffeine | CID: 2519 | −2.00 | 1 | [ |
| 85 | Chlorambucil | CID: 2708 | −1.60 | 1 | [ |
| 86 | Glycine | CID: 750 | −3.50 | 2 | [ |
| 87 | Morphine | CID: 5288826 | −2.70 | 2 | [ |
| 88 | Phenylalanine | CID: 994 | −1.30 | 2 | [ |
| 89 | Phenytoin | CID: 1775 | −2.20 | 1 | [ |
| 90 | Propranolol | CID: 4946 | −1.20 | 1 | [ |
| 91 | Taurocholic Acid | CID: 444349 | −4.10 | 1 | [ |
| 92 | Trichloroethylene | CID: 6575 | 0.34 | 1 | [ |
| 93 | Carmustine | CID: 450682 | −0.52 | 1 | [ |
| 94 | ORG34167 | CSID: 8036856 | 0.00 | 1 | [ |
| 95 | BBCPD22 | CSDI: 8620184 | −0.02 | 1 | [ |
| 96 | BBCPD23 | BBCPD23 | 0.69 | 2 | [ |
| 97 | BBCPD24 | BBCPD24 | 0.44 | 1 | [ |
| 98 | BBCPD26 | BBCPD26 | 0.22 | 2 | [ |
| 99 | 1,1,1-Trifluoro-2-chloro ethane | CSID: 6168 | 0.08 | 1 | [ |
| 100 | T7 | T7 | 0.85 | 1 | [ |
| 101 | BBCPD60 | CSDI: 23218171 | −0.73 | 1 | [ |
| 102 | BBCPD18 | BBCPD18 | −0.27 | 1 | [ |
| 103 | BBCPD19 | BBCPD19 | −0.28 | 2 | [ |
| 104 | BBCPD16 | BBCPD16 | −1.57 | 1 | [ |
| 105 | BBCPD14 | BBCPD14 | −0.12 | 2 | [ |
| 106 | Y-G16 | Y-G16 | −0.42 | 1 | [ |
| 107 | Y-G19 | Y-G19 | −1.30 | 2 | [ |
| 108 | Y-G20 | CSID: 5854406 | −1.40 | 1 | [ |
| 109 | SKF89124 | CSID: 117961. | −0.43 | 1 | [ |
| 110 | SKF101468 | CSID: 4916 | 0.25 | 1 | [ |
| 111 | CBZ-EPO | CBZ-EPO | −0.34 | 1 | [ |
| 112 | L-663581 | CSID: 114837 | −0.30 | 1 | [ |
| 113 | M1L-663,581 | CSID: 8560187 | −1.34 | 1 | [ |
| 114 | M2L-663581 | CSID: 8267285 | −1.82 | 1 | [ |
| 115 | ORG5222 | ORG5223 | 1.03 | 2 | [ |
| 116 | ORG12962 | CSID: 7972174 | 1.64 | 1 | [ |
| 117 | ORG13011 | ORG13011 | 0.16 | 1 | [ |
| 118 | ORG32104 | ORG32104 | 0.52 | 1 | [ |
| 119 | ORG30526 | ORG30526 | 0.39 | 1 | [ |
| 120 | ICI17148 | ICI17149 | −0.04 | 2 | [ |
| 121 | SK&F93319 | SK&F93320 | −1.30 | 1 | [ |
| 122 | CBZ | CBZ | 0.00 | 1 | [ |
CID = ID of compounds taken from PubChem; CSID = ID of compounds taken from ChemSpider.
Summary statistical characteristics of training and test sets.
| Parameter | Training set ( | Test set ( |
|---|---|---|
| m [95%CI] | −0.2003 [−0.4060; −0.0055] | −0.2529 [−0.5916; 0.0858] |
| StDev | 0.9306 | 1.0731 |
| Min | −4.10 | −3.50 |
| Max | 1.64 | 1.06 |
| KS statistic (p) | 0.1151 (0.2163) | 0.1729 (0.1531) |
| AD statistic | 1.1582 | 0.9939 |
| CS statistic (p) | 8.1850 (0.2249) | 0.3650 (0.9852) |
m = arithmetic mean; 95%CI = 95% confidence interval; StDev = standard deviation; n = sample size; KS = Kolmogorov-Smirnow test of goodness-of-fit; AD = Anderson-Darling test of goodness-of-fit; CS = Chi-Squared test of goodness-of-fit;
critical value = 2.5018.