| Literature DB >> 22421792 |
Brooks McPhail1, Yunfeng Tie, Huixiao Hong, Bruce A Pearce, Laura K Schnackenberg, Weigong Ge, Luis G Valerio, James C Fuscoe, Weida Tong, Dan A Buzatu, Jon G Wilkes, Bruce A Fowler, Eugene Demchuk, Richard D Beger.
Abstract
An interagency collaboration was established to model chemical interactions that may cause adverse health effects when an exposure to a mixture of chemicals occurs. Many of these chemicals--drugs, pesticides, and environmental pollutants--interact at the level of metabolic biotransformations mediated by cytochrome P450 (CYP) enzymes. In the present work, spectral data-activity relationship (SDAR) and structure-activity relationship (SAR) approaches were used to develop machine-learning classifiers of inhibitors and non-inhibitors of the CYP3A4 and CYP2D6 isozymes. The models were built upon 602 reference pharmaceutical compounds whose interactions have been deduced from clinical data, and 100 additional chemicals that were used to evaluate model performance in an external validation (EV) test. SDAR is an innovative modeling approach that relies on discriminant analysis applied to binned nuclear magnetic resonance (NMR) spectral descriptors. In the present work, both 1D ¹³C and 1D ¹⁵N-NMR spectra were used together in a novel implementation of the SDAR technique. It was found that increasing the binning size of 1D ¹³C-NMR and ¹⁵N-NMR spectra caused an increase in the tenfold cross-validation (CV) performance in terms of both the rate of correct classification and sensitivity. The results of SDAR modeling were verified using SAR. For SAR modeling, a decision forest approach involving from 6 to 17 Mold2 descriptors in a tree was used. Average rates of correct classification of SDAR and SAR models in a hundred CV tests were 60% and 61% for CYP3A4, and 62% and 70% for CYP2D6, respectively. The rates of correct classification of SDAR and SAR models in the EV test were 73% and 86% for CYP3A4, and 76% and 90% for CYP2D6, respectively. Thus, both SDAR and SAR methods demonstrated a comparable performance in modeling a large set of structurally diverse data. Based on unique NMR structural descriptors, the new SDAR modeling method complements the existing SAR techniques, providing an independent estimator that can increase confidence in a structure-activity assessment. When modeling was applied to hazardous environmental chemicals, it was found that up to 20% of them may be substrates and up to 10% of them may be inhibitors of the CYP3A4 and CYP2D6 isoforms. The developed models provide a rare opportunity for the environmental health branch of the public health service to extrapolate to hazardous chemicals directly from human clinical data. Therefore, the pharmacological and environmental health branches are both expected to benefit from these reported models.Entities:
Mesh:
Substances:
Year: 2012 PMID: 22421792 PMCID: PMC6268752 DOI: 10.3390/molecules17033383
Source DB: PubMed Journal: Molecules ISSN: 1420-3049 Impact factor: 4.411
Results of training and tenfold cross-validation for SDAR LDA and SAR DF models of inhibitors and non-inhibitors of CYP3A4.
| Model type | Data set | Forward stepwise analysis criterion | Correct classification * | Sensitivity * | Specificity * |
|---|---|---|---|---|---|
| Training | 83.4 ± 6.5 | 78.3 ± 8.4 | 86.3 ± 6.1 | ||
| Training | 76.1 ± 0.6 | 68.3 ± 1.0 | 80.4 ± 0.6 | ||
| Training | 72.5 ± 0.6 | 65.1 ± 1.0 | 76.7 ± 0.8 | ||
| CV | 56.9 ± 12.3 | 42.7 ± 21.2 | 64.9 ± 16.7 | ||
| CV | 57.2 ± 2.2 | 43.3 ± 5.9 | 64.9 ± 5.1 | ||
| CV | 58.6 ± 2.9 | 45.9 ± 6.1 | 65.6 ± 3.5 | ||
| Training | 80.0 ± 0.4 | 73.4 ± 0.8 | 83.7 ± 0.6 | ||
| Training | 72.6 ± 0.6 | 60.9 ± 1.0 | 79.1 ± 0.6 | ||
| Training | 70.1 ± 0.6 | 58.1 ± 1.0 | 76.9 ± 0.6 | ||
| CV | 56.6 ± 2.5 | 41.9 ± 5.3 | 64.8 ± 4.3 | ||
| CV | 58.8 ± 2.5 | 42.1 ± 3.5 | 68.1 ± 3.3 | ||
| CV | 59.8 ± 2.5 | 44.2 ± 5.5 | 68.5 ± 2.2 | ||
| CV | NA | 61.0 ± 2.0 | 39.0 ± 3.9 | 74.0 ± 2.0 |
* Rates are in percent; 95% confidence intervals are shown.
Figure 1Triangular relationship between the structure, 13C- and 15N-NMR spectral data, and biological activity. Kectoconazole is used as an example.
SDAR LDA model for CYP3A4 showing number of bins selected/number of bins available.
| Forward stepwise analysis criterion | C1&N5 | C2&N10 | C3&N15 |
|---|---|---|---|
| 99/273 | 70/147 | 49/99 | |
| 63/273 | 31/147 | 23/99 |
External validation statistics (%) of the CYP3A4 models.
| Model type | Forward stepwise analysis criterion | Correct classification | Sensitivity | Specificity |
|---|---|---|---|---|
| 70 | 60 | 73 | ||
| 72 | 52 | 79 | ||
| 70 | 56 | 75 | ||
| 67 | 48 | 73 | ||
| 73 | 52 | 80 | ||
| 70 | 56 | 75 | ||
| NA | 86 | 76 | 89 | |
| NA | 68 | 56 | 72 | |
| NA | 68 | 12 | 87 |
Results of training and tenfold cross-validation for SDAR LDA and SAR DF models of inhibitors and non-inhibitors of CYP2D6.
| Model type | Data set | Forward stepwise analysis criterion | Correct classification * | Sensitivity * | Specificity * |
|---|---|---|---|---|---|
| Training | 85.6 ± 2.0 | 86.0 ± 3.5 | 85.5 ± 2.2 | ||
| Training | 79.0 ± 0.6 | 80.5 ± 1.2 | 78.4 ± 0.6 | ||
| Training | 74.1 ± 0.4 | 75.1 ± 0.8 | 73.7 ± 0.4 | ||
| CV | 61.6 ± 12.0 | 47.6 ± 25.5 | 66.8 ± 13.1 | ||
| CV | 61.3 ± 3.1 | 50.3 ± 5.7 | 65.3 ± 2.7 | ||
| CV | 60.5 ± 3.1 | 51.9 ± 4.3 | 65.6 ± 4.3 | ||
| Training | 82.1 ± 0.6 | 80.9 ± 1.6 | 82.5 ± 0.6 | ||
| Training | 74.4 ± 0.6 | 72.1 ± 0.8 | 75.2 ± 0.8 | ||
| Training | 70.4 ± 0.6 | 68.7 ± 1.4 | 71.0 ± 0.6 | ||
| CV | 60.2 ± 2.7 | 45.6 ± 5.3 | 65.4 ± 3.1 | ||
| CV | 60.7 ± 3.3 | 50.3 ± 5.7 | 64.6 ± 3.9 | ||
| CV | 61.8 ± 3.5 | 54.7 ± 6.7 | 64.5 ± 5.3 | ||
| CV | NA | 70.0 ± 3.9 | 26.0 ± 5.9 | 85.0 ± 5.9 |
* Rates are in percent; 95% confidence intervals are shown.
SDAR LDA model for CYP2D6 showing number of bins selected/number of bins available.
| Forward stepwise analysis criterion | C1&N5 | C2&N10 | C3&N15 |
|---|---|---|---|
| 100/272 | 78/147 | 60/99 | |
| 63/272 | 30/147 | 23/99 |
External validation statistics (%) of the CYP2D6 models.
| Model type | Forward stepwise analysis criterion | Correct classification | Sensitivity | Specificity |
|---|---|---|---|---|
| 68 | 45 | 74 | ||
| 71 | 60 | 74 | ||
| 70 | 55 | 74 | ||
| 66 | 45 | 71 | ||
| 76 | 70 | 78 | ||
| 68 | 60 | 70 | ||
| NA | 90 | 60 | 98 | |
| NA | 76 | 55 | 81 | |
| NA | 62 | 30 | 74 |
Mean values and p-values of selected bins used in C1&N5, C2&N10, and C3&N15 SDAR LDA models with F > 2.0.
| SDAR Model Bin (ppm range) | CYP3A4 | CYP2D6 | ||||
|---|---|---|---|---|---|---|
| mean bin occupancy * | LDA model
| mean bin occupancy * | LDA model
| |||
| non-inhibitor | inhibitor | non-inhibitor | inhibitor | |||
| C1 (45) | 8.29 | 19.44 | 0.000084 | 9.73 | 18.75 | 0.0025 |
| C1 (49) | NA | NA | NA | 9.28 | 15.00 | 0.056 |
| C1 (63) | 3.37 | 8.80 | 0.15 | NA | NA | NA |
| C1 (147) | 9.84 | 15.74 | 0.008 | 8.14 | 23.75 | 0.0000004 |
| C1 (198) | 0.52 | 0.46 | 0.14 | 0.45 | 1.25 | 0.024 |
| N5 (235–240) | NA | NA | NA | 2.94 | 6.88 | 0.031 |
| C2 (45–46) | 29.02 | 35.19 | 0.11 | NA | NA | NA |
| C2 (49–50) | NA | NA | NA | 21.04 | 32.50 | 0.0033 |
| C2 (63–64) | 6.48 | 16.20 | 0.051 | NA | NA | NA |
| C2 (146–148) | 19.17 | 29.17 | 0.044 | 17.19 | 36.88 | 0.00048 |
| C2 (198–200) | 1.55 | 4.17 | 0.024 | 2.71 | 0.62 | 0.00051 |
| N10 (230–240) | NA | NA | NA | 8.82 | 16.25 | 0.0014 |
| C3 (42–45) | 31.87 | 54.63 | 0.00044 | NA | NA | NA |
| C3 (48–51) | NA | NA | NA | 32.13 | 47.50 | 0.00083 |
| C3 (63–66) | 9.84 | 23.61 | 0.022 | NA | NA | NA |
| C3 (144–147) | 27.46 | 38.89 | 0.14 | 25.79 | 48.75 | 0.00022 |
| C3 (198–201) | 2.59 | 4.63 | 0.065 | 3.85 | 0.63 | 0.0073 |
* Mean bin occupancy of 100 implies one chemical shift in the specified range for every compound. NA means bin was not used in the SDAR DF model.