| Literature DB >> 27294921 |
Serena Nembri1, Francesca Grisoni2, Viviana Consonni3, Roberto Todeschini4.
Abstract
Cytochromes P450 (CYP) are the main actors in the oxidation of xenobiotics and play a crucial role in drug safety, persistence, bioactivation, and drug-drug/food-drug interaction. This work aims to develop Quantitative Structure-Activity Relationship (QSAR) models to predict the drug interaction with two of the most important CYP isoforms, namely 2C9 and 3A4. The presented models are calibrated on 9122 drug-like compounds, using three different modelling approaches and two types of molecular description (classical molecular descriptors and binary fingerprints). For each isoform, three classification models are presented, based on a different approach and with different advantages: (1) a very simple and interpretable classification tree; (2) a local (k-Nearest Neighbor) model based classical descriptors and; (3) a model based on a recently proposed local classifier (N-Nearest Neighbor) on binary fingerprints. The salient features of the work are (1) the thorough model validation and the applicability domain assessment; (2) the descriptor interpretation, which highlighted the crucial aspects of P450-drug interaction; and (3) the consensus aggregation of models, which largely increased the prediction accuracy.Entities:
Keywords: ADMET; CYP2C9; CYP3A4; QSAR; cytochrome P450; in silico
Mesh:
Substances:
Year: 2016 PMID: 27294921 PMCID: PMC4926447 DOI: 10.3390/ijms17060914
Source DB: PubMed Journal: Int J Mol Sci ISSN: 1422-0067 Impact factor: 5.923
Figure 1Scheme of the data splitting.
Model statistics for CYP3A4 isoform. Models are described according to the method and type of descriptors, the Applicability Domain (AD: yes/no (y/n)), number of variables (p) and classification parameters (parameter: object/leaf ratio for Classification and Regression Trees (CART), k for k-Nearest Neighbours (k-NN) and α for N-Nearest Neighbors (N3)). For each model, the Non-Error Rate (NER), the Sensitivity (Sn) and the Specificity (Sp) are reported in Fitting, Cross-Validation and on the test set. %out indicates the percentage of test set compounds outside of the AD. MD: molecular descriptors; ECFP: extended connectivity fingerprints.
| Model | Descriptors | AD | Parameter | Fitting | Cross-Validation | Test Set | ||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| %out | ||||||||||||||
| CART | MD | y | 3 | 210 | 0.74 | 0.74 | 0.75 | 0.74 | 0.73 | 0.75 | - | 0.75 | 0.74 | 0.76 |
| n | 3 | 210 | 0.74 | 0.74 | 0.75 | 0.74 | 0.73 | 0.75 | 0 | 0.75 | 0.74 | 0.76 | ||
| MD | y | 6 | 14 | 0.76 | 0.73 | 0.79 | 0.76 | 0.73 | 0.78 | - | 0.77 | 0.75 | 0.79 | |
| n | 6 | 14 | 0.76 | 0.73 | 0.79 | 0.76 | 0.73 | 0.78 | 5 | 0.77 | 0.76 | 0.78 | ||
| N3 | ECFP | y | 1024 | 1 | 0.79 | 0.88 | 0.71 | 0.79 | 0.87 | 0.70 | - | 0.78 | 0.86 | 0.71 |
| n | 1024 | 1 | 0.79 | 0.88 | 0.71 | 0.79 | 0.87 | 0.70 | 1 | 0.78 | 0.86 | 0.71 | ||
List and brief description of the classical molecular descriptors (MDs) selected for CYP3A4. Some examples of molecules with low and high MD values are also reported.
| MD | Description | Reference | Model | Low Value | High Value | ||
|---|---|---|---|---|---|---|---|
| Number of multiple bonds. | [ | CART | 0 | 66 | |||
| Number of non-hydrogen bonds. | [ | CART | 3 | 59 | |||
| Percentage of C atoms. | [ | 0 | 58.3 | ||||
| Structural Information Content—order 5. | [ | 0.28 | 1.00 | ||||
| Centred Broto-Moreau autocorrelations—lag 4 (weighted by atomic polarizability). | [ | 0.21 | 46.6 | ||||
| Centred Broto-Moreau autocorrelations—lag 6 (weighted by atomic ionization potential). | [ | 0 | 4.1 | ||||
| Number of hydroxyl groups. | [ | CART, | 0 | 9 | |||
| Counts of the E-state atom types. | [ | 0 | 16 | ||||
Figure 2Representation of the molecular descriptor (MD)-based models for CYP3A4: (a) Classification and Regression Trees (CART) model; (b) Score plot of the training molecules described by the k-Nearest Neighbours (k-NN) descriptors, coloured according to their activity; (c) Loading plot of the k-NN descriptors.
Figure 3Occurrence frequency of the 19 selected fragments for CYP3A4 within the active/inactive compounds. Symbols associated with the fragments (according to SMARTS language) have the following meaning: X + number = number of total bonds in which the considered atom is involved; a = aromatic atom; A = aliphatic atom.
Model statistics for CYP2C9 isoform. Models are described according to the method and type of descriptors, the Applicability Domain (AD: yes/no), number of variables (p) and classification parameters (parameter: object/leaf ratio for CART, k for kNN and α for N3). For each model, the Non-Error Rate (NER), the Sensitivity (Sn), and the Specificity (Sp) are reported in Fitting, Cross-Validation and on the test set. %out indicates the percentage of test set compounds outside the AD.
| Model | Descriptors | AD | Parameter | Fitting | Cross-Validation | Test Set | ||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| %out | ||||||||||||||
| CART | MD | y | 4 | 210 | 0.75 | 0.75 | 0.75 | 0.75 | 0.75 | 0.75 | - | 0.75 | 0.75 | 0.74 |
| n | 4 | 210 | 0.75 | 0.75 | 0.75 | 0.75 | 0.75 | 0.75 | 0 | 0.75 | 0.75 | 0.74 | ||
| MD | y | 6 | 14 | 0.77 | 0.69 | 0.85 | 0.77 | 0.68 | 0.85 | - | 0.76 | 0.67 | 0.86 | |
| n | 6 | 14 | 0.77 | 0.69 | 0.85 | 0.77 | 0.68 | 0.85 | 5 | 0.76 | 0.67 | 0.84 | ||
| N3 | ECFP | y | 1024 | 1 | 0.80 | 0.87 | 0.73 | 0.80 | 0.86 | 0.73 | - | 0.78 | 0.83 | 0.73 |
| n | 1024 | 1 | 0.80 | 0.87 | 0.73 | 0.80 | 0.86 | 0.73 | 1 | 0.78 | 0.83 | 0.73 | ||
List and brief description of the classical molecular descriptors (MDs) selected for CYP2C9. Some examples of molecules with low and high MD values are also reported.
| MD | Description | Reference | Model | Low Value | High Value | ||
|---|---|---|---|---|---|---|---|
| Number of multiple bonds. | [ | CART | 0 | 44 | |||
| Sum of atomic polarizabilities scaled on Carbon atom. | [ | CART | 5.0 | 56.1 | |||
| Ratio between the number of aromatic bonds over the total number of non-H bonds. | [ | CART | 0 | 0.96 | |||
| Hyper-Wiener-like index from Burden matrix weighted by mass. | [ | 2.3 | 5.1 | ||||
| Geary autocorrelation of lag 2 weighted by ionization potential. | [ | 0.09 | 1.93 | ||||
| Eta pi and lone pair average VEM count. | [ | 0 | 1.02 | ||||
| Number of Pyrimidines. | [ | CART | 0 | 2 | |||
| Number of aliphatic tertiary amines. | [ | 0 | 3 | ||||
| Frequency of C–N at topological distance 1. | [ | 0 | 19 | ||||
| Moriguchi octanol-water partition coefficient. | [ | -6.3 | 9.6 | ||||
Figure 4Representation of the MD-based models for CYP2C9: (a) CART model; (b) Score plot of the k-NN descriptors, colored according to their activity; (c) Loading plot of the k-NN descriptors.
Figure 5Occurrence frequency of the 16 selected fragments for CYP2C9 within the active/inactive compounds. Symbols associated with the fragments (according to SMARTS language) have the following meaning: X + number = number of total bonds in which the considered atom is involved; a = aromatic atom; A = aliphatic atom.
Consensus models (cons) 3A4 and 2C9. Non-Error Rate (NER), Sensitivity (Sn), and Specificity (Sp) are reported in Fitting, Cross-Validation and on the test set.
| CYP | Type | Fitting | Cross-Validation | Test set | |||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| %na | %na | %na | |||||||||||
| 3A4 | 33 | 0.88 | 0.92 | 0.84 | 33 | 0.88 | 0.92 | 0.84 | 36 | 0.88 | 0.92 | 0.83 | |
| - | 0.79 | 0.80 | 0.78 | - | 0.78 | 0.79 | 0.77 | 6 | 0.80 | 0.81 | 0.80 | ||
| 2C9 | 33 | 0.89 | 0.90 | 0.88 | 34 | 0.89 | 0.90 | 0.88 | 40 | 0.89 | 0.89 | 0.88 | |
| - | 0.81 | 0.80 | 0.82 | - | 0.81 | 0.80 | 0.82 | 1 | 0.79 | 0.77 | 0.81 | ||
Classification results of the models on the external set in terms of Sn, Sp, NER and percentage of not assigned/out of the AD compounds (%na).
| CYP | Mod. | Fitting a | Cross-Validation a | External Set | |||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| %na | %na | %na | |||||||||||
| 3A4 | CART | - | 0.75 | 0.74 | 0.75 | - | 0.74 | 0.73 | 0.76 | - | 0.66 | 0.68 | 0.63 |
| - | 0.76 | 0.73 | 0.79 | - | 0.76 | 0.73 | 0.79 | 1 | 0.70 | 0.70 | 0.69 | ||
| N3 | - | 0.80 | 0.87 | 0.73 | - | 0.79 | 0.87 | 0.71 | 1 | 0.72 | 0.85 | 0.59 | |
| 32 | 0.88 | 0.91 | 0.85 | 33 | 0.88 | 0.91 | 0.84 | 42 | 0.80 | 0.89 | 0.70 | ||
| - | 0.80 | 0.80 | 0.79 | - | 0.79 | 0.80 | 0.78 | 1 | 0.71 | 0.76 | 0.67 | ||
| 2C9 | CART | - | 0.75 | 0.77 | 0.74 | - | 0.74 | 0.73 | 0.76 | - | 0.66 | 0.66 | 0.66 |
| - | 0.77 | 0.69 | 0.85 | - | 0.77 | 0.68 | 0.85 | 1 | 0.69 | 0.58 | 0.81 | ||
| N3 | - | 0.80 | 0.86 | 0.74 | - | 0.79 | 0.85 | 0.74 | - | 0.75 | 0.83 | 0.68 | |
| 34 | 0.89 | 0.91 | 0.88 | 35 | 0.89 | 0.90 | 0.88 | 45 | 0.83 | 0.85 | 0.82 | ||
| - | 0.81 | 0.81 | 0.82 | - | 0.80 | 0.79 | 0.82 | 1 | 0.73 | 0.71 | 0.75 | ||
Performance on the models recalibrated on all the shared set (9122 molecules), i.e., on training and test set compounds.
Characteristics of the shared and external sets for each isoform: n = number of molecules, %act = percentage of active compounds.
| CYP Isoform | Shared Set | External Set | ||
|---|---|---|---|---|
| %act | %act | |||
| 9122 | 40 | 2996 | 49 | |
| 9122 | 33 | 2818 | 36 | |