| Literature DB >> 28072829 |
Abstract
As an important enzyme in Phase I drug metabolism, the flavin-containing monooxygenase (FMO) also metabolizes some xenobiotics with soft nucleophiles. The site of metabolism (SOM) on a molecule is the site where the metabolic reaction is exerted by an enzyme. Accurate prediction of SOMs on drug molecules will assist the search for drug leads during the optimization process. Here, some quantum mechanics features such as the condensed Fukui function and attributes from circular fingerprints (called Molprint2D) are computed and classified using the support vector machine (SVM) for predicting some potential SOMs on a series of drugs that can be metabolized by FMO enzymes. The condensed Fukui function fA- representing the nucleophilicity of central atom A and the attributes from circular fingerprints accounting the influence of neighbors on the central atom. The total number of FMO substrates and non-substrates collected in the study is 85 and they are equally divided into the training and test sets with each carrying roughly the same number of potential SOMs. However, only N-oxidation and S-oxidation features were considered in the prediction since the available C-oxidation data was scarce. In the training process, the LibSVM package of WEKA package and the option of 10-fold cross validation are employed. The prediction performance on the test set evaluated by accuracy, Matthews correlation coefficient and area under ROC curve computed are 0.829, 0.659, and 0.877 respectively. This work reveals that the SVM model built can accurately predict the potential SOMs for drug molecules that are metabolizable by the FMO enzymes.Entities:
Mesh:
Substances:
Year: 2017 PMID: 28072829 PMCID: PMC5224990 DOI: 10.1371/journal.pone.0169910
Source DB: PubMed Journal: PLoS One ISSN: 1932-6203 Impact factor: 3.240
Fig 1An example of FMO substrate and its SOMs.
The potential SOMs identified for Tozasertib which is one of the FMO substrates studied. The potential SOMs are marked with red circles while the actual SOM is highlighted by a red arrow.
Dividing the 85 compounds collected into the training and the test sets.
| Number | compound name | reaction | Potential som | som or nonsom |
|---|---|---|---|---|
First, the total collected compounds was arranged and numbered in order of collection time. Then, 42 out of 85 compounds were assigned as the test set by the web server Research Randomizer and the rest were treated as the training set.
The total instances assigned for the training and the test sets were shown.
| Dataset | Number of compounds | Substrates of FMO | Non substrates of FMO | Number of instances (Potential SOMs) | Number of actual SOMs | Number of non-SOMs |
|---|---|---|---|---|---|---|
Fig 2An example of attributes generated by Molprint2D and its definition.
The original format of Molprint2D has been transformed by us into some numerical values for being readable by libSVM in the WEKA package.
Fig 3The values of condensed Fukui functions computed for the four selected examples in the training set.
qA(N), qA(N+1) and qA(N-1) represent respectively the atomic charge in the molecule with N electrons, the atomic charge in the molecule with N+1 electrons, and the atomic charge in the molecule with N-1 electrons. PA(N) is equal to the atomic number of atom A—qA(N) and so on. fA+, fA− and fA0 are the values of condensed Fukui function computed. The values of fA+, fA− and fA0 represent the electrophilicity of atom A, nucleophilicity of atom A, and radical attack susceptibility of atom A, respectively.
Fig 4Converting the original Molprint2D text format into numerical values.
The original Molprint2D text format are converted to numerical values so that they are readable by the WEKA package. Such a converting for three selected training set compounds are shown in the figure. Label "I-" represents the neighbor type of the first layer while label "II-" represents the neighbor type of the second layer.
The performance of the training and the test set given by the model built.
| Method | instances | Features | Parameters | SE | SP | ACC | MCC | AUC | |||
|---|---|---|---|---|---|---|---|---|---|---|---|
| SVM | Training set | 111 | 5+37 | C 15 | γ 1/111 | W 1 | 0.851 | 0.864 | 0.856 | 0.705 | 0.887 |
| Test set | 117 | 5+37 | - | - | - | 0.790 | 0.917 | 0.829 | 0.659 | 0.877 | |
| Naive Bayes | Training set | 111 | 5+37 | Default setting | 0.821 | 0.864 | 0.838 | 0.673 | 0.869 | ||
| Test set | 117 | 5+37 | Default setting | 0.790 | 0.889 | 0.821 | 0.635 | 0.867 | |||
| Random Forest | Training set | 111 | 5+37 | Default setting | 0.910 | 0.864 | 0.892 | 0.774 | 0.927 | ||
| Test set | 117 | 5+37 | Default setting | 0.840 | 0.861 | 0.846 | 0.668 | 0.868 | |||
In the model training process, class I set was designated as the non-SOM while class II set was designated as the SOM set. Instances represent the number of potential SOMs identified. There were 5 quantum features and 37 attributes from circular fingerprints used for building the model. Parameters C, γ and W were obtained from the SVM training. The sensitivity (SE), specificity (SP), accuracy (ACC), Matthews correlation coefficient (MCC), and area under the ROC curve (AUC) were used to characterize the performance of the SVM model built. Symbol "-" was used to represent the same numbers used in the test set.
Fig 5The ROC curves of models.
The ROC curves constructed for (A) the training set by the SVM, (B) the test set by the SVM, (C) the training set by the Naive Bayes method, (D) the test set by the Naive Bayes method, (E) the training set by the Random Forest method, and (F) the test set by the Random Forest method.
Fig 6The prediction probability of each potential SOM computed for two selected FMO substrates voriconazole and albendazole.
The actual SOMs of each compound determined are highlighted with red arrows. Each predicted SOM is marked with a red circle where the prediction probability computed is shown alongside. Each predicted non-SOM is marked by a green circle and the prediction probability computed is also shown alongside. Symbol "+" is used to denote a false prediction.
Comparison between the prediction results for the test set by the SVM model built and Metaprint2D.
| Numbers of instances | Enzyme for SOM | TP | FN | SP | |
|---|---|---|---|---|---|
| FP | TN | ||||
| Test set | 117 | FMO | 64 | 17 | 0.917 |
| 3 | 33 | ||||
| Metaprint2D | 117 | Phase I enzymes | 63 | 18 | 0.611 |
| 14 | 22 |
Class I set is designated as the non-SOM while class II set is designated as the SOM set in the SVM model built. The number of substrates used in both sets is the same. The performance of the method is characterized by TP (True Positive), FP (False Positive), FN (False Negative), TN (True Negative), and specificity (SP) computed.