| Literature DB >> 29897324 |
Myeong-Sang Yu1, Hyang-Mi Lee1, Aaron Park2, Chungoo Park2, Hyithaek Ceong3, Ki-Hyeong Rhee4, Dokyun Na5.
Abstract
BACKGROUND: Administered drugs are often converted into an ineffective or activated form by enzymes in our body. Conventional in silico prediction approaches focused on therapeutically important enzymes such as CYP450. However, there are more than thousands of different cellular enzymes that potentially convert administered drug into other forms. RESULT: We developed an in silico model to predict which of human enzymes including metabolic enzymes as well as CYP450 family can catalyze a given chemical compound. The prediction is based on the chemical and physical similarity between known enzyme substrates and a query chemical compound. Our in silico model was developed using multiple linear regression and the model showed high performance (AUC = 0.896) despite of the large number of enzymes. When evaluated on a test dataset, it also showed significantly high performance (AUC = 0.746). Interestingly, evaluation with literature data showed that our model can be used to predict not only enzymatic reactions but also drug conversion and enzyme inhibition.Entities:
Keywords: Drug discovery; Enzyme reaction prediction; In silico model; Machine learning
Mesh:
Substances:
Year: 2018 PMID: 29897324 PMCID: PMC5998764 DOI: 10.1186/s12859-018-2194-2
Source DB: PubMed Journal: BMC Bioinformatics ISSN: 1471-2105 Impact factor: 3.169
Fig. 1Graphical description for a. Entire pipeline to construct our prediction model. b. A brief graphical description about dataset preparation. 1444 molecular descriptors were calculated for substrates in training dataset, and the subtractions of descriptors were calculated for every substrates pair. For supervised learning, a set of descriptor subtractions of substrates pair was labeled with 1 or 0. a, b and c denote substrates in the training dataset. E1 and E2 denotes enzymes in the dataset
Top 10 features with a high correlation
| Name | Rpb | Category |
|---|---|---|
| minsssCH | −0.0674 | Atom type electrotopological state |
| Hmax | −0.0645 | Atom type electrotopological state |
| SHsOH | −0.0641 | Atom type electrotopological state |
| EE_Dt | −0.0635 | Detour matrix |
| maxHCsatu | −0.0630 | Atom type electrotopological state |
| XLogP | −0.0624 | XLogP |
| CrippenLogP | −0.0619 | Crippen logP and MR |
| Lipoaffinity Index | −0.0618 | Atom type electrotopological state |
| ETA_EtaP_F | −0.0615 | Extended topochemical atom |
| nsOH | −0.0615 | Atom type electrotopological state |
Fig. 2Performances (AUC) of four machine learning algorithms with the increasing number of features. The four machine learning algorithms are artificial neural network, random forest, naïve Bayes and multiple linear regression. Their performances were calculated by 20-fold cross-validation. Of the four algorithms, multiple linear regression model using 500 features showed the best performance (AUC = 0.897)
Performance (AUC) results of four different score-integration methods
| Simple average | Simple maximum | Probability-based methoda | Our method |
|---|---|---|---|
| 0.842 | 0.877 | 0.884 | 0.896 |
aProbability-based method is expressed as , meaning the probability any of the given query-substrate pair is reacted by the same enzyme
Fig. 3MCC with respect to threshold. To find optimal threshold, Matthews correlation coefficients (MCC) were calculated with an increasing threshold from 0 to 1. MCC was gradually improved with increasing threshold, and began to drop at 0.75. Accordingly, 0.75 was set as a threshold to optimize our model
Performance results of with a threshold of 0.75
| SEN | SPE | MCC | PPV | |
|---|---|---|---|---|
| Training dataset | 0.527 | 0.975 | 0.208 | 0.088 |
| Test dataset | 0.171 | 0.976 | 0.106 | 0.089 |
Top five proteins predicted to interact with the five molecules obtained from the literaturea
| Substrate | UniProt | Enzyme Name |
|---|---|---|
| P00915 | Carbonic anhydrase 1 | |
|
|
| |
| Q14524 | Sodium channel protein type 5 subunit alpha | |
| Q9UI33 | Sodium channel protein type 11 subunit alpha | |
| Q9Y5Y9 | Sodium channel protein type 10 subunit alpha | |
| O60774 | Putative dimethylaniline monooxygenase [N-oxide-forming] 6 | |
| Q15166 | Serum paraoxonase/lactonase 3 | |
| O60774 | Putative dimethylaniline monooxygenase [N-oxide-forming] 6 | |
| P00915 | Carbonic anhydrase 1 | |
| Q15166 | Serum paraoxonase/lactonase 3 | |
| Q14524 | Sodium channel protein type 5 subunit alpha | |
| Q9UI33 | Sodium channel protein type 11 subunit alpha | |
| Q9Y5Y9 | Sodium channel protein type 10 subunit alpha | |
| methyl salicylate | Q14524 | Sodium channel protein type 5 subunit alpha |
| Q9UI33 | Sodium channel protein type 11 subunit alpha | |
| Q9Y5Y9 | Sodium channel protein type 10 subunit alpha | |
| O60774 | Putative dimethylaniline monooxygenase [N-oxide-forming] 6 | |
|
|
| |
| Q15166 | Serum paraoxonase/lactonase 3 | |
| Q01959 | Sodium-dependent dopamine transporter | |
| Q9NUW8 | Tyrosyl-DNA phosphodiesterase 1 | |
| tamoxifen | O15554 | Intermediate conductance calcium-activated potassium channel protein 4 |
|
|
| |
| agmatine | Q96F10 | Diamine acetyltransferase 2 |
| Q9H015 | Solute carrier family 22 member 4 | |
| P21673 | Diamine acetyltransferase 1 | |
| Q8NE62 | Choline dehydrogenase, mitochondrial | |
| P19623 | Spermidine synthase | |
|
|
|
aKnown interactions are highlighted in bold