| Literature DB >> 34271420 |
Heba El-Behery1, Abdel-Fattah Attia2, Nawal El-Feshawy3, Hanaa Torkey4.
Abstract
BACKGROUND: Discover possible Drug Target Interactions (DTIs) is a decisive step in the detection of the effects of drugs as well as drug repositioning. There is a strong incentive to develop effective computational methods that can effectively predict potential DTIs, as traditional DTI laboratory experiments are expensive, time-consuming, and labor-intensive. Some technologies have been developed for this purpose, however large numbers of interactions have not yet been detected, the accuracy of their prediction still low, and protein sequences and structured data are rarely used together in the prediction process.Entities:
Keywords: Covid-19; Deep-learning; Drug-target interactions; Drugs; Machine learning; Prediction; Proteins
Mesh:
Substances:
Year: 2021 PMID: 34271420 PMCID: PMC8256690 DOI: 10.1016/j.compbiolchem.2021.107536
Source DB: PubMed Journal: Comput Biol Chem ISSN: 1476-9271 Impact factor: 2.877
Fig. 1Biological steps during drug target interaction (Chen et al., 2013).
Fig. 2Different computational approaches for DTIs prediction.
Evaluating the related work for computational methods of DTI prediction.
| ( | Matador database | Improves the similarity method, The DLS approach combines the prediction of a link with the binary network structure of the DPI prediction. The validation method applied ten times is applied in the trial. | |
| ( | Benchmark dataset | Calculate the NRLMFB from the similarity matrices and NRLMF score for all drugs and target pairs in the interaction matrix | |
| ( | downloaded from DrugBank | Using Lasso model for create protein and drug features | |
| ( | downloaded from DrugBank | Extract features and apply the CNN model for learning features and using machine and deep learning to classification (FCNN, SVM, RF, Autoencoder) | |
| ( | Benchmark dataset | First: using (PSEPSSM) and FP2 for extracting the features Secondly: using lasso for feature selection method then using the sampling techniques (SMOTE)Finally apply the RF classifier into the feature to prediction | |
| ( | Benchmark dataset | Feature generation using | |
| ( | downloaded from DrugBank | First using RDkit tools for extract features Then using the DBN technique for classification and prediction |
Unique drugs, targets and DTIs used to create the datasets.
| Sequences | Training | 16011 | 16011 | 5839 | 10712 | |
| Testing | 7926 | 7926 | 3012 | 4914 | ||
| Features | Training | 14000 | 14000 | 5620 | 8380 | |
| Testing | 4118 | 4118 | 1586 | 2532 |
Fig. 3Proposed model workflow where a) is the overall workflow for prediction, b) is the data extraction and preprocessing stage for the drug and protein sequences, and c) presents the stage of applying the learning methods and calculate the predication for each classifier.
The parameters of deep learning methods.
| 5 | Sigmoid | 10 | 100 | 32 | |
| 3 | Relu | 128 | 100 | 32 | |
| 2 | Relu | 256 | 100 | 32 |
Results of the deep, machine and ensemble techniques according to Accuracy, Mean Square Error, MCC Score and F1-score.
| F1-score | |||||
|---|---|---|---|---|---|
| DrugBank | 0.9277 | 0.072 | 0.848 | 0.88 | |
| Benchmark | |||||
| DrugBank | 0.917 | 0.056 | 0.89 | 0.885 | |
| Benchmark | 0.94 | 0.02 | 0.95 | 0.92 | |
| DrugBank | |||||
| Benchmark | 0.9744 | 0.0257 | 0.945 | 0.96 | |
| DrugBank | 0.93 | 0.07 | 0.85 | 0.915 | |
| Benchmark | 0.96 | 0.039 | 0.917 | 0.948 | |
| DrugBank | 0.938 | 0.0197 | 0.958 | 0.918 | |
| Benchmark | |||||
| DrugBank | 0.913 | 0.087 | 0.814 | 0.88 | |
| Benchmark | 0.97 | 0.029 | 0.938 | 0.96 | |
| DrugBank | 0.94 | 0.056 | 0.88 | 0.915 | |
| Benchmark |
The results of the deep, machine and ensemble techniques according to Time.
| ANN | DrugBank | 518.8 |
| benchmark | 501.5 | |
| DBN | DrugBank | |
| benchmark | ||
| CNN | DrugBank | 28080 |
| benchmark | 15642 | |
| Random Forest(RF) | DrugBank | 1.78 |
| benchmark | 1.28 | |
| SVM | DrugBank | 184.6 |
| benchmark | 53.12 | |
| LightBoost | DrugBank | 10.1 |
| benchmark | 12.31 | |
| XGBoost | DrugBank | 90.1 |
| benchmark | 52.14 | |
| ExtraTree | DrugBank | |
| Benchmark |
Fig. 4the results for the ROC curve and the value of the area under the curve (AUC) for the learning methods which shown the random forest and ANN method predict maximum value in the AUC = 0.937 for DrugBank data set in the bench mark data set the extra tree method predict the maximum value in the AUC = 0.982.
Fig. 5shows the precision and recall curve for sequence and features Data. The tradeoff between Precision and recall shows a different threshold. High space below the curve represents both high recall and high accuracy, where high resolution is related to low false positive, and high recall is associated with a low false negative rate. the precision and recall curve is better option for evaluating model performance.
the comparison between the related work and our work according to accuracy.
| convolutional neural networks (CNNs) ( | CNN | 0.923 |
| RF | 0.921 | |
| SVM | 0.908 | |
| LASSO-DNN ( | SVM | 0.81 |
| Proposed model | ANN | 0.9277 |
| RF | 0.947 | |
| SVM | 0.93 |
Predicted interact proteins for the drugs that it is influence on Covid-19 contain the drug name, predicted interact drugs, prediction probability.
| Remdesivir ( | GANAB_HUMAN( | 0.94 |
| Lopinavir ( | FAAH1_RAT( | 0.7 |
| Ritonavir ( | Cytochrome c oxidase polypeptide II( | 1 |
| Triazavirin ( | ENDR_PROTEIN( | 0.8 |
| Chloroquine ( | Peptidoglycan D,D-transpeptidase FtsI( | 0.8 |
| Darunavir ( | ACM1_HUMAN( | 0.8 |
Predicted interacted Drugs for the proteins that it is influence on Covid-19, mentioned at (Morgat et al., 2019), contain the drug name, predicted interact drugs, prediction probability.
| Angiotensin-converting enzyme 2( | Lisinopril ( | 00.6 |
| Spike glycoprotein( | ZINC00060939 | 0.8 |
| Nucleocapsid protein( | ZINC | 0.8 |
| Nucleoporin NSP1 | ZINC48807828 | 0.8 |
| Inclusion body matrix protein( | ZINC40895665 | 0.7 |
| Adipocyte differentiation-related protein ( | ZINC72116390 | 1 |
| Non-structural protein 7( | Bitolterol ( | 1 |
| ORF1ab polyprotein ( | ZINC13814083 | 1 |
| Cap-specific mRNA (nucleoside-2′-O-)-methyltransferase 1( | ZINC00171159 | 1 |
| Caveolin-2 ( | ZINC00137875 | 1 |
| Mitogen-activated protein kinase 8 ( | ZINC18710082 | 1 |
| Mitogen-activated protein kinase 9 | ZINC13491480 | 0.9 |
| Dihydroorotate dehydrogenase (quinone), mitochondrial ( | ZINC13726735 | 1 |
| RAC-beta serine/threonine-protein kinase ( | ZINC13339634 | 1 |
| RAC-gamma serine/threonine-protein kinase ( | ZINC40949491 | 0.7 |
| E2 glycoprotein (Q99A57) ( | Demecarium ( | 0.8 |
| Peptidyl-prolyl cis-trans isomerase ( | Dimetindene ( | 0.85 |