| Literature DB >> 33433784 |
Talha Burak Alakus1, Ibrahim Turkoglu2.
Abstract
The new type of corona virus (SARS-COV-2) emerging in Wuhan, China has spread rapidly to the world and has become a pandemic. In addition to having a significant impact on daily life, it also shows its effect in different areas, including public health and economy. Currently, there is no vaccine or antiviral drug available to prevent the COVID-19 disease. Therefore, determination of protein interactions of new types of corona virus is vital in clinical studies, drug therapy, identification of preclinical compounds and protein functions. Protein-protein interactions are important to examine protein functions and pathways involved in various biological processes and to determine the cause and progression of diseases. Various high-throughput experimental methods have been used to identify protein-protein interactions in organisms, yet, there is still a huge gap in specifying all possible protein interactions in an organism. In addition, since the experimental methods used include cloning, labeling, affinity purification mass spectrometry, the processes take a long time. Determining these interactions with artificial intelligence-based methods rather than experimental approaches may help to identify protein functions faster. Thus, protein-protein interaction prediction using deep-learning algorithms has been employed in conjunction with experimental method to explore new protein interactions. However, to predict protein interactions with artificial intelligence techniques, protein sequences need to be mapped. There are various types and numbers of protein-mapping methods in the literature. In this study, we wanted to contribute to the literature by proposing a novel protein-mapping method based on the AVL tree. The proposed method was inspired by the fast search performance on the dictionary structure of AVL tree and was used to verify the protein interactions between SARS-COV-2 virus and human. First, protein sequences were mapped by both the proposed method and various protein-mapping methods. Then, the mapped protein sequences were normalized and classified by bidirectional recurrent neural networks. The performance of the proposed method was evaluated with accuracy, f1-score, precision, recall, and AUC scores. Our results indicated that our mapping method predicts the protein interactions between SARS-COV-2 virus proteins and human proteins at an accuracy of 97.76%, precision of 97.60%, recall of 98.33%, f1-score of 79.42%, and with AUC 89% in average.Entities:
Keywords: AVL tree; COVID-19; Deep learning; Protein mapping; SARS-COV-2
Mesh:
Substances:
Year: 2021 PMID: 33433784 PMCID: PMC7801232 DOI: 10.1007/s12539-020-00405-4
Source DB: PubMed Journal: Interdiscip Sci ISSN: 1867-1462 Impact factor: 2.233
Fig. 1Genomic structure of SARS-COV-2 virus
Interacting COVID-19 and human proteins and the total number of these proteins
| COVID-19 proteins | Interacting human proteins | Total number of proteins |
|---|---|---|
| orf3a | ALG5, ARL6IP6, CLCC1, HMOX1, SUN2, TRIM59, VPS11, VPS39 | 8 |
| orf3b | STOML2 | 1 |
| orf6 | MTCH1, NUO98, RAE1 | 3 |
| orf7a | HEATR3, MDN1 | 2 |
| orf8 | ADAM9, ADAMTS1, CHPF2, CHPF, CISD3, COL6A1, DNMT1, EDEM3, EMC1, ERLEC1, ERO1LB, ERP44, FBXL12, FKBP7, FKBP10, FOXRED2, GDF15, GGH, HS6ST2, HYOU1, IL17RA, INHBE, ITGB1, KDELC1, KDELC2, LOX, MFGE8, NEU1, NGLY1, NPC2, NPTX1, OS9, PCSK6, PLAT, PLD3, PLEKHF2, PLOD2, POFUT1, PUSL1, PVR, SDF2, SIL1, SMOC1, STC2, TM2D3, TOR1A, UGGT2 | 47 |
Fig. 2Insertion of amino acid codes to the AVL tree
Fig. 3Depth values of each amino acid
Fig. 4Experimental design of the study
Fig. 5Validation process of the study. While 85% of the original dataset consists of training and validation data, the remaining 15% is the blind dataset. After completing the iteration process, the performances of the mapping methods were determined with the blind dataset
Average interaction prediction results of orf3a protein with other human proteins (all values were obtained by averaging the tenfold cross-validation process)
| Mapping methods | Accuracy | Precision | Recall | f1-score | AUC score |
|---|---|---|---|---|---|
| CPNR | 0.8398 | 0.7541 | 0.8911 | 0.8106 | 0.92 |
| EIIP | 0.8144 | 0.7200 | 0.8394 | 0.7737 | 0.91 |
| Hydrophobicity | 0.8019 | 0.7263 | 0.8488 | 0.7788 | 0.89 |
| The proposed method | 0.8107 | 0.8177 | 0.8169 | 0.8163 | 0.89 |
Average interaction prediction results of orf3b protein with other human proteins (all values were obtained by averaging the tenfold cross-validation process)
| Mapping methods | Accuracy | Precision | Recall | f1-score | AUC score |
|---|---|---|---|---|---|
| CPNR | 0.8065 | 0.6429 | 0.6953 | 0.6652 | 0.91 |
| EIIP | 0.6796 | 0.4770 | 0.4918 | 0.4843 | 0.67 |
| Hydrophobicity | 0.7472 | 0.7926 | 0.6406 | 0.7064 | 0.79 |
| The proposed method | 0.9065 | 0.8429 | 0.8953 | 0.8652 | 0.93 |
Average interaction prediction results of orf6 protein with other human proteins (all values were obtained by averaging the tenfold cross-validation process)
| Mapping methods | Accuracy | Precision | Recall | F1-score | AUC score |
|---|---|---|---|---|---|
| CPNR | 0.7790 | 0.6629 | 0.7678 | 0.7081 | 0.90 |
| EIIP | 0.4109 | 0.4680 | 0.4888 | 0.4781 | 0.43 |
| Hydrophobicity | 0.6921 | 0.7225 | 0.6539 | 0.6827 | 0.81 |
| The proposed method | 0.8523 | 0.7291 | 0.7231 | 0.7176 | 0.93 |
Average interaction prediction results of orf7a protein with other human proteins (all values were obtained by averaging the tenfold cross-validation process)
| Mapping methods | Accuracy | Precision | Recall | f1-score | AUC score |
|---|---|---|---|---|---|
| CPNR | 0.8457 | 0.8182 | 0.8121 | 0.8133 | 0.90 |
| EIIP | 0.3840 | 0.4052 | 0.4385 | 0.4198 | 0.40 |
| Hydrophobicity | 0.8145 | 0.7939 | 0.7402 | 0.7147 | 0.85 |
| The proposed method | 0.8891 | 0.8181 | 0.9310 | 0.8709 | 0.95 |
Average interaction prediction results of orf8 protein with other human proteins (all values were obtained by averaging the tenfold cross-validation process)
| Mapping methods | Accuracy | Precision | Recall | f1-score | AUC score |
|---|---|---|---|---|---|
| CPNR | 0.6409 | 0.6203 | 0.5927 | 0.6015 | 0.75 |
| EIIP | 0.6085 | 0.5729 | 0.4329 | 0.4914 | 0.78 |
| Hydrophobicity | 0.8002 | 0.7377 | 0.6528 | 0.6926 | 0.91 |
| The proposed method | 0.8078 | 0.7567 | 0.6605 | 0.7011 | 0.93 |
Fig. 6ROC and PR plots of protein-mapping methods that predicted the interaction between orf3a protein and human proteins
Fig. 7ROC and PR plots of protein-mapping methods that predicted the interaction between orf3b protein and human proteins
Fig. 8ROC and PR plots of protein-mapping methods that predicted the interaction between orf6 protein and human proteins
Fig. 9ROC and PR plots of protein-mapping methods that predicted the interaction between orf7a protein and human proteins
Fig. 10The interaction network of orf3a, orf3bb, orf6, and orf7a proteins
Fig. 11ROC and PR plots of protein-mapping methods that predicted the interaction between orf8 protein and human proteins
Fig. 12The interaction network of orf8 protein
Average protein interaction accuracy results of all protein-mapping methods
| Mapping method | Interaction accuracy |
|---|---|
| CPNR | 78.24% |
| EIIP | 57.95% |
| Hydrophobicity | 77.12% |
| The proposed method | 85.33% |