| Literature DB >> 35879939 |
Talha Burak Alakus1, Ibrahim Turkoglu2.
Abstract
Experimental approaches are currently used to determine viral-host interactions, but these approaches are both time-consuming and costly. For these reasons, computational-based approaches are recommended. In this study, using computational-based approaches, viral-host interactions of SARS-CoV-2 virus and human proteins were predicted. The study consists of four different stages; in the first stage viral and host protein sequences were obtained. In the second stage, protein sequences were converted into numerical expressions by various protein mapping methods. These methods are entropy-based, AVL-tree, FIBHASH, binary encoding, CPNR, PAM250, BLOSUM62, Atchley factors, Meiler parameters, EIIP, AESNN1, Miyazawa energies, Micheletti potentials, Z-scale, and hydrophobicity. In the third stage, a deep learning model was designed and BiLSTM was used for this. In the last stage, the protein sequences were classified, and the viral-host interactions were predicted. The performances of protein mapping methods were determined by accuracy, F1-score, specificity, sensitivity, and AUC scores. According to the classification results, the best classification process was obtained by the entropy-based method. With this method, 94.74% accuracy, and 0.95 AUC score were calculated. Then, the most successful classification process was performed with the Z-scale and 91.23% accuracy, and 0.96 AUC score were obtained. Although other protein mapping methods are not as efficient as Z-scale and entropy-based methods, they have achieved successful classification. AVL-tree, FIBHASH, binary encoding, CPNR, PAM250, BLOSUM62, Atchley factors, Meiler parameters and AESNN1 methods showed over 80% accuracy, F1-score, and AUC score. Accuracy scores of EIIP, Miyazawa energies, Micheletti potentials and hydrophobicity methods remained below 80%. When the results were examined in general, it was observed that the computational approaches were successful in predicting viral-host interactions between SARS-CoV-2 virus and human proteins.Entities:
Keywords: Covid-19; Deep learning; Protein mapping; SARS-CoV-2 virus
Year: 2022 PMID: 35879939 PMCID: PMC9301933 DOI: 10.1016/j.chemolab.2022.104622
Source DB: PubMed Journal: Chemometr Intell Lab Syst ISSN: 0169-7439 Impact factor: 4.175
Fig. 1Genetic structure of the SARS-CoV-2 virus [12].
AESNN1 values of amino acid codes.
| Amino Acid Code | AESNN1 Value | Amino Acid Code | AESNN1 Value |
|---|---|---|---|
| A | −0.99 | L | −0.92 |
| R | 0.28 | K | −0.63 |
| N | 0.77 | M | −0.80 |
| D | 0.74 | F | 0.87 |
| C | 0.34 | P | −0.99 |
| Q | 0.12 | S | 0.99 |
| E | 0.59 | T | 0.42 |
| G | −0.79 | W | −0.13 |
| H | 0.08 | Y | 0.59 |
| I | −0.77 | V | −0.99 |
A protein sequence P (S) = [A R N D C Q …] is mapped as C (S) = [-0.99 0.28 0.77 0.74 0.34 0.12 …] by the AESNN1 protein mapping method.
Atchley factors of amino acid codes.
| Amino Acid Code | bp | ss | mv | raac | ec |
|---|---|---|---|---|---|
| −0.591 | −1.302 | −0.733 | 1.570 | −0.146 | |
| 1.538 | −0.055 | 1.502 | 0.440 | 2.897 | |
| 0.945 | 0.828 | 1.299 | −0.169 | 0.933 | |
| 1.050 | 0.302 | −3.656 | −0.259 | −3.242 | |
| −1.343 | 0.465 | −0.862 | −1.020 | 0.255 | |
| 1.357 | −1.453 | 1.477 | 0.113 | −0.837 | |
| 0.931 | −0.179 | −3.005 | −0.503 | −1.853 | |
| −0.384 | 1.652 | 1.330 | 1.045 | 2.064 | |
| 0.336 | −0.417 | −1.673 | −1.474 | −0.078 | |
| −1.239 | −0.547 | 2.131 | 0.393 | 0.816 | |
| −1.019 | −0.987 | −1.505 | 1.266 | −0.912 | |
| 1.831 | −0.561 | 0.533 | −0.277 | 1.648 | |
| M | −0.663 | −1.524 | 2.219 | −1.005 | 1.212 |
| F | −1.006 | −0.590 | 1.891 | −0.397 | 0.412 |
| P | 0.189 | 2.081 | −1.628 | 0.421 | −1.392 |
| S | −0.228 | 1.399 | −4.760 | 0.670 | −2.647 |
| T | −0.032 | 0.326 | 2.213 | 0.908 | 1.313 |
| W | −0.595 | 0.009 | 0.672 | −2.128 | −0.184 |
| Y | 0.260 | 0.830 | 3.097 | −0.838 | 1.512 |
| V | −1.337 | −0.279 | −0.544 | 1.242 | −1.262 |
A protein sequence P (S) = [A R …] is mapped as C (S) = [[-0.591 -1.302 -0.733 1.570 -0.146] [1.538 -0.055 1.502 0.440 2.897] …] by the Atchley factors protein mapping method.
Hydrophobicity values of amino acid codes.
| Amino Acid Code | Hydrophobicity Value | Amino Acid Code | Hydrophobicity Value |
|---|---|---|---|
| A | 1.8 | M | 1.9 |
| C | 2.5 | N | −3.5 |
| D | −3.5 | P | −1.6 |
| E | −3.5 | Q | −3.5 |
| F | 2.8 | R | −4.5 |
| G | −0.4 | S | −0.8 |
| H | −3.2 | T | −0.7 |
| I | 4.5 | V | 4.2 |
| K | −3.9 | W | −0.9 |
| L | 3.8 | Y | −1.3 |
A protein sequence P (S) = [A C D E F G …] is mapped as C (S) = [1.8 2.5–3.5 -3.5 2.8 -0.4 …] by the hydrophobicity protein mapping method.
Meiler parameters of amino acid codes.
| Amino Acid Code | s | p | v | h | ip | hp | sp |
|---|---|---|---|---|---|---|---|
| 1.28 | 0.05 | 1.00 | 0.31 | 6.11 | 0.42 | 0.23 | |
| 2.34 | 0.29 | 6.13 | −1.01 | 10.74 | 0.36 | 0.25 | |
| 1.60 | 0.13 | 2.95 | −0.60 | 6.52 | 0.21 | 0.22 | |
| 1.60 | 0.11 | 2.78 | −0.77 | 2.95 | 0.25 | 0.20 | |
| 1.77 | 0.13 | 2.43 | 1.54 | 6.35 | 0.17 | 0.41 | |
| 1.56 | 0.15 | 3.78 | −0.64 | 3.09 | 0.42 | 0.21 | |
| 1.56 | 0.18 | 3.95 | −0.22 | 5.65 | 0.36 | 0.25 | |
| 0.00 | 0.00 | 0.00 | 0.00 | 6.07 | 0.13 | 0.15 | |
| 2.99 | 0.23 | 4.66 | 0.13 | 7.69 | 0.27 | 0.30 | |
| 4.19 | 0.19 | 4.00 | 1.80 | 6.04 | 0.30 | 0.45 | |
| 2.59 | 0.19 | 4.00 | 1.70 | 6.04 | 0.39 | 0.31 | |
| 1.89 | 0.22 | 4.77 | −0.99 | 9.99 | 0.32 | 0.27 | |
| 2.35 | 0.22 | 4.43 | 1.23 | 5.71 | 0.38 | 0.32 | |
| 2.94 | 0.29 | 5.89 | 1.79 | 5.67 | 0.30 | 0.38 | |
| 2.67 | 0.00 | 2.72 | 0.72 | 6.80 | 0.13 | 0.34 | |
| 1.31 | 0.06 | 1.60 | −0.04 | −5.70 | 0.20 | 0.28 | |
| T | 3.03 | 0.11 | 2.60 | 0.26 | 5.60 | 0.21 | 0.36 |
| W | 3.21 | 0.41 | 8.08 | 2.25 | 5.94 | 0.32 | 0.42 |
| Y | 2.94 | 0.30 | 6.47 | 0.96 | 5.66 | 0.25 | 0.41 |
| V | 3.67 | 0.14 | 3.00 | 1.22 | 6.02 | 0.27 | 0.49 |
A protein sequence P (S) = [A R …] is mapped as C (S) = [[1.28 0.05 1.00 0.31 6.11 0.42 0.23] [2.34 0.29 6.13 -1.01 10.74 0.36 0.25] …] by the Meiler parameters protein mapping method.
EIIP values of amino acid codes.
| Amino Acid Code | EIIP Value | Amino Acid Code | EIIP Value |
|---|---|---|---|
| 0.0823 | Q | 0.0761 | |
| 0.0548 | S | 0.0829 | |
| 0.0946 | A | 0.0373 | |
| 0.0516 | N | 0.0036 | |
| 0.0198 | G | 0.0050 | |
| 0.0829 | R | 0.0959 | |
| 0.0941 | I | 0 | |
| 0.0242 | D | 0.1263 | |
| 0.0057 | E | 0.0058 | |
| 0 | K | 0.0371 |
A protein sequence P (S) = [M W F Y P C …] is mapped as C (S) = [0.0823 0.0548 0.0946 0.0516 0.0198 0.0829 …] by the EIIP protein mapping method.
CPNR values of amino acid codes.
| Amino Acid Code | CPNR Value | Amino Acid Code | CPNR Value |
|---|---|---|---|
| 1 | Q | 29 | |
| 2 | S | 31 | |
| 3 | A | 37 | |
| 5 | N | 41 | |
| 7 | G | 43 | |
| 11 | R | 47 | |
| 13 | I | 53 | |
| 17 | D | 59 | |
| 19 | E | 61 | |
| 23 | K | 67 |
A protein sequence P (S) = [M W F Y P C …] is mapped as C (S) = [1 2 3 5 7 11 …] by the CPNR protein mapping method.
Binary encoding values of amino acid codes.
| Amino Acid Code | Binary Encoding Value | Amino Acid Code | Binary Encoding Value |
|---|---|---|---|
| 10000000000000000000 | M | 00000000001000000000 | |
| 01000000000000000000 | N | 00000000000100000000 | |
| 00100000000000000000 | P | 00000000000010000000 | |
| 00010000000000000000 | Q | 00000000000001000000 | |
| 00001000000000000000 | R | 00000000000000100000 | |
| 00000100000000000000 | S | 00000000000000010000 | |
| 00000010000000000000 | T | 00000000000000001000 | |
| 00000001000000000000 | V | 00000000000000000100 | |
| 00000000100000000000 | W | 00000000000000000010 | |
| 00000000010000000000 | Y | 00000000000000000001 |
A protein sequence P (S) = [A C D …] is mapped as C (S) = [10000000000000000000 010000000000000000 00100000000000000000 …] by the binary encoding protein mapping method.
PAM250 values of amino acid codes.
| Amino Acid Code | PAM250 Value | Amino Acid Code | PAM250 Value |
|---|---|---|---|
| 2 | L | 6 | |
| 6 | K | 5 | |
| 2 | M | 6 | |
| 4 | F | 9 | |
| 4 | P | 6 | |
| 4 | S | 3 | |
| 4 | T | 3 | |
| 5 | W | 17 | |
| 6 | Y | 10 | |
| 5 | V | 4 |
A protein sequence P (S) = [A R N D C Q …] is mapped as C (S) = [2 6 2 4 4 4 …] by the PAM250 protein mapping method.
BLOSUM62 values of amino acid codes.
| Amino Acid Code | BLOSUM62 Value | Amino Acid Code | BLOSUM62 Value |
|---|---|---|---|
| 4 | L | 4 | |
| 5 | K | 5 | |
| 6 | M | 5 | |
| 6 | F | 6 | |
| 9 | P | 7 | |
| 5 | S | 4 | |
| 5 | T | 5 | |
| 6 | W | 11 | |
| 8 | Y | 7 | |
| 4 | V | 4 |
A protein sequence P (S) = [A R N D C Q …] is mapped as C (S) = [4 5 6 6 9 5 …] by the BLOSUM62 protein mapping method.
Miyazawa energy values of amino acid codes.
| Amino Acid Code | Miyazawa Energy Value | Amino Acid Code | Miyazawa Energy Value |
|---|---|---|---|
| −0.02 | L | −0.32 | |
| 0.08 | K | 0.30 | |
| 0.10 | M | −0.25 | |
| 0.19 | F | −0.33 | |
| −0.32 | P | 0.11 | |
| 0.21 | S | 0.11 | |
| 0.15 | T | 0.05 | |
| −0.02 | W | −0.27 | |
| −0.02 | Y | −0.23 | |
| −0.28 | V | −0.23 |
A protein sequence P (S) = [A R N D C Q …] is mapped as C (S) = [-0.02 0.08 0.10 0.19–0.32 0.21 …] by the Miyazawa energies protein mapping method.
Micheletti potential values of amino acid codes.
| Amino Acid Code | Micheletti Potential Value | Amino Acid Code | Micheletti Potential Value |
|---|---|---|---|
| −0.001461 | L | −0.000782 | |
| 0.009875 | K | 0.005109 | |
| −0.001962 | M | 0.031655 | |
| −0.000531 | F | −0.013128 | |
| −0.002544 | P | −0.003621 | |
| 0.006456 | S | −0.000802 | |
| 0.008438 | T | 0.003269 | |
| 0.000990 | W | 0.131813 | |
| 0.001314 | Y | −0.007699 | |
| 0.006801 | V | 0.001445 |
A protein sequence P (S) = [A R N D C Q …] is mapped as C (S) = [-0.001461 0.009875 -0.001962 -0.000531 -0.002544 0.006456 …] by the Micheletti potentials protein mapping method.
FIBHASH values of amino acid codes.
| Amino Acid Code | FIBHASH Value | Amino Acid Code | FIBHASH Value |
|---|---|---|---|
| 1 | M | 9 | |
| 2 | N | 7 | |
| 3 | P | 16 | |
| 4 | Q | 17 | |
| 5 | R | 10 | |
| 8 | S | 11 | |
| 13 | T | 18 | |
| 6 | V | 12 | |
| 14 | W | 19 | |
| 15 | Y | 20 |
A protein sequence P (S) = [A C D E F G …] is mapped as C (S) = [1 2 3 4 5 8 …] by the FIBHASH protein mapping method.
AVL-tree values of amino acid codes.
| Amino Acid Code | AVL-tree Value | Amino Acid Code | AVL-tree Value |
|---|---|---|---|
| 4 | L | 3 | |
| 3 | K | 2 | |
| 0 | M | 4 | |
| 4 | F | 4 | |
| 3 | P | 3 | |
| 2 | S | 1 | |
| 2 | T | 3 | |
| 3 | W | 2 | |
| 1 | Y | 3 | |
| 3 | V | 4 |
A protein sequence P (S) = [A R N D C Q …] is mapped as C (S) = [4 3 0 4 3 2 …] by the AVL-tree protein mapping method.
Fig. 2Flow chart of the study.
Performances of protein mapping methods (Results are average values of 10 fold cross-validation).
| Protein Mapping Methods | Accuracy | Specificity | Sensitivity | F1-score | AUC |
|---|---|---|---|---|---|
| AESNN1 | 88.94% | 88.41% | 87.62% | 88.01% | 0.81 |
| Atchley factors | 82.58% | 76.77% | 78.79% | 77.76% | 0.83 |
| AVL-tree | 90.44% | 86.85% | 89.86% | 88.33% | 0.91 |
| Binary encoding | 86.20% | 83.84% | 84.86% | 84.34% | 0.88 |
| BLOSUM62 | 85.54% | 85.83% | 83.80% | 84.80% | 0.83 |
| CPNR | 88.52% | 84.84% | 86.84% | 85.82% | 0.87 |
| EIIP | 78.89% | 79.75% | 79.74% | 79.74% | 0.82 |
| Entropy-based | 94.74% | 88.91% | 90.90% | 89.89% | 0.95 |
| FIBHASH | 89.33% | 86.88% | 88.86% | 87.85% | 0.88 |
| Hydrophobicity | 79.65% | 76.77% | 73.74% | 75.22% | 0.77 |
| Meiler parameters | 81.10% | 82.78% | 83.83% | 83.30% | 0.82 |
| Micheletti potentials | 75.39% | 72.72% | 74.75% | 73.72% | 0.79 |
| Miyazawa energies | 77.31% | 73.70% | 75.70% | 74.68% | 0.81 |
| PAM250 | 87.63% | 85.83% | 85.84% | 85.83% | 0.87 |
| Z-scale | 91.23% | 88.90% | 88.88% | 88.85% | 0.96 |
Fig. 3ROC plots of protein mapping methods.