| Literature DB >> 33036956 |
Lopamudra Dey1, Sanjay Chakraborty1, Anirban Mukhopadhyay2.
Abstract
BACKGROUND: COVID-19 (Coronavirus Disease-19), a disease caused by the SARS-CoV-2 virus, has been declared as a pandemic by the World Health Organization on March 11, 2020. Over 15 million people have already been affected worldwide by COVID-19, resulting in more than 0.6 million deaths. Protein-protein interactions (PPIs) play a key role in the cellular process of SARS-CoV-2 virus infection in the human body. Recently a study has reported some SARS-CoV-2 proteins that interact with several human proteins while many potential interactions remain to be identified.Entities:
Keywords: COVID-19; Classifier ensemble; Machine learning; Protein–protein interaction; SARS-CoV-2; Supervised classification
Mesh:
Substances:
Year: 2020 PMID: 33036956 PMCID: PMC7470713 DOI: 10.1016/j.bj.2020.08.003
Source DB: PubMed Journal: Biomed J ISSN: 2319-4170 Impact factor: 4.910
Fig. 1SARS-CoV-2 proteins' frequency of interactions with human proteins.
Fig. 2A glimpse of SARS-CoV-2-human PPI network. Purple ovals indicate SARS-CoV-2 proteins, blue ovals indicate human proteins and edges indicate SARS-CoV-2-human protein interactions.
Comparison of accuracy of all supervised learning algorithms on 1:1 positive:negative training dataset considering random sampling, subcellular localization and degree distribution of preparing negative samples.
| Algorithms | Degree Distributaion Accuracy | Random Sampling Accuracy | Subcellular Localization Accuracy |
|---|---|---|---|
| 68.97 | 57.76 | 54.51 | |
| 59.16 | 52.13 | 54.47 | |
| 67.16 | 56.06 | 53.09 | |
| KNN | 67.09 | 52.12 | 58.90 |
| NB | 61.38 | 53.08 | 54.78 |
| RF | 67.28 | 57.78 | 55.07 |
| XGBoost | 51.53 | 49.56 | 52.13 |
| AdaBoost | 49.53 | 56.78 | 55.67 |
| DMLP (epochs = 50, Batch-Size = 10) | 70.91 | 57.78 | 64.13 |
The seven clusters of amino acids based on their dipoles and side-chain volumes.
| Cluster number | Protein groups |
|---|---|
| Cluster 1 | A, G, V |
| Cluster 2 | I, L, F, P |
| Cluster 3 | Y, M, T, S |
| Cluster 4 | H, N, Q, W |
| Cluster 5 | R, K |
| Cluster 6 | D, E |
| Cluster 7 | C |
Fig. 3Block diagram of protein–protein interaction prediction methodology.
Comparison of cross-validation performance between all features vs selected best 38 features using all supervised learning algorithms on 1:1 positive and negative training dataset. The best accuracy values for each classifier are highlighted in boldface.
| Method | All features | Selected features | ||
|---|---|---|---|---|
| Accuracy | Kappa | Accuracy | Kappa | |
| 68.97 | 36.93 | 36.90 | ||
| 59.16 | 23.34 | 31.79 | ||
| 67.16 | 35.93 | 34.58 | ||
| KNN | 34.19 | 59.27 | 28.53 | |
| NB | 61.38 | 29.88 | 30.91 | |
| RF | 67.28 | 34.50 | 36.90 | |
| XGBoost | 51.53 | 23.01 | 28.23 | |
| AdaBoost | 49.53 | 20.31 | 24.02 | |
| DMLP(epochs = 50,Batch-Size = 10) | 68.91 | 36.61 | 38.72 | |
Comparison of performance of all supervised learning algorithms on blind dataset.
| Algorithms | Accuracy | Recall | Specificity | Precision | F1-Score |
|---|---|---|---|---|---|
| 69.67 | 58.06 | 73.33 | 62.85 | 67.68 | |
| 63.93 | 58.06 | 70 | 61.76 | 65.63 | |
| 68.03 | 56.64 | 80 | 64 | 70 | |
| KNN | 64.17 | 66.13 | 56.67 | 61.81 | 59.12 |
| NB | 65.03 | 65 | 56.45 | 66 | 65.18 |
| RF | 68.93 | 66.13 | 70 | 66.67 | 68.29 |
| XGBoost | 61.2 | 63 | 55.23 | 61 | 63.29 |
| AdaBoost | 54.3 | 59.84 | 60 | 60 | 60.17 |
| DMLP(epochs = 50,Batch-Size = 10) | 63.47 | 60 | 57.9 | 61.01 | 60.53 |
The top 10 high-degree predicted target human proteins with their degrees and average prediction scores.
| Protein Name | Degree | Average Prediction Score |
|---|---|---|
| THEM4 | 168 | 0.707725 |
| OMG | 156 | 0.724195 |
| RTN4RL1 | 154 | 0.772267 |
| ANXA4 | 135 | 0.846931 |
| TTC3 | 131 | 0.723774 |
| MYO1A | 130 | 0.749535 |
| TEX10 | 88 | 0.797231 |
| NOSIP | 81 | 0.755477 |
| PCSK1N | 76 | 0.711644 |
| MYH11 | 74 | 0.841619 |
The significant KEGG pathways of the predicted human proteins.
| KEGG Pathway | Protein Count | Predicted Human Proteins |
|---|---|---|
| Proteasome (p = 2.3615E-8) | 19 | PSMB10, SHFM1, PSMB8, PSMA2, PSMB4, PSMC5, PSMD12, PSMA6, PSMB1, PSMC4, PSMA5, PSMC3, PSMC2, PSMD1, PSMC1, PSMB2, POMP, PSMD4, PSMD7 |
| Endocytosis (p = 2.5126E-7) | 49 | LDLR, CHMP4B, TSG101, CHMP5, CAPZA2, CHMP6, PIP5K1C, CLTC, SMAP1, PIP5KL1, VPS4B, SPG21, KIF5B, KIF5A, RAB4A, HLA-A, HLA-B, HLA-E, HLA-F, ARPC1A, ARPC1B, RAB11FIP5, RAB11FIP3, CHMP1B, ACAP3, ACAP1, RAB5A, SH3GL1, VPS29, SNX5, SNX2, SNX1, ARPC4, HSPA1A, SNX4, ARPC5, ARFGEF2, CHMP2B, SH3GLB1, RAB11A, EHD1, EHD2, RAB31, ARF1, RAB35, ARF3, RAB22A, VPS28, DNM1 |
| Biosynthesis of antibiotics (p = 7.0351E-5) | 44 | LDOA, HSD17B10, LDHB, LDHA, ADPGK, PGAM1, HK2, HK1, ASL, AGXT, PDHB, FDFT1, GOT1, IDH3G, HK3, ENO2, IDH2, GCSH, IDH1, ENO3, PDHA2, CAT, RPIA, PDHA1, HADH, ENO1, SHMT1, PFKL, AK1, SUCLG1, FDPS, IDH3B, ACLY, PFKM, IDH3A, NME5, ALDH7A1, PYCR2, NME2, PKLR, MVK, PRPS2, CBS, PRPS1 |
| Carbon metabolism (p = 1.02674E-6) | 29 | ALDOA, ADPGK, GLUD1, PGAM1, HK2, HK1, AGXT, PDHB, GOT1, IDH3G, HK3, IDH2, ENO2, IDH1, ENO3, PDHA2, CAT, RPIA, PDHA1, ENO1, SHMT1, PFKL, SUCLG1, IDH3B, PFKM, IDH3A, PKLR, PRPS2, PRPS1 |
| Biosynthesis of amino acids (p = 5.6641E-6) | 21 | ALDOA, SHMT1, PFKL, PGAM1, IDH3B, PFKM, ASL, IDH3A, PYCR2, GOT1, IDH3G, PKLR, ENO2, IDH2, ENO3, IDH1, RPIA, PRPS2, CBS, ENO1, PRPS1 |
| Glycolysis/Gluconeogenesis (p = 6.9568E-6) | 20 | ALDOA, LDHB, LDHA, PFKL, ADPGK, HK2, PGAM1, HK1, PFKM, PDHB, ALDH3A1, G6PC, ALDH7A1, HK3, PKLR, ENO2, ENO3, PDHA2, PDHA1, ENO1 |
| Central carbon metabolism in cancer (p = 6.3733E-4) | 16 | PFKL, MET, HK2, PGAM1, RAF1, HK1, SIRT6, PFKM, PDHB, SLC16A3, SLC1A5, HK3, PDGFRB, PDHA2, MTOR, PDHA1 |
The predicted human proteins that interact with the proteins of other viruses.
| Virus | Number of overlapping proteins | Database Name | Number of human proteins present in the database | Reference |
|---|---|---|---|---|
| Dengue | 174 | DenvInt | 480 | [ |
| HIV-1 | 1290 | HIV-1 Human Interaction Database | 4667 | [ |
| HCV | 144 | HCVpro | 467 | [ |
| Ebola | 16 | Zhou et al. | 60 | [ |
| Zika | 5 | ZikaBase | 24 | [ |
| H1N1 | 160 | Shapira et al. | 617 | [ |
Fig. 4The diagrammatic representation of the predicted target human proteins that interact with the proteins of multiple viruses. Black circles represent human proteins. Square boxes represent different viruses. Interactions of human proteins with different viruses are represented as edges colored as per the color of the boxes of respective viruses.
List of drugs associated with the predicted target human proteins that interact with the proteins of at least 3 different viruses.
| Sl. No. | Drugs Name | Human Protein Name |
|---|---|---|
| 1. | Remicade, Etanercept, Adalimumab, Thalidomide, Inamrinone, Golimumab, Certolizumab Pegol, Chloroquine, Glucosamine, Clenbuterol | TNF |
| 2. | Atorvastatin, Cetrorelix | HSPD1 |
| 3. | Melatonin, Tretinoin, Gentamicin, Tenecteplase | CALR |
| 4. | Aspirin, Fluorouracil | HSPA5 |
| 5. | Amlexanox, Procaine | IKBKE |
| 6. | Thyroglobulin, Amikacin, Pembrolizumab | B2M |
| 7. | Carfilzomib, Bortezomib, Ixazomib Citrate | PSMB9 |
| 8. | Rifabutin | HSP90AA1 |
| 9. | Lovastatin, Zinc Sulfate, Doxorubicin, Prasterone, Progesterone, Octreotide, Epinephrine, Dactinomycin, Nandrolone Phenpropionate, Candicidin | PIK3CB |
| 10. | Lithium Citrate Hydrate, Lithium Carbonate, Fluoxetine | GSK3B |
| 11. | Albumin Human, Prednisone, Tretinoin, Ganciclovir, Triamcinolone, Irbesartan, Vitamin E, Lorazepam, Soybean Oil, Gonadotropin, Chorionic | APOE |
| 12. | Mesalamine, Aminosalicylic Acid, Sulfasalazine, Acetylcysteine, Ascorbate | CHUK |