| Literature DB >> 35429835 |
Abstract
A novel coronavirus (SARS-CoV-2) has devastated the globe as a pandemic that has killed millions of people. Widespread vaccination is still uncertain, so many scientific efforts have been directed toward discovering antiviral treatments. Many drugs are being investigated to inhibit the coronavirus main protease, 3CLpro, from cleaving its viral polyprotein, but few publications have addressed this protease's interactions with the host proteome or their probable contribution to virulence. Too few host protein cleavages have been experimentally verified to fully understand 3CLpro's global effects on relevant cellular pathways and tissues. Here, I set out to determine this protease's targets and corresponding potential drug targets. Using a neural network trained on cleavages from 392 coronavirus proteomes with a Matthews correlation coefficient of 0.985, I predict that a large proportion of the human proteome is vulnerable to 3CLpro, with 4898 out of approximately 20,000 human proteins containing at least one putative cleavage site. These cleavages are nonrandomly distributed and are enriched in the epithelium along the respiratory tract, brain, testis, plasma, and immune tissues and depleted in olfactory and gustatory receptors despite the prevalence of anosmia and ageusia in COVID-19 patients. Affected cellular pathways include cytoskeleton/motor/cell adhesion proteins, nuclear condensation and other epigenetics, host transcription and RNAi, ribosomal stoichiometry and nascent-chain detection and degradation, ubiquitination, pattern recognition receptors, coagulation, lipoproteins, redox, and apoptosis. This whole proteome cleavage prediction demonstrates the importance of 3CLpro in expected and nontrivial pathways affecting virulence, lead me to propose more than a dozen potential therapeutic targets against coronaviruses, and should therefore be applied to all viral proteases and subsequently experimentally verified.Entities:
Keywords: 3CLpro; COVID-19; Coronavirus; Machine learning; Neural networks; Protease; Proteomics; SARS-CoV-2
Mesh:
Substances:
Year: 2022 PMID: 35429835 PMCID: PMC8958254 DOI: 10.1016/j.compbiolchem.2022.107671
Source DB: PubMed Journal: Comput Biol Chem ISSN: 1476-9271 Impact factor: 3.737
Fig. 13CLpro cleavage site sequence logo plotted by WebLogo v2.8.2.(Crooks et al., 2004).
Fig. 2(A) One-hot encoded t-distributed stochastic neighbor embedding (t-SNE)(van der Maaten and Hinton, 2008) vs cleavage number and (B) vs genus demonstrating that variation within genomes is more important than variation between genomes. (C) Physiochemical encoded t-SNE vs cleavage number and (D) vs genus showing similar clustering although with worse separation.
Fig. 3(A) Information content vs residue position vs cleavage number and (B) vs genus similarly demonstrating that cleavage variation within single genomes is more important than variation between genomes.
Fig. 4Unscaled subgenera-averaged tanglegram of 3CLpro and respective cleavages based on BLOSUM62 substitution matrix similarity scores with and without default affine gap penalties (opening 10 and extension 0.2).
Fig. 5Entropy correlation coefficients (also known as symmetric uncertainties) between positions within the improved sequence logo.
Fig. 6Sequence bundle with charge-polarity-hydrophobicity encoding.(Kultys et al., 2014).
Fig. 7Random train/test split fraction vs MCC demonstrating that performance quickly approaches a limit for all classifiers.
Leave-one-(sub)genus-out resampling vs number of samples and MCCs showing that complete removal of distinct sequences of any lineage from the training set can reduce that lineage’s test accuracy independent of the number of number of positives and negatives or their imbalance.
| Genus | Subgenus | Positives | Negatives | % Positive | MCC |
|---|---|---|---|---|---|
| Alphaletovirus | All/MileCoV | 11 | 236 | 4.5% | 0.628 |
| AlphaCoV | All | 1241 | 9207 | 11.9% | 0.881 |
| ColaCoV | 139 | 337 | 29.2% | 0.986 | |
| DecaCoV | 287 | 1108 | 20.6% | 0.990 | |
| DuvinaCoV | 246 | 905 | 21.4% | 0.966 | |
| LuchaCoV | 68 | 374 | 15.4% | 0.985 | |
| MinaCoV | 239 | 661 | 26.6% | 0.973 | |
| MinunaCoV | 243 | 772 | 23.9% | 0.985 | |
| MyotaCoV | 179 | 453 | 28.3% | 0.986 | |
| NyctaCoV | 267 | 871 | 23.5% | 0.990 | |
| PedaCoV | 360 | 1147 | 23.9% | 0.977 | |
| RhinaCoV | 216 | 487 | 30.7% | 0.986 | |
| SetraCoV | 217 | 852 | 20.3% | 0.964 | |
| SunaCoV | 114 | 583 | 16.4% | 0.837 | |
| TegaCoV | 365 | 1716 | 17.5% | 0.891 | |
| BetaCoV | All | 1200 | 10,433 | 10.3% | 0.835 |
| EmbeCoV | 422 | 2741 | 13.3% | 0.925 | |
| HibeCoV | 42 | 378 | 10.0% | 0.973 | |
| MerbeCoV | 330 | 3346 | 9.0% | 0.829 | |
| NobeCoV | 142 | 1488 | 8.7% | 0.938 | |
| SarbeCoV | 381 | 2600 | 12.8% | 0.865 | |
| GammaCoV | All | 1761 | 5724 | 23.5% | 0.884 |
| BrangaCoV | 163 | 321 | 33.7% | 0.985 | |
| CegaCoV | 45 | 337 | 11.8% | 0.943 | |
| IgaCoV | 1706 | 5157 | 24.9% | 0.892 | |
| DeltaCoV | All | 111 | 2078 | 5.1% | 0.799 |
| AndeCoV | 11 | 352 | 3.0% | 0.664 | |
| BuldeCoV | 89 | 1403 | 6.0% | 0.859 | |
| HerdeCoV | 11 | 369 | 2.9% | 0.683 |
Summary of noteworthy cleavages and possible consequences.
| Pathway | Cleaved proteins | Uncleaved proteins | Possible consequences |
|---|---|---|---|
| Neurodegeneration | APP, tau, eIF4G1, DNAJC13, huntingtin, FUS, ataxin-1 | Neuropsychiatric symptoms | |
| Olfaction/gustation | Olfactory receptors, some adenylate cyclases and PDEs | GNAL, GNAS, some adenylate cyclases and PDEs | Anosmia and ageusia via neural cAMP signaling and programmed cell death |
| Cytoskeleton | Microfilaments, intermediate filaments, microtubules, spectrin | Altered vesicle and virus trafficking, DMV formation | |
| Motor proteins | Myosin, kinesin, dynein | Altered vesicle and virus trafficking | |
| Cell adhesion | Integrins, immunoglobulins, cadherins, selectins | Disrupted barriers and inflammation | |
| Ras superfamily | Rho, Rab, Ran, Arf | Ras | Altered vesicle and virus trafficking, disrupted barriers |
| Cilia | NPHP1/2/4/5/8, BBS1/4/8/9/12, ALMS1, CC2D2A, RP1, RPGRIP1, LCA5, PKD1/2, PKHD1, DNAAF2, IFT80, DNAH5/11, DNAI2, and RSPH6A | Ciliary dyskinesia, reduced mucociliary escalator effectiveness, and anosmia | |
| Coatomers and adaptors | COPII (SEC24A/24B/31 A/31B), retromer component VSP13B, synergin gamma, GGA1, LYST, and AP1/2/3/5 | COPI, clathrin, caveolae, AP4 | Altered vesicle and virus trafficking, DMV formation |
| Nucleus | NPC subunits, importins, exportins, lamins, chromatin remodeling proteins (HATs, HDACs, SMC proteins, separase, topoisomerase III alpha), some DNA methyltransferases and demethylases | CTCF, all other topoisomerases, some DNA methyltransferases and demethylases | Altered chromatin condensation |
| Ribosome | A few ribosomal proteins (RPL4/10 and RPS3A/19), nascent-chain detection and degradation (ZNF598, NEMF, and LTN1) | All other ribosomal proteins | Increased frameshifting |
| SRPs | SRP68/72 kDa | SRP9/14/19/54 kDa | Increased envelope protein membrane insertion and increased frameshifting |
| RNAi | DICER1, AGO2, PIWIL1/3 | Disrupted antiviral RNAi | |
| HERVs | Syncytin-1/2, PEG10, HERVK-5/6/7/8/9/10/18/19/21/24/25/113, HERVH-2q24.1/3, HHLA1, HERVFC1–1, HERVS71–1 | Inflammation, inhibited RTs and/or integrases which may normally produce antiviral RNAi | |
| Vault | MVP, TEP1, PARP4, TERT | Unknown | |
| PRRs | TLR2/3/6/8, CLEC4G/H1/H2/4 K/10 A/12 A/13 C/13E/14 A/16 A, KLRC3/C4/F1, ACG1, collectin-12, neurocan core protein, FREM1, layilin, PKD1, E-selectin, and thrombomodulin, NLRC2/3/4/5 and NLRP1/2/3/6/12/14 | Disrupted innate immune responses | |
| Downstream of PRRs | RIP1/2, NF-κB p65 and p100 subunits, CFLAR, TRIF, IRF2/9, DAXX, PI3K/AKT pathway (PIK3CB/G/D, PIK3R2/5/6, PIK3C2A, n/i/eNOS, TSC1/2), mTOR pathway (mTOR itself, SREBP1, RICTOR, PRR5L, ULK1/2, RBCC1, lipin-3, GRB10, FOXO1/3), MAPK pathway (MAP4K2/4/5, MAP3K1/4/5/8/9/10/12/15/16/17/18/19/21, KSR2, MAPK7/13/15), TFs (c-Jun, ATF2/6, CREB1/3/BP, SP1, OCT1/2, HSF2/2BP/4/5/X1/X2, Pol I initiator UBTF, Pol II initiators TFIID, multiple subunits of SL1 (TBP, TBPL2, TAF4/4B/6), and MED1/12/12 L/13/15/17/22/23/28, and Pol III initiators TFIIIC and SNAPC1/5) | Disrupted innate immune responses | |
| IFNs and receptors | All IFNs and their receptors | Uncleaved due to redundancy | |
| Downstream of IFNs | STAT1/2/4/5B and ISGs (GBP1, OAS1, PML, mitoferrin-1/2, TRIM5) | Disrupted innate immune responses | |
| Apoptosis | Pro-apoptotic (CASP2/4/5/14), anti-apoptotic (Bcl-2, BIRC1/2/3/6) | Unknown | |
| Lipid transport and adipokines | APOA5/B/C4/L1/(a), CETP, MTTP, and LRP2/6, leptin, leptin receptor, IL-6 | SR-B1 | Correlations with dyslipidemia and cardiovascular disease |
| Ubiquitin-proteasome system | E3s (NEDD4/4 L, SMURF1/2, WWP1, ITCH, HECW1), cullins, proteasomal subunits (PSMD1/3/4/5/8/11/14) | Ubiquitin | PLpro cleavages required for full understanding |
| Helicases | SKI and NEXT subunits (SKIV2L, SKIV2L2, TTC37, PAPD5, ZCCHC7, RBM7), exosome subunits (DIS3, DIS3L1), DHX36 | DDX1/5 | Enhanced viral RNA stability, yet both pro- and anti-viral helicases are cleaved |
| Coagulation | coagulation factors II, III, VIII, XII, XIII, plasmin(ogen), VWF, plasma kallikrein, kininogen-1, and fibronectin, PAI-2, megsin, A1AT, angiotensinogen, PZI, CBG, LEI, and HSP47 | Unknown | |
| Antiproteases | A2M and PZP near their bait regions | Other small antiproteases | Antiviral activity |
| Complement | C1/3/4/5 | Unknown, but if complement is activated, NETosis and hypercoagulability | |
| Redox-active centers | DUOX1/2, NOX5, XO | Antioxidants, selenocysteine synthesis | Inflammation, ROS |