| Literature DB >> 33613009 |
Shalini Gour1, Jay Kant Yadav1.
Abstract
The emergence of novel coronavirus SARS-CoV-2 is responsible for causing coronavirus disease-19 (COVID-19) imposing serious threat to global public health. Infection of SARS-CoV-2 to the host cell is characterized by direct translation of positive single stranded (+ ss) RNA to form large polyprotein polymerase 1ab (pp1ab), which acts as precursor for a number of nonstructural and structural proteins that play vital roles in replication of viral genome and biosynthesis of new virus particles. The maintenance of viral protein homeostasis is essential for continuation of viral life cycle in the host cell. To test whether the protein homeostasis of SARS-CoV-2 can be disrupted by inducing specific protein aggregation, we made an effort to examine whether the viral proteome contains any aggregation prone regions (APRs) that can be explored for inducing toxic protein aggregation specifically in viral proteins and without affecting the host cell. This curiosity leads to the identification of several (> 70) potential APRs in SARS-CoV-2 proteome. The length of the APRs ranges from 5 to 25 amino acid residues. Nearly 70% of total APRs investigated are relatively smaller and found to be in the range of 5-10 amino acids. The maximum number of ARPs (> 50) was observed in pp1ab. On the other hand, the structural proteins such as, spike (S), nucleoprotein (N), membrane (M) and envelope (E) proteins also possess APRs in their primary structures which altogether constitute 30% of the total APRs identified. Our findings may provide new windows of opportunities to design specific peptide-based, anti-SARS-CoV-2 therapeutic molecules against COVID-19.Entities:
Keywords: Aggregation prone regions; COVID-19; Protein aggregations; SARS-CoV-2
Year: 2021 PMID: 33613009 PMCID: PMC7882052 DOI: 10.1007/s42485-021-00057-y
Source DB: PubMed Journal: J Proteins Proteom ISSN: 0975-8151
Location of newly identified aggregation prone regions in different proteins of SARS-CoV-2
| Amino acid sequences | Positions | Residues | Amino acid Sequence of APRs | Length of APRs | |
|---|---|---|---|---|---|
| Polyprotein polymerase 1ab | |||||
| Nsp1 | MESLVPGFNEKTHVQLSLPVLQVRDVLVRGFGDSVEEVLSEARQHLKDGTCGLVEVEKGVLPQLEQPYVFIKRSDARTAPHGHVMVELVAELEGIQYGRSGETLGVLVPHVGEIPVAYRKVLLRKNGNKGAGGHSYGADLKSFDLGDELGTDPYEDFQENWNTKHSSGVTRELMRELNGG | 1–180 | 180 | Nil | |
| Nsp2 | AVTRYVDNNFCGPDGYPLDCIKDFLARAGKSMCTLSEQLDYIESKRGVYCCRDHEHEIAWFTERSDKSYEHQTPFEIKSAKKFDTFKGECPKFVFPLNSKVKVIQPRVEKKKTEGFMGRIRSVYPVASPQECNNMHLSTLMKCNHCDEVSWQTCDFLKATCEHCGTENLVIEGPTTCGYLPTNAVVKMPCPACQDPEIGPEHSVADYHNHSNIETRLRKGGRTRCFGG | 181–818 | 637 | 409CVFAYV415 473VAIILASF480 565AAVTIL570 595VIIMAYVTG603 645AWEILKFLITGVF657 675VKCFIDVV682 | 6 8 6 9 13 8 |
| Nsp3 | APTKVTFGDDTVIEVQGYKSVNITFELDERIDKVLNEKCSAYTVELGTEVNEFACVVADAVIKTLQPVSELLTPLGIDLDEWSMATYYLFDESGEFKLASHMYCSFYPPDEDEEEGDCEEEEFEPSTQYEYGTEDDYQGKPLEFGATSAALQPEEEQEEDWLDDDSQQTVGQQDGSEDNQTTTIQTIVEVQPQLEMELTPVVQTIEVNSFSGYLKLTDNVYIKNADIVEEAKKVKPTVVVNAANVYLKHGGGVAGALNKATNNAMQVESDDYIATNGPLKVGGSCVLSGHNLAKHCLHVVGPNVNKGEDIQLLKSAYENFNQHEVLLAPLLSAGIFGADPIHSLRVCVDTVRTN | 819–2763 | 1945 | 1173VYLAVF1178 1295VLTAVV1300 1570VFTTV1574 1676LATALLT1682 1710FCALILAY1717 2171YFFTLLL2177 2229IIIWFLLLSVCLGSLI2244 2324VAEWFLAYILFTRFFYV2340 2363WLMWLIINLV2372 2384YIFFASFYYVW2394 2538INVIVF2543 2709IALIWNV2715 | 6 6 5 7 8 7 16 17 10 11 6 7 |
| Nsp4 | KIVNNWLKQLIK | 2764–3263 | 500 | 2776VTLVFLFVAAIFYLI2790 2853LIAAVIT2859 2975VVTTF2979 3052IVAIVVTCLAYYF3064 3077VAFNTLLFLMSFTVLCL3094 3104VYSVIYLYLTFYL3116 3138FWITIAYIICI3148 3153FYWFF3157 3180LCTFLL3185 | 15 7 5 13 17 13 11 5 6 |
| Nsp5 | SGFRKMAFPSGKVEGCMVQVTCGTTTLNGLWLDDVVYCPRHVICTSEDMLNPNYEDLLIRKSNHNFLVQAGNVQLRVIGHSMQNCVLKLKVDTANPKTPKYKFVRIQPGQTFSVLACYNGSPSGVYQCAMRPNFTIKGSFLNGSCGSVGFNIDYDCVSFCYMHHMELPTGVHAGTDLEGNFYGPFVDRQTAQAAGTDTT | 3264–3569 | 306 | 3463ITVNVLAWLYAAVI3476 | 14 |
| Nsp6 | SAVKRTIKGTHH | 3570–3859 | 290 | 3582WLLLTILTSLLVLV3595 3616MGIIAMSAFAMMFV3629 3635FLCLFL3640 3644LATVAYFNMVY3654 3683VMYASAVVLLILMT3696 3709WTLMNVLTLVY3719 3733MWALIISV3740 3747VVTTVMFLA3755 3758IVFMCV3763 3779IMLVYCFLGYFCTCYF3794 | 14 14 6 11 14 11 8 9 6 16 |
| Nsp7 | SKMSDVKCTS | 3860–3942 | 83 | 3870VVLLSVL3876 3911MVSLLSVLL3919 | 7 9 |
| Nsp8 | AIASEFSSLPSYAAFATAQEAYEQAVANGDSEVVLKKLKKSLNVAKSEFDRDAAMQRKLEKMADQAMTQMYKQARSEDKRAKVTSAMQTMLFTMLRKLDNDALNNIINNARDGCVPLNIIPLTTAAK | 3943–4140 | 198 | 128LMVVI132 184LIVTAL189 | 5 6 |
| Nsp9 | NNELSPVALRQMSCAAGTTQTACTDDNALAYYNTTKGGR | 4141–4253 | 113 | 4180FVLALL4185 | 6 |
| Nsp10 | AGNATEVPANST | 4254–4392 | 139 | 4266VLSFCAFA4273 | 8 |
| Nsp12 | SADAQSFLNRVCGVSAARLTPCGTGTSTDVVYRAFDIYNDKVAGFAKFLKTNCCRFQEKDEDDNLIDSYFVVKRHTFSNYQHEETIYNLLKDCPAVAKHDFFKFRIDGDMVPHISRQRLTKYTMADLVYALRHFDEGNCDTLKEILVTYNCCDDDYFNKKDWYDFVENPDILRVYANLGERVRQALLKTVQFCDAMRNAG | 4393–5324 | 932 | 4593IVGVL4597 4763LLVYA4767 4861LLFVV4865 | 5 5 5 |
| Nsp13 | AVGACVLCNSQTSLRCGACIRRPFLCCKCCYDHVISTSHKLVLSVNPYVCNAPGCDVTDVTQLYLGGMSYYCKSHKPPISFPLCANGQVFGLYKNTCVGSDNVTDFNAIATCDWTNAGDYILANTCTERLKLFAAETLKATEETFKLSYGIATVREVLSDRELHLSWEVGKPRPPLNRNYVFTGYRVTKNSKVQIGEYTFEKGDYGDAVVYRGTTTYKLNVGDYFVLTSHTVMPLSAPTLVPQEHYVRITGLYPTLNISDEFSSNVANYQKVGMQKYSTLQGPPGTGKSHFAIGLALYYPSARIVYTACSHAAVDALCEKALKYLPIDKCSRIIPARARVECFDKFKVNSTLEQYVFCTVNALPETTADIVVFDEISMATNYDLSVVNARLRAKHYVYIGDPAQLPAPRTLLTKGTLEPEYFNSVCRLMKTIGPDMFLGTCRRCPAEIVDTVSALVYDNKLKAHKDKSAQCFKMFYKGVITHDVSSAINRPQIGVVREFLTRNPAWRKAVFISPYNSQNAVASKILGLPTQTVDSSQGSEYDYVIFTQTTETAHSCNVNRFNVAITRAKVGILCIMSDRDLYDKLQFTSLEIPRRNVATLQ | 5325–5925 | 601 | ||
| Nsp14 | AENVTGLFKDCSKVITGLHPTQAPTHLSVDTKFKTEGLCVDIPGIPKDMTYRRLISMMGFKMNYQVNGYPNMFITREEAIRHVRAWIGFDVEGCHATREAVGTNLPLQLGFSTGVNLVAVPTGYVDTPNNTDFSRVSAKPPPGDQFKHLIPLMYKGLPWNVVRIKIVQMLSDTLKNLSDR | 5926–6452 | 527 | 6106VVFVLW6111 6306VCLFW6310 6431FSLWVY6436 | 6 5 6 |
| Nsp15 | SLEN | 6453–6798 | 346 | 6457VAFNVV6462 6571LTVFF6575 6779ISFMLW6784 | 6 5 6 |
| Nsp16 | SSQAWQPGVAMPNLYKMQRMLLEKCDLQNYGDSATLPKGIMMNVAKYTQLCQYLNTLTLAVPYNMRVIHFGAGSDKGVAPGTAVLRQWLPTGTLLVDSDLNDFVSDADSTLIGDCATVHTANKWDLIISDMYDPKTKNVTKENDSKEG | 6799–7096 | 298 | 6947FFTYICGFI6955 6985FAWWTAFV6992 7069ILSLL7073 | 9 8 5 |
Spike Protein | M | 1–1273 | 1273 | 2FVFLVL7 140FLGVYY145 510VVVLSF515 1060VVFL1063 1128VVIGIV1133 1215YIWLGFIAGLIAIVMVTI1232 | 6 6 6 4 6 18 |
| E-protein | MYSFVSEETGTLIVNS | 1–75 | 75 | 17VLLFLAFVVFLLVTLAIL34 | 18 |
| M-protein | MADSNGTITVEELKKLLEQWN | 1–222 | 222 | 22LVIGFLFLTWICLLQFA38 51LIFLWLL57 60VTLACFVLAAVY71 80IAIAMACLVGLMWLSYFI97 138LVIGAVIL145 | 17 7 12 18 8 |
| N-protein | MSDNGPQNQRNAPRITFGGPSDSTGSNQNGERSGARSKQRRPQGLPNNTASWFTALTQHGKEDLKFPRGQGVPINTNSSPDDQIGYYRRATRRIRGGDGKMKDLSPR | 1–419 | 419 | 108WYFYYL113 129GIIWV133 219LALLLL224 | 6 5 6 |
| Aβ (1–42) peptide | DAEFRHDSGYEVHHQK | 1–42 | 42 | 17LVFFA21 30AIIGLMVGGVVI41 | 5 12 |
Fig. 1Identification of aggregation prone regions (APRs) in the major proteins of SARS-CoV-2. The aggregation score and propensity in the predicted APRs found to be equivalent to the Abeta peptide, which serves as a classical β-structured aggregates
Fig. 2Consensus among different methods for the prediction of aggregation prone regions in the different proteins of SARS-CoV-2
Fig. 3Schematic representation of APR peptide-based inhibition of viral replication. The events in the region one represents usual cycle of infection, release of viral + ssRNA AND its direct translation to form pp1ab which subsequently forms all the nonstructural proteins (NSPS). The NSPS, are used in amplification of viral genomic + ssRNA, formation of structural and other accessory proteins. At the end genomic + ssRNA assemble with structural proteins to form new viral particles. The events depicted in the region 2 (left side) depict the events leading to APR peptide-based targeting of proteins formed from ORF1a/ORF1ab (pp1ab). Addition of APR peptides will interfere the protein folding reaction of viral proteins and subject them for proteasomal degradation in the host cell. Depletion of essential viral proteins will lead to complete halt of the viral replication and formation of new viral particles