| Literature DB >> 34960638 |
Heidy Elkhaligy1,2, Christian A Balbin1, Jessica L Gonzalez1, Teresa Liberatore1, Jessica Siltberg-Liberles1,2.
Abstract
Most viruses have small genomes that encode proteins needed to perform essential enzymatic functions. Across virus families, primary enzyme functions are under functional constraint; however, secondary functions mediated by exposed protein surfaces that promote interactions with the host proteins may be less constrained. Viruses often form transient interactions with host proteins through conformationally flexible interfaces. Exposed flexible amino acid residues are known to evolve rapidly suggesting that secondary functions may generate diverse interaction potentials between viruses within the same viral family. One mechanism of interaction is viral mimicry through short linear motifs (SLiMs) that act as functional signatures in host proteins. Viral SLiMs display specific patterns of adjacent amino acids that resemble their host SLiMs and may occur by chance numerous times in viral proteins due to mutational and selective processes. Through mimicry of SLiMs in the host cell proteome, viruses can interfere with the protein interaction network of the host and utilize the host-cell machinery to their benefit. The overlap between rapidly evolving protein regions and the location of functionally critical SLiMs suggest that these motifs and their functional potential may be rapidly rewired causing variation in pathogenicity, infectivity, and virulence of related viruses. The following review provides an overview of known viral SLiMs with select examples of their role in the life cycle of a virus, and a discussion of the structural properties of experimentally validated SLiMs highlighting that a large portion of known viral SLiMs are devoid of predicted intrinsic disorder based on the viral SLiMs from the ELM database.Entities:
Keywords: SLiMs; intrinsically disordered protein regions; short eukaryotic linear motifs; the ELM database; viral-host protein interaction
Mesh:
Substances:
Year: 2021 PMID: 34960638 PMCID: PMC8703344 DOI: 10.3390/v13122369
Source DB: PubMed Journal: Viruses ISSN: 1999-4915 Impact factor: 5.048
Figure 1Predicted structural features of 260 viral SLiMs from the ELM database. The percentage of viral motifs with a certain disorder content as inferred from IUPRED prediction using a cutoff of (a) 0.5 and (b) 0.4. (c) The percentage of viral motifs with a certain Mean IUPRED Disorder Score (MIDS). The percentage of viral motifs with a certain (d) secondary structure (coil) and (e) surface accessibility content as inferred from NetSurfP-2.0 prediction. The percentages shown are approximate; rounded to the nearest whole number for a, b, d, and e, and to the nearest tenth for c. See also Table S1.
Figure 2The general lytic virus life cycle inside the cells. (1) The virion attaches to the cell surface receptors. (2) The penetration of the virus through endocytosis to the infected cell. (3) The replicated genome and translated viral proteins inside the cell. (4) The newly assembled viruses inside the cell. (5) The cell lysis and release of new viruses from the infected cell. Created with BioRender.com (accessed on 30 October 2021).
Figure 3The furin cleavage site in the envelope glycoprotein from HIV. Sequences were identified with BLAST using the envelope protein (accession: NP_057856.1) from HIV-1 as query. Sequence names shown in red represents true positive instances from the ELM database [27]. The multiple sequence alignment (MSA) was built with MAFFT+L-INS-i [57] in Jalview [58]. The regular expression pattern R.[RK]R. from motif CLV_PCSK_FUR_1 in the ELM database [27] was identified using Find in Jalview, shown in black with white text. The region shown under Sequence shows the amino acids that corresponds to the true positive motif from ENV_HIV1 plus one additional site on each side. The three additional heatmaps display the same region of the alignment colored by property. The heatmap for Disorder propensity displays disordered (magenta) or ordered (purple) residues based on IUPRED prediction with cutoff = 0.4 [35,36,59]. Heatmaps for (1) Surface accessibility displays surface exposed (magenta) and buried (white) residues and (2) Secondary structure displays coil (orange) and secondary structure (helix: blue, strand: magenta) based on NetSurfP-2.0 predictions.
Figure 4The G3BP binding motif has been verified in the nsp3 protein from Chikungunya virus and Semliki Forest virus from Alphaviruses. Sequences were identified with BLAST using residues 1700–2000 from nsp3 (accession: Q5XXP4) from Chikungunya virus as query. Sequence names shown in red represents true positive instances from the ELM database [27]. The multiple sequence alignment was built with MAFFT+L-INS-i [57] in Jalview [58]. The regular expression pattern [FYLIMV].FG[DES]F from motif LIG_G3BP_FGDF_1 in the ELM database [27] was identified using Find in Jalview, shown in black with white text. The region shown under Sequence shows the amino acids that corresponds to the true positive motifs from Chikungunya virus and Semliki Forest virus, the connecting amino acids, plus one additional site on each side. The MSA and heatmaps for Disorder, Surface, and Structure are colored as in Figure 3.
Figure 5The pLxIS site in nsp1 from Simian rotavirus. Sequences were identified with BLAST using full-length nsp1 from Simian rotavirus (accession: AFY98633.1) as query. Sequence names shown in red represents true positive instances from the ELM database [27]. The multiple sequence alignment was built with MAFFT+L-INS-i [57] in Jalview [58]. The regular expression pattern [VILPF].{1,3}L.I(S) from motif LIG_IRF3_LxIS_1 in the ELM database was identified using Find in Jalview, shown in black with white text. The region shown under Sequence shows the amino acids that corresponds to the true positive motif from Simian rotavirus plus one additional site on each side. The MSA and heatmaps for Disorder, Surface, and Structure are colored as in Figure 3.
Figure 6The PDZ domain binding motif in the E6 protein from HPV16 and HPV18. Sequences were identified with BLAST using protein E6 from HPV18 (accession: P06463.1) as query. Sequence names shown in red represents true positive instances from the ELM database [27]. The multiple sequence alignment (MSA) was built with MAFFT+L-INS-i [57] in Jalview [58]. The regular expression pattern …[ST].[ACVILF]$ from motif LIG_PDZ_Class_1 in the ELM database [27] was identified using Find in Jalview, shown in black with white text. The region shown under Sequence shows the amino acids that corresponds to the true positive motif from HPV16 and HPV18 plus one additional site on each side. The MSA and heatmaps for Disorder, Surface, and Structure are colored as in Figure 3.
Figure 7The PPxY motif in the matrix protein VP40 from Ebola virus. Sequences were identified with BLAST using full-length VP40 from Ebola virus (accession: Q05128) as query against the refseq_protein and nr databases. Sequence names shown in red represents true positive instances from the ELM database [27]. The multiple sequence alignment was built with MAFFT+L-INS-i [57] in Jalview [58]. The regular expression pattern PP.Y from motif LIG_WW_1 in the ELM database [27] was identified using Find in Jalview, shown in black with white text. The region shown under Sequence corresponds to the true positive motif from Zaire Ebola virus and Marburg marburg virus plus one additional site on each side. It should be noted that query protein Q05128 Uniprot ID is identical to protein NP_066245.1 used in the multiple sequence alignment.
Figure 8Cellular context. Subcellular localization of SARS-CoV-2 proteins (circles) in human cells based on experimental data (thick border: multiple sources; dotted border: [127]; thin black border: [128]; white border: [129,130,131]). (a). Each protein is colored as in the SARS-CoV-2 proteome (b). Proteins that form complexes are colored similarly; nsp 3/4/6, nsp 7/8/12, nsp 10/14. SARS-CoV-2 proteins localize to the following organelles: lysosome (nsp2, orf3a, and orf7b), endosome (orf3a and orf6), plasma membrane (envelope (E), membrane (M), spike (S), and orf3a), Golgi apparatus (E, M, S, nsp5, nsp15, orf6, orf7a, and orf7b), endoplasmic reticulum (E, M, S, nsp6-10, nsp14, orf6, orf7b, orf8, and orf10), nucleolus (E, nsp1, nsp3, nsp5-7, nsp9-10, nsp12-16 and orf9a-9b), punctate cytoplasm (M, nsp1, nsp2, nsp5, nsp7-10, nsp12-16, orf3a, and orf6), and diffuse cytoplasm (E, M, nucleocapsid (N), S, nsp1-16, nsp10, nsp12-16, orf3a-3b, orf6, orf7a-7b, orf8, orf9a-9b, and orf10). Created with BioRender.com (accessed on 30 October 2021).