Literature DB >> 31427097

Resources to Discover and Use Short Linear Motifs in Viral Proteins.

Peter Hraber¹, Paul E O'Maille², Andrew Silberfarb³, Katie Davis-Anderson⁴, Nicholas Generous⁵, Benjamin H McMahon⁶, Jeanne M Fair⁴.

Abstract

Viral proteins evade host immune function by molecular mimicry, often achieved by short linear motifs (SLiMs) of three to ten consecutive amino acids (AAs). Motif mimicry tolerates mutations, evolves quickly to modify interactions with the host, and enables modular interactions with protein complexes. Host cells cannot easily coordinate changes to conserved motif recognition and binding interfaces under selective pressure to maintain critical signaling pathways. SLiMs offer potential for use in synthetic biology, such as better immunogens and therapies, but may also present biosecurity challenges. We survey viral uses of SLiMs to mimic host proteins, and information resources available for motif discovery. As the number of examples continues to grow, knowledge management tools are essential to help organize and compare new findings.

Entities: Chemical Disease Gene Mutation Species

Keywords: cell signaling; gene ontology; host–pathogen interactions; immune modulation; molecular mimicry; short linear motifs

Mesh：

Substances：
Viral Proteins

Year: 2019 PMID： 31427097 PMCID： PMC7114124 DOI： 10.1016/j.tibtech.2019.07.004

Source DB: PubMed Journal: Trends Biotechnol ISSN： 0167-7799 Impact factor: 19.536

How Viruses Do More with Less

Viruses exploit host cellular processes to replicate, and have developed myriad ways to subvert host immune defenses. Molecular mimicry (see Glossary) is a common and effective strategy, enabling a pathogen to usurp host protein function by resemblance 1, 2. Molecular mimicry varies over a continuum, from one extreme that includes sequence and structural similarity (i.e., orthologs) of entire proteins, to another extreme of chemical similarity at only a few localized sites, as is the case for short linear motifs (SLiMs). The growing body of literature on SLiMs indicates that some important virus–host interactions can be attributed to a few well-chosen AAs 3, 4, 5, 6, 7. Rather than devote entire proteins to one function, SLiMs enable multifunctional viral proteins. Interactions between globular virus and host proteins have picomolar affinities, while SLiMs have micromolar binding affinities with globular host proteins [8]. Moderate binding affinity of SLiMs facilitates disruption of signaling interactions, rather than competing for stable formation of persistent protein complexes. Synthetic biology practitioners can benefit from an introduction to how SLiMs enable viral interference with host cell functions and computational resources available for SLiM analysis. Viral SLiMs are potentially useful in synthetic biology, to provide a toolkit for new functions, for example, to modulate immune responses or to complement and interact with newly developed adjuvants in a synergistic manner [9]. Research efforts to develop broad-spectrum antiviral compounds or design broadly cross-protective vaccine immunogens benefit directly from knowledge of gene products, protein functions, and motifs involved with viral immune interference. N-linked glycosylation of the PNG sequon is a well-known example used by viral glycoproteins as camouflage against immune recognition 10, 11. The distribution of N-linked glycosylation sites has recently been recognized as essential for the design of immunogens to induce broadly cross-reactive immune protection against such challenging viruses as HIV-1 12, 13. Motifs associated with cellular trafficking (localization, transport, secretion, and sequestration) are readily edited to modify where expression products go, and change interaction profiles with other proteins [14]. In addition to motifs that stabilize the structure of immunogens, such as trimerization (‘foldon’) 15, 16, 17, 18 and dimerization 19, 20 domains, motifs that interact with cellular processes for innate antiviral pathways could be used to enhance immunogenicity. While SLiMs in eukaryotic proteins have been discussed extensively, SLiM involvement in viral immunomodulation remains less thoroughly explored, and suggests new opportunities for use in engineered biotechnology applications. The ability to transfer genetic components across species, or to introduce such components de novo, enables new functions. While such functions are generally well intended, some risk also exists for harmful effects. Subject to the technical advances of synthetic biology, such effects are not necessarily a taxonomically relevant property. It may be necessary to evaluate risks of new functions by other means than taxonomy or even protein functional evaluations. Instead, new methods are needed that assess functions at a finer resolution than the gene, whether by computational analysis or functional phenotypic assessments 21, 22, 23. SLiM analysis might help with such assessments.

Viral Immunomodulatory Proteins

Viral proteins can modulate immunity in several ways, which include: shutdown of host macromolecular synthesis, inhibiting antigen production or apoptosis, and interference with such processes as antigen presentation by MHC, natural killer (NK) cell function, antiviral cytokines, or interferon responses. Each of these processes involves coordination among multiple components in host cells. Viral interference with these functions is frequently attributed to entire proteins, but in some important cases has been localized to SLiMs.

Signaling Interference

Because of their compact size, SLiMs are modular, rapidly evolvable sequence elements. Different instances of a given SLiM can vary in sequence while maintaining the overall functional profile, that is, the regular expression for the sequence motif, where a few positions are invariant while other positions tolerate numerous substitutions. Thus, partial sequence matches are sufficient for transient binding interactions with target domains, for example, signal transduction proteins. This observation led to the proposal of ex nihilo SLiM evolution – the evolution of a novel SLiM ‘from nothing’ – the appearance of a new functional module from a previously nonfunctional region of protein sequence [24]. Because hosts’ interaction networks are often conserved, SLiMs represent a significant vulnerability for opportunistic exploitation. These properties enable pathogens to acquire host-like SLiMs rapidly through ex nihilo convergent evolution, to rewire host interaction networks, and to acquire tropism and virulence traits needed for successful adaptation and propagation [25]. Over 200 motifs are known, with 2,400 validated instances, and many more motifs may await discovery 3, 26. Focus on viral motifs may reveal practical utility, to broaden the repertoire of tools available to reprogram molecular function in synthetic biology. One example of how viral proteins use SLiMs to subvert host cell function is illustrated by Epstein–Barr virus (EBV), which persists in resting memory B cells of nearly all (>95%) individuals throughout their adult lives [5]. Latent membrane protein 1 (LMP1) is central to EBV persistence. The cytoplasmic tail of this membrane-bound protein includes PxxPxP and PxQxT motifs that recruit signaling proteins (JAK3, that is, Janus kinase 3, and several TRAFs, tumor necrosis factor receptor-associated factors, respectively) [5]. Together, the motifs mimic the cytoplasmic domain of CD40 to activate nuclear factor-κB via intermediates, including a third motif YYD$ (where $ denotes the C terminus), the TRADD (tumor necrosis factor receptor-associated death domain) binding domain. The overall result is that LMP1 inhibits apoptosis and infected B cell proliferation, to confer viral persistence [5]. Other examples of viral SLiM contributions to motif mimicry involved with immune function include: protein degradation, transcription, translation, and transport into and out of the nucleus [5].

Viral Gene Ontologies

Given the continued growth of this field [27], established frameworks can manage and exploit this knowledge beyond catalogues of currently known motifs 5, 7, 28 or details on contributions of one viral protein (e.g., 14, 29). Work to use SLiMs in bioengineering can benefit from understanding viral protein function. This information is organized in viral knowledge bases, such as ViralZone (Box 1 ). Ontologies describe systematically the many different functional roles of viral proteins, including immune evasion. By promoting use of standard terms for relationships between concepts, an ontology arranges concepts into a framework that can be updated as knowledge grows. Protein function is captured broadly in such a framework, though the nuanced details of interactions with other molecules are not localized to domains or motifs. A bridge that links literature reports to GO term annotation, ViralZone is an online knowledge base that contains ‘textbook’ information about viral taxonomy, replication, genome organization, and virion structure, and provides links to viral sequence data 62, 63. Importantly, ViralZone staff collaborate with the GO Consortium to define entries for virus-specific molecular functions 63, 64, 65, 66. ViralZone cross-references its keywords with GO Consortium terms and UniProt [67] identifiers. This makes it possible to search for viral proteins by their functional role. ViralZone staff have developed GO concepts specifically for viruses, to represent the diversity of viral replication and processes involved with viral entry, replication, and egress [66]. ViralZone staff have also developed a detailed listing of virus–host interactions, with entries for 65 functions and 57 GO terms 64, 65. Each entry (‘keyword’) has a unique identifier. Unlike Enzyme Commission (EC) numbers [68], ViralZone IDs are arbitrary numbers and do not indicate position in the concept hierarchy. Instead, organization of the keyword hierarchy is provided online. The web address https://viralzone.expasy.org/886 is an entry point into the ViralZone concept space (Figure I).

Figure I

ViralZone Summarizes What Is Known about Viral Proteins Involved in Virus–Host Interactions.

Blue text indicates a link to more detail. Shown is the vertebrate host–virus interactions page [64]. Also available are summaries for invertebrate, plant, and bacterial host–virus interactions.

ViralZone Summarizes What Is Known about Viral Proteins Involved in Virus–Host Interactions. Blue text indicates a link to more detail. Shown is the vertebrate host–virus interactions page [64]. Also available are summaries for invertebrate, plant, and bacterial host–virus interactions. Alt-text: Box 1 GO is an authoritative resource for annotating functions of gene sequences 30, 31. An example of interest is ‘evasion or tolerance by virus of host immune response’ [32] (www.ebi.ac.uk/QuickGO/term/GO:0030683, Figure 1 ). Concepts are hierarchically organized, and include a definition, synonyms, and lists of parents and children. Functional annotation in GO reflects the diverse effects of viral proteins on immune interference.

Figure 1

Immune-Evasion Concept Hierarchy.

Adapted from www.ebi.ac.uk/QuickGO/term/GO:0030683[32]. Most arrows indicate ‘is a’ relations, where the hierarchy is refined by specialization. Blue arrows indicate ‘part of’ relations, which relate to the symbiont process as parts to a whole.

Immune-Evasion Concept Hierarchy. Adapted from www.ebi.ac.uk/QuickGO/term/GO:0030683[32]. Most arrows indicate ‘is a’ relations, where the hierarchy is refined by specialization. Blue arrows indicate ‘part of’ relations, which relate to the symbiont process as parts to a whole. Modulating autophagy is an example of recent advances in this research area 33, 34. A growing number of reports describe how virus proteins and SLiMs therein modulate autophagy to promote various aspects of their life cycle 34, 35. Both GO and ViralZone have developed concepts to detail autophagy processes, including positive and negative regulation of xenophagy, the selective autophagy of pathogens 33, 34, 35. Understanding the functional roles of SLiMs can help identify related mechanisms or processes, or possibly identify knowledge gaps where SLiMs may be posited but not yet identified. An overview of SLiM functions may also help to prioritize which are of greatest potential for use or abuse when artificially added to modify protein function. Databases and discovery tools are also useful to identify known and new SLiMs.

Motif Databases

Identification of shared structural features across divergent protein families led to analysis and identification of modular protein domains. Protein domains are used to categorize protein function, and the InterPro database [36] (www.ebi.ac.uk/interpro) aggregates information at this within-protein level. The identification of protein domains led to recognition of SLiMs as compact, small-scale functional modules [37]. ELM is a database of eukaryotic motifs (Box 2 ), though its representation of viral–host interactions is not fully developed. At present, 264 ELM entries map to 648 GO terms. Most motifs map to multiple GO terms; the median is seven GO terms per motif and the maximum is 29 (MOD_Plk_1). Immune-associated function of the LIG_IRF3_LxIS_1 motif is involved in signal transduction responses to pathogen-associated molecular patterns; this motif maps to 25 GO terms, but none occur in the ViralZone vocabulary. In total, only three ELM entries utilize GO terms from ViralZone: LIG_BH_BH3_1, LIG_HCF-1_HBM_1, and LIG_Rb_pABgroove_1. These three motifs all map to the most general GO term, GO:0019048 (‘modulation by virus of host morphology or physiology’, a synonym of ‘virus–host interaction’). This underscores the prevalent mode of ELM motif discovery and annotation does not emphasize host–virus interactions, but rather systems-level interactions within eukaryotic cells. Thus, better integration of ViralZone–GO-term vocabulary with ELM or another domain-level representation of viral SLiMs is needed to promote potential utility for biotechnology. ELM assigns motif classes to one of six functional categories 69, 70: (i) CLV, proteolytic cleavage sites; (ii) DEG, degradation sites, part of polyubiquitination; (iii) DOC, docking sites, involved in protein recruitment but not directly targeted by an active site; (iv) LIG, ligand binding sites, primarily for protein–protein interactions; (v) MOD, post-translational modification sites; and (vi) TRG, targeting sites for subcellular localization. ELM has also spun-off several specialized databases: phospho.ELM for phosphorylation sites with experimental evidence [71], switches.ELM for conditional molecular switches, such as requiring that a site be modified [72], and iELM, with an emphasis on protein–protein interactions [73]. A detailed tutorial provides orientation for ELM use [74]. ELM documents each motif class with a concise description of its function. For example, one type of nuclear localization signal (NLS), TRG_NLS_Bipartite_1 the ‘classic bipartite NLS’, which binds to importin-α for nuclear pore transfer and is utilized by the PB2 protein of influenza A, is documented here: elm.eu.org/elms/TRG_NLS_Bipartite_1. The abstract and functional site descriptions summarize what is known about the motif. ELM provides a virus-specific summary at elm.eu.org/viruses.html and serves downloads in multiple formats from there or elm.eu.org/downloads.html. ELM presently contains 53 viral motif classes and 246 viral motif instances. Viral motif instances are distributed over the functional categories as follows: 4.5% CLV, 1.6% DEG, 4.1% DOC, 47.8% LIG, 26.5% MOD, and 15.5% TRG. Of the 246 instances, the most common (50, or 20.3%) are N-linked glycosylation sites from six distinct viruses. The most commonly represented viruses with motif instances in ELM are HIV-1 (11.8% of all entries) and severe acute respiratory syndrome-related coronavirus, that is, SARS-CoV (5.7%), though again, these are dominated by the N-linked glycosylation sites: 25 of 29 HIV-1 entries and 13 of 14 from SARS-CoV. More viral motif classes and instances will surely be added to ELM in time. Alt-text: Box 2

Integration of Resources

Despite not using the ViralZone ontology, ELM documents other motif classes with GO terms that refer to ‘viral’, ‘virus’, ‘immune’, or ‘immunity’ (Table 1 ). Because the focus is motif function in the host cell context, ELM does not directly indicate how viral immune interference results. Further, the relative lack of viral motifs in ELM does not indicate their absence in vivo, but rather the evidence-based requirement for ELM inclusion.

Table 1

Number of ELM Viral Motif Instances with Virus-Related or Immune-Related GO Terms

Regular expression	Viral instances	Motif	Role
a	15	LIG_Rb_LxCxE_1	Binds retinoblastoma B pocket
.P[TS]AP.	13	LIG_PTAP_UEV_1	UEV domain binding PTAP motif
[LM]YP.[LI]	11	LIG_LYPXL_S_1	Endosomal sorting of membrane proteins
[FVILMY].FG[DES]F	8	LIG_G3BP_FGDF_1	Binds Ras GTPase activating SH3 domain
b	5	LIG_KLC1_WD_1	Binds kinesin light chain TPR region
[LM]YP.[LI]	2	LIG_LYPXL_L_2	Endosomal sorting of membrane proteins
[DE]H.Y	1	LIG_HCF-1_HBM_1	Binds transcriptional coactivator HCF-1
..L.I(S)	1	LIG_IRF3_LxIS_1	Interferon regulatory factor 3 binding site
.[VILM]..[LM][FY]D.	1	LIG_Rb_pABgroove_1	Binds retinoblastoma AB groove
c	0	LIG_BH_BH3_1	Binds BH domains to inhibit apoptosis
EP[IL]Y[TAG]	0	LIG_CSK_EPIYA_1	Binds C-terminal Src kinase SH2 domain

([DEST]|ˆ).{0,4}[LI].C.E.{1,4}[FLMIVAWPHY].{0,8}([DEST]|$)

[LMTAFSRI][ˆKRG]W[DE].{3,5}[LIVMFPA]

....[LIFVYMTE][ASGC][ˆP]{2}L[ˆP]{2}[IVMTL][GACS][D][ˆP][FVLMI].

Number of ELM Viral Motif Instances with Virus-Related or Immune-Related GO Terms ([DEST]|ˆ).{0,4}[LI].C.E.{1,4}[FLMIVAWPHY].{0,8}([DEST]|$) [LMTAFSRI][ˆKRG]W[DE].{3,5}[LIVMFPA] ....[LIFVYMTE][ASGC][ˆP]{2}L[ˆP]{2}[IVMTL][GACS][D][ˆP][FVLMI]. Related to the earlier observation, a review of how viruses use SLiMs to interfere with host cells [5] lists 52 examples that represent viral mimicry of host SLiMs (Table 2 ). Only 70% of these have corresponding ELM entries, though the SLiMs are known. The remaining 30% indicate that ELM does not fully capture all known viral motifs. This strong requirement by ELM for evidence-based motif classes and instances is not strictly a drawback. Indeed, the ELM creators are very aware that computational analysis alone is error prone and can yield misleading outcomes. In [38], they discuss this issue in depth, and recommend a workflow for SLiM discovery that culminates in experimental validation, whether in vivo or in vitro. Working with viral–host systems adds layers of difficulty to experimental motif validation, so it should not be surprising or to the detriment of available information resources that viral SLiMs are less thoroughly documented.

Table 2

Examples of Viral Proteins That Mimic Host SLiMs, updated from [5]

Host target	Viral protein	Virus	Motifa	ELM
CDH1	E1	BPV	KEN	DEG_APCC_KENbox_2
Phosphodegron FBW7	LT	SV40	TPxxE	DEG_SCF_FBW7_1
Phosphodegron βTrCP1	Vpu	HIV	DSGxxS	DEG_SCF_TrCP1_1
SIAH1	ORF45	KSHV	PxAxV	DEG_SIAH_1
Tankyrase	EBNA1	EBV	RxxPDG	DOC_ANK_TNKS_1
Cyclins	E1	HPV	RxLF	DOC_CYCLIN_1
PP1	γ134.5	HSV	RVxF	DOC_PP1
Calcineurin	p12	HTLV1	SPxLxLT	DOC_PP2B_1
USP7	EBNA1	EBV	PxE[ˆP]xS[ˆP]	DOC_USP7_MATH_2
14-3-3	Rep68	AAV	RSxSxP	LIG_14-3-3_CanoR_1
Clathrin heavy chain	HDAg-L	HDV	LFxAD	LIG_Clathr_ClatBox_1
TR	E1A	Adenovirus	LxxLIxxxL	LIG_CORNRBOX
CtBP SDB	E1A	Adenovirus	PxDLS	LIG_CtBP_PxDLS_1
Dynein light chain 8	P	Rabies	KxTQT	LIG_Dynein_DLC8_1
ALIX	Gag	HIV	LYPxxxL	LIG_LYPXL_L_2
BS69	EBNA2	EBV	PxLxP	LIG_MYND_1
PDZ domain	E6	HPV	TxV$	LIG_PDZ_Class_1
Tsg101	Gag	HIV	PTAP	LIG_PTAP_UEV_1
RB (pocket region)	E7	HPV	LxCxE	LIG_RB_LxCxE_1
RB (E2F competition)	E1A	Adenovirus	LxxLYD	LIG_RB_pABgroove_1
Integrin α5β3	VP1	FMDV	RGD	LIG_RGD
SH2 domain	stpC	HVS	YxxV	LIG_SH2
SH3 domain	Nef	HIV	PxxP	LIG_SH3_2
TRAF2	LMP1	EBV	PxQxT	LIG_TRAF2_1
TRAF6	U(L)37	HSV	PxExxE	LIG_TRAF6
Syk	LMP2A	EBV	Yxxϕ Yxxϕ	LIG_TYR_ITAM
Farnesyltransferase	HDAg-L	HDV	Cxxx$	MOD_CAAXbox
Oligosaccharyltransferase	E1	HCV	N[ˆP][ST]	MOD_N-GLC_1
N-myristoyltransferase	G9R	Vaccinia	ˆGxxxS	MOD_Nmyristoyl
PIAS1	IE1	HCMV	IKxE	MOD_SUMO
AP-2μ	Env	SIV	Yxxϕ	TRG_ENDOCYTIC_2
COP1	E3	Adenovirus	KK	TRG_ER_diLys_1
ERD2	ctxA	Phage CTX	KDEL	TRG_ER_KDEL_1
AP-1	Nef	HIV	ExxxLL	TRG_LysEnd_APsAcLL_1
NESc	Rev	HIV	Ψ-rich	TRG_NES_CRM1_1
NLS, bipartite	PB2	Influenza	KR-rich	TRG_NLS_Bipartite_4
NLS, monopartite	LT	SV40	KR-rich	TRG_NLS_MonoCore_1
CtBP NDB	E1A	Adenovirus	RxxTG
p300/CBP	E1A	Adenovirus	FxDxxxL
Caspases	NS1	ADV	DxxD↓
NEDD4	VP40	Ebola	PPxY
SEC24C	VP40	Ebola	LxMVI
JAK	LMP1	EBV	PxxPxP
TRADD	LMP1	EBV	YYD$
Elongin C	Vif	HIV	SLxxxLxxxI
PACS1	Nef	HIV	EEEE
HCF	VP16	HSV	EHxY
NoLS	γ134.5	HSV	KR-rich
Furin	Spike	IBV	R↓S
PKR	NS1	Influenza	IMxKN
H2A–H2B	LANA	KSHV	MxLRSG
Palmitoyl acyltransferase	G	Rabies	CC

The down arrow (↓) indicates a cleavage site; φ (phi) represents a site occupied by a hydrophobic [VILFWYM] and Ψ (Psi) an aliphatic [VILM] AA. Other motif symbols are regular expression terms. For example, ˆ indicates sequence N terminus, but in brackets indicates negation. That is, [ˆP] indicates any AA except proline. Motifs stated do not necessarily correspond to the general motif patterns currently in ELM.

Examples of Viral Proteins That Mimic Host SLiMs, updated from [5] The down arrow (↓) indicates a cleavage site; φ (phi) represents a site occupied by a hydrophobic [VILFWYM] and Ψ (Psi) an aliphatic [VILM] AA. Other motif symbols are regular expression terms. For example, ˆ indicates sequence N terminus, but in brackets indicates negation. That is, [ˆP] indicates any AA except proline. Motifs stated do not necessarily correspond to the general motif patterns currently in ELM. Searching arbitrary sequences for motif instances is computationally straightforward. Box 3 provides an example of motif searching with ELM, which might facilitate comparative analysis of two related proteins from different species of human herpesvirus (HSV). Resources such as InterPro and UniProt are able to perform similar assessments, but give broader, domain-level representations with less functional detail than the SLiM searches enabled by ELM. Reports in the primary literature take a different approach, by marking SLiMs in a protein alignment, which includes orthologues to mark conservation (e.g., Figure 3 in [39]). The ELM-generated report combines predicted SLiMs with information from annotated domains and local disorder predictions, for a perspective that complements the other approaches. HSV-1 and HSV-2 virulence factor ICP34.5 assists in viral immune evasion by molecular mimicry. HSV-1 neurovirulence protein ICP34.5, encoded by the γ34.5 gene, initiates immune interference by binding and sequestering cellular proteins that would stimulate autophagy, translational arrest, and type I interferon responses. HSV-1 ICP34.5 binds TANK-binding kinase 1 (TBK1) to prevent type I interferon induction [75], Beclin-1 to prevent autophagy [76], and both PP1α and eIF2α to overcome translational arrest [77]. HSV-2 γ34.5 contains an intron not present in HSV-1, and up to four isoforms of HSV-2 ICP34.5 are known [78]. Full-length HSV-2 ICP34.5 has conserved PP1α and eIF2α-binding domains, but lacks TBK1 and Beclin-1 binding domains [79]. Additional HSV-1 motifs influence intracellular localization [80], virion maturation, and egress [81], not yet characterized in HSV-2. HSV-2 is recognized as more virulent than HSV-1, but both can cause neuropathology, including viral encephalitis and meningitis [82]. To attenuate virulence, ICP34.5 is routinely deleted or inactivated when making HSV-1 constructs for oncolytic therapy [82]. Both are the same length and share domain structures, and partially share SLiM compositions (Figure I). Identifying differences in SLiMs from each could provide clues for more detailed experimental investigations to understand ICP34.5 virulence determinants and host protein targets. Searching ELM for motif instances in HSV-2 ICP34.5 (UniProt P28283) gives 127 instances of 39 motif classes, before filtering to exclude globular domains and other likely false hits. Filtering leaves 97 instances, which cover almost all of this 261 AA protein except for a predicted globular domain of 73 AAs. A similar query for the HSV-1 protein (UniProt P08353) gives 158 instances of 42 distinct motifs, and 114 instances of 36 motifs when filtered. Most of the ELM motif hits are probably false positives, if only because the motif expressions overlap. The patterns (.RK)|(RR[ˆKR]), [KR]R., and R...[KR]R., RRGPRRR, MSRRR, and RR all match sites in the first tenAA positions of the HSV-2 query protein sequence, a low-complexity region dominated by positively charged AAs. The HSV-1 version contains a 30 amino acid ATP triplet repeat at sites 160–190, not found in the HSV-2 protein. ELM matches four motifs here, only one of which (DOC_CKS1_1) is unique to HSV-1. Viral SLiM Comparisons Using ELM. (A) ELM-generated plots of motif locations in neurovirulence factor ICP34.5 from HSV-1 and HSV-2 74, 83. (B) Summary of motifs found in one or both. Alt-text: Box 3 Clearly, false positives are inevitable among SLiM search results. This makes it necessary to filter for the most significant and informative outcomes. This leads to consideration of in silico (computational) methods for SLiM evaluation. A highly recommended, authoritative review of SLiM discovery techniques, from an author of the SLiMSuite software package, discusses motif identification techniques in depth [40].

Motif Discovery

Methods for SLiM discovery can be divided into two broad classes: (i) de novo discovery of new SLiMs, and (ii) instance prediction to find new occurrences of known SLiMs. There are currently at least eight software packages available to discover new SLiMs and 40 packages (25 stand-alone programs or servers and two software suites that consist of multiple tools: SLiMSuite [40], which includes ten utilities, and MEME [41], which consists of five tools for SLiM instance detection). Though MEME was developed for discovery of DNA sequence motifs, it generates ungapped, profile-based motifs using the expectation–maximization (EM) algorithm. No single method is inherently better than the rest, but the choice of which to use depends on several factors, such as the input sequence data, whether one sequence, an alignment, or a collection of nonhomologous sequences [40]. To illustrate the diversity of motif discovery methods, this section mentions only a handful of the software tools available (Table 3 ). Readers seeking to learn more about the full set of alternatives are strongly encouraged to consult [40], particularly Tables 1, 4, and 5 therein. Another helpful resource (Table 1 in [38]) lists online motif discovery bioinformatics services.

Table 3

Computational Resources for SLiM Discovery

Program	Website	Refs
SLiMSuite	https://github.com/slimsuite/SLiMSuite	[40]
SLiMFinder	https://www.slimsuite.unsw.edu.au/servers/slimfinder.php	[45]
DILIMOT	https://dilimot.russelllab.org	46, 47
GLAM2	https://meme-suite.org/tools/glam2	[48]
NestedMica	https://www.mybiosoftware.com/nestedmica-0-8-0-motif-finder.html	49, 50
ShettiMotif	https://sites.google.com/site/haithamsobhy/ShettiMotif_V1.zip	[56]

Computational Resources for SLiM Discovery

De Novo Methods

Several alternative approaches for discovery of new motifs have been advanced. Edwards and Palopoli [40] review the alternatives in depth, discussing their merits and drawbacks. Briefly, they can be divided into alignment-based and alignment-free methods. An alignment-based approach looks for conserved sites among homologous sequences, but can be misled by high sequence conservation in globular domains. A program called SLiMPrints works around this with a specialized approach to model substitutions [42]. SLiMPrints uses a statistical model of relative local conservation, which looks for clusters of overly constrained sites in a window of about 30 AAs, using IUPred scores (intrinsically unordered prediction; see later) to weigh sites in intrinsically disordered protein regions more heavily than sites in globular (ordered) regimes [42]. In contrast, alignment-free methods look for enrichment of amino acid patterns in proteins that are expected by other means to perform similar motif-related roles, for example, by GO category annotations or protein–protein interaction (PPI) data, that is, via databases that capture experimental evidence for protein colocalization and functional interactions. An important caveat is that to assume such sequences are independent could yield spurious enrichment of shared patterns, so alignment-free methods need to compensate for evolutionary constraints at the domain level, rather than for full-length homologous proteins. The development of such corrections and their relative advantages are detailed in [40]. Some programs (e.g., SLiMDisc [43], SLiMFinder 44, 45, and DILIMOT 46, 47) produce regular expressions that compensate for phylogenetic relatedness, while others (MEME suite, GLAM2 [48], and NestedMICA 49, 50) produce probabilistic profiles. For more discussion of these and issues of concern for computational motif discovery, see [40].

Instance Detection

Filtering methods control high false positive rates from SLiM instance detection. Structural information, whether known or predicted, can be used for filtering. Box 3 illustrates how ELM filters results to exclude a region predicted to fold as globular protein. These predictions were made by SMART (simple modular architecture research tool) [51] and Pfam [52] domain matches, corroborated by GlobPlot [53]. Another widely used approach is to identify regions of local disorder, where protein structure is not clearly defined, making that region accessible to interact with other proteins. IUPred [54] is commonly used for this task, though the choice of parameter settings and how to interpret results varies. ELM results include an IUPred disorder score and a simple cutoff of 0.5 to define the disorder transition. Above this value, local protein structure is considered accessible for interaction with other proteins. Scoring schemes filter for statistical enrichment of motif instances. An approach of filtering by homology [55] seems inappropriate for use to detect virus interactions with host proteins, as it may exclude nonhomologous regions with motifs that do interact, yielding false negatives. Regardless, failure to consider evolutionary relatedness among sequences being searched could introduce bias due to common ancestry, rather than independence, among sequences. A simple approach to instance prediction is a stand-alone program called ShettiMotif [56]. It was used to scan 2251 protein sequences from 11 Poxviridae genomes (an average of 205 proteins per poxvirus) for low-complexity regions and regular expressions defined by PROSITE. The approach compared numbers of proteins per genome that carry each motif, and doubtlessly includes many motif instances that are not functional as SLiMs. Also, shorter motifs occur more frequently than longer motifs 3, 27, partly due to chance alone. Regardless, systematic error may be considered a source of background noise across the large number of proteins in 11 viral proteomes, each having different host specificities, to enable somewhat meaningful comparisons, in such a ‘statistical genomics’ approach [3]. The comparisons could be more meaningful if false positive motif instances were reduced. Becerra et al. developed another approach to instance counting [57], which involves comparison with a null distribution from permuting primary sequence and testing for presence of the motif in the permuted sequences. A motif is considered rare and therefore significantly unlikely to occur by chance if it is present at or below some cutoff frequency. Restricting the sequence region that is used for permutation testing, such as by use of structural considerations, can further focus the search. Indeed, such a hybrid filtering approach was described recently and evaluated on the HIV-1 proteome [57]. Following methods described in an earlier study [58], Becerra et al. used IUPred with a modified, window-based scoring procedure to identify intrinsically disordered protein regions, and tested for statistical rarity below 1% of 1000 shuffled variants. The approach further considered conservation above 70% in a set of aligned sequences, though combining three filtering criteria was too stringent and excluded all motif candidates [57].

Motif-Specific Databases

While algorithmic approaches seek to identify a broad range of SLiM types, more specialized resources have emerged to track the distribution of a particular SLiM in viral proteins. For example, iLIR@viral is a web resource dedicated to detecting LIR motif-containing proteins in viruses [59]. LC3-interacting regions (LIR motifs) are SLiMs that mediate protein–protein interactions involved in autophagy, as used by influenza A virus M2 protein to subvert autophagy and maintain virion stability [60]. Using curated text mining analysis and position-specific scoring matrices, iLIR@viral analyzed 16 609 reviewed viral sequences available from UniProt across 2569 individual viral species and found that 15 589 viral sequences contain LIR motifs. While many predicted instances may represent false positives, the enrichment of LIR motifs in viral sequences is consistent with viral adaptation to host xenophagy [35]. Curiously, ELM currently lists the LIR motif as a candidate, rather than an accepted motif class.

Concluding Remarks

Embedding SLiMs into engineered constructs may enable specific effects on cellular immune processes, for applications that include targeted drug delivery, pathogen-specific adjuvants, potent and broadly effective immunogens, transformational medical countermeasures, and improved design of vectors for gene therapy. SLiM modularity may enable easy ways to reprogram protein function with a few localized modifications. To realize the potential utility of SLiMs in synthetic biology, more research is needed to expand and integrate our collection of knowledge on viral SLiMs (see Outstanding Questions). Detecting SLiMs in variant sequences may help to identify functional innovation or changes in virulence, in a manner that does not rely strictly on functional assessment at the whole-gene level, to identify how sequence-specific variation may interact with host responses. This may be particularly useful and important to understand new variants and assess the risk that they may spread and cause harmful effects on human health or agricultural interests. Such knowledge is needed in an era where synthetic biology may introduce new risks for biological error and biological terror. Detecting and understanding SLiM variants can help to reduce such risks and identify newly emerging threats to global health and security because watch lists for harmful organisms to ensure public safety by preventing access to select known risks may be inadequate 21, 22, 23. SLiMs in viral proteins can interact in many different ways with host proteins to modulate immune responses. A motif may be necessary but not sufficient for any inferred function. The simplest case is where a viral SLiM interacts directly with a host protein to yield an immunomodulated phenotype. More elaborate cases are known, such as the multifunctional proteins E1A (EBV), Nef (HIV-1), and ICP34.5 (HSV). Computational prediction of SLiM classes and new instances is a process, which involves experimental confirmation and validation. High-throughput methods for experimental assessment of protein interactions are useful to validate computational predictions 38, 61, and more assays are needed to evaluate functional and phenotypic effects of adding or deleting SLiMs. What incentives or community-based activities would best enable integration of specialized viral gene ontologies with databases of motif classes and instances? To validate new SLiMs, what are the needed procedures, which simultaneously minimize spurious results and make the most promising candidates available to use and discover new instances? How prevalent are immunomodulatory motifs in viruses, relative to the prevalence of entire viral proteins dedicated to this specialized function? To what extent can viral immunomodulatory function be localized to a motif or domain, or is the larger whole-protein context necessary for function? What types of motif interactions are most common and important for viral immune modulation? Are they accurately reflected in the current literature and databases, or do many new types of motifs still await discovery? Is there sound support for an enrichment (or scarcity) of particular motif types in certain viral classes (i.e., Baltimore classification)? If so, what does this reveal about commonalities among viral replication strategies and potential for broad-spectrum antiviral treatments? What host countermeasures are involved with overcoming viral immune interference, either in general against many viruses, or in particular, against specific taxa? Do these countermeasures explain some of the crosstalk among interactions in antiviral innate signaling pathways? The distribution of glycosylation sites on enveloped viruses can be extremely variable, even within one host. Such variation may impede bioengineering specific constructs. How are SLiMs influenced by dynamic evolutionary processes? What fitness costs are associated with SLiM evolution? What specific constraints limit SLiM evolvability? What strategies are most effective to advance knowledge of viral immunomodulatory SLiMs in the design of vaccines and therapies to promote global health? For example, can some viral peptides be useful as adjuvants? Alt-text: Outstanding Questions

81 in total

1. ELM server: A new resource for investigating short functional sites in modular eukaryotic proteins.

Authors: Pål Puntervoll; Rune Linding; Christine Gemünd; Sophie Chabanis-Davidson; Morten Mattingsdal; Scott Cameron; David M A Martin; Gabriele Ausiello; Barbara Brannetti; Anna Costantini; Fabrizio Ferrè; Vincenza Maselli; Allegra Via; Gianni Cesareni; Francesca Diella; Giulio Superti-Furga; Lucjan Wyrwicz; Chenna Ramu; Caroline McGuigan; Rambabu Gudavalli; Ivica Letunic; Peer Bork; Leszek Rychlewski; Bernhard Küster; Manuela Helmer-Citterich; William N Hunter; Rein Aasland; Toby J Gibson
Journal: Nucleic Acids Res Date: 2003-07-01 Impact factor: 16.971

2. A computational strategy for the prediction of functional linear peptide motifs in proteins.

Authors: Holger Dinkel; Heinrich Sticht
Journal: Bioinformatics Date: 2007-10-31 Impact factor: 6.937

Review 3. Convergent evolution and mimicry of protein linear motifs in host-pathogen interactions.

Authors: Lucía Beatriz Chemes; Gonzalo de Prat-Gay; Ignacio Enrique Sánchez
Journal: Curr Opin Struct Biol Date: 2015-04-02 Impact factor: 6.809

4. SLiMPrints: conservation-based discovery of functional motif fingerprints in intrinsically disordered protein regions.

Authors: Norman E Davey; Joanne L Cowan; Denis C Shields; Toby J Gibson; Mark J Coldwell; Richard J Edwards
Journal: Nucleic Acids Res Date: 2012-09-12 Impact factor: 16.971

5. Systematic discovery of new recognition peptides mediating protein interaction networks.

Authors: Victor Neduva; Rune Linding; Isabelle Su-Angrand; Alexander Stark; Federico de Masi; Toby J Gibson; Joe Lewis; Luis Serrano; Robert B Russell
Journal: PLoS Biol Date: 2005-11-15 Impact factor: 8.029

6. Prediction of virus-host protein-protein interactions mediated by short linear motifs.

Authors: Andrés Becerra; Victor A Bucheli; Pedro A Moreno
Journal: BMC Bioinformatics Date: 2017-03-09 Impact factor: 3.169

7. The ins and outs of eukaryotic viruses: Knowledge base and ontology of a viral infection.

Authors: Chantal Hulo; Patrick Masson; Edouard de Castro; Andrea H Auchincloss; Rebecca Foulger; Sylvain Poux; Jane Lomax; Lydie Bougueleret; Ioannis Xenarios; Philippe Le Mercier
Journal: PLoS One Date: 2017-02-16 Impact factor: 3.240

8. UniProt: the universal protein knowledgebase.

Authors:
Journal: Nucleic Acids Res Date: 2016-11-29 Impact factor: 16.971

9. Discovering sequence motifs with arbitrary insertions and deletions.

Authors: Martin C Frith; Neil F W Saunders; Bostjan Kobe; Timothy L Bailey
Journal: PLoS Comput Biol Date: 2008-05-09 Impact factor: 4.475

10. The eukaryotic linear motif resource ELM: 10 years and counting.

Authors: Holger Dinkel; Kim Van Roey; Sushama Michael; Norman E Davey; Robert J Weatheritt; Diana Born; Tobias Speck; Daniel Krüger; Gleb Grebnev; Marta Kuban; Marta Strumillo; Bora Uyar; Aidan Budd; Brigitte Altenberg; Markus Seiler; Lucía B Chemes; Juliana Glavina; Ignacio E Sánchez; Francesca Diella; Toby J Gibson
Journal: Nucleic Acids Res Date: 2013-11-07 Impact factor: 16.971

11 in total

1. Comparative Analysis of Structural Features in SLiMs from Eukaryotes, Bacteria, and Viruses with Importance for Host-Pathogen Interactions.

Authors: Heidy Elkhaligy; Christian A Balbin; Jessica Siltberg-Liberles
Journal: Pathogens Date: 2022-05-15

2. ELM-the eukaryotic linear motif resource in 2020.

Authors: Manjeet Kumar; Marc Gouw; Sushama Michael; Hugo Sámano-Sánchez; Rita Pancsa; Juliana Glavina; Athina Diakogianni; Jesús Alvarado Valverde; Dayana Bukirova; Jelena Čalyševa; Nicolas Palopoli; Norman E Davey; Lucía B Chemes; Toby J Gibson
Journal: Nucleic Acids Res Date: 2020-01-08 Impact factor: 16.971

3. Post-translational modifications of Drosophila melanogaster HOX protein, Sex combs reduced.

Authors: Anirban Banerjee; Anthony Percival-Smith
Journal: PLoS One Date: 2020-01-13 Impact factor: 3.240

Review 4. Pneumoviral Phosphoprotein, a Multidomain Adaptor-Like Protein of Apparent Low Structural Complexity and High Conformational Versatility.

Authors: Christophe Cardone; Claire-Marie Caseau; Nelson Pereira; Christina Sizun
Journal: Int J Mol Sci Date: 2021-02-03 Impact factor: 5.923

5. The Eukaryotic Linear Motif resource: 2022 release.

Authors: Manjeet Kumar; Sushama Michael; Jesús Alvarado-Valverde; Bálint Mészáros; Hugo Sámano-Sánchez; András Zeke; Laszlo Dobson; Tamas Lazar; Mihkel Örd; Anurag Nagpal; Nazanin Farahi; Melanie Käser; Ramya Kraleti; Norman E Davey; Rita Pancsa; Lucía B Chemes; Toby J Gibson
Journal: Nucleic Acids Res Date: 2022-01-07 Impact factor: 16.971

Review 6. Dynamic, but Not Necessarily Disordered, Human-Virus Interactions Mediated through SLiMs in Viral Proteins.

Authors: Heidy Elkhaligy; Christian A Balbin; Jessica L Gonzalez; Teresa Liberatore; Jessica Siltberg-Liberles
Journal: Viruses Date: 2021-11-26 Impact factor: 5.048

7. Unique peptide signatures of SARS-CοV-2 virus against human proteome reveal variants' immune escape and infectiveness.

Authors: Vasileios Pierros; Evangelos Kontopodis; Dimitrios J Stravopodis; George Th Tsangaris
Journal: Heliyon Date: 2022-04-04

8. Short Linear Motifs (SLiMs) in "Core" RxLR Effectors of Phytophthora parasitica var. nicotianae: a Case of PpRxLR1 Effector.

Authors: Jane Chepsergon; Celiwe Innocentia Nxumalo; Brenda S C Salasini; Aquillah M Kanzi; Lucy Novungayo Moleleki
Journal: Microbiol Spectr Date: 2022-04-11

9. In Depth Exploration of the Alternative Proteome of Drosophila melanogaster.

Authors: Bertrand Fabre; Sebastien A Choteau; Carine Duboé; Carole Pichereaux; Audrey Montigny; Dagmara Korona; Michael J Deery; Mylène Camus; Christine Brun; Odile Burlet-Schiltz; Steven Russell; Jean-Philippe Combier; Kathryn S Lilley; Serge Plaza
Journal: Front Cell Dev Biol Date: 2022-05-26

Review 10. PPI Modulators of E6 as Potential Targeted Therapeutics for Cervical Cancer: Progress and Challenges in Targeting E6.

Authors: Lennox Chitsike; Penelope J Duerksen-Hughes
Journal: Molecules Date: 2021-05-18 Impact factor: 4.411