| Literature DB >> 35098694 |
András Zeke1, Éva Schád1, Tamás Horváth1, Rawan Abukhairan1, Beáta Szabó1, Agnes Tantos1.
Abstract
Recent efforts to identify RNA binding proteins in various organisms and cellular contexts have yielded a large collection of proteins that are capable of RNA binding in the absence of conventional RNA recognition domains. Many of the recently identified RNA interaction motifs fall into intrinsically disordered protein regions (IDRs). While the recognition mode and specificity of globular RNA binding elements have been thoroughly investigated and described, much less is known about the way IDRs can recognize their RNA partners. Our aim was to summarize the current state of structural knowledge on the RNA binding modes of disordered protein regions and to propose a classification system based on their sequential and structural properties. Through a detailed structural analysis of the complexes that contain disordered protein regions binding to RNA, we found two major binding modes that represent different recognition strategies and, most likely, functions. We compared these examples with DNA binding disordered proteins and found key differences stemming from the nucleic acids as well as similar binding strategies, implying a broader substrate acceptance by these proteins. Due to the very limited number of known structures, we integrated molecular dynamics simulations in our study, whose results support the proposed structural preferences of specific RNA-binding IDRs. To broaden the scope of our review, we included a brief analysis of RNA-binding small molecules and compared their structural characteristics and RNA recognition strategies to the RNA-binding IDRs. This article is categorized under: RNA Structure and Dynamics > RNA Structure, Dynamics, and Chemistry RNA Interactions with Proteins and Other Molecules > Protein-RNA Recognition RNA Interactions with Proteins and Other Molecules > Small Molecule-RNA Interactions.Entities:
Keywords: IDP-RNA complex; RNA recognition; RNA structure; intrinsically disordered protein; protein-RNA binding
Mesh:
Substances:
Year: 2022 PMID: 35098694 PMCID: PMC9539567 DOI: 10.1002/wrna.1714
Source DB: PubMed Journal: Wiley Interdiscip Rev RNA ISSN: 1757-7004 Impact factor: 9.349
Structural classification of intrinsically disordered peptide–RNA complexes found in the Protein Data Bank
| Protein structure upon binding | Target RNA 3D structure | Detailed description | Protein sequence features | Example structures (PDB entries) | References |
|---|---|---|---|---|---|
| Turn‐forming (including beta‐turns), loops or random coil | Distorted major (or minor) groove of a double helix | Either random coils or turns sometimes configured almost as beta‐sheets | Charges (mostly Arg) intermixed with structure breaking residues (e.g., Gly) | 1MNB, 1ZBN, 2KX5, 2KDQ, 2A9X, 6D2U, 1BIV | (Puglisi et al., |
| Loop capping at the end of a double helix | A subcase of the above, but also with added loop capping | Charges, structure breaking plus an aromatic position | 484D | (Ye et al., | |
| Structural transitions: Duplex‐to quadruplex | Short, sharp turns, probably also found at other structural transitions | RGG regions and similarly flexible motifs | 2LA5, 5DE5 | (Phan et al., | |
| Quadruplex capping | Use of planar pi‐stacking of aromatic side chains | Aromatics: Trp, Tyr, or Phe present with Pro | 2RU7 | (Hayashi et al., | |
| Partly or completely alpha‐helical | Distorted major (or minor) groove of a double helix | Normally bind to the large groove of the RNA in an alpha‐helical conformation | Very high Arg and Lys content (including R/E/S‐rich regions) | 1ETG, 1ULL, 1G70, 1EXY, 1I9F | (Battiste et al., |
| Loop capping at the end of a double helix | Binding the groove with a helix and capping it with pi‐stacking | Numerous charges (Lys, Arg) with 1 aromatic (e.g., Trp) | 1QFQ, 1A4T, 1NYB, 1HJI | (Schärpf et al., | |
| Structural transitions: Stem‐stem junctions | Complex geometry, with both helical as well as non‐helical segments | In addition to charges and pi‐stacking amino acids: nonhelical and helix breaking | 1XOK | (D'Souza & Summers, | |
| Quadruplex capping | Helix with a very flat, hydrophobic side contacting the RNA | Small amino acids on one side (e.g., Gly, Ala), hydrophilic on the other | 2N21, 6Q6R (crystal with DNA only) | (Heddi et al., |
Note: Alpha‐helical or turn‐type motifs can also bind to at least four different RNA structures in each case.
FIGURE 1Samples structures of a few suggested structural cases: (a) Turns/loops binding to an RNA structural transition: PDB 2LA5. (b) Charged helices within a distorted groove: PDB 1ULL. (c) Quadruplex cappings: PDB 2RSK
FIGURE 2Rulesets governing the biochemistry of glycine‐rich (loop‐like) RNA binding regions. These intrinsically disordered elements are contacting the RNA with an amino acid capable of pi‐stacking (Phe, Tyr, Arg), H‐bonding (Tyr, Arg), or electrostatic interactions (Arg). To yield the proper side chain geometry, highly flexible residues (preferably Gly, sometimes Ser or other) need to be intercalated at both flanks to the central amino acid. In addition, the physical spacing of nucleobases versus the smaller protein chain calls for more than one such intervening amino acid for optimal RNA–protein contacts
FIGURE 3Modeling of two RNA–protein structures. The RBMX protein has both SR‐rich and RGG‐type regions, out of which the latter docked to the model complex RNA (WEC) published as a binding partner (a). Zooming into the RNA–protein contacts show that four arginine residues play a key role through establishing numerous polar contacts to both the sugar‐phosphate backbone and the nucleobases of the complex RNA (b). Although its exact RNA partners are unknown, the Arg/Glu/Ser rich segments of the RNA binding disordered segments of LUC7L3 are likely to be helical, and dock stably into a model RNA molecule binding helical peptides (c)
FIGURE 4Atomic‐level details of RNA–protein interactions. The importance of arginine (Arg) lies in the wealth of molecular interactions it can establish: Electrostatic interactions, pi–pi stacking as well as dedicated H‐bonds, preferably toward the Hoogsteen edge of guanosine (G) nucleobases (a). It can bind the sugar‐phosphate backbone or nucleobases or even both simultaneously (b). Optimal coordination of nucleobases can only be achieved through consecutive amino acids that provide both pi‐stacking and H‐bonding interactions, as shown by examples from the protein data bank (c)
FIGURE 5Representative structures of structurally different classes of DNA binding disordered protein segments: (a) Double helix binding helical/helix‐containing motif (basic leucine zipper (bZIP): PDB 1JNM), (b) Loop‐like binding (AT‐hook: PDB 2EZD). (c) Capping motif (G‐quadruplex capping: PDB 6Q6R) [Correction added on 9 February 2022 after first online publication: Figure 5 has been updated; Figure 1 was incorrectly published as Figure 5.]
FIGURE 6Examples of chemical moieties found in synthetic small molecule RNA ligands (a) compared to amino acid side chains in RNA binding proteins and other biomolecules (b). Guanidine and amidine groups fulfill a special role in both categories, capable of pi‐stacking, charge interactions, and H‐bonding to the RNA nucleobases at the same time
FIGURE 7Binding sites of small molecules (blue), disordered peptides (red) or both (magenta) observed in published PDB structures (* marks examples that bind RNA and DNA similarly, and were crystallized with the latter
FIGURE 8The role of multivalency in disordered protein–RNA interactions. Proteins carrying numerous RNA‐binding tandem repeats can engage multiple RNA molecules at their structurally matching sites, to form stochastic complexes (a). These complexes endow the cells with numerous advantages (b): They allow the formation of organelles through liquid–liquid phase separation. Specific, repetitive proteins can recruit specific sets of RNA into these complexes, leaving others out. RNAs and disordered proteins can exist in a symbiotic relationship, properly folding only in the presence of their partner. Last but not least, these complexes can contain enzymatic components, processing the recruited nucleic acids