Literature DB >> 26900662

Transcriptome-scale RNase-footprinting of RNA-protein complexes.

Zhe Ji1,2, Ruisheng Song1, Hailiang Huang2,3, Aviv Regev2,4,5, Kevin Struhl1.   

Abstract

Ribosome profiling is widely used to study translation in vivo, but not all sequence reads correspond to ribosome-protected RNA. Here we describe Rfoot, a computational pipeline that analyzes ribosomal profiling data and identifies native, nonribosomal RNA-protein complexes. We use Rfoot to precisely map RNase-protected regions within small nucleolar RNAs, spliceosomal RNAs, microRNAs, tRNAs, long noncoding (lnc)RNAs and 3' untranslated regions of mRNAs in human cells. We show that RNAs of the same class can show differential complex association. Although only a subset of lncRNAs show RNase footprints, many of these have multiple footprints, and the protected regions are evolutionarily conserved, suggestive of biological functions.

Entities:  

Mesh:

Substances:

Year:  2016        PMID: 26900662      PMCID: PMC4824641          DOI: 10.1038/nbt.3441

Source DB:  PubMed          Journal:  Nat Biotechnol        ISSN: 1087-0156            Impact factor:   54.908


Target sites for individual RNA-binding proteins have been identified on a transcriptome scale using CLIP-seq (crosslinking and immunoprecipitation-seq) or PAR-CLIP (photoactivable ribonucleoside-enhanced CLIP) techniques[1,2]. Two transcriptome-scale methods for more comprehensive identification of RNA-protein interactions in vivo have been described. One approach uses UV crosslinking of cells grown in the presence of 4-thiouridine[3,4], but this is limited to short-range interactions of appropriate stereochemistry to permit UV crosslinking. The other approach involves RNase footprinting of RNA crosslinked with formaldehyde[5]. Both transcriptome-scale approaches map the regions of RNA bound by proteins in the context of the RNA-protein complex, but they do not identify the specific proteins involved. In addition, both methods identify bound regions on a population basis, not at the levels of individual molecules, and hence cannot distinguish between different complexes associated with the same region of RNA. Sequencing of ribosome-protected RNA, known as ribosome profiling, has been used widely to examine translation in vivo[6]. In this procedure, cell extracts are treated with RNase I to degrade all non-protected RNA, and the resulting material is subjected to velocity sedimentation through sucrose to enrich for material > 7–10S (corresponds to a 100–200 kDa globular protein) while removing degraded RNA and other low-molecular-weight material. In the course of ribosome profiling experiments, we and others noted that many sequencing reads do not correspond to translated regions. Ribosomes are not specifically selected during the biochemical isolation procedure, and therefore non-ribosomal RNA-protein complexes should also be present. In ribosome profiling, sequencing reads correspond to ribosomes that span the entire translated region and show 3-nt periodicity (Fig. 1a). In contrast, sequencing reads corresponding to RNase footprints of non-ribosomal RNA-protein complexes should be highly localized (Fig. 1a,b). Each RNA species has a percentage of maximum entity (PME) value that reflects degree of localization of sequence reads within this RNA (0 represents highly localized and 1 represents uniform distribution across the gene), and different types of RNA–protein complexes have different PME values (Fig. 1b).
Fig. 1

Identifying non-ribosomal protein associated footprints

(a) Read distribution pattern in translated ORFs and non-ribosomal RNA-protein complexes. (b) Distribution of PME values across transcripts (60nt window). (c) Read fragment length of RNase footprints in types of transcripts. (d) Fraction (in percent) of the various types of RNA-protein complexes.

Based on these considerations, we develop a computational pipeline, Rfoot (Supplementary Code), to systematically identify RNA regions protected by non-ribosomal protein complexes. Specifically, Rfoot searches for protected RNA regions with at least 10 sequencing reads that are highly localized and do not show 3-nt periodicity. Rfoot is distinct from standard peak- detecting methods in ChIP-seq and CLIP-seq analyses that respectively identify DNA or RNA regions bound by proteins. Rfoot considers read distribution patterns and distinguishes between RNA protected by ribosomes, which represent the majority of sequence reads, from RNA protected by non-ribosomal complexes. Unlike analyses of ChIP-seq and CLIP-seq data that require peak detection methods to map bound regions from a population of molecules of varying size with endpoints having varying distances from the protected region, each sequencing read in Rfoot analysis corresponds directly to the fully protected region of an individual RNA-protein complex. Rfoot analysis of our previous ribosome profiling data[7] from two isogenic human cancer cell models (Src-inducible mammary epithelial and Ras-dependent fibroblast;)[8] reveals that 11.3% of the sequencing reads correspond to non-ribosomal RNA-protein complexes. Protected RNA regions, and presumably RNA-protein complexes, are observed for virtually all types of cytoplasmic and nuclear RNAs: mRNAs (3’ UTRs); lncRNAs; small nucleolar (sno) RNAs; spliceosomal RNAs; microRNAs; and tRNAs. Detection of a given RNA–protein complex depends on the abundance of the RNA, the fraction of RNA stably bound by proteins throughout the experimental procedure, and the total number of sequencing reads. Although the sequencing depth used here is sufficient to identify RNA–protein complexes from all RNA classes, greater sequencing depth would likely reveal additional complexes involving mRNAs, miRNAs or lncRNAs that are poorly expressed. As expected, different types of RNA–protein complexes protect different lengths of RNAs (Fig. 1c), and the same complexes are observed when translation was inhibited by either cycloheximide or harringtonine. Small nucleolar (sno) RNAs are primarily nuclear, with the C/D box snoRNAs guiding methylation and the H/ACA box class guiding pseudouridylation of other RNAs[9]. We identified RNase footprints for 112 C/D box RNAs and 68 H/ACA box RNAs (Table S1), which represent almost all expressed snoRNAs. The protected region of C/D type snoRNAs covers the stem loop structure between the C motif (UGAUGA) and D motif (CUGA) (Fig. 2a,b). The region between C/D motifs forms an RNA duplex with the methylation site of the target RNA[10], and is bound by C/D ribonucleoproteins[9]. Notably, although C/D box snoRNAs can form symmetric stem loop structures (Fig. 2a), the protected region covers the left arm of SNORD105, the right arm of SNORD110, and both arms for SNORD113–9, and the middle D and C motifs from different arms of SNORD87 (Fig. 2b). For H/ACA type snoRNAs, the protected regions flank the H box (ANANNA), the single stranded region linking two stem loop structures, and the ACA box located in the tail region (Fig. 2c,d). These motifs are bound by the H/ACA ribonucleoproteins[9]. Interestingly, although C/D box snoRNAs can form symmetric stem loop structures (Fig. 2a), the protected region covers the left arm of SNORD105, the right arm of SNORD110, and both arms for SNORD113-9, and the middle D and C motifs from different arms of SNORD87 (Fig. 2b). For H/ACA type snoRNAs, the protected regions flank the H box (ANANNA), the single stranded region linking two stem loop structures, and the ACA box located in the tail region (Fig. 2c, d). Reads in SNORA23 are mostly in the H box (Fig. 2d), whereas reads in SNORA3 are more associated with ACA box (Fig. 2d). Thus, it appears that RNA–protein complexes within an individual snoRNA class can have different stabilities or conformations.
Fig. 2

Footprinted regions on various classes of RNA

(a) Structure of C/D box snoRNAs. (b) Read distribution of the indicated C/D box snoRNAs with respect to the C and D motifs. (c) Structure of H/ACA box snoRNAs. (d) Read distribution of the indicated H/ACA box snoRNAs with respect to the H and ACA motifs. Read distribution in (e) RNU11 and (f) RNU12 spliceosomal RNAs with respect to the indicated motifs and secondary structures. Read distribution in (g) chr1.tRNA9-ArgUCU and (h) chr12.tRNA2-SerCGA tRNAs with respect to the D and TΨC loops. (i) Read distribution in the MALAT1 lncRNA along with protected regions and PhastCon scores based on 44-vertebrate Multiz alignment. Read distributions in the indicated cell types and fragment lengths and RNA structures in two protected regions are shown. The two fragment length peaks in the protected region on the right indicate structurally and/or conformationally distinct RNA-protein complexes. (j) Distribution of mean Phastcon scores around Lnc RNase footprints.

Spliceosomal RNAs associate with spliceosomal proteins to form small nuclear ribonucleic particles (snRNPs) that are critical for RNA splicing[11], and we detected RNase footprints for all types of spliceosomal RNAs (Table S1). For RNU11, the protected region is mainly associated with the Sm site (Fig. 2e), a conserved sequence (consensus AUUUGUGG) bound by the SMN complex[12]. For RNU12, protected regions are observed both for the Sm site and the 5’ hairpin structure (Fig. 2f) that interacts with branch points of pre-mRNA[12]. We detected RNase footprints for almost all expressed tRNAs (157 in Table S1). The protected regions are located in the D loop and TΨC loop. The D loop is recognized by aminoacytl-tRNA synthases[13], whereas the TΨC loop is important for ribosome binding[14]. The read distribution between these loops varies among tRNAs. For example, more sequencing reads are observed for the D loop of tRNA9 on chromosome 1 (Figs. 2g,S1a), or the TΨC loop of tRNA2 on chromosome 12 (Figs. 2h,S1b). Thus, as observed for snoRNAs, tRNA–protein complexes can have different stabilities or conformations. We detected RNase protected regions for 12 miRNAs (Table S1) that cover the mature microRNA (Fig. S2a,b). If one transcript encodes two mature miRNAs (e.g., miR21 and miR21*), sequence reads were observed over both mature miRNAs (Fig. S2c). The RNA– induced silencing complex (RISC) may bind to these regions, but it is unknown why RNase footprints are not detected for most expressed miRNAs. The fact that mRNAs are associated with ribosomes makes it difficult to identify non-ribosomal RNA–protein complexes that interact with protein-coding or non-canonical translated regions. In this regard, we found 95 protected RNA regions in 3’ UTRs of 69 mRNAs (Table S1). For example, the protected RNA sequence in AMD1 3’ UTR also forms stable hairpin structure (Fig. S3). Some lncRNAs interact with polycomb proteins, and it has been suggested that these interactions affect chromatin structure and transcription[15,16]. Although we detect RNase footprints for only 87 (8%) of expressed lncRNAs, this is five times as many footprints as observed for 3’ UTRs, even though the number of nucleotides in 3‘ UTRs is higher than in lincRNAs. Moreover, in this subset of 87 lncRNAs, we identified 208 non-ribosomal binding sites (Table S1), an average of 2.4 footprints/lncRNA. For example, the telomerase component TERC contains 3 non-ribosomal protein-binding sites (Fig. S4a) that cover the H- and CAB-boxes of the ScaRNA domain, and a 5’ single strand region (Fig. S4b), whereas MALAT1 shows several RNase footprints at regions tending to form RNA hairpin structures (Fig. 2i). Notably, one MALAT1 region shows two distinct RNAse footprints as defined by different protected fragment lengths (Fig. 2i) and a similar situation occurs at other lncRNAs (e.g., Fig. S5). Distinct RNase footprints over the same region could reflect completely different or related RNA–protein complexes or alternative conformations of the same complex. In addition, some RNA–protein complexes are cell-type specific (Figs. 2i, S5). Considering all RNase footprints in lncRNAs, PhastCon scores based on 44-vertebrate Multiz alignment[17] of nucleotide sequences reveals that the conservation level is about 2-fold higher than surrounding sequences (Fig. 2j; Wilcoxon Rank-Sum Test P-value < 10−19). Taken together, these observations suggest that RNase footprints in lncRNAs may represent RNA-protein complexes that carry out biological functions. Our experimental method differs from a transcriptome-scale RNase footprinting approach described previously[5], and it is advantageous in several respects. First, by avoiding crosslinking, we are able to identify native RNA-protein complexes. Crosslinking can cause artifacts, although it also enables the detection of less stable complexes. Second, whole-cell extracts are subject to a crude purification step that enriches for RNA–protein complexes and removes degraded RNA, thereby eliminating sequence reads corresponding to RNA not associated with proteins. In principle, distinct RNA-protein complexes could be enriched by fractionation based on molecular weight or by immunoprecipitation with an antibody against a specific protein (analogous to CLIP-seq). In addition, factors important for RNase footprints can be identified by comparing cells depleted of an individual factor with their wild-type counterparts. Third, each sequencing read corresponds to a complete protected region for an individual RNA molecule. By examining the size distribution of the protected region of individual RNase footprints, we detected distinct RNA–protein complexes for some footprints of MALAT1 and several other lncRNAs. In contrast, RNase footprints obtained with the previous approach represent averages over many molecules such that distinct RNA-protein complexes cannot be detected. Our method can analyze reported and future ribosome profiling datasets for RNase footprints on non-ribosomal RNA-protein complexes. In this regard, we performed Rfoot analysis on published ribosomal profiling datasets from mouse cell lines[18,19]. In accord with our results in human cells, 14.5% of the reads of the sequencing reads correspond to non-ribosomal RNA–protein complexes, and the PME profiles of the mouse (Fig. S6a) and human (Fig. 1b) samples are similar. Furthermore, RNA–protein complexes representing all types of RNA species are identified in these mouse cell lines, and the relative proportion of these types of complexes are roughly comparable to what we observed in human cells (compare Fig. 1d with Fig. S6b). The ability to analyze translation (ribosome footprints) and non- ribosomal RNA–protein complexes in the same sample cannot be done by other methods. Lastly, we note that most of the RNA–protein complexes identified here have not been described previously. As such, our method represents a distinct and complementary approach to identifying RNA–protein complexes on a transcriptome scale.

METHODS

Methods and any associated references are available in the online version of the paper.
  19 in total

Review 1.  Recognizing the D-loop of transfer RNAs.

Authors:  T L Hendrickson
Journal:  Proc Natl Acad Sci U S A       Date:  2001-11-20       Impact factor: 11.205

2.  An early evolutionary origin for the minor spliceosome.

Authors:  Anthony G Russell; J Michael Charette; David F Spencer; Michael W Gray
Journal:  Nature       Date:  2006-10-19       Impact factor: 49.962

Review 3.  Spliceosome structure and function.

Authors:  Cindy L Will; Reinhard Lührmann
Journal:  Cold Spring Harb Perspect Biol       Date:  2011-07-01       Impact factor: 10.005

4.  Sequence and structural elements of methylation guide snoRNAs essential for site-specific ribose methylation of pre-rRNA.

Authors:  Z Kiss-László; Y Henry; T Kiss
Journal:  EMBO J       Date:  1998-02-02       Impact factor: 11.598

5.  A transcriptional signature and common gene networks link cancer with lipid metabolism and diverse human diseases.

Authors:  Heather A Hirsch; Dimitrios Iliopoulos; Amita Joshi; Yong Zhang; Savina A Jaeger; Martha Bulyk; Philip N Tsichlis; X Shirley Liu; Kevin Struhl
Journal:  Cancer Cell       Date:  2010-04-13       Impact factor: 31.743

6.  Evolutionarily conserved elements in vertebrate, insect, worm, and yeast genomes.

Authors:  Adam Siepel; Gill Bejerano; Jakob S Pedersen; Angie S Hinrichs; Minmei Hou; Kate Rosenbloom; Hiram Clawson; John Spieth; Ladeana W Hillier; Stephen Richards; George M Weinstock; Richard K Wilson; Richard A Gibbs; W James Kent; Webb Miller; David Haussler
Journal:  Genome Res       Date:  2005-07-15       Impact factor: 9.043

Review 7.  Non-coding RNAs: lessons from the small nuclear and small nucleolar RNAs.

Authors:  A Gregory Matera; Rebecca M Terns; Michael P Terns
Journal:  Nat Rev Mol Cell Biol       Date:  2007-03       Impact factor: 94.444

8.  Chemical probing of the tRNA--ribosome complex.

Authors:  D A Peattie; W Herr
Journal:  Proc Natl Acad Sci U S A       Date:  1981-04       Impact factor: 11.205

9.  Transcriptome-wide identification of RNA-binding protein and microRNA target sites by PAR-CLIP.

Authors:  Markus Hafner; Markus Landthaler; Lukas Burger; Mohsen Khorshid; Jean Hausser; Philipp Berninger; Andrea Rothballer; Manuel Ascano; Anna-Carina Jungkamp; Mathias Munschauer; Alexander Ulrich; Greg S Wardle; Scott Dewell; Mihaela Zavolan; Thomas Tuschl
Journal:  Cell       Date:  2010-04-02       Impact factor: 41.582

10.  Genome-wide analysis in vivo of translation with nucleotide resolution using ribosome profiling.

Authors:  Nicholas T Ingolia; Sina Ghaemmaghami; John R S Newman; Jonathan S Weissman
Journal:  Science       Date:  2009-02-12       Impact factor: 47.728

View more
  22 in total

1.  Tissue- and development-stage-specific mRNA and heterogeneous CNV signatures of human ribosomal proteins in normal and cancer samples.

Authors:  Anshuman Panda; Anupama Yadav; Huwate Yeerna; Amartya Singh; Michael Biehl; Markus Lux; Alexander Schulz; Tyler Klecha; Sebastian Doniach; Hossein Khiabanian; Shridar Ganesan; Pablo Tamayo; Gyan Bhanot
Journal:  Nucleic Acids Res       Date:  2020-07-27       Impact factor: 16.971

2.  Super-resolution ribosome profiling reveals unannotated translation events in Arabidopsis.

Authors:  Polly Yingshan Hsu; Lorenzo Calviello; Hsin-Yen Larry Wu; Fay-Wei Li; Carl J Rothfels; Uwe Ohler; Philip N Benfey
Journal:  Proc Natl Acad Sci U S A       Date:  2016-10-21       Impact factor: 11.205

Review 3.  Ribosome Profiling: Global Views of Translation.

Authors:  Nicholas T Ingolia; Jeffrey A Hussmann; Jonathan S Weissman
Journal:  Cold Spring Harb Perspect Biol       Date:  2019-05-01       Impact factor: 10.005

4.  Global and cell-type specific properties of lincRNAs with ribosome occupancy.

Authors:  Hongwei Wang; Yan Wang; Shangqian Xie; Yizhi Liu; Zhi Xie
Journal:  Nucleic Acids Res       Date:  2017-03-17       Impact factor: 16.971

5.  Ribosome elongating footprints denoised by wavelet transform comprehensively characterize dynamic cellular translation events.

Authors:  Zhiyu Xu; Long Hu; Binbin Shi; SiSi Geng; Longchen Xu; Dong Wang; Zhi J Lu
Journal:  Nucleic Acids Res       Date:  2018-10-12       Impact factor: 16.971

6.  RibORF: Identifying Genome-Wide Translated Open Reading Frames Using Ribosome Profiling.

Authors:  Zhe Ji
Journal:  Curr Protoc Mol Biol       Date:  2018-09-04

7.  Transcriptome-wide measurement of translation by ribosome profiling.

Authors:  Nicholas J McGlincy; Nicholas T Ingolia
Journal:  Methods       Date:  2017-06-01       Impact factor: 3.608

8.  RiboDiPA: a novel tool for differential pattern analysis in Ribo-seq data.

Authors:  Keren Li; C Matthew Hope; Xiaozhong A Wang; Ji-Ping Wang
Journal:  Nucleic Acids Res       Date:  2020-12-02       Impact factor: 16.971

9.  Rfoot: Transcriptome-Scale Identification of RNA-Protein Complexes from Ribosome Profiling Data.

Authors:  Zhe Ji
Journal:  Curr Protoc Mol Biol       Date:  2018-08-31

10.  LINC00152 Promotes Invasion through a 3'-Hairpin Structure and Associates with Prognosis in Glioblastoma.

Authors:  Brian J Reon; Bruno Takao Real Karia; Manjari Kiran; Anindya Dutta
Journal:  Mol Cancer Res       Date:  2018-07-10       Impact factor: 5.852

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.