Hongjing Deng1, Jitender Cheema2, Hang Zhang3, Hugh Woolfenden3, Matthew Norris3, Zhenshan Liu3, Qi Liu3, Xiaofei Yang3, Minglei Yang3, Xian Deng4, Xiaofeng Cao5, Yiliang Ding6. 1. State Key Laboratory of Plant Genomics and National Center for Plant Gene Research, CAS Center for Excellence in Molecular Plant Sciences, Institute of Genetics and Developmental Biology, Chinese Academy of Sciences, Beijing 100101, China; Department of Cell and Developmental Biology, John Innes Centre, Norwich Research Park, Norwich NR4 7UH, UK; College of Life Sciences, University of Chinese Academy of Sciences, 100049, Beijing, China; CAS-JIC Centre of Excellence for Plant and Microbial Science (CEPAMS), John Innes Centre, Norwich Research Park, Norwich NR4 7UH, UK. 2. Department of Computational and Systems Biology, John Innes Centre, Norwich Research Park, Norwich NR4 7UH, UK. 3. Department of Cell and Developmental Biology, John Innes Centre, Norwich Research Park, Norwich NR4 7UH, UK. 4. State Key Laboratory of Plant Genomics and National Center for Plant Gene Research, CAS Center for Excellence in Molecular Plant Sciences, Institute of Genetics and Developmental Biology, Chinese Academy of Sciences, Beijing 100101, China. 5. State Key Laboratory of Plant Genomics and National Center for Plant Gene Research, CAS Center for Excellence in Molecular Plant Sciences, Institute of Genetics and Developmental Biology, Chinese Academy of Sciences, Beijing 100101, China; CAS-JIC Centre of Excellence for Plant and Microbial Science (CEPAMS), John Innes Centre, Norwich Research Park, Norwich NR4 7UH, UK. Electronic address: xfcao@genetics.ac.cn. 6. Department of Cell and Developmental Biology, John Innes Centre, Norwich Research Park, Norwich NR4 7UH, UK; CAS-JIC Centre of Excellence for Plant and Microbial Science (CEPAMS), John Innes Centre, Norwich Research Park, Norwich NR4 7UH, UK. Electronic address: yiliang.ding@jic.ac.uk.
Abstract
RNA secondary structure plays a critical role in gene regulation. Rice (Oryza sativa) is one of the most important food crops in the world. However, RNA structure in rice has scarcely been studied. Here, we have successfully generated in vivo Structure-seq libraries in rice. We found that the structural flexibility of mRNAs might associate with the dynamics of biological function. Higher N6-methyladenosine (m6A) modification tends to have less RNA structure in 3' UTR, whereas GC content does not significantly affect in vivo mRNA structure to maintain efficient biological processes such as translation. Comparative analysis of RNA structurome between rice and Arabidopsis revealed that higher GC content does not lead to stronger structure and less RNA structural flexibility. Moreover, we found a weak correlation between sequence and structure conservation of the orthologs between rice and Arabidopsis. The conservation and divergence of both sequence and in vivo RNA structure corresponds to diverse and specific biological processes. Our results indicate that RNA secondary structure might offer a separate layer of selection to the sequence between monocot and dicot. Therefore, our study implies that RNA structure evolves differently in various biological processes to maintain robustness in development and adaptational flexibility during angiosperm evolution.
RNA secondary structure plays a critical role in gene regulation. Rice (Oryza sativa) is one of the most important food crops in the world. However, RNA structure in rice has scarcely been studied. Here, we have successfully generated in vivo Structure-seq libraries in rice. We found that the structural flexibility of mRNAs might associate with the dynamics of biological function. Higher N6-methyladenosine (m6A) modification tends to have less RNA structure in 3' UTR, whereas GC content does not significantly affect in vivo mRNA structure to maintain efficient biological processes such as translation. Comparative analysis of RNA structurome between rice and Arabidopsis revealed that higher GC content does not lead to stronger structure and less RNA structural flexibility. Moreover, we found a weak correlation between sequence and structure conservation of the orthologs between rice and Arabidopsis. The conservation and divergence of both sequence and in vivo RNA structure corresponds to diverse and specific biological processes. Our results indicate that RNA secondary structure might offer a separate layer of selection to the sequence between monocot and dicot. Therefore, our study implies that RNA structure evolves differently in various biological processes to maintain robustness in development and adaptational flexibility during angiosperm evolution.
RNA serves a variety of roles in a wide array of different biological processes and its secondary structure controls many aspects of its synthesis, metabolism, and regulatory pathways (Wan et al., 2011, Vandivier et al., 2016). RNA secondary structure can be changed by RNA modification to affect biological processes (Harcourt et al., 2017). It has been reported in mammals that N6-methyladenosine (m6A) can function as a switch of RNA secondary structure in post-transcriptional regulation (Liu et al., 2015, Roost et al., 2015, Spitale et al., 2015). However, such a relationship is not revealed in plants. Over the past 60 years structure determination relied on traditional time-consuming and labor-intensive techniques limited to individual RNA species, such as X-ray crystallography, nuclear magnetic resonance, cryo-electron microscopy, gel electrophoresis, and capillary electrophoresis (Bevilacqua et al., 2016). Application of next-generation sequencing (NGS) technologies allowed the first genome-wide profiling of RNA secondary structure using enzymatic modification by RNase V1 and RNase S1, but such approaches were limited to in vitro conditions (Kertesz et al., 2010, Underwood et al., 2010, Li et al., 2012a, Li et al., 2012b, Wan et al., 2012, Wan et al., 2014). The in vivo environment of the cell impacts heavily upon the nature of RNA secondary structure, not least because of the many proteins present in the cell that bind and modify RNA. The use of chemicals that can penetrate the cell and efficiently modify RNA in a measurable fashion has provided the first insight into genome-wide profiling of RNA secondary structure in vivo (Ding et al., 2014, Rouskin et al., 2014, Talkish et al., 2014, Spitale et al., 2015, Burkhardt et al., 2017). One of the common chemicals used for in vivo genome-wide probing is dimethyl sulfate (DMS), which specifically alkylates the Watson–Crick face of adenine (A) and cytosine (C) in the single-stranded region (Ding et al., 2014, Rouskin et al., 2014, Talkish et al., 2014).By combining the action of DMS with NGS, the first in vivo genome-wide RNA secondary structure in plants was achieved in Arabidopsis thaliana (Ding et al., 2014). This study defined several novel RNA structural features that appeared to affect multiple aspects of RNA function and processing. A three-nucleotide periodic pattern was found, but this was limited to the coding sequence (CDS), and it was proposed that this may affect translation efficiency. Those mRNAs associated with abiotic stress responses were generally more single-stranded and this was proposed to maintain greater RNA structural flexibility, facilitating responses to various environmental conditions (Ding et al., 2014). Thus, it was proposed that RNA secondary structure may be important for environmental adaptation, and this is consistent with recent studies demonstrating that all levels of mRNA processing, such as translation, splicing, and degradation, are affected by RNA structure (Bevilacqua et al., 2016).Rice (Oryza sativa L.) is one of the most important food crops in the world. While genetic and genomic approaches have achieved many impressive scientific improvements in recent years, the nature of RNA secondary structure in rice remains poorly understood. Several studies have only implied the potential role of RNA secondary structure in rice under cold stress (Chaikam and Karlson, 2008, Kim et al., 2010). Rice has emerged as an important model for monocotyledonous plants. Genome-wide m6A modification has been revealed in rice (Li et al., 2014), but the relationship of m6A and RNA secondary structure has yet to be revealed. Rice and Arabidopsis diverged from a common ancestor about 150 million years ago (Chaw et al., 2004). Comparative sequence analyses between rice and Arabidopsis revealed a major difference in the GC content of genomes, with a bimodal distribution of GC content in rice but a unimodal pattern in Arabidopsis (Carels and Bernardi, 2000, Yu et al., 2002, Wang et al., 2004). It has been proposed using in silico prediction that a region with higher GC content is likely to have a more stable RNA secondary structure (Chan et al., 2009). However, the effect of GC content on in vivo RNA secondary structure remains unknown. In addition, mRNAs associated with stress responses showed significant expression divergence between rice and Arabidopsis (Mustroph et al., 2010, Narsai et al., 2010), perhaps due to different tolerance to environmental stress. Transcriptional regulation is a way to regulate gene expression. However, the mRNA expression levels only partially correlated with protein levels (Vogel and Marcotte, 2012), hence post-transcriptional regulation must contribute and may provide a means for rapid regulation of protein levels under different environmental stresses (Floris et al., 2009). How RNA structurome in rice and Arabidopsis plays a role in such expression divergence, however, is still unknown.Conservation and divergence in RNA structurome may provide an as yet unexplored modality of evolutionary adaptation, and the degree of diversity in RNA structurome between species has yet to be explored. To better understand the relative contribution of RNA structurome to the expression divergence between species (Mustroph et al., 2010, Narsai et al., 2010), we attempted to characterize RNA structurome of rice and to use this in a comparative analysis with Arabidopsis. We successfully generated in vivo Structure-seq libraries to explore the in vivo RNA secondary structurome in rice, which covered 30 848 transcripts, about half of the annotated transcripts in rice. We found that mRNAs with regulatory functions tend to fold into more thermostable structures while those mRNAs associated with functions in cell redox homeostasis, signal transduction, and photosynthesis tend to fold into less thermostable structures. We also confirmed that higher m6A modification tends to have less RNA structure in the 3′ UTR in plants. A comparison of RNA structurome between rice and Arabidopsis found that GC content does not significantly affect in vivo mRNA secondary structurome in either rice or Arabidopsis. Conservation and divergence in both sequence and RNA secondary structure correlate with diverse and specific biological processes. We thus propose that evolutionary selection can not only modify the sequence to regulate gene function but also alter RNA structure to regulate gene expression.
Results
Genome-wide In Vivo Profiling Provides RNA Secondary Structurome with Nucleotide-Resolution Coverage Across 30 848 Transcripts
We optimized DMS treatment conditions for rice seedlings, showing that a 2.5% (v/v) DMS treatment at 28°C achieved efficient RNA modifications while maintaining single-kinetic reactions (Supplemental Figure 1). We then generated two independent biological libraries, with and without DMS, following the established Structure-seq library construction pipeline (Ding et al., 2014) with two rounds of poly(A) selection (Figure 1A). We generated more than 500 million 50-nt single-end reads and achieved nearly 90% reads mapped to the transcriptomic reference in rice (Table 1). We first evaluated the modification specificity by assessing the correlation of modified nucleotide positions on both 18S rRNA and 25S rRNA with well-identified RNA secondary structure information. The Pearson correlation coefficient (PCC) is high between biological replicates while the PCC is low between (−) DMS and (+) DMS libraries for each biological replicate (Supplemental Table 1). High correlation between biological replicates indicates highly reproducible results while low correlation between (−) DMS and (+) DMS libraries indicates the specific chemical modification by DMS. Moreover, the PCC of mRNA abundance between two biological replicates is about 0.98 (Figure 1B), confirming high correlation of mRNA abundance between biological replicates. Thus, we merged the two replicates for further analysis, which yielded a high enrichment of mRNAs (more than 95%) (Figure 1C). The degree of nucleotide modifications by DMS treatment were specific to A and C (Figure 1D), consistent with the mode of action of DMS. With high sequencing depths, we were able to achieve nucleotide-resolution coverage for 30 848 transcripts, nearly half of the transcriptome in rice (Figure 1E). To further validate the quality of our in vivo Structure-seq libraries in rice, we compared the DMS modification patterns between Structure-seq and conventional gel-based structural probing of both 18S rRNA and 25S rRNA. The deep-sequencing results agree well with conventional gel-based structural probing (PCC = 0.64 and 0.89, respectively, Supplemental Figure 2). These validations and assessments on the Structure-seq libraries confirm our in vivo RNA structurome of rice with both high quality and depths.
Figure 1
Overview of Structure-Seq Pipeline and Library Quality in Rice.
(A) Five-day-old rice etiolated seedlings are probed with DMS in vivo at 28°C. DMS prefers to modify A (red) and C (blue) in the single-stranded region. The adducted methyl group (green circle) will block the RT and cDNA synthesis will stop one nucleotide before the modification sites (green star). After single-stranded DNA (ssDNA) adapter (blue line) ligation and PCR, double-stranded DNA will be used for single-end sequencing. Dark-green and light-green lines represent index primers used in (+) DMS and (−) DMS libraries, respectively. Red line represents universal forward primer.
(B) Correlation of mRNA abundance in two biological replicates in (−) DMS (top) and (+) DMS (bottom) libraries.
(C) Mapping percentage of RNA types in (−) DMS (top) and (+) DMS (bottom) libraries.
(D) Nucleotide occurrence is shown in (−) DMS and (+) DMS libraries.
(E) The distribution of coverage in (+) DMS library of rice. Transcripts without coverage are not shown. Stop/n, stop per nucleotide.
Table 1
Mapping Summary of Structure-Seq Libraries in Rice.
Samples
Total reads
Mapped reads
Mapped ratio (%)
(−) DMS biological replicate 1
136 505 727
120 473 632
88.26
(−) DMS biological replicate 2
113 849 099
100 463 751
88.24
(+) DMS biological replicate 1
126 290 666
111 583 286
88.35
(+) DMS biological replicate 2
127 387 317
114 288 995
89.72
Overview of Structure-Seq Pipeline and Library Quality in Rice.(A) Five-day-old rice etiolated seedlings are probed with DMS in vivo at 28°C. DMS prefers to modify A (red) and C (blue) in the single-stranded region. The adducted methyl group (green circle) will block the RT and cDNA synthesis will stop one nucleotide before the modification sites (green star). After single-stranded DNA (ssDNA) adapter (blue line) ligation and PCR, double-stranded DNA will be used for single-end sequencing. Dark-green and light-green lines represent index primers used in (+) DMS and (−) DMS libraries, respectively. Red line represents universal forward primer.(B) Correlation of mRNA abundance in two biological replicates in (−) DMS (top) and (+) DMS (bottom) libraries.(C) Mapping percentage of RNA types in (−) DMS (top) and (+) DMS (bottom) libraries.(D) Nucleotide occurrence is shown in (−) DMS and (+) DMS libraries.(E) The distribution of coverage in (+) DMS library of rice. Transcripts without coverage are not shown. Stop/n, stop per nucleotide.Mapping Summary of Structure-Seq Libraries in Rice.Prior to in vivo RNA structure studies, attempts were made using in silico approaches, based primarily on thermodynamics, to predict the most stable RNA secondary structure (Lorenz et al., 2011). To better understand RNA structurome in vivo, we folded individual mRNAs with nucleotide-resolution coverage to compare RNA secondary structure with and without in vivo DMS reactivity constraints. We then calculated the positive predictive value (PPV) for each mRNA. PPV is a metric to measure the proportion of base pairs in RNA secondary structure in vivo constrained with DMS reactivity that also appear in RNA secondary structure predicted in silico (Seetin and Mathews, 2012). A high PPV means the in vivo RNA secondary structure is more similar to the in silico-predicted structure, which in turn is a more thermostable RNA secondary structure. A low PPV means the in vivo RNA secondary structure is very different to the in silico-predicted structure, which is a less thermostable RNA secondary structure. An average PPV of 0.54 (data not shown) indicates that most mRNAs did not fold in vivo as in silico.To confirm PPV is a representative metric for measuring RNA structure similarity between the in silico- and in vivo-predicted RNA structure, we also calculated both sensitivity and F1 score. All of these are metrics for measuring the accuracy of RNA secondary structure prediction. Sensitivity measures the proportion of base pairs in RNA secondary structure predicted in silico that also appear in RNA secondary structure in vivo constrained by DMS reactivity (Seetin and Mathews, 2012). F1 score is the harmonic mean of sensitivity and PPV (Lorenz et al., 2011). We found that both sensitivity and F1 score are highly correlated with PPV in rice and Arabidopsis (Supplemental Figure 3). Thus, PPV is representative for measuring RNA structure similarity between in silico- and in vivo-predicted RNA structure in our study.
Genome-wide Relationship between In Vivo RNA Secondary Structure and Biological Processes
We then performed gene ontology (GO) enrichment analysis to understand the relationship between in vivo RNA secondary structure and biological processes. We found that mRNAs with the highest 10% PPV are enriched in genes involved in regulatory processes (Figure 2A and Supplemental Table 2). This means that these genes tend to fold similarly to in silico-predicted structures with more thermostable RNA secondary structures. By contrast, the enrichment of mRNAs with the lowest 10% PPVs are enriched in genes involved in cell redox homeostasis, signal transduction, and photosynthesis (Figure 2A and Supplemental Table 2). These latter RNAs tend to fold very differently from the thermostable structures, with more single-stranded regions than the thermodynamically predicted structures.
Figure 2
The Folded RNA Structurome of Rice In Vivo.
(A) GO enrichment for mRNAs with the highest 10% (top) and lowest 10% (bottom) PPV. There are 29 166 mRNAs for RNA secondary structure folding and PPV calculation.
(B) Examples of mRNAs with high (top) and low (bottom) PPV in descending order, showing the comparison of in vivo (red or blue) and in silico (black) RNA structure predictions in arc diagrams.
The Folded RNA Structurome of Rice In Vivo.(A) GO enrichment for mRNAs with the highest 10% (top) and lowest 10% (bottom) PPV. There are 29 166 mRNAs for RNA secondary structure folding and PPV calculation.(B) Examples of mRNAs with high (top) and low (bottom) PPV in descending order, showing the comparison of in vivo (red or blue) and in silico (black) RNA structure predictions in arc diagrams.Some well-studied and interesting examples with high and low PPV are shown in Figure 2B using arc diagrams depicting RNA secondary structures both in vivo and in silico. An arc diagram is a useful tool for visualizing and comparing base pairs between two RNA secondary structures simultaneously (Lai et al., 2012). mRNA with a high PPV shows a more symmetric pattern between the top and bottom half, which represent in vivo and in silico structure, respectively, while mRNA with a low PPV shows a more asymmetric pattern. Consistent with our GO enrichment for high PPV, we found that several well-known mRNAs with high PPVs are involved in the regulation of macromolecules and gene expression (Figure 2B): FLO7 (Zhang et al., 2016) and GIF1 (Wang et al., 2008), which regulate starch synthesis; OsSPL16 (Wang et al., 2012), a transcription factor regulating grain shape; Badh2 (Chen et al., 2008), encoding a betaine aldehyde dehydrogenase involved in 4-aminobutyric acid biosynthesis; and GS5 (Li et al., 2011), a putative serine carboxypeptidase involved in protein processing. In contrast, some mRNAs with low PPVs are correlated with cell redox homeostasis, signal transduction, and photosynthesis: Se5 (Andres et al., 2009, Hsu et al., 2012, Chen et al., 2013, Li et al., 2013a), involved in chloroplast function; Ugp1 (Chen et al., 2007), involved in pollen cell wall biosynthesis under different temperature; COLD1 (Ma et al., 2015), involved in G-protein signaling for temperature sensing; DST (Li et al., 2013b), involved in hydrogen peroxide (H2O2) homeostasis under drought and salt stress; and Bsr-d1 (Li et al., 2017), involved in H2O2 homeostasis for host immunity. These individual mRNA structures further indicate that the structural flexibility of mRNAs might associate with the dynamics of their biological functions.
Global Characterization of RNA Secondary Structural Features in Rice
With over 30 000 mRNAs with nucleotide-resolution coverage in vivo, we were able to assess global structural features of mRNAs in rice. We averaged DMS reactivity 40 nt upstream of the start codon in 5′ UTR, 100 nt downstream of the start codon in CDS, 100 nt upstream of the stop codon in CDS, and 40 nt downstream of the stop codon in 3′ UTR. We found that the global trend of DMS reactivity along the transcripts in rice (Supplemental Figure 4A) is quite similar to that in Arabidopsis (Ding et al., 2014). The region from −4 nt to +1 nt around the start codon shows particularly high DMS reactivity, reflecting greater single-strandedness and thus less structure (Supplemental Figure 4A). The DMS reactivity in 3′ UTR is higher than that in CDS, implying less structure in 3′ UTR (P value <2.2E−16, Supplemental Figure 4A). We also applied discrete Fourier transform on different genic regions, allowing an assessment of repetitive patterns (Ding et al., 2014). In rice, the three-nucleotide periodicity only exists in CDS and not in UTRs (Supplemental Figure 4C), similar to what we previously observed in Arabidopsis (Ding et al., 2014). The DMS reactivity in 5′ UTR is statistically lower than that in 3′ UTR of rice mRNAs (P value <2.2E−16, Supplemental Figure 4A), indicating that 5′ UTR has a higher structure than 3′ UTR.We applied the same method of Arabidopsis (Ding et al., 2014) to the analysis of RNA structure around alternative splice sites in rice. For each mRNA in our dataset, we identified the spliced and unspliced (including intron retention, exon skipping, and alternative 3'/5′ splicing sites) events. We found stronger structure with lower DMS reactivity region approximately 40 nt upstream of 5′ splice site for the unspliced events (Supplemental Figure 4B). This structural pattern was not found in the same region in the spliced events and 3′ splice site (Supplemental Figure 4B). The result is similar to that in Arabidopsis (Ding et al., 2014), which indicates that the strong structure in the first step of splicing is conserved during the evolution of rice and Arabidopsis.m6A is the most prevalent internal modification present in the mRNAs of most eukaryotes (Yue et al., 2015). As RNA structure can be altered by m6A modification in mammals (Liu et al., 2015, Roost et al., 2015, Spitale et al., 2015), we analyzed the data from m6A immunoprecipitation sequencing in rice (Li et al., 2014). By calculating the correlation of m6A enrichment and Gini index in the whole mRNA, 5′ UTR, CDS and 3′ UTR, we only found a significantly weak and negative correlation in 3′ UTR but not in the whole mRNA or any other regions (Supplemental Table 3). This finding indicates that higher m6A modification is weakly correlated with less RNA structure only in 3′ UTR. Moreover, we found significantly weak and negative correlations between m6A peak density and Gini index in the whole mRNA, CDS, and 3′ UTR (Supplemental Table 3), which indicates that higher m6A peak density tends to be more single stranded, especially in 3′ UTR. Because m6A modification in rice has the highest prevalence in 3′ UTR (Li et al., 2014), we further analyzed the relationship of m6A and RNA structure in 3′ UTR. We found that transcripts with more m6A peaks have significantly lower Gini index (Supplemental Figure 4F), which confirms our results of m6A enrichment and peak density (Supplemental Table 3). Moreover, we observed a significantly different DMS reactivity pattern in the motif (UGUAMM, M = A or C) with and without m6A modification (Supplemental Figure 4G). The region around the motif with m6A modification has significantly higher DMS reactivity than that without m6A (P = 9.444E−13, Supplemental Figure 4G). Thus, we indicate that higher m6A modification tends to have less RNA structure in 3′ UTR in plants, which is consistent with the previous findings in mammals (Liu et al., 2015, Roost et al., 2015, Spitale et al., 2015).
GC Content Does Not Significantly Affect In Vivo mRNA Structure
Considering the high GC content in rice genome and transcriptome (Carels and Bernardi, 2000, Yu et al., 2002, Wang et al., 2004), we assessed whether GC content affects RNA structure in vivo. We averaged the DMS reactivity of mRNAs with the highest 15% mRNA GC content and, separately, lowest 15% mRNA GC content. After comparing DMS reactivity in each genic region between the two groups, we found that the group with the highest GC content does not display significantly lower DMS reactivity than the group with the lowest GC content in the 3′ region of CDS (P = 0.1509) but in the 5′ region of CDS, 5′ UTR, or 3′ UTR (P values are 1.253E−05, 0.0143, and 0.0286, respectively, Figure 3A and 3B). The three-nucleotide periodicity in CDS shows a slightly higher magnitude in the mRNAs with the highest mRNA GC content, compared with those mRNAs with the lowest mRNA GC content (Supplemental Figure 4D). These metagene analyses suggest that GC content does not significantly affect the DMS reactivity in CDS near the stop codon but in UTRs, however, slight effects were observed on the three-nucleotide periodicity in CDS.
Figure 3
The Effect of GC Content and Translation on the Global Pattern of RNA Structurome in Rice.
(A and B) Average GC content (A) and average DMS reactivity (B) of mRNAs with the highest 15% and lowest 15% mRNA GC content with more than one stop per nucleotide.
(C and D) Average GC content (C) and average DMS reactivity (D) of mRNAs with the highest 30% and lowest 30% TEI in TRAP-seq transcripts (19 281 in total) in selected regions of transcripts with more than one stop per nucleotide.
(E–H) 11 847 mRNAs with sufficient coverage for Gini index calculation are used to calculate GC content and Gini index in mRNA (E), 5′ UTR (F), CDS (G), and 3′ UTR (H). After sorting the transcripts from highest to lowest GC content in each region, they are separated into 200 bins before calculating average GC content and Gini index in each bin.
P values between two groups were calculated by Kolmogorov-Smirnov test and shown in the corresponding region in (B–D). The PCC and P values of each comparison are presented at the bottom in (E–H).
The Effect of GC Content and Translation on the Global Pattern of RNA Structurome in Rice.(A and B) Average GC content (A) and average DMS reactivity (B) of mRNAs with the highest 15% and lowest 15% mRNA GC content with more than one stop per nucleotide.(C and D) Average GC content (C) and average DMS reactivity (D) of mRNAs with the highest 30% and lowest 30% TEI in TRAP-seq transcripts (19 281 in total) in selected regions of transcripts with more than one stop per nucleotide.(E–H) 11 847 mRNAs with sufficient coverage for Gini index calculation are used to calculate GC content and Gini index in mRNA (E), 5′ UTR (F), CDS (G), and 3′ UTR (H). After sorting the transcripts from highest to lowest GC content in each region, they are separated into 200 bins before calculating average GC content and Gini index in each bin.P values between two groups were calculated by Kolmogorov-Smirnov test and shown in the corresponding region in (B–D). The PCC and P values of each comparison are presented at the bottom in (E–H).To better investigate the effect of GC content of CDS and other genic regions on individual mRNA secondary structures, we calculated the Gini index of DMS reactivity. Gini index measures the inequality among values within a selected region (Rouskin et al., 2014, Burkhardt et al., 2017). A low Gini index indicates less RNA secondary structure while a high Gini index indicates higher RNA secondary structure. We compared both GC content and Gini index in selected regions in all individual mRNAs with nucleotide-resolution coverage. We found that mRNA GC content did not positively correlate with Gini index; rather, we found a slightly negative correlation (PCC = −0.21, Figure 3E). This further proves that GC content does not strongly affect the in vivo DMS reactivity in mRNAs (Figure 3B). We then separated the whole mRNA into 5′ UTR, CDS, and 3′ UTR. Similarly, there was a negative correlation in CDS (PCC = −0.19, Figure 3G). In UTRs, GC content positively correlated with Gini index, especially in 5′ UTR (PCC are 0.87 and 0.26 in 5′ UTR and 3′ UTR, respectively, Figure 3F and 3H). Therefore, our analysis of Gini index in individual mRNAs further confirms that GC content of the whole mRNA does not strongly affect in vivo RNA secondary structure. However, there might be GC-dependent RNA secondary structure in UTRs.More obvious agreement is found in Arabidopsis that mRNAs with higher GC content display significantly higher DMS reactivity in the whole mRNA (Supplemental Figure 5), which indicates more single-strandedness in mRNAs with higher GC content.Previous studies on predicting functional non-coding RNAs (ncRNAs), such as tRNAs, indicate that GC-rich regions tend to form strong structures (Rivas and Eddy, 2000, Klein et al., 2002). Also, that GC content affects RNA secondary structure has been found through in silico prediction (Chan et al., 2009). We further investigated why mRNAs, particularly CDS, do not tend to fold under the effect of GC content. We hypothesized that this might be because of the notable association with ribosome on the mRNAs and, by inference, translation. Translation, one of the major RNA biological processes, has been shown to be influenced by in vivo RNA secondary structure (Ding et al., 2014, Burkhardt et al., 2017). The study on translation in rice shows that high GC content is correlated to high translation efficiency (Zhao et al., 2017). Therefore, we tested our hypothesis by investigating the effect of translation on in vivo RNA secondary structure. We utilized the previous published translatome in rice from TRAP-seq (translating ribosome affinity purification followed by mRNA-sequencing) (Zhao et al., 2017). Consistent with the previous paper (Zhao et al., 2017), mRNAs with high translation efficiency have significantly high GC content in our analysis (P values are 0.01223, 3.13E−05, 2.22E−16, and 5.76E−09, respectively, Figure 3C). We also found that the mRNAs with high translation efficiency have much higher DMS reactivity than the mRNAs with low translation efficiency across the whole transcripts, especially in both 5′ UTR and CDS regions (P values are 1.121E−09, 6.028E−06, 0.01482, and 0.165, respectively, Figure 3D). The region from −4 nt to +1 nt around the start codon has higher DMS reactivity in the mRNAs with high translation efficiency (Figure 3D). The periodicity magnitude is much greater in mRNAs with high translation efficiency than that in mRNAs with low translation efficiency (Supplemental Figure 4E). These results indicate that mRNAs with higher translation efficiency are less structured. Thus, our results confirm our hypothesis that GC content does not affect in vivo RNA secondary structure of mRNAs in order to maintain efficient regulatory roles, such as translation.
Difference of In Vivo RNA Structurome between Rice and Arabidopsis Is Not Mainly due to the Difference in GC Content
In the analyses above, we found that GC content does not affect in vivo mRNA secondary structure in rice. As previously reported, mRNA GC content is globally higher in rice than in Arabidopsis (Carels and Bernardi, 2000, Goff et al., 2002, Yu et al., 2002), and GC content affects RNA secondary structure in in silico prediction (Chan et al., 2009). We were intrigued about the differences between the in vivo RNA structurome of rice and Arabidopsis and whether the different GC content affects RNA structure between these species.We calculated GC content and Gini index of the whole mRNA and different genic regions of each gene. Rice has a significantly higher GC content than Arabidopsis (Figure 4A, Supplemental Table 4). In our analyzed population, the GC content in mRNA, 5′ UTR, CDS, and 3′ UTR in rice are 0.504 ± 0.055, 0.587 ± 0.095, 0.538 ± 0.091, and 0.401 ± 0.042, respectively (Supplemental Table 4); those in Arabidopsis are 0.424 ± 0.024, 0.386 ± 0.058, 0.459 ± 0.031, and 0.330 ± 0.038, respectively (Supplemental Table 4). Gini index in rice is significantly higher than that in Arabidopsis (Figure 4A), which suggests that mRNAs in rice are more highly structured than those in Arabidopsis in general. We then assessed the difference of mRNA GC content and RNA structural flexibility between the two species. Similar to the previous study (Wang et al., 2004), the distribution of mRNA GC content in rice displayed a bimodal distribution and covered a much broader range than that in Arabidopsis (Figure 4B). The PPV distribution of mRNAs in rice shifted to higher PPV compared with that in Arabidopsis (Figure 4B).
Figure 4
The Effect of GC Content on RNA Structurome between Rice and Arabidopsis.
11 847 and 7424 mRNAs were analyzed in rice and Arabidopsis, respectively.
(A) Distribution of GC content and Gini index in rice and Arabidopsis.
(B) Distribution of mRNA GC content (top) and PPV (bottom) in rice and Arabidopsis. The dashed lines represent the mean of distributions.
(C) Histogram of GC content in CDS and the best-fit model for a mixture with two components in rice. The population was separated into 80 bins (green). The purple line represents the probability density estimation. 8316 and 3531 CDS of mRNAs were separated according to the two distributions into groups of low (yellow) and high (red) GC content of CDS. The dashed line represents the threshold of GC content (0.5765).
(D–F) Violin plots show GC content (D), Gini index (E), and PPV (F) in two groups of CDS of mRNAs in rice and all the CDS of mRNAs in Arabidopsis.
The black dots in (A) and (D–F) represent the mean and the vertical black lines represent the SD.
P values were calculated by Wilcoxon rank-sum test.
The Effect of GC Content on RNA Structurome between Rice and Arabidopsis.11 847 and 7424 mRNAs were analyzed in rice and Arabidopsis, respectively.(A) Distribution of GC content and Gini index in rice and Arabidopsis.(B) Distribution of mRNA GC content (top) and PPV (bottom) in rice and Arabidopsis. The dashed lines represent the mean of distributions.(C) Histogram of GC content in CDS and the best-fit model for a mixture with two components in rice. The population was separated into 80 bins (green). The purple line represents the probability density estimation. 8316 and 3531 CDS of mRNAs were separated according to the two distributions into groups of low (yellow) and high (red) GC content of CDS. The dashed line represents the threshold of GC content (0.5765).(D–F) Violin plots show GC content (D), Gini index (E), and PPV (F) in two groups of CDS of mRNAs in rice and all the CDS of mRNAs in Arabidopsis.The black dots in (A) and (D–F) represent the mean and the vertical black lines represent the SD.P values were calculated by Wilcoxon rank-sum test.We next assessed the effect of GC content on in vivo RNA secondary structure in CDS between the two species. We separated CDS of mRNAs in rice into two groups, high and low GC content, according to its bimodal distribution (Figure 4C and 4D). Surprisingly, mRNAs with high GC content shows Gini index and PPV very similar to the values of mRNAs with low GC content (Figure 4E and 4F). The difference of both Gini index and PPV between rice mRNAs with low GC content and all Arabidopsis mRNAs is similar to the difference between the rice mRNAs with high GC content and all Arabidopsis mRNAs. Thus, it appears that the difference in RNA structure between the species is not mainly due to the difference in GC content between the two species.
Conservation and Divergence of Both Sequence and In Vivo RNA Secondary Structure Corresponds to Diverse and Specific Biological Processes
Since GC content in mRNA, especially in CDS, cannot explain the difference in RNA structure between rice and Arabidopsis, we undertook a more detailed comparative analysis to investigate whether sequence conservation affects the in vivo RNA secondary structure conservation between the two species.The previous studies on rRNA structure are based on crystal structure, co-variation analyses, and in vitro probing (Noller and Woese, 1981, Noller et al., 1981, Gutell et al., 1992, Gutell et al., 1994, Gutell et al., 2002). However, it took decades to understand rRNA structure conservation between species using low-throughput methods. By taking advantage of our Structure-seq data, we demonstrated for the first time that Structure-seq can compare RNA structure conservation between species in a time- and labor-efficient manner with high agreement with previous low-throughput methods (Supplemental Figure 6) (Noller and Woese, 1981, Noller et al., 1981, Gutell et al., 1992, Gutell et al., 1994, Gutell et al., 2002).We further expanded our analysis to a transcriptome-wide scale to assess the relationship between sequence identity and structure similarity. We analyzed 19 956 orthologous gene pairs between rice and Arabidopsis with nucleotide-resolution coverage. We calculated CDS sequence identity by pairwise sequence alignment and measured in vivo RNA secondary structure similarity by RNAforester, a tool that aligns RNA secondary structure based on an ordered tree (Hofacker et al., 1994, Jiang et al., 1995, Hochsmann et al., 2003, Hochsmann et al., 2004). We found a significantly weak correlation between sequence identity and structure similarity (PCC = 0.08, Figure 5A). This indicates that in vivo RNA secondary structure conservation is not correlated with sequence conservation in the two species. Furthermore, we assessed which types of mRNAs are selected more on either a sequence and/or a structure level.
Figure 5
The Relationship between Sequence Identity and Structure Similarity.
(A) Scatterplot of CDS sequence identity and structure similarity. The PCC value is 0.08, and P value is less than 2.2E−16. 19 956 homologous gene pairs were analyzed in the analysis. The black dots represent the examples in (F) and (G) and Supplemental Figure 7.
(B–E) The results of GO enrichment in four groups, with high sequence identity and high structure similarity (B), with high sequence identity and low structure similarity (C), with low sequence identity and high structure similarity (D), and with low sequence identity and low structure similarity (E).
(F and G) The example for two notable groups: high sequence identity and low structure similarity (F) and low sequence identity and high structure similarity (G). The results in rice are colored red and the ones in Arabidopsis blue. The reactivity of the homologous pairs is compared based on the sequence alignment of selected region. Best matched regions are colored with a light purple background, the mismatches on A and C with green background, and the mismatches on G and U with a yellow background. Arc diagrams are shown to compare in vivo RNA secondary structure in the selected regions.
The Relationship between Sequence Identity and Structure Similarity.(A) Scatterplot of CDS sequence identity and structure similarity. The PCC value is 0.08, and P value is less than 2.2E−16. 19 956 homologous gene pairs were analyzed in the analysis. The black dots represent the examples in (F) and (G) and Supplemental Figure 7.(B–E) The results of GO enrichment in four groups, with high sequence identity and high structure similarity (B), with high sequence identity and low structure similarity (C), with low sequence identity and high structure similarity (D), and with low sequence identity and low structure similarity (E).(F and G) The example for two notable groups: high sequence identity and low structure similarity (F) and low sequence identity and high structure similarity (G). The results in rice are colored red and the ones in Arabidopsis blue. The reactivity of the homologous pairs is compared based on the sequence alignment of selected region. Best matched regions are colored with a light purple background, the mismatches on A and C with green background, and the mismatches on G and U with a yellow background. Arc diagrams are shown to compare in vivo RNA secondary structure in the selected regions.We classified genes into four groups according to their comparability between rice and Arabidopsis: (1) high sequence identity with high structure similarity; (2) high sequence identity with low structure similarity; (3) low sequence identity with high structure similarity; and (4) low sequence identity with low structure similarity (Figure 5A). We then performed GO enrichment analysis on these four groups (Figure 5B–5E, Supplemental Table 5). The group-1 genes were enriched in transcription-, translation-, and energy-related processes (Figure 5B and Supplemental Table 5). The group-2 mRNAs were enriched in specific biosynthetic pathways, such as small molecules, glucan, and chlorophyll (Figure 5C and Supplemental Table 5). The group-3 mRNAs with low sequence identity but high structure similarity were enriched in gene expression regulation and other processes (Figure 5D and Supplemental Table 5). The group-4 mRNAs with low sequence identity and low structure similarity were enriched in response to chemical stimulus, abiotic stress, and other related processes (Figure 5E and Supplemental Table 5).To evaluate the classification of the four groups, we aligned DMS reactivity based on sequence alignment and folded RNA secondary structure with inclusion of DMS reactivity as in vivo constraints. Individual examples are presented from the four groups (Figure 5F and 5G, Supplemental Figure 7). The fragments from the orthologous gene pair of group-2, LOC_Os11g14220.1-AT1G50010.1, shows a different pattern of DMS reactivity (Figure 5F, top) and different RNA secondary structure in vivo (Figure 5F, bottom), although the sequence identity is 0.82. Another fragment from the orthologous gene pair of group-3, LOC_Os06g46950.1-AT5G54490.1, shows a very similar pattern of DMS reactivity (Figure 5G, top) and a similar base-pairing pattern of RNA secondary structure in vivo (Figure 5G, bottom) with sequence identity of 0.60. In all, we found that RNA secondary structure conservation is not well correlated with sequence conservation.
Discussion
Different In Vivo RNA Secondary Structural Flexibility Is Required by Different Biological Processes
In this study, we successfully applied the Structure-seq method in rice and generated libraries with high quality and depths. The in vivo RNA structurome achieved up to 30 848 transcripts with nucleotide-resolution coverage and allowed us to globally investigate the in vivo RNA secondary structural features in rice. With the RNA structurome in rice, we confirm that m6A can alter RNA structure in 3′ UTR not only in mammals (Anderson et al., 1995, Hochsmann et al., 2003, Roost et al., 2015) but also in plants (Supplemental Figure 4F and 4G).Through the comparison between in vivo and in silico RNA secondary structure, we noticed that most RNAs do not fold in vivo as predicted in silico. By GO enrichment analysis, we found that mRNAs with high PPV tend to have basic regulatory function, such as macromolecule biosynthesis and transcription (Figure 2A and Supplemental Table 2). This indicates that these genes may have evolved with more thermostable structures in vivo to prevent large conformational changes to maintain stability.We also found that those mRNAs with low PPV tend to have more flexible secondary structure to maintain functions, such as cell redox homeostasis, signal transduction, and photosynthesis (Figure 2A and Supplemental Table 2). These biological processes require a direct and fast response to quench reactive oxygen species for cell homeostasis (Fedoroff, 2006). Thus, highly flexible RNA secondary structure may have high capability to undergo conformational changes in response to various environmental stimuli.Photosynthesis is an interesting enriched GO term different from that in Arabidopsis (Ding et al., 2014). It is a key process directly affected by different environmental cues, requiring great flexibility (Anderson et al., 1995, Franklin and Whitelam, 2004). Therefore, this specific enrichment implies the rice-specific requirement for adopting a flexible RNA structure to enable a fast response to daily changes in light intensity in order to ensure efficient photosynthesis.A possible explanation for high flexibility of mRNAs relating to environmental response is stress-mediated decay in post-transcriptional regulation. It has been reported that photosynthetic genes were reduced more rapidly in post-transcriptional levels associated with polysomes under stress conditions (Park et al., 2012). In vitro and in vivo studies have shown that RNA secondary structure plays an important role in RNA degradation (Li et al., 2012b, Roy and Chanfreau, 2014). Thus, it is possible that the flexible RNA secondary structures of these mRNAs involved in cell redox homeostasis, signal transduction, and photosynthesis may undergo the alteration of RNA degradation in response to stress.The individual examples shown in Figure 2B confirm our findings from GO enrichment analysis. These individual genes contribute to important agronomic traits, such as grain yield (Xing and Zhang, 2010, Qian et al., 2016). In our study, we captured the structural information of these genes and suggest that different flexibility of RNA structure is required to maintain diverse biological processes. Our results provide another layer of RNA structure–function analyses for understanding the gene function in addition to the traditional way of sequence–function analyses.
GC Content Does Not Significantly Affect In Vivo mRNA Structurome
Since GC content in rice is high and bimodally distributed (Carels and Bernardi, 2000, Yu et al., 2002, Wang et al., 2004), we investigated the relationship between in vivo RNA secondary structure and GC content. We found that the mRNA GC content does not significantly affect the global pattern of DMS reactivity both in rice and Arabidopsis (Figure 3A and 3B; Supplemental Figure 5A and 5B). Furthermore, we found that GC content does not significantly affect in vivo RNA structure in CDS, but strongly affects the RNA structure of UTRs, especially 5′ UTR in rice (Figure 3A, 3B, 3F, and 3H). Moreover, we suspect that most mRNAs do not tend to fold in vivo under the effect of GC content to maintain efficient and important biological processes, such as translation (Figure 3C and 3D). Thus, we found that GC content in mRNA does not significantly affect the in vivo mRNA structurome.Comparative analysis of in vivo RNA secondary structure between rice and Arabidopsis shows that mRNAs in rice are more structured than those in Arabidopsis (Figure 4A and 4B). However, these RNA secondary structural differences between rice and Arabidopsis are not influenced by the marked difference of GC content between the two species (Figure 4C–4F). This inter-species result confirms the intra-species results above and suggests that the evolution of GC content between monocot and dicot might not affect in vivo RNA secondary structure of mRNAs.The previous computational predictions identified novel structured ncRNAs by searching for GC-rich regions (Rivas and Eddy, 2000, Klein et al., 2002), which indicates that structured RNAs tend to have higher GC content. However, from our results focused on mRNAs, we found that the GC content in mRNAs does not significantly affect the in vivo mRNA structurome. Therefore, the effect of GC content on RNA structure might vary between ncRNAs and mRNAs, which confirms that RNA structural features are different between ncRNAs and mRNAs (Spitale et al., 2015). Perhaps GC content has a stronger effect on the selection of codon usage (Glemin et al., 2014) other than in vivo RNA structure in mRNAs.Other monocots such as wheat, barley, and maize also have bimodally distributed and high GC content (Carels and Bernardi, 2000). Our study on rice, a model monocot, provides a novel insight into the relationship between GC content and RNA secondary structure. It will be of interest to ascertain whether this relationship revealed in rice prevails in other important monocot crops.
Global Pattern of RNA Structure in Rice Shows Conservation and Divergence to that in Arabidopsis
We revealed a similar global pattern of RNA structure between rice and Arabidopsis. Firstly, we found that the global trend and the periodicity of DMS reactivity across genic regions in rice (Supplemental Figure 4A and 4C) are similar to those in Arabidopsis (Ding et al., 2014). Secondly, we found a similar DMS reactivity valley only in the 40 nt upstream of 5′ splice sites for unspliced events (Supplemental Figure 4B). Lastly, we found a weaker periodicity of DMS reactivity in those mRNAs with low translation efficiency (Supplemental Figure 4E). Such comparison shows the conservation of global RNA structural features between rice and Arabidopsis.However, we found some different patterns between the two species. One major difference is that the DMS reactivity in 5′ UTR is significantly lower than that in 3′ UTR in rice (Supplemental Figure 4A), indicating that 5′ UTR has a stronger structure than 3′ UTR. However, there is no such trend in Arabidopsis (Ding et al., 2014). We suspect that this is due to the effect of GC content in 5′ UTR. As we discussed above, GC content affects the in vivo RNA structure in rice UTRs. However, for Arabidopsis, GC content does not affect RNA structure in UTRs (Supplemental Figure 5), and we did not find such a marked difference in GC content between 5′ UTR and 3′ UTR in Arabidopsis (Figure 4A and Supplemental Table 4). This result implies that there might be regulatory RNA structural features in 5′ UTR in rice different from that in Arabidopsis. This difference suggests the divergence of global RNA structure pattern between rice and Arabidopsis.
RNA Secondary Structure Acts as Another Layer of Evolution Selection Independent of Its Primary Sequence
Sequence has been considered as a layer of selection correlated to protein function during evolution. Our Structure-seq libraries in both rice and Arabidopsis allow us to investigate both sequence and in vivo RNA secondary structure conservation between species on a genome-wide scale for the first time. We compared sequence identity and RNA secondary structure similarity between 19 956 orthologous genes. We found a weak correlation between sequence identity and RNA secondary structure similarity (Figure 5A), indicating that RNA secondary structure conservation does not generally correlate with sequence conservation in vivo.By linking to the corresponding biological functions, we found that the conservation and divergence of both sequence and in vivo RNA secondary structure lead to diverse and specific biological processes (Figure 5B–5E and Supplemental Table 5). Sequence selection highly determines gene function during evolution. If sequences are conserved between species, gene function between species is supposed to be similar. If sequences are divergent between species, gene function might be diverse to allow various adaptations in specific species. RNA structure has been found to control gene expression processes in previous studies, such as translation (Qu et al., 2011). If RNA secondary structure is conserved between species, post-transcriptional regulations such as translation are likely to be conserved to control gene expression levels. Moreover, if RNA secondary structure is not conserved between species, gene expression levels might be regulated differently in these species.Using GO analysis, we found that mRNAs with both high sequence identity and high structure similarity between rice and Arabidopsis are enriched in transcription-, translation-, and energy-related processes (Figure 5B and Supplemental Table 5). The high selection on both sequence and RNA secondary structure implies high selection of both gene function and gene expression levels.The mRNAs with high sequence conservation but low RNA secondary structure conservation are enriched in some biosynthetic pathways, such as small molecules, glucan, and chlorophyll biosynthesis pathways (Figure 5C and Supplemental Table 5). The group with conserved sequence but divergent RNA secondary structure implies high selection on gene function but different regulation of gene expression levels.Furthermore, we found that mRNAs in group-3, with low sequence conservation but high secondary structure conservation, are enriched in many regulatory processes such as gene expression and RNA metabolic processes (Figure 5D and Supplemental Table 5). The weak selection on sequence but strong selection on RNA secondary structure implies divergent sequence selection between species but similar gene expression levels to assure the fundamental regulatory role, such as transcriptional regulators GATA factors found in group-3 (data not shown).The last notable group of mRNAs with both low sequence and RNA secondary structure conservation is enriched in response to chemical stimulus, abiotic stress, and oxidative stress (Figure 5E and Supplemental Table 5). These genes are likely to be divergent in both gene function and expression levels. Previous studies showed a significant expression context divergence in orthologous pairs related to abiotic response between rice and Arabidopsis (Narsai et al., 2010, Movahedi et al., 2011). From our analysis, it is possible that such significant expression context divergence in abiotic response is also regulated by divergence of in vivo RNA secondary structure for the maximum flexibility to adapt to environmental stresses.Through our comprehensive analysis on biological processes, we found that in vivo RNA secondary structure might offer a separate layer of evolutionary selection for different biological processes. Such evolutionary footprints on RNA secondary structure seem to be independent from sequence selection. Perhaps this is because RNA secondary structure plays an important role in gene expression levels and such mechanisms are selected differently during evolution in order to secure RNA structural robustness in basic development and flexibility in adaptation.It is notable from our study that many mRNAs involved in response to environmental cues are found to be more flexible in their RNA secondary structure. As environmental stress can substantially affect crop production, it is of major importance to decipher the details of RNA secondary structure in rice and other crops to advance our understanding of the mechanisms of stress resistance. Additional comparative in vivo RNA secondary structure analysis would offer novel perspectives to help decipher the regulatory role of RNA secondary structure during evolution.
Methods
Plant Materials and Growth Conditions
Rice (Oryza sativa ssp. Nipponbare) seedlings were grown on filter paper wetted with distilled water in plates after sterilizing with 40% bleach. The plates were wrapped in foil and grown at 28°C for 5 days in darkness in an incubator.
In Vivo DMS Treatment
All manipulations with DMS were performed in a chemical fume hood. Ten to twelve individuals of 5-day-old etiolated rice seedlings were thoroughly covered in 1× DMS reaction buffer with 40 mM HEPES (pH 7.5), 100 mM KCl, and 0.5 mM MgCl2. For (+) DMS treatment, DMS was added to a final concentration of 2.5% with vigorous vortex until completely dissolved. For (−) DMS treatment, 1× DMS reaction buffer was added as control. Treatment was performed for 15 min at 28°C with shaking at 500 rpm. Dithiothreitol was added to a final concentration of 0.5 M with vigorous vortexing until completely dissolved to quench DMS. After washing with deionized water, the seedlings were immediately frozen with liquid nitrogen and ground into powder. Total RNA was extracted and purified with an RNeasy Plant Mini Kit (Qiagen). For each treatment, two independent biological replicates were performed for RNA isolation and library construction.
Illumina Library Construction
The method of library construction is taken from the published Structure-seq method (Ding et al., 2014, Ding et al., 2015). We performed two rounds of poly(A) selection using a Poly(A) Purist MAG Kit (Ambion). Poly(A)-selected RNA (500 ng) was subjected to reverse transcription (RT) using a SuperScript III First-Strand Synthesis System (Invitrogen) and random hexamers fused with Illumina TruSeq adapter (5′-CAG ACG TGT GCT CTT CCG ATC TNN NNN N-3′). cDNAs were ligated to a 5′ ssDNA linker (5′ /Phos/ NNN AGA TCG GAA GAG CGT CGT GTA G/3SpC/) using CircLigase ssDNA) ligase (Epicentre). After one round of cDNA size selection (>100 nt) purified by 10% TBE-Urea Gel (Life Technologies), PCR amplification was performed using Illumina TruSeq Primers. Three rounds of agarose gel purification were performed to remove the excess adapters and achieve PCR products between 200 and 650 bp. The double-stranded DNA libraries were subjected to 50-bp single-end sequencing on an Illumina HiSeq 4000 platform.
Illumina Sequence Mapping
The rice transcriptomic reference consists of nuclear, chloroplast, and mitochondrial cDNA sequences (MSU version 7.0), and rRNA and tRNA sequences (IRGSP-1.0). The 50-bp single-end reads were mapped to the rice transcriptomic reference with Bowtie (Langmead et al., 2009). The mapping pipeline is the same as that in FoldAtlas (Norris et al., 2017). The length of 21 nt was the minimum length for mapping after the reads were trimmed from 3′ ends. Up to three mismatches without any insertions or deletions were allowed. After confirming the good correlation of two biological replicates, reads from two replicates of the same treatment were combined for further analyses.
Calculation of DMS Reactivity
First, the raw reactivity, θraw(i), of nucleotide i was calculated by Equation (1) (Ding et al., 2015). and are the raw counts on nucleotide i in (+) DMS and (−) DMS libraries, respectively; l is the length of the transcript:Box-plot analysis (Deigan et al., 2009) was performed to normalize positive reactivity of all nucleotides to obtain the normalized reactivity, θnorm(i). When we folded RNAs with θnorm(i), negative reactivity was excluded according to the requirement of RNAfold (ViennaRNA-2.3.3). Reactivity on guanine (G) and uracil (U) was also excluded. In genome-wide analyses, the reactivity, θfinal(i), was calculated by
Comparison of Gel and Structure-Seq Probing Results
In conventional gel analysis of RNA structure, the samples were treated by DMS in the same manner as for Structure-seq. RT was performed with SuperScript III reverse transcriptase (Invitrogen) by Cy5-labeled primers (Integrated DNA Technologies). The RT product was loaded onto 8% urea–PAGE gel. Gel images were collected with a Typhoon FLA 9500, and the band intensity for each nucleotide was quantified using ImageQuant TL. The gel reactivity was calculated by Equation (3). and were the intensity values of target band i in (+) and (−) DMS lanes, respectively. n was the total number of the clearly visible bands in sequencing lanes. The reactivity on G and U was excluded in further analyses. When we calculated the PCC of gel reactivity between the two species, we aligned the sequence first and calculated the PCC of reactivity:When we compared the reactivity of rRNA in Structure-seq with gel-based probing, we used relative reactivity calculation method by Equation (4). It was similar to the gel reactivity calculation. and were the raw counts of nucleotide i in (+) and (−) DMS library, respectively. m was the total stop count of the analyzed region. When we calculated the PCC of relative DMS reactivity in the two species, we aligned the sequence first and calculated the PCC of DMS reactivity:
Global Analyses of mRNA Secondary Structure with DMS Reactivity In Vivo
We calculated the average stop count per A and C for each RNA and chose those mRNAs with more than one stop per nucleotide for all global analyses. When we analyzed the structural features of mRNA in each region, the population of mRNAs was filtered by length (with more than 40 nt in 5′ UTR, 200 nt in CDS, and 40 nt in 3′ UTR).For GC content analysis, we analyzed the highest 15% and the lowest 15% mRNAs after mRNA GC content ranking. For translation analysis, we downloaded the translatome enrichment index (TEI) for each mRNA from TRAP-seq analysis (Zhao et al., 2017). High TEI indicates high translation efficiency. We analyzed the highest 30% and lowest 30% mRNAs after translation efficiency ranking.For periodicity analysis, we assessed periodicity by applying discrete Fourier transform (DFT) for average DMS reactivity in the first 40 nt before start codon in 5′ UTR, the first 100 nt after start codon and the last 100 nt before stop codon in CDS, and the first 40 nt after stop codon in 3′ UTR. The Goertzel algorithm (Goertzel, 1958) was used for an efficient implementation to obtain the periods of DFT.For alternative splicing analysis, we mapped the (−) DMS libraries to Nipponbare genomic sequence (MSU version 7.0) by HISAT2 with the “--no-softclip” option (Kim et al., 2015). We then identified spliced and unspliced (including intron retention, exon skipping, and alternative 3′/5′ splicing sites) events.For m6A modification analysis, we analyzed the processed data of m6A-seq in rice leaf, in which each m6A peak with start and end coordinates were clearly assigned to 5′ UTR, CDS, and 3′ UTR (Li et al., 2014). The m6A peak density was calculated by dividing the number of m6A peaks with the length of mRNA or selected regions. In the motif analysis, we identified all the sequence motifs (UGUAMM, M = A or C) only in those transcripts (19 002) with m6A modification and nucleotide-resolution coverage and filtered by length as described above. We retrieved a set of motifs with m6A by checking whether the whole motif was in a m6A peak. To avoid the bias induced by different number of motifs in the four types of motifs, in each type of motif the motif number is the same or similar in both m6A and control motifs. The control motif was randomly selected from those motifs outside m6A peaks.
Comparison of In Vivo and In Silico Predicted RNA Secondary Structures
A total of 29 166 mRNAs with more than one stop per nucleotide were folded by RNAfold (ViennaRNA-2.3.3) (Lorenz et al., 2011) with and without in vivo DMS reactivity by pseudo-free energy function (Deigan et al., 2009). Their CDS were folded as well. The phylogenetic structure of 18S rRNA was downloaded from the Comparative RNA Web site (http://www.rna.icmb.utexas.edu/). The scorer utility in RNAstructure (RNAstructure-5.8.1) was used for PPV and sensitivity calculation. The F1 score was calculated from the previous definition (Lorenz et al., 2011). When comparing PPV in CDS between rice and Arabidopsis, mRNAs were filtered by length as described above. R-CHIE was used for visualizing the folded RNAs (Lai et al., 2012).
Gaussian Mixture Modeling of GC Content
It is well known that CDS GC content is bimodal (Carels and Bernardi, 2000, Yu et al., 2002, Wang et al., 2004), so we fitted a one-dimensional Gaussian mixture model with two components with the scikit-learn Python package (Pedregosa et al., 2011). We made 500 iterations and set the tolerance to 0.001 to separate genes into two groups.
Calculation of Gini Index
Raw Gini indices on A and C of selected regions were calculated by the numpy-based implementation (http://neuroplausible.com/gini). For fair comparison of the Gini index between different regions, we normalized the Gini index by A and C percentage in each region. Transcripts with more than one stop per nucleotide were calculated using the Gini index. When comparing Gini index in each generic region, mRNAs were filtered by length as described above.
Analysis of Sequence and RNA Secondary Structure Conservation
The orthologous gene pairs were downloaded from the Rice Genome Annotation Project. The pairwise alignment was performed by Clustal Omega. The comparison of RNA secondary structures was performed by RNAforester (ViennaRNA-2.3.3) (Hofacker et al., 1994, Jiang et al., 1995, Hochsmann et al., 2003, Hochsmann et al., 2004, Lorenz et al., 2011). The relative value for RNA secondary structure similarity measurement was extracted for further analyses.
Gene Ontology Analysis
We utilized the online server, PlantGSEA (Yi et al., 2013), for biological process of GO enrichment analysis in rice. We calculated the fold enrichment of the output from PlantGSEA by the method described at http://pantherdb.org/tips/tips_geneListAnalysis.jsp.
Funding
This work was supported by the Biotechnology and Biological Sciences Research Council (BBSRC) (BB/L025000/1 and BB/J004553/1) to H.D., J.C., and Y.D.; and University of Chinese Academy of Sciences (UCAS) Joint PhD Training Program (UCAS (2015)37) to H.D.
Author Contributions
Conceptualization, H.D. and Y.D.; Methodology, H.D., J.C., Q.L., X.Y., M.Y., and Y.D.; Software, J.C., H.Z., H.W., M.N., and Z.L.; Investigation, H.D., J.C., H.Z., Z.L., and Y.D.; Data Curation, H.D., J.C., H.Z., H.W. M.N., Z.L., and Y.D.; Visualization, H.D., J.C., and Y.D.; Writing – Original Draft, H.D. and Y.D.; Writing – Review & Editing, H.D., J.C., H.Z., H.W., X.Y., X.D., X.C., and Y.D.; Funding Acquisition, H.D. and Y.D.; Supervision, X.C. and Y.D.
Authors: David H Burkhardt; Silvi Rouskin; Yan Zhang; Gene-Wei Li; Jonathan S Weissman; Carol A Gross Journal: Elife Date: 2017-01-31 Impact factor: 8.140
Authors: Yue Wan; Kun Qu; Qiangfeng Cliff Zhang; Ryan A Flynn; Ohad Manor; Zhengqing Ouyang; Jiajing Zhang; Robert C Spitale; Michael P Snyder; Eran Segal; Howard Y Chang Journal: Nature Date: 2014-01-30 Impact factor: 49.962
Authors: Jose Antonio Corona-Gomez; Irving Jair Garcia-Lopez; Peter F Stadler; Selene L Fernandez-Valverde Journal: RNA Date: 2020-04-02 Impact factor: 4.942
Authors: Zhao Su; Yin Tang; Laura E Ritchey; David C Tack; Mengmeng Zhu; Philip C Bevilacqua; Sarah M Assmann Journal: Proc Natl Acad Sci U S A Date: 2018-11-09 Impact factor: 11.205