Vahid Khoddami1,2,3, Archana Yerra2,3, Timothy L Mosbruger4, Aaron M Fleming5, Cynthia J Burrows6, Bradley R Cairns7,3. 1. Department of Cell Biology, Harvard Medical School, Boston, MA 02115. 2. Howard Hughes Medical Institute, University of Utah School of Medicine, Salt Lake City, UT 84112. 3. Department of Oncological Sciences, Huntsman Cancer Institute, University of Utah School of Medicine, Salt Lake City, UT 84112. 4. Bioinformatics Shared Resource, Huntsman Cancer Institute, University of Utah School of Medicine, Salt Lake City, UT 84112. 5. Department of Chemistry, University of Utah, Salt Lake City, UT 84112. 6. Department of Chemistry, University of Utah, Salt Lake City, UT 84112 burrows@chem.utah.edu brad.cairns@hci.utah.edu. 7. Howard Hughes Medical Institute, University of Utah School of Medicine, Salt Lake City, UT 84112; burrows@chem.utah.edu brad.cairns@hci.utah.edu.
Abstract
The breadth and importance of RNA modifications are growing rapidly as modified ribonucleotides can impact the sequence, structure, function, stability, and fate of RNAs and their interactions with other molecules. Therefore, knowing cellular RNA modifications at single-base resolution could provide important information regarding cell status and fate. A current major limitation is the lack of methods that allow the reproducible profiling of multiple modifications simultaneously, transcriptome-wide and at single-base resolution. Here we developed RBS-Seq, a modification of RNA bisulfite sequencing that enables the sensitive and simultaneous detection of m5C, Ψ, and m1A at single-base resolution transcriptome-wide. With RBS-Seq, m5C and m1A are accurately detected based on known signature base mismatches and are detected here simultaneously along with Ψ sites that show a 1-2 base deletion. Structural analyses revealed the mechanism underlying the deletion signature, which involves Ψ-monobisulfite adduction, heat-induced ribose ring opening, and Mg2+-assisted reorientation, causing base-skipping during cDNA synthesis. Detection of each of these modifications through a unique chemistry allows high-precision mapping of all three modifications within the same RNA molecule, enabling covariation studies. Application of RBS-Seq on HeLa RNA revealed almost all known m5C, m1A, and ψ sites in tRNAs and rRNAs and provided hundreds of new m5C and Ψ sites in noncoding RNAs and mRNAs. However, our results diverge greatly from earlier work, suggesting ∼10-fold fewer m5C sites in noncoding and coding RNAs and the absence of substantial m1A in mRNAs. Taken together, the approaches and refined datasets in this work will greatly enable future epitranscriptome studies.
The breadth and importance of RNA modifications are growing rapidly as modified ribonucleotides can impact the sequence, structure, function, stability, and fate of RNAs and their interactions with other molecules. Therefore, knowing cellular RNA modifications at single-base resolution could provide important information regarding cell status and fate. A current major limitation is the lack of methods that allow the reproducible profiling of multiple modifications simultaneously, transcriptome-wide and at single-base resolution. Here we developed RBS-Seq, a modification of RNA bisulfite sequencing that enables the sensitive and simultaneous detection of m5C, Ψ, and m1A at single-base resolution transcriptome-wide. With RBS-Seq, m5C and m1A are accurately detected based on known signature base mismatches and are detected here simultaneously along with Ψ sites that show a 1-2 base deletion. Structural analyses revealed the mechanism underlying the deletion signature, which involves Ψ-monobisulfite adduction, heat-induced ribose ring opening, and Mg2+-assisted reorientation, causing base-skipping during cDNA synthesis. Detection of each of these modifications through a unique chemistry allows high-precision mapping of all three modifications within the same RNA molecule, enabling covariation studies. Application of RBS-Seq on HeLa RNA revealed almost all known m5C, m1A, and ψ sites in tRNAs and rRNAs and provided hundreds of new m5C and Ψ sites in noncoding RNAs and mRNAs. However, our results diverge greatly from earlier work, suggesting ∼10-fold fewer m5C sites in noncoding and coding RNAs and the absence of substantial m1A in mRNAs. Taken together, the approaches and refined datasets in this work will greatly enable future epitranscriptome studies.
Covalent modifications of RNA are numerous (1), and transcriptome-wide profiling enables broad and systematic analyses (2–4). Thus far, transcriptome-wide profiling has been reported for a limited number of modifications including N6-methyladenosine (m6A), 5-methylcytosine (m5C), pseudouridine (Ψ), and N1-methyladenosine (m1A) (5–14). However, profiling methods that provide sensitive and true single-base resolution are currently available only for m5C (9, 14, 15) and m1A (16); three of these (m6A, m1A, and Ψ) have involved initial enrichment or detection via antibodies (for m6A or m1A) (5, 6, 8, 10) or by techniques involving polymerase pausing/termination during reverse transcription (for m1A and Ψ) (7, 11–13, 17, 18). Recent single-base techniques for Ψ (19) rely on a bulky adduct formation before detection. Furthermore, although the current methods for Ψ profiling are useful, most lack the sensitivity, resolution, and technical ease needed for widespread adoption or straightforward candidate site validation (7, 11–13, 20). To provide simultaneous detection of m5C, m1A, and Ψ at single-base resolution transcriptome-wide from the same sample, we developed a molecular approach and analysis pipelines for Ψ and improved sequencing-based methods for m5C and m1A. First, we provide the conceptual basis for sequencing/mismatch-based detection of all three modifications (Fig. 1) and an example tRNA (glycine) that illustrates modification clarity within our HeLa cell dataset (Fig. 1 , with multiple additional examples in ).
Fig. 1.
RBS-Seq enables simultaneous base-pair resolution transcriptome-wide mapping of m5C, m1A, and Ψ. (A) Schematic of reactivity of modified nucleotides to bisulfite (Top) and the principle of simultaneous mapping of modified nucleotides (Bottom). For m5C, bisulfite treatment deaminates Cs (converting to Us; Ts upon cDNA sequencing), whereas m5Cs resist bisulfite (remain Cs upon cDNA sequencing) establishing sites of cytosine methylation. For m1A, cDNA synthesis at sites with m1A often confers misincorporation/mismatch in the NBS sample. In contrast, during treatment, m1A becomes converted to m6A via Dimroth rearrangement (methyl passes from N1 to N6), which faithfully templates thymine (cDNA remains adenosine) in the BS sample. Thus, comparison of NBS and BS samples identifies m1A sites. For Ψ, Ψ nucleotides upon bisulfite treatment form a stable monobisulfite adduct (Fig. 4) causing frequent bypass with reverse transcriptase, leaving a deletion signature at the exact modified sites, evident exclusively in the BS samples. (B) Schematic representation of tRNAGly, indicating its well-known multiple m5C and single m1A and Ψ modified sites. (C) Bar graph summarizing the actual RBS-Seq results from HeLa cell line for a tRNAGly locus indicating the exact locations of the modified nucleotides and their levels. The low levels of m5C shown at positions 40 and 60–66 have been shown previously for a subset of tRNA types (31, 32).
RBS-Seq enables simultaneous base-pair resolution transcriptome-wide mapping of m5C, m1A, and Ψ. (A) Schematic of reactivity of modified nucleotides to bisulfite (Top) and the principle of simultaneous mapping of modified nucleotides (Bottom). For m5C, bisulfite treatment deaminates Cs (converting to Us; Ts upon cDNA sequencing), whereas m5Cs resist bisulfite (remain Cs upon cDNA sequencing) establishing sites of cytosine methylation. For m1A, cDNA synthesis at sites with m1A often confers misincorporation/mismatch in the NBS sample. In contrast, during treatment, m1A becomes converted to m6A via Dimroth rearrangement (methyl passes from N1 to N6), which faithfully templates thymine (cDNA remains adenosine) in the BS sample. Thus, comparison of NBS and BS samples identifies m1A sites. For Ψ, Ψ nucleotides upon bisulfite treatment form a stable monobisulfite adduct (Fig. 4) causing frequent bypass with reverse transcriptase, leaving a deletion signature at the exact modified sites, evident exclusively in the BS samples. (B) Schematic representation of tRNAGly, indicating its well-known multiple m5C and single m1A and Ψ modified sites. (C) Bar graph summarizing the actual RBS-Seq results from HeLa cell line for a tRNAGly locus indicating the exact locations of the modified nucleotides and their levels. The low levels of m5C shown at positions 40 and 60–66 have been shown previously for a subset of tRNA types (31, 32).
Fig. 4.
Characterization of a pseudouridine-bisulfite adduct and heat/Mg2+-induced rearrangement to elicit reverse transcriptase bypass. (A) Sequence and intramolecular folding of pseudouridylated 70-mer RNA oligonucleotide used in the downstream experiments. The two Ψ sites (in red) are indicated by arrowheads. (B) Flowchart of oligo treatments, RT-PCR, TA cloning, and Sanger sequencing of individual colonies. (C) Summary of the deletion signatures obtained from oligonucleotide experiments with the reference sequence and the two Ψ sites at the bottom, showing the insufficiency of the bisulfite step, and requirement for the subsequent heat + MgCl2 step to generate the deletion signatures. (D) Sequence and calculated mass for 12-mer control (12-U) and pseudouridylated (12-Ψ) oligomers used in the downstream experiments. (E) Reaction sequence and methods used for Ψ reactivity studies with 12-mers. (F) Mass spectrum for 12-Ψ after bisulfite and subsequent heat + MgCl2 treatments shows formation of a stable monobisulfite adduct. Mass spectrum for the 12-U and 12-Ψ with only bisulfite treatment is provided in . (G) A proposed model showing that during cDNA synthesis, ribose ring-opened Ψ-monobisulfite is oriented away from the polymerization site, reinforced by Mg2+, explaining base skipping.
Detection of m5C in RNA (and DNA) relies on differential sensitivity to bisulfite: unmethylated cytosine is efficiently deaminated by bisulfite ions converting cytosine to uridine, which is subsequently read as thymidine following desulfonation, RT-PCR, and sequencing. In contrast, m5C resists bisulfite and remains cytosine after sequencing (15, 21) (Fig. 1). We improved prior m5C profiling methods by combining heat and the strong chemical denaturant formamide, which improves RNA denaturation and bisulfite treatment (which preferentially modifies single-stranded RNA), providing a global C → T conversion frequency of 99.7% in HeLa RNA (). Optimization was quantified via synthetic RNA oligomers, with m5C bases placed within and/or outside of regions of secondary structure (). We applied RBS-Seq to HeLa RNA, using three types of input RNA species [polyA-selected (∼85 M reads), rRNA-depleted (∼200 M reads), and small RNA (∼92 M reads)] (22), and our analysis pipelines () (23) compared datasets derived from bisulfite-treated (BS) and non–bisulfite-treated (NBS; untreated) RNAs of the same sample, a regimen which reduces false positives generated by incorrect alignments resulting from reduced nucleotide complexity. We then aggregated results from all three input types and filtered out additional false positives via computational and visual inspection for C/G tracts and strong secondary structure and imposed thresholds for nonconversion [≥20% (FDR ≤ 0.05)] (). This combination of approaches and thresholds yielded a list of high-threshold candidate sites in the following RNA categories: 486 total m5C sites, representing 297 unique sites in abundant noncoding RNAs (tRNAs and rRNA, together), 143 sites in mRNAs (e.g., PTEN and HDGF; Figs. 2 and 3, , and Datasets S1 and S2), 14 pseudogenes, and 32 other noncoding RNAs. New sites within prominent mRNAs include PTEN, XRCC3, FANCA, RXRB, FGFR4, and EIF3B (Dataset S1). Importantly, examination of known/validated sites in tRNAs demonstrated that RBS-Seq has a dynamic range for m5C approaching 100% at single cytosines (e.g., C49 in Fig. 1). Although our read numbers exceeded prior studies, our yield of 486 high-threshold candidate m5C sites in mRNA was far lower than the 10,275 sites reported previously (14), largely due to our more effective denaturation and deamination/conversion, lowering false positives (, and Dataset S1). In keeping, more recent m5C profiling in mouse ESCs that applies additional statistical and analytical parameters to remove false-positives reports 266 sites in mRNA passing thresholds (24).
Fig. 2.
Transcriptome-wide RBS-Seq analysis in HeLa cells shows widespread m5C and Ψ in mRNAs and ncRNAs and m1A almost exclusively in rRNAs and tRNAs. (A) HDGF mRNA bears a single m5C base, within its 3′ UTR. (B) Distribution and dynamic range of m5C sites in coding and noncoding RNAs, stratified by the nonconversion rate (percent of reads that contain a single cytosine that remains cytosine after bisulfite treatment). (C) Mismatch rate (percent of reads with a base mismatch) at human 28S rRNA transcript reveals a single well-known m1A nucleotide at position 1322. (D) Mismatch rate at the conserved well-known A58 sites of human tRNAs (from HeLa cells) in NBS and BS samples. Representatives of each tRNA type with the highest mismatch rate are shown. (E) Deletion rate (percent of reads bearing a 1–2 base deletion) along CDC6 mRNA reveals a single Ψ nucleotide. (F) Distribution and dynamic range for Ψ candidate sites in coding and noncoding RNAs.
Fig. 3.
(A) Distribution of candidate m5C sites in RNA species, with expanded mRNA annotations. (B) Distribution of candidate Ψ sites in different RNA species, with expanded mRNA annotations. See Datasets S1 and S5 for all sites and their corresponding annotations.
Transcriptome-wide RBS-Seq analysis in HeLa cells shows widespread m5C and Ψ in mRNAs and ncRNAs and m1A almost exclusively in rRNAs and tRNAs. (A) HDGF mRNA bears a single m5C base, within its 3′ UTR. (B) Distribution and dynamic range of m5C sites in coding and noncoding RNAs, stratified by the nonconversion rate (percent of reads that contain a single cytosine that remains cytosine after bisulfite treatment). (C) Mismatch rate (percent of reads with a base mismatch) at human 28S rRNA transcript reveals a single well-known m1A nucleotide at position 1322. (D) Mismatch rate at the conserved well-known A58 sites of human tRNAs (from HeLa cells) in NBS and BS samples. Representatives of each tRNA type with the highest mismatch rate are shown. (E) Deletion rate (percent of reads bearing a 1–2 base deletion) along CDC6 mRNA reveals a single Ψ nucleotide. (F) Distribution and dynamic range for Ψ candidate sites in coding and noncoding RNAs.(A) Distribution of candidate m5C sites in RNA species, with expanded mRNA annotations. (B) Distribution of candidate Ψ sites in different RNA species, with expanded mRNA annotations. See Datasets S1 and S5 for all sites and their corresponding annotations.Unlike m6A, m1A compromises A:T Watson–Crick base pairing, which pauses reverse transcriptase and elicits frequent nucleotide misincorporation, generating a single-base mismatch signature useful for m1A identification (8, 17, 25). As expected, in our NBS datasets from RBS-Seq, we indeed detected significant (FDR < 0.01) m1A-related mismatches at well-known m1A sites in noncoding RNAs [e.g., m1A-1322 in 28S rRNA (Fig. 2 and ) and m1A-58 in all tRNAs (Fig. 2 and Dataset S3)]. Unexpectedly, these mismatches were wholly absent or greatly diminished in our BS datasets (), with tRNAs displaying the remarkable dynamic range of our method (∼90% for tRNAGly and tRNAThr; Fig. 2). Regarding the basis, conversion of m1A to m6A (involving transfer of the methyl group from N1 to the N6 position) occurs through a well-studied process known as the Dimroth rearrangement (26) (), which readily occurs in the alkaline heat conditions present in the desulfonation step of the RBS-Seq procedure (). Thus, comparisons of base mismatch frequency within a BS sample compared with its matched original NBS sample reveals sites of m1A ( and Dataset S3). Notably, by our methods and analyses, no significant m1A (>1% at individual A sites) was detected at any single site within an mRNA (, and Dataset S4). Our results contrast with initial studies claiming thousands of m1A sites in mRNAs (10, 25) but are corroborated by more recent studies, which quantify m1A in mRNAs and lncRNAs as extremely rare (15 total sites in HEK293T cells) (24, 27–29).We then focused considerable attention on Ψ. Fortuitously, we observed a reproducible highly penetrant 1–2 nucleotide deletion signature at virtually all known Ψ sites in tRNAs, rRNAs, and snRNAs, exclusively in BS samples. Notably, because our approach does not stop reverse transcriptase, it can uniquely reveal two nearby Ψ sites on the same RNA ( and Dataset S5). Regarding penetrance, 47 uniquely mapping, known/validated Ψ sites in tRNAs (from prior studies, including Ψ55) displayed >50% penetrance (e.g., >90% for tRNAGly; Fig. 1 ), demonstrating the exceptional dynamic range of RBS-Seq for Ψ detection (Dataset S5). Below we address in more detail the deletion mechanism; however, this consistent and unique feature motivated expansion to identify novel Ψ sites transcriptome-wide, and for comparative purposes we chose HeLa cells. Here a custom analysis pipeline was developed involving a statistical approach (), which revealed 754 unique sites: 388 sites in various noncoding RNA species, 322 sites in mRNAs, and 44 sites in pseudogenes (including CDC6; Fig. 2 , , and Dataset S5; FDR < 0.001) with a strong bias for coding regions (Fig. 3). Thus, our work provides hundreds of Ψ sites in noncoding and mRNA species, which show clear enrichment for GO categories related to protein translation and metabolism (especially RNA metabolism) (, and Dataset S6). Sites in prominent mRNAs include SMC2, EIF3D, POLE4, LMO7, CCDC22, ATP5F1, and TRIM8 (Dataset S5).Four groups have previously conducted transcriptome-wide Ψ profiling reports under different names: Pseudoseq (7) and Ψ-seq (13) (using yeast and mammalian cells), PSI-seq (12) (yeast), and CeU-Seq (11) (mammalian cells). All four methods share the same principle: treatment of RNA with the chemical N-cyclohexyl-N′-(2-morpholinoethyl)-carbodiimide-metho-p-toluenesulfonate (CMC), which leaves a bulky group on pseudouridines, stopping reverse transcriptase during cDNA synthesis, thus indicating sites of pseudouridylation globally via RNASeq. These CMC-based techniques have proven useful for identifying Ψ sites, especially in high-abundance RNAs, and for identifying candidate Ψ sites in mRNAs. However, despite utilizing similar methods and identical yeast strains, overlap between candidate sites was extremely low (typically <4%) and was equivalently low when we compared published CMC-based results in mammalian cell types (typically HeLa and/or HEK293) (20) (, and Dataset S7). Interestingly, candidate Ψ sites from RBS-Seq using HeLa cells overlapped better with prior CMC-based studies (either HeLa or HEK293) than did the prior studies with themselves (, and Dataset S7), consistent with RBS-Seq revealing a higher proportion of positives.To better understand these differences, we turned to validation approaches. CMC-based methods that rely on cDNA chain termination for mapping Ψ sites present challenges for validation, requiring quantities of pure target RNA beyond feasibility for most mRNAs; thus, most prior studies either lacked validation (7, 13) or verified only very few (fewer than five) highly abundant candidate sites (11, 12). RBS-Seq, in contrast, provides a straightforward high-throughput validation protocol easily adapted to mRNA because the Ψ-dependent deletion signatures that appear within the corresponding cDNAs are easily scaled up through gene-specific PCR amplification and quantified by high-throughput sequencing of barcoded amplicons. Thus, for validation and comparisons to prior studies, we tested 60 candidate sites, which we partitioned into four groups: group I, sites uniquely identified by RBS-Seq (12 sites); group II, sites shared between RBS-Seq and at least one CMC-based method (25 sites); group III, sites detected in at least one of the CMC-based methods but not identified by RBS-Seq due to falling below our read coverage thresholds (14 sites); and group IV, sites from one or more CMC-based methods but not RBS-Seq, despite sufficient coverage in RBS-Seq (nine sites) (Dataset S8). For validation tests, we treated HeLa or HEK293 total RNA to a streamlined bisulfite + heat + MgCl2 protocol via chemistry described below followed by RT-PCR involving barcoded ∼125-bp amplicons (on average), sequencing (termed RBS-MiSeq), and our deletion signature analysis pipeline (). Notably, the vast majority (34 of 37) of group I and II sites validated, yielding a clear deletion signature involving tens of thousands of sequenced reads with HeLa and/or HEK293 datasets, providing confidence in sites identified by RBS-seq. For group III, 10 of 14 validated, suggesting that increasing the sequencing depth and/or applying our focused validation approach can resolve the rare false negatives generated by RBS-Seq. Finally, none of the candidates in group IV (0 of 9) validated, strongly suggesting a much higher false positive rate with any of the CMC-based techniques alone compared with RBS-Seq, consistent with their low overlap in prior studies. Examples of each group are provided in , and complete results are provided in Dataset S8.To independently test our Ψ profiling approaches and results, we examined Dyskerin (DKC1), the most disease-relevant human Ψ synthase. Mutation of DKC1 causes dyskeratosis congenita, characterized by short telomeres and bone marrow failure (30, 31). DKC1 utilizes H/ACA box snoRNAs to guide Ψ targeting to rRNAs via base pairing between the snoRNA and the target rRNA (32), and DKC1 also interacts with telomerase noncoding RNA (TERC) (33, 34), but whether TERC receives substantial Ψ is uncertain (35). To resolve this issue, high-throughput RBS-Seq followed by deletion signature analysis was performed on both total and polyA-selected RNAs isolated from DKC1-depleted HeLa cell via siRNAs, yielding ∼84% reduction in DKC1 transcript levels (). Comparison of the DKC1-siRNA with control-siRNA datasets revealed significant reduction (>25% reduction, FDR < 0.01) of the deletion signature levels in 227 sites; most reside within rRNAs, although 18 sites were observed within mRNAs (, and Dataset S9). Curiously, the 58 DKC1-dependent sites in HEK293 mRNAs reported by Ψ-seq show no overlap with the 18 sites found in HeLa mRNAs by RBS-Seq. Moreover, because Ψ-seq but not RBS-Seq reported two DKC1-dependent Ψ sites within TERC in HEK293 cells (13), we specifically tested TERC at both sites with our streamlined RBS-MiSeq validation procedure in both HeLa and HEK293 cells. Notably, despite over 30 K reads overlapping both sites in both cell types, no significant deletion was observed (Dataset S8), suggesting that TERC is an interacting partner of DKC1 but not a direct pseudouridylation substrate in these cell types under the conditions tested.Finally, to elucidate the chemistry of the observed 1–2 base deletion signature at Ψ and to guide validation methodologies, we determined which step(s) of our RBS-Seq protocol elicited base deletion by utilizing a synthetic 70-mer oligonucleotide bearing two Ψ sites and quantifying base deletion frequencies. Strikingly, bisulfite treatment alone failed to induce any deletion signature, whereas heating (75 °C) the BS RNA in the presence of magnesium ions (20 mM) for 15 min was both necessary and sufficient for generating the penetrant deletion signature (Fig. 4 and ). Similar treatments with a synthetic 12-mer RNA oligonucleotide containing a single Ψ followed by MS analysis displayed a major peak consistent with a stable, monoadduct of bisulfite (Fig. 4 and ). Next, the site of covalent attachment of the bisulfite group was determined by exposing the Ψ nucleoside itself to the reaction sequence followed by structural analysis (). Notably, the nucleoside reaction analyzed by LC-MS revealed two stable monobisulfite adducts (). The UV-vis for these compounds displayed λmax ∼ 260 nm (), supporting the aromaticity of the base remaining intact after the reaction. On the basis of 1H-NMR, the bisulfite attachment was determined to be on the 1′ carbon, which involved an opening of the ribose ring to yield a pair of isomeric products, consistent with prior preliminary results () (36). We propose initial addition of bisulfite to the base, followed by a heat-induced migration of the bisulfite to the ribose, yielding rearomatization and the formation of a stable ring-opened sugar adduct (). Notably, previous studies examining polymerase bypass of similar ring-opened template sites reported a strong tendency for deletions (37), consistent with our observations and interpretation that the ribose ring-opened bisulfite adduct of pseudouridine underlies the deletion signature. Interestingly, although Mg2+ was an absolute requirement for generation of the deletion signature assessed by cDNA sequencing (Fig. 4), it proved dispensable for generating the ring-opened sugar adduct in the nucleoside experiments, suggesting that Mg2+ helps reorient the ribose ring-opened Ψ bisulfite adduct away from the polymerization site or stabilizes a preexisting configuration that eventually causes polymerase bypass (Fig. 4).Characterization of a pseudouridine-bisulfite adduct and heat/Mg2+-induced rearrangement to elicit reverse transcriptase bypass. (A) Sequence and intramolecular folding of pseudouridylated 70-mer RNA oligonucleotide used in the downstream experiments. The two Ψ sites (in red) are indicated by arrowheads. (B) Flowchart of oligo treatments, RT-PCR, TA cloning, and Sanger sequencing of individual colonies. (C) Summary of the deletion signatures obtained from oligonucleotide experiments with the reference sequence and the two Ψ sites at the bottom, showing the insufficiency of the bisulfite step, and requirement for the subsequent heat + MgCl2 step to generate the deletion signatures. (D) Sequence and calculated mass for 12-mer control (12-U) and pseudouridylated (12-Ψ) oligomers used in the downstream experiments. (E) Reaction sequence and methods used for Ψ reactivity studies with 12-mers. (F) Mass spectrum for 12-Ψ after bisulfite and subsequent heat + MgCl2 treatments shows formation of a stable monobisulfite adduct. Mass spectrum for the 12-U and 12-Ψ with only bisulfite treatment is provided in . (G) A proposed model showing that during cDNA synthesis, ribose ring-opened Ψ-monobisulfite is oriented away from the polymerization site, reinforced by Mg2+, explaining base skipping.
Discussion
This work provides five advances in RNA modification profiling: (i) improved methods for profiling m5C and m1A; (ii) quantitative methods for profiling Ψ sites at true base resolution transcriptome-wide; (iii) a chemical understanding of the Ψ-dependent deletion signature; (iv) a coupling of these methods for the simultaneous detection of all three modifications in the same sample, which has provided hundreds of candidate sites of modification; and (v) a streamlined Ψ candidate site validation procedure for bulk verification of dozens of candidate sites in the same sample. Together, the combinatorial ability and relative ease of execution provided by this procedure should greatly forward epitranscriptome studies involving these three very common RNA modifications, and the refined lists of high-threshold mapped sites in HeLa cells should enable better-focused downstream functional studies. Furthermore, because RBS-Seq also provides transcript abundance (like a typical RNAseq), the combined outputs present a multidimensional (4D) dataset that may prove useful both for basic investigations and diagnostic settings.
Methods
Detailed methods are provided in .
Cell Culture (Including siRNA Treatment).
HeLa cells were seeded in 100-mm plates at 2 × 106 per plate in DMEM (Gibco) containing 4.5 g/L d-glucose, l-glutamine, and 110 mg/L sodium pyruvate and supplemented with 10% FBS. At ∼75% confluency, cells were harvested via TrypLE Express (Gibco) and washed once with 1× PBS, pH 7.4 (Gibco).For the DKC1 depletion experiment, siRNA treatment was performed in two sequential transfections per sample, 72 h apart. HeLa cells were seeded at 3 × 105 per well and transfected with Lipofectamine RNAiMAX (Invitrogen) and 60 pmol of siRNA (either Dharmacon’s siGENOME HumanDKC1 siRNA for the DKC1 knockdown or Dharmacon’s siGENOME nontargeting siRNA pool no. 1 for the control sample). Cells were split and reseeded 48 h after the first transfection and harvested 72 h after the second transfection.
RNA Isolation and Preparation.
Total RNA from the above samples was isolated using TRIzol Reagent (Invitrogen) and split. rRNA depleted samples were obtained via RiboMinus Transcriptome Isolation Kit for human/mouse (Ambion). The polyA-selected sample was isolated from total RNA using polyA Spin mRNA Isolation Kit (New England Biolabs). Small RNA (enriching transcripts <200 bp) was isolated using mirVana miRNA Isolation. See , for RNA fragmentation.
Bisulfite Treatment.
Processed RNA was denatured by incubating 45 µL RNA (5 µg) in 240 µL deionized formamide at 95 °C for 5 min before chilling on ice. For the sulfonation step, 312 µL of freshly prepared 5 M sodium bisulfite (pH 5) with 3 µL 100 mM hydroquinone was added to each denatured sample and incubated at 50 °C. To desulfonate, each sample was purified on Illustra NAP-10 columns (GE) and incubated in 2 M Tris buffer, pH 9.0 0 (Trizma Preset crystals, pH 9.0; Sigma-Aldrich), for 2 h at 37 °C. The RNA was recovered by ethanol precipitation.
Library Preparation and Sequencing.
For the transcriptome-wide study and for the DKC1 depletion experiment, we used the Illumina Tru-Seq Small RNA kit to generate paired-end libraries. The resulting libraries were sequenced in a 101-cycle paired-end format on an Illumina HiSeq 2000 instrument.
Validation of Candidate Ψ Sites by RBS-MiSeq.
PCR amplification yielded ∼300 bp regions surrounding each of 60 candidate sites via a primer design compatible with bisulphite, in which all unmethylated Cs have been converted to Ts. In addition, two DCK1-dependent sites in TERC RNA were tested. HeLa and HEK293 cells were cultured as described above, and RNA was extracted and bisulfite-treated, incubated sequentially in library preparation buffers lacking enzyme, and recovered by ethanol precipitation. Random hexamers and SuperScript II Reverse Transcriptase (Invitrogen) generated cDNA, from which we generated PCR amplicons using our PCR primer sets, which were then pooled and used for MiSeq DNA library preparation using different barcodes for HeLa or HEK293 sets according the Illumina protocols. Libraries were sequenced on a MiSeq instrument.
Bioinformatics Methods.
Transcriptome-wide sequencing reads from BS, NBS, and the DKC-1 experiments were aligned using Novoalign (Novocraft) to standard and bisulfite reference index of hg19 chromosome, scaffold and splice junction sequences, accommodating repeat reads, and trimming adaptor sequences. Reads mapping to certain small and repetitive RNA classes (tRNA, rRNA, snRNA, and snoRNA) were extracted and realigned to a custom reference containing only unique representative sequences of the above. All alignment files were processed identically. The processed alignments were then analyzed using custom scripts (https://github.com/HuntsmanCancerInstitute/RBSSeqTools) to generate tables of candidate sites for each individual modification based on the criteria as listed. For m5C, we selected only those sites from all reference “C” positions which had a read depth ≥10 in both BS and NBS datasets, a C->T nonconversion rate of ≥20% in the BS dataset, and an FDR ≤ 0.05 for nonconversion; sites were further screened manually for evidence of nonconversion arising from undenatured structure or mapping errors. For m1A, all reference A positions having coverage of at least 100 reads, positions were selected where the non-A rate of the NBS sample was significantly higher than the BS sample; two of C, G, or T were significantly higher than the error rate; and the FDR of C was not lower than that of both G and T. Additionally, to compare our data to publicly available m1A datasets, we screened our data for previously reported positions and filtered using the same criteria for m1A. Because previously available data were reported as enriched regions, these were merged across replicates, and the RefSeq transcript positions were converted to genomic positions using the python hgvs package. The entire region was scanned within our data for potential m1A sites. Pseudouridine positions were called if the following criteria were met: bisulfite (BS) ≥5 deletions, BS fraction deletion ≥0.01, and BS coverage ≥10 reads. Adjacent positions were merged into individual deletion groups, which were further pruned by removing positions with fraction deletions less than half the maximum observed faction deletion in the group. See , for further details on pruning.
Investigating the Source of the Deletion Signature for Ψ.
A synthetic 12-mer RNA strand (10 nmols), 5′-GCU ACG ΨAC UAG-3′, was bisulfite treated (as described above) and dialyzed against ddH2O for 36 h at 4 °C, and the water was changed every 8 h. After the 36-h dialysis, the sample was then dialyzed against ddH2O containing 3 mM NH4OAc for 36 h at 4 °C, and the NH4OAc solution was changed every 8 h, then lyophilized to dryness and resuspended in 30 µL of 1 mM NH4OAc and 30 µL of isopropanol. The adducted RNA sample was analyzed by ESI−-MS to yield an experimental mass (3880.8) consistent with a monobisulfite adducted RNA strand (calcd = 3880.3).
Structural Analysis of the Monobisulfite Adduct to the Pseudouridine Nucleoside.
All chemicals were obtained from commercial suppliers. The NaHSO3 solution was freshly prepared before reaction as a 5 M stock solution (pH 5.0). The pseudouridine nucleoside (20 mM) was allowed to react with 3 M NaHSO3 at pH 5.0 for 16 h at 50 °C to give a product yield of ∼90%. The reaction was analyzed using a Hypercarb HPLC column running a mobile phase combination of A = 20 mM NH4OAc (pH 7) and B = MeOH with a flow rate = 1 mL/min while monitoring the elution profile at 220 and 260 nm. The method was held at 0% B for 5 min, after which B was changed to 95% B over 20 min via a linear gradient. The two product peaks were collected, dried, and submitted to mass spectrometric analysis in which they gave masses consistent with monoadducts of bisulfite to the nucleoside (calcd mass [M-H]− = 325.27 and expt mass for the first isomer = 325.00 and the second isomer = 325.07). The purified samples were analyzed by UV-vis in ddH2O buffered with 20 mM NaPi showing the first eluting peak named isomer 1 to have a λmax = 265 nm and the second eluting peak named isomer 2 to have a λmax = 266 nm. In a final experiment, the purified compounds were analyzed by 1H-NMR: isomer 1 (500 MHz, D2O) δ 7.60 (s, 1 H), 4.40 (s, 0 Hz, 1 H), 4.31 (dd, 8.32 Hz, 1 H), 3.67 (m, 2.45 Hz, 1 H), 3.61 (dd, 2.17 Hz, 1 H), 3.45 (dd, 7.33, 6.85, and 4.40 Hz, 1 H), and 3.22 (dd, 4.89 and 2.93 Hz, 1 H) and isomer 2 (500 MHz, D2O) δ 7.61 (s, 1 H), 4.39 (d, 6.85 Hz, 1 H), 3.98 (t, 6.85 and 6.35 Hz, 1 H), 3.75 (t, 5.87 Hz, 1 H), 3.65 (dd, 3.91 and 2.93 Hz, 1 H), 3.58 (dd, 3.42 and 2.94 Hz, 1 H), and 3.44 (dd, 6.85 and 4.89 Hz, 1 H).
Authors: Dan Dominissini; Sigrid Nachtergaele; Sharon Moshitch-Moshkovitz; Eyal Peer; Nitzan Kol; Moshe Shay Ben-Haim; Qing Dai; Ayelet Di Segni; Mali Salmon-Divon; Wesley C Clark; Guanqun Zheng; Tao Pan; Oz Solomon; Eran Eyal; Vera Hershkovitz; Dali Han; Louis C Doré; Ninette Amariglio; Gideon Rechavi; Chuan He Journal: Nature Date: 2016-02-10 Impact factor: 49.962
Authors: Jeffrey E Squires; Hardip R Patel; Marco Nousch; Tennille Sibbritt; David T Humphreys; Brian J Parker; Catherine M Suter; Thomas Preiss Journal: Nucleic Acids Res Date: 2012-02-16 Impact factor: 16.971
Authors: Aaron M Fleming; Anton Alenko; Jay P Kitt; Anita M Orendt; Peter F Flynn; Joel M Harris; Cynthia J Burrows Journal: J Am Chem Soc Date: 2019-10-02 Impact factor: 15.419
Authors: Clemens Heissenberger; Lisa Liendl; Fabian Nagelreiter; Yulia Gonskikh; Guohuan Yang; Elena M Stelzer; Teresa L Krammer; Lucia Micutkova; Stefan Vogt; David P Kreil; Gerhard Sekot; Emilio Siena; Ina Poser; Eva Harreither; Angela Linder; Viktoria Ehret; Thomas H Helbich; Regina Grillari-Voglauer; Pidder Jansen-Dürr; Martin Koš; Norbert Polacek; Johannes Grillari; Markus Schosserer Journal: Nucleic Acids Res Date: 2019-12-16 Impact factor: 16.971