Literature DB >> 35426900

DRUMMER-Rapid detection of RNA modifications through comparative nanopore sequencing.

Jonathan S Abebe1, Alexander M Price2, Katharina E Hayer3, Ian Mohr1, Matthew D Weitzman2,4, Angus C Wilson1, Daniel P Depledge1,5,6.   

Abstract

MOTIVATION: The chemical modification of ribonucleotides regulates the structure, stability, and interactions of RNAs. Profiling of these modifications using short-read (Illumina) sequencing techniques provides high sensitivity but low-to-medium resolution i.e., modifications cannot be assigned to specific transcript isoforms in regions of sequence overlap. An alternative strategy uses current fluctuations in nanopore-based long read direct RNA sequencing (DRS) to infer the location and identity of nucleotides that differ between two experimental conditions. While highly sensitive, these signal-level analyses require high quality transcriptome annotations and thus are best suited to the study of model organisms. By contrast, the detection of RNA modifications in microbial organisms which typically have no or low-quality annotations requires an alternative strategy. Here, we demonstrate that signal fluctuations directly influence error rates during base calling and thus provides an alternative approach for identifying modified nucleotides.
RESULTS: DRUMMER (Detection of Ribonucleic acid Modifications Manifested in Error Rates (i) utilizes a range of statistical tests and background noise correction to identify modified nucleotides with high confidence, (ii) operates with similar sensitivity to signal-level analysis approaches, and (iii) correlates very well with orthogonal approaches. Using well-characterized DRS datasets supported by independent meRIP-Seq and miCLIP-Seq datasets we demonstrate that DRUMMER operates with high sensitivity and specificity.
AVAILABILITY AND IMPLEMENTATION: DRUMMER is written in Python 3 and is available as open source in the GitHub repository: https://github.com/DepledgeLab/DRUMMER. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
© The Author(s) 2022. Published by Oxford University Press.

Entities:  

Year:  2022        PMID: 35426900      PMCID: PMC9154255          DOI: 10.1093/bioinformatics/btac274

Source DB:  PubMed          Journal:  Bioinformatics        ISSN: 1367-4803            Impact factor:   6.931


1 Introduction

The selective chemical modification of an RNA transcript impacts its splicing, stability, structure, translation and turnover (Barbieri and Kouzarides, 2020; He and He, 2021; Shi ). Over 170 distinct chemical modifications of RNA have been reported; however, only a minority are well characterized (Boccaletto ). Key challenges in RNA modification studies include the precise mapping of modified bases at nucleotide resolution and on the level of individual RNAs. These challenges are exacerbated where transcriptome annotation quality is low and/or the defined genome contains multitudes of overlapping transcription units (i.e. viruses) (Depledge ). While antibody-based (e.g. meRIP-Seq, miCLIP-Seq) and antibody-independent (e.g. DART-Seq, MAZTER-Seq) have significantly advanced our understanding of RNA modifications such as the methylation of adenosine at the N6 position (m6A) (Dominissini ; Garcia-Campos ; Linder ; Meyer, 2019; Zhang ), they remain constrained by the biases of short-read sequencing approaches which include technical and biological variation leading to noise and affecting reproducibility (Kukurba and Montgomery, 2015). Alternative approaches to RNA modification detection are centered on the analysis of datasets generated using Oxford Nanopore Technologies (ONT) direct RNA sequencing (DRS) methodology (Depledge ; Workman ). Here, sequencing of polyadenylated RNAs in their native state using a molecular motor that ratchets the RNA through a membrane-embedded protein pore, disrupting the flow of ions through the nanopore. These changes in current are subsequently interpreted using neural networks to predict the sequence of nucleotides within the RNA. The reader-head, which records these signals, is positioned within a region of the nanopore in which an average of five nucleotides of the RNA is present. Thus, the signal change is a reflection of this 5-mer. The structural changes to a ribonucleotide resulting from modification will thus alter the current measurements during its entire time in the reader-head region (Garalde ). During subsequent base-calling, the neural networks, trained on unmodified nucleotides, are prone to misinterpreting the signal and calling an incorrect base. Multiple tools applying discrete approaches have been developed to exploit DRS data ranging from classifiers (Liu ; Lorenz ), comparative signal-level analysis tools (Leger ; Pratanwanich ), comparative error-rate analysis tools (Jenjaroenpun ; Parker ; Price ), and most recently, direct signal-level analysis (Begik ; Hendra ). The utility of each approach varies according to the experimental questions and datasets in question. For instance, signal-level analyses predominantly operate at the transcriptome level and their success is thus dependent on having high-quality transcriptome annotations. While powerful, their utility is often limited to well-characterized datasets and requires a higher level of computational expertise. By contrast, organisms for which genome but not high-quality transcriptome annotation are available (e.g. microbes) require alternative strategies for analysis. Here, the use of comparative error-rate analysis tools provides a simpler alternative to screen for RNA modifications. Here, we introduce DRUMMER (Detection of Ribonucleic acid Modifications Manifested in Error Rates—https://github.com/DepledgeLab/DRUMMER); an RNA modification detection package that predicts modified nucleotides via a comparative assessment of base-call error rates in two or more datasets. While designed primarily for the detection and analysis of RNA modifications in diverse viruses, we use well-characterized DRS datasets, supported by meRIP and miCLIP, to demonstrate DRUMMER’s ability to map m6A modifications at high resolution across both mammalian and viral transcriptomes.

2 Materials and methods

DRUMMER is implemented in Python 3 and allows for up to three biological replicates per condition to be processed using a round-robin approach (where each control sample is compared against each of the treatment samples). One condition is represented by datasets from a control sample in which the RNA modification(s) of interest is present, while the second condition is represented by datasets from a treatment sample in which one or more RNA modifications have been ablated using inhibitors, gene knockdown, gene knockout or in vitro transcription strategies (Fig. 1). The choice of approach dictates the specificity of the resulting analysis i.e. a broad analysis revealing the location of all modified ribonucleotides or a narrow analysis showing the location of a specific modification, such as m6A.
Fig. 1.

Schematic overview of DRUMMER. (A) DRUMMER identifies putative RNA modifications through comparative analysis of nanopore DRS datasets. The presence or increased abundance of a modified ribonucleotide is more likely to result in an incorrect nucleotide being reported during base-calling (i.e. a higher error rate). (B) DRUMMER can process both genome-level and transcriptome-level alignments. In ‘exome’ mode DRUMMER uses sequence read alignments against a genome to predict the location of putative RNA modifications (triangles) in a genomic context. In ‘isoform’ mode, DRUMMER relies on sequence read alignments (blue, yellow, green lines) against a (high-quality) transcriptome and predicts the location of putative RNA modifications (red triangles) at the level of individual transcript isoforms (large blue, yellow, green boxes) and in a genomic context. Note that low-quality sequence read alignments (grey lines) should be filtered prior to analysis. (C and D) DRUMMER parses BAM files using bamreadcount to generate per nucleotide counts of A, C, G, U and N (indels) base-calls in both treatment and control datasets. A G-test (2 × 5 contingency table) is used to determine whether a significant difference in erroneous base-calls is observed between the two datasets at a given position, supported by an odds ratio test to determine whether an increased error rate is observed in the control (depletion of RNA modification abundance in treatment relative to control) or treatment (accumulation of RNA modification abundance in treatment relative to control) dataset. A given site is reported (by default) as a depletion/accumulation candidate if G-test padj < 0.05 and O/R > 1.5. Where multiple sites within a five-nucleotide window are classed as candidates, only the site with the largest G-test score is retained with all others reported as [masked]. Additional reporting shows 11-nt sequence windows centered on the candidate site that can be used for sequence motif/context discovery. When specifically run in m6A detection mode (−m6A), DRUMMER also reports the distance (nt) between a given candidate site and the nearest AC dinucleotide along with the 5-nt sequence motif centered on that nearest AC dinucleotide. Data shown in C and D are derived from isoform-level analysis of the Adenovirus L2-Penton transcript

Schematic overview of DRUMMER. (A) DRUMMER identifies putative RNA modifications through comparative analysis of nanopore DRS datasets. The presence or increased abundance of a modified ribonucleotide is more likely to result in an incorrect nucleotide being reported during base-calling (i.e. a higher error rate). (B) DRUMMER can process both genome-level and transcriptome-level alignments. In ‘exome’ mode DRUMMER uses sequence read alignments against a genome to predict the location of putative RNA modifications (triangles) in a genomic context. In ‘isoform’ mode, DRUMMER relies on sequence read alignments (blue, yellow, green lines) against a (high-quality) transcriptome and predicts the location of putative RNA modifications (red triangles) at the level of individual transcript isoforms (large blue, yellow, green boxes) and in a genomic context. Note that low-quality sequence read alignments (grey lines) should be filtered prior to analysis. (C and D) DRUMMER parses BAM files using bamreadcount to generate per nucleotide counts of A, C, G, U and N (indels) base-calls in both treatment and control datasets. A G-test (2 × 5 contingency table) is used to determine whether a significant difference in erroneous base-calls is observed between the two datasets at a given position, supported by an odds ratio test to determine whether an increased error rate is observed in the control (depletion of RNA modification abundance in treatment relative to control) or treatment (accumulation of RNA modification abundance in treatment relative to control) dataset. A given site is reported (by default) as a depletion/accumulation candidate if G-test padj < 0.05 and O/R > 1.5. Where multiple sites within a five-nucleotide window are classed as candidates, only the site with the largest G-test score is retained with all others reported as [masked]. Additional reporting shows 11-nt sequence windows centered on the candidate site that can be used for sequence motif/context discovery. When specifically run in m6A detection mode (−m6A), DRUMMER also reports the distance (nt) between a given candidate site and the nearest AC dinucleotide along with the 5-nt sequence motif centered on that nearest AC dinucleotide. Data shown in C and D are derived from isoform-level analysis of the Adenovirus L2-Penton transcript To prepare data for DRUMMER analysis after nanopore sequencing, all input datasets should be base-called using the same version of Guppy (or equivalent), and aligned against either a representative genome (exome mode) or transcriptome (isoform mode) to produce sorted BAM files (Fig. 1). The specific choice of exome versus isoform mode is guided by the available data for the transcriptome of interest. Exome mode is primarily suited for smaller genomes with low-quality transcriptome annotations. Here, it is more prudent to align reads directly to the genome prior to identifying putative modified ribonucleotides using DRUMMER. While this approach reduces sensitivity compared to isoform-level alignments (see Supplementary Materials) and may be more affected by misalignments around splice junctions, it broadens the accessibility of RNA modification detection such that analyses can be performed in the absence of high-quality transcriptome annotations. By contrast, where high-quality reference transcriptomes are available, alignments of nanopore reads against a transcriptome in fasta format and comprising all documented transcript isoforms can be further parsed to remove noise from 5ʹ truncated and/or multi-mapping alignments. This filtering notably increases the sensitivity compared to exome level analyses (see Supplementary Materials). DRUMMER processes each genome/transcript isoform individually, parsing alignments from the input BAM files to generate base-call distributions (i.e. the number of A, C, G, U and indels) for each position along the genome/transcript. Each position is then subject to a 2 × 5 G-test and an Odds Ratio (O/R) test with resultant P-values undergoing multiple testing Bonferroni correction. Putative RNA modification positions are labeled as candidate sites if both G-test and O/R adjusted P-values are less than the user-specified input (default < 0.05) and the O/R test result exceeds a user-specified input (default > 1.5) (Fig. 1C). Candidate sites within 5 nt of each other are masked, leaving only a single candidate possessing the highest G-test score. This increases the specificity of downstream analyses and prevents the inclusion of false positives that may occur due to influence of modified nucleotides on neighboring unmodified nucleotides. Note, however, that this function can be disabled within DRUMMER if neighboring modifications are expected. Additional information collected on a per site basis includes an 11-base sequence motif centered on the position of interest, and a determination of whether a homopolymer (≥3 nt) is present in the 11-base motif. When run specifically in m6A detection mode (−m6A), DRUMMER also determines the distance to the nearest AC dinucleotide and the 5-base sequence centered upon that motif (i.e. NNCN) (Fig. 1D). Finally, DRUMMER classifies candidate sites according to the direction of the odds ratio result. Accumulation sites are defined as having a higher error rate in the treatment versus control sample where depletion sites have a higher error rate in the control versus treatment. This specificity allows users to identify RNA modifications that either increase or decrease in frequency according to the experimental design (e.g. depletion of a methyltransferase or depletion of a demethylase). Importantly, the presence of accumulation sites when a specific modification is depleted allows DRUMMER to establish a baseline for false-positive detection that leads to increasingly stringent filtering and higher specificity for true-positives (see Supplementary Materials). Upon completion, DRUMMER outputs a report table containing detailed lists of all putative modified nucleotides that can be filtered and visualized using bundled scripts (see Supplementary Materials). Two case studies demonstrating the ability of DRUMMER to detect m6A modifications in viral and murine datasets are showcased in the Supplementary Materials with further examples found in recent publications on adenovirus, SARS-CoV-2 and herpes simplex virus type 1 (Burgess ; Price ; Srinivas ).

3 Conclusions

The sensitive detection and precise mapping of RNA modifications to the transcriptomes of model and non-model organisms remains challenging in most scenarios. Where read depth is sufficiently high, DRUMMER operates with high specificity and sensitivity, allowing the identification of modified ribonucleotides on individual transcript isoforms. While primarily designed for the analysis of microbial transcriptomes (and in particular, viruses), DRUMMER is also capable of providing rapid analyses of larger eukaryotic transcriptomes. Click here for additional data file.
  24 in total

1.  Highly parallel direct RNA sequencing on an array of nanopores.

Authors:  Daniel R Garalde; Elizabeth A Snell; Daniel Jachimowicz; Botond Sipos; Joseph H Lloyd; Mark Bruce; Nadia Pantic; Tigist Admassu; Phillip James; Anthony Warland; Michael Jordan; Jonah Ciccone; Sabrina Serra; Jemma Keenan; Samuel Martin; Luke McNeill; E Jayne Wallace; Lakmal Jayasinghe; Chris Wright; Javier Blasco; Stephen Young; Denise Brocklebank; Sissel Juul; James Clarke; Andrew J Heron; Daniel J Turner
Journal:  Nat Methods       Date:  2018-01-15       Impact factor: 28.547

2.  Nanopore direct RNA sequencing maps the complexity of Arabidopsis mRNA processing and m6A modification.

Authors:  Matthew T Parker; Katarzyna Knop; Anna V Sherwood; Nicholas J Schurch; Katarzyna Mackinnon; Peter D Gould; Anthony Jw Hall; Geoffrey J Barton; Gordon G Simpson
Journal:  Elife       Date:  2020-01-14       Impact factor: 8.140

3.  Identification of differential RNA modifications from nanopore direct RNA sequencing with xPore.

Authors:  Fei Yao; Ying Chen; Casslynn W Q Koh; Yuk Kei Wan; Ploy N Pratanwanich; Christopher Hendra; Polly Poon; Yeek Teck Goh; Phoebe M L Yap; Jing Yuan Chooi; Wee Joo Chng; Sarah B Ng; Alexandre Thiery; W S Sho Goh; Jonathan Göke
Journal:  Nat Biotechnol       Date:  2021-07-19       Impact factor: 54.908

4.  Quantitative profiling of pseudouridylation dynamics in native RNAs with nanopore sequencing.

Authors:  Oguzhan Begik; Morghan C Lucas; Leszek P Pryszcz; Jose Miguel Ramirez; Rebeca Medina; Ivan Milenkovic; Sonia Cruciani; Huanle Liu; Helaine Graziele Santos Vieira; Aldema Sas-Chen; John S Mattick; Schraga Schwartz; Eva Maria Novoa
Journal:  Nat Biotechnol       Date:  2021-05-13       Impact factor: 54.908

5.  Single-nucleotide-resolution mapping of m6A and m6Am throughout the transcriptome.

Authors:  Bastian Linder; Anya V Grozhik; Anthony O Olarerin-George; Cem Meydan; Christopher E Mason; Samie R Jaffrey
Journal:  Nat Methods       Date:  2015-06-29       Impact factor: 28.547

6.  RNA modifications detection by comparative Nanopore direct RNA sequencing.

Authors:  Adrien Leger; Paulo P Amaral; Luca Pandolfini; Charlotte Capitanchik; Federica Capraro; Valentina Miano; Valentina Migliori; Patrick Toolan-Kerr; Theodora Sideri; Anton J Enright; Konstantinos Tzelepis; Folkert J van Werven; Nicholas M Luscombe; Isaia Barbieri; Jernej Ule; Tomas Fitzgerald; Ewan Birney; Tommaso Leonardi; Tony Kouzarides
Journal:  Nat Commun       Date:  2021-12-10       Impact factor: 14.919

7.  Widespread remodeling of the m6A RNA-modification landscape by a viral regulator of RNA processing and export.

Authors:  Kalanghad Puthankalam Srinivas; Daniel P Depledge; Jonathan S Abebe; Stephen A Rice; Ian Mohr; Angus C Wilson
Journal:  Proc Natl Acad Sci U S A       Date:  2021-07-27       Impact factor: 11.205

Review 8.  m6 A RNA methylation: from mechanisms to therapeutic potential.

Authors:  P Cody He; Chuan He
Journal:  EMBO J       Date:  2021-01-20       Impact factor: 11.598

9.  Accurate detection of m6A RNA modifications in native RNA sequences.

Authors:  Huanle Liu; Oguzhan Begik; Morghan C Lucas; Jose Miguel Ramirez; Christopher E Mason; David Wiener; Schraga Schwartz; John S Mattick; Martin A Smith; Eva Maria Novoa
Journal:  Nat Commun       Date:  2019-09-09       Impact factor: 14.919

10.  Topology of the human and mouse m6A RNA methylomes revealed by m6A-seq.

Authors:  Dan Dominissini; Sharon Moshitch-Moshkovitz; Schraga Schwartz; Mali Salmon-Divon; Lior Ungar; Sivan Osenberg; Karen Cesarkas; Jasmine Jacob-Hirsch; Ninette Amariglio; Martin Kupiec; Rotem Sorek; Gideon Rechavi
Journal:  Nature       Date:  2012-04-29       Impact factor: 49.962

View more
  3 in total

1.  Analyzing viral epitranscriptomes using nanopore direct RNA sequencing.

Authors:  Ari Hong; Dongwan Kim; V Narry Kim; Hyeshik Chang
Journal:  J Microbiol       Date:  2022-08-24       Impact factor: 2.902

Review 2.  Nanopore-Based Detection of Viral RNA Modifications.

Authors:  Jonathan S Abebe; Ruth Verstraten; Daniel P Depledge
Journal:  mBio       Date:  2022-05-17       Impact factor: 7.786

Review 3.  RNA Modification in Inflammatory Bowel Diseases.

Authors:  Mika Nakayama; Yuki Ozato; Yoshiko Tsuji; Yasuko Arao; Chihiro Otsuka; Yumiko Hamano; Genzo Sumi; Ken Ofusa; Shizuka Uchida; Andrea Vecchione; Hideshi Ishii
Journal:  Biomedicines       Date:  2022-07-13
  3 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.