| Literature DB >> 32508872 |
Charlotte Capitanchik1, Patrick Toolan-Kerr1,2, Nicholas M Luscombe1,3,4, Jernej Ule1,2.
Abstract
A flurry of methods has been developed in recent years to identify N6-methyladenosine (m6A) sites across transcriptomes at high resolution. This raises the need to understand both the common features and those that are unique to each method. Here, we complement the analyses presented in the original papers by reviewing their various technical aspects and comparing the overlap between m6A-methylated messenger RNAs (mRNAs) identified by each. Specifically, we examine eight different methods that identify m6A sites in human cells with high resolution: two antibody-based crosslinking and immunoprecipitation (CLIP) approaches, two using endoribonuclease MazF, one based on deamination, two using Nanopore direct RNA sequencing, and finally, one based on computational predictions. We contrast the respective datasets and discuss the challenges in interpreting the overlap between them, including a prominent expression bias in detected genes. This overview will help guide researchers in making informed choices about using the available data and assist with the design of future experiments to expand our understanding of m6A and its regulation.Entities:
Keywords: N6-methyladenosine; RNA; bioinformatics; epitranscriptomics; m6A
Year: 2020 PMID: 32508872 PMCID: PMC7251061 DOI: 10.3389/fgene.2020.00398
Source DB: PubMed Journal: Front Genet ISSN: 1664-8021 Impact factor: 4.599
Single nucleotide resolution, transcriptome-wide methods for detecting m6A.
| Antibody based | miCLIP | HEK293 MOLM13 | • High throughput, can be used to assess multiple conditions | • Difficult to correct for nonspecific antibody binding | DRACH | Truncations and C → T mutations | Yes | Total RNA and poly(A) selected available | |
| m6A-CLIP | A549 CD8+ T cells HeLa | RRACU/RAC | Truncations and mutations (substitutions and deletions) | Yes | poly(A) HeLa—ribo0, poly(A), nucleoplasm, chromatin | ||||
| MazF enzyme based | MAZTER-seq | HEK293T | • Generates stoichiometric data | • Can only detect sites in ACA sequence context | ACA | Enzymatic cleavage efficiency, measured as truncations vs. read-through | No | poly(A) | |
| m6A-REF-seq | HEK293T | ACA | No | poly(A) | |||||
| Fusion domain based | DART-seq | HEK293T | • Low RNA input | • Biases in background APOBEC1 targeting | Mutation site must be C → U | C → U mutations | No | None | |
| WHISTLE | Any | • Can predict m6A sites in any gene, regardless of expression | • Trains based on CLIP datasets, so will learn CLIP biases | RRACH | Truncations and mutations | Yes | poly(A) | ||
| Direct RNA sequencing by Nanopore | MINES | HEK293 | • Potential for measuring stoichiometry of sites and combinatorial modification dynamics (although currently not systematically implemented) | • Trains based on CLIP datasets, so will learn CLIP biases | RGACH | Tombo’s fraction modified values and coverage files | NA | poly(A) | |
| NanoCompore | MOLM13 | • Can detect other modifications as well as m6A | • Currently low throughput | No | Difference in k-mer current intensity and dwell time in pore between WT and METTL3 KD control | NA | poly(A) |
FIGURE 1High throughput methods to detect or predict m6A in transcriptomes. (A) Crosslinking and immunoprecipitation (CLIP) methods involve UV crosslinking of the m6A antibody to purified RNA. m6A-CLIP and miCLIP differ in the antibodies used, complementary DNA (cDNA) library preparation, and computational processing, among other differences. (B) MazF Escherichia coli endoribonuclease preferentially cuts at nonmethylated ACA sites. This forms the basis of MAZTER-seq and m6A-REF-seq. (C) DART-seq expresses an APOBEC1-YTH fusion protein. The YTH domain targets APOBEC1 to m6A sites, where it deaminates surrounding cytosines to uracil. (D) Direct RNA sequencing with Nanopore technologies facilitates detection of m6A due to differences in ionic current intensities between A- and m6A-containing sequences and dwell time in the pore. Methods differ by how these signals are deconvolved. m6A identification using nanopore sequencing (MINES) is a combination of four random forest models, pretrained using CLIP m6A sites as true positives. NanoCompore relies on a comparison in signal between two conditions, for example wild type (WT) and METTL3 knockdown, or in vivo RNA vs. nonmodified in vitro transcribed RNA. (E) In silico prediction of m6A sites is performed by WHISTLE, a support vector machine algorithm that uses miCLIP and m6A-CLIP sites as training data.
FIGURE 2m6A-containing genes identified by eight methods. (A) Bar chart showing the number of m6A-containing transcripts identified by each method. Some methods have data from multiple cell lines or apply several possible thresholds, which are shown separately. The cell lines for each dataset are indicated along with the type of method. The hashed bars denote genes that are commonly expressed between all the cell lines considered here. For DART-Seq, MAZTER-Seq, and MINES, several thresholds were possible: “DART-Seq M3” refers to sites identified by comparison with METTL3 knockdown. “Low” and “high” refer to two stringency thresholds applied by the authors. “MAZTER-Seq” refers to all sites with a cleavage efficiency <50%, and “MAZTER-Seq cond” refers to FTO overexpression, WT ≥ 20%, and/or Alkbh5 overexpression, WT ≥ 20%. “MINES” refers to all sites identified by MINES, and “MINES 30×” refers to MINES sites with ≥ = 30× coverage. (B) Bar chart showing the numbers of overlapping target genes between the eight methods, considering all the reported genes.
Number of expressed genes per cell line and origin of the expression dataset.
| HEK293 | 11,018 | ||
| HEK293T | 11,703 | ||
| MOLM13 | 12,968 | ||
| HeLa | 12,839 | ||
| A549 | 9,963 | ||
| CD8T+ | 8,235 |
Number of top-ranking targets selected per method.
| DART-seq | 1,019 |
| m6A-CLIP | 1,072 |
| m6A-REF-seq | 1,243 |
| miCLIP | 1,233 |
| NanoCompore | 387 |
| WHISTLE | 1,198 |
| MINES | 1,104 |
| MAZTER-seq | 944 |
FIGURE 3Comparing the top-ranking target genes identified by eight methods. (A) Bar chart showing the numbers of top-ranking genes that overlap between the eight methods. (B) Heatmap showing overlap between the top targets. Dendrograms are produced by complete-linkage hierarchical clustering using the Jaccard index as the distance metric. Dark blue indicates presence of the gene among the top targets for a method, and gray indicates absence. Colored bars denote the category of the method. (C) Proportions of top targets that are unique to each method. (D) Number of methods detecting a target gene plotted against its mean expression decile across all studied cell lines. (E) Minimum expression deciles for the top ranked genes were plotted for each method.
Number of m6A modified transcripts for each method following thresholding.
| miCLIP | CIMs HEK293 | As from paper | 3,755 | 6,282 | 4,000 |
| CITs HEK293 | As from paper | 2,779 | |||
| MOLM13 | As from paper | 3,662 | |||
| m6A-CLIP | A549 | As from paper | 5,915 | 8,560 | 4,694 |
| CD8+ T cell | As from paper | 4,697 | |||
| HeLa | As from paper | 6,415 | |||
| DART-seq | High stringency HEK293T | C > U events from paper filtered for DRACH motif | 5,648 | 8,331 | 5,445 |
| Low stringency HEK293T | C > U events from paper filtered for DRACH motif | 7,614 | |||
| WT vs. METTL3 depleted HEK239T | C > U events from paper filtered for DRACH motif | 2,370 | |||
| m6A-REF-seq | HEK293T | As from paper | 1,843 | 1,843 | 1,243 |
| MAZTER-seq | HEK293T | MazF cleavage efficiency < 50% | 3,545 | 3,705 | 2,568 |
| HEK293T | FTO overexpression, WT ≥ 20%, and/or Alkbh5 overexpression, WT ≥ 20% | 482 | |||
| WHISTLE | Trained on miCLIP and m6A-CLIP | Posterior probability of being m6A ≥ 0.95 | 3,877 | 3,877 | 2,177 |
| MINES | Nanopore | As from paper | 6,910 | 6,910 | 4,390 |
| Nanopore | Filtered for 30× coverage (threshold for NanoCompore) | 1,883 | |||
| NanoCompore | WT vs. METTL3 KO Nanopore | DRACHs within clustered 5-mers with contextual | 556 | 556 | 387 |