| Literature DB >> 32117451 |
Florian Pichot1,2,3, Virginie Marchand2, Lilia Ayadi1,2, Valérie Bourguignon-Igel1,2, Mark Helm3, Yuri Motorin1,2.
Abstract
A major trend in the epitranscriptomics field over the last 5 years has been the high-throughput analysis of RNA modifications by a combination of specific chemical treatment(s), followed by library preparation and deep sequencing. Multiple protocols have been described for several important RNA modifications, such as 5-methylcytosine (m5C), pseudouridine (ψ), 1-methyladenosine (m1A), and 2'-O-methylation (Nm). One commonly used method is the alkaline cleavage-based RiboMethSeq protocol, where positions of reads' 5'-ends are used to distinguish nucleotides protected by ribose methylation. This method was successfully applied to detect and quantify Nm residues in various RNA species such as rRNA, tRNA, and snRNA. Such applications require adaptation of the initially published protocol(s), both at the wet bench and in the bioinformatics analysis. In this manuscript, we describe the optimization of RiboMethSeq bioinformatics at the level of initial read treatment, alignment to the reference sequence, counting the 5'- and 3'- ends, and calculation of the RiboMethSeq scores, allowing precise detection and quantification of the Nm-related signal. These improvements introduced in the original pipeline permit a more accurate detection of Nm candidates and a more precise quantification of Nm level variations. Applications of the improved RiboMethSeq treatment pipeline for different cellular RNA types are discussed.Entities:
Keywords: 2′-O-methylation; RNA; bioinformatic pipeline; high-throughput sequencing; receiver operating characteristic; ribose methylation
Year: 2020 PMID: 32117451 PMCID: PMC7031861 DOI: 10.3389/fgene.2020.00038
Source DB: PubMed Journal: Front Genet ISSN: 1664-8021 Impact factor: 4.599
Figure 1Selection of RiboMethSeq datasets for optimization. Three human datasets providing representative performance of 2'-O-Me detection (Sample 1 – HUVEC, 2 – BMSC, and 3 – HeLa) were selected on the basis of receiver operating characteristics (ROC) curves and the associated max Matthews correlation coefficient (MCC) values for ScoreMAX6 (A–C). Graphs represent zoom to ROC curve 0–0.05 for false positive rate (FPR) and 0–1 for true positive rate (TPR). It was previously shown (Marchand et al., 2016) that 5'-end coverage (light blue curve) is sufficient for reliable construction of the RNA protection profile, but cumulated 5'- and 3'-end coverage (violet curve) provides better discrimination between methylated positions and false positive (FP) hits. (D) shows the read coverage per position for human rRNAs. 5S rRNA shows quite variable coverage, probably due to variations in 5S rRNA content in the total rRNA fraction due to biased extraction.
Alignment statistics and uncovered rRNA positions in samples used for analysis.
| Sample | number of raw reads used | 4 mln | 8 mln | 12 mln | 16 mln | 20 mln |
|---|---|---|---|---|---|---|
| Sample 1 | trimmed reads | 3873996 | 7741257 | 11607806 | 15471803 | 19341955 |
| short reads for alignment | 1761248 | 3581209 | 5338558 | 7104102 | 8909438 | |
| aligned to rRNA reference | 1513287 | 3077770 | 4587105 | 6104305 | 7657111 | |
| uncovered pos 5S rRNA | 0 | 0 | 0 | 0 | 0 | |
| uncovered pos 5.8S rRNA | 0 | 0 | 0 | 0 | 0 | |
| uncovered pos 18S rRNA |
|
| 0 | 0 | 0 | |
| uncovered pos 28S rRNA |
|
|
|
|
| |
| Sample 2 | trimmed reads | 3878093 | 7752516 | 11628210 | 15494852 | 19371588 |
| short reads for alignment | 1473330 | 2986750 | 4455805 | 5927702 | 7428697 | |
| aligned to rRNA reference | 999714 | 2027867 | 3023042 | 4022365 | 5042133 | |
| uncovered pos 5S rRNA | 0 | 0 | 0 | 0 | 0 | |
| uncovered pos 5.8S rRNA | 0 | 0 | 0 | 0 | 0 | |
| uncovered pos 18S rRNA |
|
| 0 | 0 | 0 | |
| uncovered pos 28S rRNA |
|
| 0 | 0 | 0 | |
| Sample 3 | trimmed reads | 3882713 | 7764523 | 11644031 | 15516647 | 19387461 |
| short reads for alignment | 2582027 | 5182220 | 7776132 | 10353556 | 12934621 | |
| aligned to rRNA reference | 2222928 | 4460722 | 6693085 | 8910528 | 11132036 | |
| uncovered pos 5S rRNA | 0 | 0 | 0 | 0 | 0 | |
| uncovered pos 5.8S rRNA | 0 | 0 | 0 | 0 | 0 | |
| uncovered pos 18S rRNA | 0 | 0 | 0 | 0 | 0 | |
| uncovered pos 28S rRNA | 0 | 0 | 0 | 0 | 0 |
The bold font highlights the number of uncovered positions in the different datasets.
Figure 2(A) Performance of ScoreMAX and ScoreA calculated using variable numbers of neighboring nucleotides (from +/−2 to +/−8). The standard RiboMethSeq protocol uses a +/−6 interval. Values for FDR and max Matthews correlation coefficient (MCC) are given. The scale on the left corresponds to false discovery rate (FDR), and on the right to MCC. Sample 2 – BMSC was used here for all calculations; other datasets gave similar trends. (B) shows global values for MethScore (ScoreC) calculated for modified yeast rRNA and in vitro rRNA transcripts using different neighboring intervals. The total number of “2'-O-Me groups” in rRNA is given (red - in vitro transcript, blue - modified rRNA). (C, D) MethScores2 (ScoreC2) for individual 2'-O-methylated positions in 18S (C) and 25S rRNA [(D), red - in vitro transcript, blue - modified rRNA]. Lines correspond to average values.
Figure 3Improvement of ScoreMAX/MEAN (MAX6 and MEAN2) with 5'/3'-counts and reduced calculation window (Score 2 calculation scheme). Boxplot shows max Matthews correlation coefficient (MCC) values (left) and associated false discovery rate (FDR) (right) for all 19 RiboMethSeq datasets used for validation. Identity of the RiboMethSeq datasets is given on the right.
Figure 4Validation of ScoreMEAN2 and ScoreA2 with the S. cerevisiae rRNA RiboMethSeq dataset. Comparative distribution of ScoreA6/ScoreMAX6 signals (A) and ScoreA2/ScoreMEAN2 signals (B) for the same S. cerevisiae rRNA dataset. Graphs represent scatter plots for two scores, with the associated density plot on top (ScoreA6 or ScoreA2) and on the right (ScoreMAX6 and ScoreMEAN2). RiboMethSeq signals for 2'-O-Me positions (light blue), pseudouridines (red) and unmodified nucleotides (gray) are shown.