| Literature DB >> 31608115 |
Lukas Schmidt1, Stephan Werner1, Thomas Kemmer2, Stefan Niebler3, Marco Kristen1, Lilia Ayadi4,5, Patrick Johe1, Virginie Marchand4, Tanja Schirmeister1, Yuri Motorin4,5, Andreas Hildebrandt2, Bertil Schmidt3, Mark Helm1.
Abstract
Modification mapping from cDNA data has become a tremendously important approach in epitranscriptomics. So-called reverse transcription signatures in cDNA contain information on the position and nature of their causative RNA modifications. Data mining of, e.g. Illumina-based high-throughput sequencing data, is therefore fast growing in importance, and the field is still lacking effective tools. Here we present a versatile user-friendly graphical workflow system for modification calling based on machine learning. The workflow commences with a principal module for trimming, mapping, and postprocessing. The latter includes a quantification of mismatch and arrest rates with single-nucleotide resolution across the mapped transcriptome. Further downstream modules include tools for visualization, machine learning, and modification calling. From the machine-learning module, quality assessment parameters are provided to gauge the suitability of the initial dataset for effective machine learning and modification calling. This output is useful to improve the experimental parameters for library preparation and sequencing. In summary, the automation of the bioinformatics workflow allows a faster turnaround of the optimization cycles in modification calling.Entities:
Keywords: Galaxy platform; RNA modifications; RT signature; Watson–Crick face; m1A; machine learning
Year: 2019 PMID: 31608115 PMCID: PMC6774277 DOI: 10.3389/fgene.2019.00876
Source DB: PubMed Journal: Front Genet ISSN: 1664-8021 Impact factor: 4.599
Figure 1Main overview of the modification calling pipeline. A diagram showing the different steps for creating and analyzing RNA-Seq data. The pipeline has two parts: (A) general workflow for the processing of RNA samples and (B) the implemented automated graphical workflow system with the available modules for bioinformatics data analysis. (A) consists of (A1) possible and partly necessary pretreatments for different RNA species, (A2) library preparation with the possibility of adaptations (e.g. conditions for reverse transcription), (A3) sequencing with Illumina sequencing platforms (e.g. MiSeq/NextSeq and HiSeq), and (A4) data processing including basic data treatment like adapter trimming, alignment, and format conversion, as well as data analysis (e.g. machine learning and RT-signature analysis). The elaborate data processing (A4) was fully automated in (B) by using the open-source Galaxy platform to create and provide a quick and user-friendly feedback mechanism to optimize the experimental design, sample preparation, and data processing. The standard workflow (B1) is supplemented by various additional modules (B2) including workflows for (a) machine learning, (b) visualization, and (c) filtering.
Extracted Profile file after filtering with Demethylation_relative_change module with all m1A candidate positions.
| ref_seg | pos | refbase | cov | prebase | mismatch | A | G | T | C | N | a | g | t | c | n | single_jump_direct | single_jump_delayed | double_jump | arrest |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| tdbR00000370|Saccharomyces_cerevisiae|4932|Arg|TCT | 57 | A | 699 | C | 0.29471 | 493 | 8 | 2 | 94 | 0 | 0 | 5 | 5 | 92 | 0 | 0.00000 | 0.02710 | 0.00285 | 0.10941 |
| tdbR00000300|Saccharomyces_cerevisiae|4932|Asn|GTT | 59 | A | 961 | C | 0.37045 | 605 | 7 | 6 | 125 | 0 | 0 | 7 | 69 | 142 | 0 | 0.00000 | 0.00407 | 0.02238 | 0.15544 |
| tdbR00000021|Saccharomyces_cerevisiae|4932|Cys|GCA | 57 | A | 405 | T | 0.21728 | 317 | 13 | 39 | 0 | 0 | 0 | 7 | 28 | 1 | 0 | 0.00000 | 0.00000 | 0.00000 | 0.43399 |
| tdbM00000003|Saccharomyces_cerevisiae|4932|Gln|TTG | 57 | A | 475 | A | 0.15789 | 400 | 11 | 18 | 1 | 0 | 0 | 12 | 29 | 4 | 0 | 0.00000 | 0.00000 | 0.00000 | 0.26810 |
| tdbR00000170|Saccharomyces_cerevisiae|4932|Ile|AAT | 59 | A | 919 | T | 0.38085 | 569 | 55 | 88 | 6 | 0 | 0 | 67 | 127 | 7 | 0 | 0.00429 | 0.00000 | 0.01072 | 0.15350 |
| tdbM00000006|Saccharomyces_cerevisiae|4932|Ile|TAT | 58 | A | 373 | T | 0.25469 | 278 | 13 | 28 | 4 | 0 | 0 | 7 | 34 | 9 | 0 | 0.00000 | 0.00000 | 0.00000 | 0.31934 |
| tdbR00000192|Saccharomyces_cerevisiae|4932|Lys|CTT | 58 | A | 2715 | G | 0.16317 | 2272 | 102 | 103 | 9 | 0 | 0 | 108 | 112 | 9 | 0 | 0.00037 | 0.00000 | 0.00293 | 0.07658 |
| tdbR00000193|Saccharomyces_cerevisiae|4932|Lys|TTT | 58 | A | 619 | G | 0.43942 | 347 | 49 | 75 | 10 | 0 | 0 | 62 | 68 | 8 | 0 | 0.00478 | 0.00000 | 0.00955 | 0.16511 |
| tdbR00000323|Saccharomyces_cerevisiae|4932|Pro|TGG | 57 | A | 459 | T | 0.43573 | 259 | 3 | 69 | 0 | 0 | 0 | 12 | 112 | 4 | 0 | 0.00000 | 0.00000 | 0.00000 | 0.18905 |
| tdbR00000324|Saccharomyces_cerevisiae|4932|Pro|TGG | 57 | A | 439 | T | 0.43508 | 248 | 4 | 56 | 1 | 0 | 0 | 9 | 121 | 0 | 0 | 0.00000 | 0.00000 | 0.00000 | 0.20364 |
| tdbR00000443|Saccharomyces_cerevisiae|4932|Thr|AGT | 58 | A | 396 | A | 0.28283 | 284 | 23 | 23 | 3 | 0 | 0 | 28 | 30 | 5 | 0 | 0.00000 | 0.00222 | 0.12195 | 0.38608 |
| tdbR00000444|Saccharomyces_cerevisiae|4932|Thr|AGT | 58 | A | 616 | A | 0.31656 | 421 | 39 | 47 | 5 | 0 | 0 | 41 | 54 | 9 | 0 | 0.00145 | 0.00000 | 0.10320 | 0.30152 |
| tdbR00000464|Saccharomyces_cerevisiae|4932|Val|AAC | 59 | A | 1066 | T | 0.18386 | 870 | 33 | 55 | 22 | 0 | 0 | 18 | 61 | 7 | 0 | 0.00187 | 0.00000 | 0.00094 | 0.69026 |
Figure 2Galaxy Filtering module Demethylation_relative_change interface. As input, two Profile files, yeast total tRNA untreated and yeast total tRNA AlkB treated, are used with the following selected parameters for filtering: adenosine (A) as nucleobase of interest, 0.5 or 50 (%) and 0.3 or 30 (%) as thresholds for the minimum relative and absolute changes in the mismatch rate and 250 as threshold for the minimum coverage required.
Figure 3Graphical plots of untreated (A) and AlkB-treated (B) yeast tRNALys (CTT) using the additional module Visualize_V3 for visualization. Sites with error rates of more than 10% are highlighted with yellow arrows, with colored bars indicating the nature of the reads. Mismatch rates are depicted as black crosses, and arrest rates as red lines. The m1A site is located in the middle of the shown sequence segment at position 58.
Figure 4Graphical plots of yeast tRNAAsn (GTT), which was used for library preparation, visualized by using the additional module Visualize_V3. The reverse transcription step was performed by using SuperScript® III Reverse Transcriptase in different reaction buffers. The supplier’s standard reaction buffer (First Strand Synthesis buffer) with Mg2+ serves as reference (A), and the tested buffer mixtures differ by increased concentrations of Mn2+ [0.5 mM (B), 1.0 mM (C), 3.0 mM (D)] as Mg2+ substitute. Sites with error rates of more than 10% are highlighted with yellow arrows, with colored bars indicating the nature of the reads. Mismatch rates are depicted as black crosses, and arrest rates as red lines. The m1A site is located in the middle of the shown sequence segment at position 59.
Extracted Profile data for yeast tRNAAsn (GTT) after library preparation with 4 different buffer mixtures for the reverse transcription step. Shown are data for positions 58, 59 (m1A), and 60.
| ref_seg | pos | refbase | cov | prebase | mismatch | A | G | T | C | N | a | g | t | c | n | single_jump_direct | single_jump_delayed | double_jump | arrest |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| tdbR00000300|Saccharomyces_cerevisiae|4932|Asn|GTT Reference | 58 | A | 3238 | A | 0.02471 | 3158 | 4 | 4 | 33 | 2 | 0 | 5 | 7 | 25 | 0 | 0.01927 | 0.00056 | 0.00000 | 0.4574 |
| tdbR00000300|Saccharomyces_cerevisiae|4932|Asn|GTT 0.5 mM Mn | 58 | A | 1380 | A | 0.04855 | 1313 | 4 | 1 | 47 | 3 | 0 | 2 | 0 | 10 | 0 | 0.02404 | 0.00000 | 0.00060 | 0.32355 |
| tdbR00000300|Saccharomyces_cerevisiae|4932|Asn|GTT 1.0 mM Mn | 58 | A | 3546 | A | 0.04061 | 3402 | 15 | 9 | 79 | 0 | 0 | 13 | 6 | 22 | 0 | 0.02913 | 0.00000 | 0.00067 | 0.14965 |
| tdbR00000300|Saccharomyces_cerevisiae|4932|Asn|GTT 3.0 mM Mn | 58 | A | 2239 | A | 0.04332 | 2142 | 9 | 6 | 37 | 7 | 0 | 12 | 5 | 21 | 0 | 0.05623 | 0.00172 | 0.00138 | 0.0565 |
| tdbR00000300|Saccharomyces_cerevisiae|4932|Asn|GTT Reference | 59 | A (m1A) | 6311 | C | 0.90160 | 621 | 79 | 36 | 3431 | 6 | 0 | 119 | 25 | 1994 | 0 | 0.00000 | 0.01048 | 0.04161 | 0.84647 |
| tdbR00000300|Saccharomyces_cerevisiae|4932|Asn|GTT 0.5 mM Mn | 59 | A (m1A) | 2210 | C | 0.93167 | 151 | 37 | 59 | 1238 | 8 | 0 | 37 | 15 | 665 | 0 | 0.00041 | 0.01630 | 0.09902 | 0.86879 |
| tdbR00000300|Saccharomyces_cerevisiae|4932|Asn|GTT 1.0 mM Mn | 59 | A (m1A) | 4454 | C | 0.95757 | 189 | 65 | 95 | 2208 | 1 | 0 | 75 | 35 | 1786 | 0 | 0.00038 | 0.02481 | 0.14907 | 0.70422 |
| tdbR00000300|Saccharomyces_cerevisiae|4932|Asn|GTT 3.0 mM Mn | 59 | A (m1A) | 2568 | C | 0.96145 | 99 | 9 | 9 | 1149 | 14 | 0 | 7 | 5 | 1276 | 0 | 0.00000 | 0.05323 | 0.16101 | 0.06965 |
| tdbR00000300|Saccharomyces_cerevisiae|4932|Asn|GTT Reference | 60 | C | 42890 | C | 0.00445 | 87 | 30 | 22 | 42699 | 21 | 20 | 10 | 1 | 0 | 0 | 0.00000 | 0.00000 | 0.00000 | 0.36943 |
| tdbR00000300|Saccharomyces_cerevisiae|4932|Asn|GTT 0.5 mM Mn | 60 | C | 18703 | C | 0.00733 | 51 | 12 | 10 | 18566 | 50 | 11 | 3 | 0 | 0 | 0 | 0.00000 | 0.00005 | 0.00000 | 0.43528 |
| tdbR00000300|Saccharomyces_cerevisiae|4932|Asn|GTT 1.0 mM Mn | 60 | C | 17706 | C | 0.00345 | 17 | 7 | 10 | 17645 | 10 | 9 | 6 | 2 | 0 | 0 | 0.00006 | 0.00011 | 0.00011 | 0.35852 |
| tdbR00000300|Saccharomyces_cerevisiae|4932|Asn|GTT 3.0 mM Mn | 60 | C | 3287 | C | 0.01156 | 2 | 1 | 9 | 3249 | 14 | 5 | 5 | 2 | 0 | 0 | 0.00000 | 0.00000 | 0.00030 | 0.03294 |