Anne Hoffmann1, Jörg Fallmann1, Elisa Vilardo2, Mario Mörl3, Peter F Stadler1,4,5,6,7,8,9, Fabian Amman9,10. 1. Bioinformatics Group, Department of Computer Science, and Interdisciplinary Center for Bioinformatics, D-04107 Leipzig, Germany. 2. Center for Anatomy and Cell Biology, Medical University of Vienna, Austria. 3. Institute for Biochemistry, Leipzig University, D-04103 Leipzig, Germany. 4. German Centre for Integrative Biodiversity Research (iDiv) Halle-Jena-Leipzig, Competence Center for Scalable Data Services and Solutions, and Leipzig Research Center for Civilization Diseases, Leipzig University, D-04107 Leipzig, Germany. 5. Max Planck Institute for Mathematics in the Sciences, D-04103 Leipzig, Germany. 6. Fraunhofer Institute for Cell Therapy and Immunology, D-04103 Leipzig, Germany. 7. Center for RNA in Technology and Health, University of Copenhagen, Frederiksberg C, Denmark. 8. Santa Fe Institute, Santa Fe, NM 87501, USA. 9. Department of Theoretical Chemistry of the University of Vienna, A-1090 Vienna, Austria. 10. Department of Chromosome Biology of the University of Vienna, A-1030 Vienna, Austria.
Abstract
Motivation: Many repetitive DNA elements are transcribed at appreciable expression levels. Mapping the corresponding RNA sequencing reads back to a reference genome is notoriously difficult and error-prone task, however. This is in particular true if chemical modifications introduce systematic mismatches, while at the same time the genomic loci are only approximately identical, as in the case of tRNAs. Results: We therefore developed a dedicated mapping strategy to handle RNA-seq reads that map to tRNAs relying on a modified target genome in which known tRNA loci are masked and instead intronless tRNA precursor sequences are appended as artificial 'chromosomes'. In a first pass, reads that overlap the boundaries of mature tRNAs are extracted. In the second pass, the remaining reads are mapped to a tRNA-masked target that is augmented by representative mature tRNA sequences. Using both simulated and real life data we show that our best-practice workflow removes most of the mapping artefacts introduced by simpler mapping schemes and makes it possible to reliably identify many of chemical tRNA modifications in generic small RNA-seq data. Using simulated data the FDR is only 2%. We find compelling evidence for tissue specific differences of tRNA modification patterns. Availability and implementation: The workflow is available both as a bash script and as a Galaxy workflow from https://github.com/AnneHoffmann/tRNA-read-mapping. Contact: fabian@tbi.univie.ac.at. Supplementary information: Supplementary data are available at Bioinformatics online.
Motivation: Many repetitive DNA elements are transcribed at appreciable expression levels. Mapping the corresponding RNA sequencing reads back to a reference genome is notoriously difficult and error-prone task, however. This is in particular true if chemical modifications introduce systematic mismatches, while at the same time the genomic loci are only approximately identical, as in the case of tRNAs. Results: We therefore developed a dedicated mapping strategy to handle RNA-seq reads that map to tRNAs relying on a modified target genome in which known tRNA loci are masked and instead intronless tRNA precursor sequences are appended as artificial 'chromosomes'. In a first pass, reads that overlap the boundaries of mature tRNAs are extracted. In the second pass, the remaining reads are mapped to a tRNA-masked target that is augmented by representative mature tRNA sequences. Using both simulated and real life data we show that our best-practice workflow removes most of the mapping artefacts introduced by simpler mapping schemes and makes it possible to reliably identify many of chemical tRNA modifications in generic small RNA-seq data. Using simulated data the FDR is only 2%. We find compelling evidence for tissue specific differences of tRNA modification patterns. Availability and implementation: The workflow is available both as a bash script and as a Galaxy workflow from https://github.com/AnneHoffmann/tRNA-read-mapping. Contact: fabian@tbi.univie.ac.at. Supplementary information: Supplementary data are available at Bioinformatics online.
Authors: Adrian Gabriel Torres; Oscar Reina; Camille Stephan-Otto Attolini; Lluís Ribas de Pouplana Journal: Proc Natl Acad Sci U S A Date: 2019-04-08 Impact factor: 11.205