Literature DB >> 29730207

SPORTS1.0: A Tool for Annotating and Profiling Non-coding RNAs Optimized for rRNA- and tRNA-derived Small RNAs.

Junchao Shi¹, Eun-A Ko², Kenton M Sanders², Qi Chen³, Tong Zhou⁴.

Abstract

High-throughput RNA-seq has revolutionized the process of small RNA (sRNA) discovery, leading to a rapid expansion of sRNA categories. In addition to the previously well-characterized sRNAs such as microRNAs (miRNAs), piwi-interacting RNAs (piRNAs), and small nucleolar RNA (snoRNAs), recent emerging studies have spotlighted on tRNA-derived sRNAs (tsRNAs) and rRNA-derived sRNAs (rsRNAs) as new categories of sRNAs that bear versatile functions. Since existing software and pipelines for sRNA annotation are mostly focused on analyzing miRNAs or piRNAs, here we developed the sRNA annotation pipelineoptimized for rRNA- and tRNA-derived sRNAs (SPORTS1.0). SPORTS1.0 is optimized for analyzing tsRNAs and rsRNAs from sRNA-seq data, in addition to its capacity to annotate canonical sRNAs such as miRNAs and piRNAs. Moreover, SPORTS1.0 can predict potential RNA modification sites based on nucleotide mismatches within sRNAs. SPORTS1.0 is precompiled to annotate sRNAs for a wide range of 68 species across bacteria, yeast, plant, and animal kingdoms, while additional species for analyses could be readily expanded upon end users' input. For demonstration, by analyzing sRNA datasets using SPORTS1.0, we reveal that distinct signatures are present in tsRNAs and rsRNAs from different mouse cell types. We also find that compared to other sRNA species, tsRNAs bear the highest mismatch rate, which is consistent with their highly modified nature. SPORTS1.0 is an open-source software and can be publically accessed at https://github.com/junchaoshi/sports1.0.

Entities: Chemical Disease Gene Species

Keywords: Annotation pipeline; RNA-seq data analysis; Small RNA; rsRNA; tsRNA

Mesh：

Substances：

Year: 2018 PMID： 29730207 PMCID： PMC6112344 DOI： 10.1016/j.gpb.2018.04.004

Source DB: PubMed Journal: Genomics Proteomics Bioinformatics ISSN： 1672-0229 Impact factor: 7.691

Introduction

Expanding classes of small RNAs (sRNAs) have emerged as key regulators of gene expression, genome stability, and epigenetic regulation [1], [2]. In addition to the previously well-characterized sRNA classes such as microRNAs (miRNAs), Piwi-interacting RNA (piRNAs), and small nucleolar RNA (snoRNAs), recent analysis of sRNA-seq data has led to the identification of expanding novel sRNA families. These include tRNA-derived sRNAs (tsRNAs; also known as tRNA-derived fragments, tRFs) and rRNA-derived sRNAs (rsRNAs) [3]. tsRNAs and rsRNAs have been discovered in a wide range of species with evolutionary conservation, supposedly due, in part, to the highly conservative sequence of their respective precursors, i.e., tRNAs and rRNAs [3]. Interestingly, tsRNAs and rsRNAs have been abundantly found in unicellular organisms (e.g., protozoa), where canonical sRNA pathways such as miRNA, siRNA, and piRNAs are entirely lacking [4], [5], [6]. The dynamic regulation of tsRNAs and rsRNAs in these unicellular organisms suggests that they are among the most ancient classes of sRNAs for intra- and inter-cellular communications [7]. Moreover, recent emerging evidence from mammalian species have highlighted the diverse biological functions mediated by tsRNAs, including regulating ribosome biogenesis, translation initiation, retrotransposon control, cancer metastasis, stem cell differentiation, neurological diseases, and epigenetic inheritance [3], [8], [9], [10], [11], [12], [13], [14], [15]. Although tsRNAs are known to be involved in regulating these processes at both post-transcriptional and translational levels [11], [14], [16], the exact molecular mechanisms of how tsRNAs exert their functions have not been fully understood. Compared to tsRNAs, rsRNAs are more recently discovered and also exhibit tissue-specific distribution. Dynamic expression of rsRNAs is associated with diseases such as metabolic disorders and inflammation [17], [18], [19]. The diverse biological functions of tsRNAs and rsRNAs and their strong disease associations are now pushing the new frontier of sRNA research. Currently, there are multiple existing general sRNA annotation software and pipelines [20], [21], [22], [23], [24], and some have been developed aiming to analyze tsRNAs [25], [26], [27]. However, there is still a lack of specialized tools that can simultaneously analyze both tsRNAs and rsRNAs in addition to other canonical sRNAs. Here, we provide the sRNA annotation pipeline optimized for rRNA- and tRNA-derived sRNAs (SPORTS1.0), which can annotate and profile canonical sRNAs such as miRNAs and piRNAs, and is also optimized to analyze tsRNAs and rsRNAs from sRNA-seq data. In addition, SPORTS1.0 can help predict potential RNA modification sites based on nucleotide mismatches within sRNAs.

Method

The source code of SPORTS1.0 is written in Perl and R. The whole package and installation instructions are available on Github (https://github.com/junchaoshi/sports1.0). SPORTS1.0 can apply to a wide-range of species and the annotation references of 68 species are precompiled for downloading (Table S1). The workflow of SPORTS1.0 consists of four main steps, i.e., pre-processing, mapping, annotation output, and annotation summary (Figure 1). SRA, FASTQ, and FASTA are the acceptable formats for data input. By calling Cutadapt [28] and Perl scripts extracted from miRDeep2 [29], SPORTS1.0 outputs clean reads by removing sequence adapters, and discarding sequences with length beyond the defined range and those with bases other than ATUCG. The clean reads obtained in pre-processing step are sequentially mapped against reference genome, miRBase [30], rRNA database (collected from NCBI), GtRNAdb [31], piRNA database [32], [33], Ensembl [34] and Rfam [35], upon users’ setting. sRNA sequences are first annotated by Bowtie [36]. Next, a Perl script precompiled in SPORTS1.0 is used to identify the locations of tsRNAs regarding whether they are derived from 5′ terminus, 3′ terminus, or 3′CCA end of tRNAs. Then an R script precompiled in SPORTS1.0 is applied to obtain rsRNA expression level and positional mapping information regarding their respective rRNA precursors (5.8S, 18S, 28S, etc.).

Figure 1

Workflow of SPORTS1.0

SPORTS1.0 contains four main steps, i.e., pre-processing, mapping, annotation output, and annotation summary, as outlined in the figure.

Workflow of SPORTS1.0 SPORTS1.0 contains four main steps, i.e., pre-processing, mapping, annotation output, and annotation summary, as outlined in the figure. SPORTS1.0 can also be used to analyze sequence mismatch information if mismatches are allowed during alignment process. Such information can help predict potential modification sites that have caused nucleotide misincorporation during the reverse transcription (RT) process as previously reported [37]. In the current version, a mismatch site is designated using criteria as previously described [37]. Binomial distribution is used to address whether the observed mismatch enrichment is significantly higher than the base-calling error. Here, we define p as the base-calling error rate, n as the number of nucleotides perfectly fitted to the reference sites, n as the number of mismatched nucleotides, and n as the sum of n and n. The probability of observing not larger than k perfectly matched nucleotides out of n can be calculated as: SPORTS1.0 provides two methods to evaluate n number. The first option is to simply calculate n as the read number of sequences containing particular mismatches. Since some sequences may align to multiple reference loci, using this method may result in an increased false-positive rate. A second method is thus included, in which read number of sequences from multiple matching loci are uniformly distributed (based on the assumption that each of these multiple sites equally expresses RNAs) and consequently generates an adjusted n. SPORTS1.0 summary output includes annotation details for each sequence and length distribution along with other statistics. (See sample output Figures 2 and 3, Tables S2 and S3). User guideline is provided online (https://github.com/junchaoshi/sports1.0).

Figure 2

Exemplary annotation and profiling of sRNA-seq datasets generated by SPORTS1.0

Categorization and length distribution analysis of different sRNA types in mouse sperm (A), bone marrow cells (B), and intestinal epithelial cells (C). RPM, reads per million clean reads; Unanno, unannotated; MG, match genome; UMG, unmatch genome.

Figure 3

Cell-specific rsRNA profiles revealed by SPORTS1.0

Subtypes of rRNA precursors (5.8S, 18S, 28S, etc.) for rsRNAs from mouse sperm (A), bone marrow cells (B), and intestinal epithelial cells (C). Comparison of rsRNA-generating loci from different rRNA precursors reveals distinct pattern between sperm (D), bone marrow cells (E), and intestinal epithelial cells (F). RPM, reads per million clean reads.

Exemplary annotation and profiling of sRNA-seq datasets generated by SPORTS1.0 Categorization and length distribution analysis of different sRNA types in mouse sperm (A), bone marrow cells (B), and intestinal epithelial cells (C). RPM, reads per million clean reads; Unanno, unannotated; MG, match genome; UMG, unmatch genome. Cell-specific rsRNA profiles revealed by SPORTS1.0 Subtypes of rRNA precursors (5.8S, 18S, 28S, etc.) for rsRNAs from mouse sperm (A), bone marrow cells (B), and intestinal epithelial cells (C). Comparison of rsRNA-generating loci from different rRNA precursors reveals distinct pattern between sperm (D), bone marrow cells (E), and intestinal epithelial cells (F). RPM, reads per million clean reads.

Results

As an example, we used SPORTS1.0 to analyze sRNA-seq datasets from mouse sperm (GSM2304822 [38]), bone marrow cells (GSM1604100 [39]), and intestinal epithelial cells (GSM1975854 [40]) samples. Graphic output by SPORTS1.0 reveals distinct sRNA profiles in sperm (Figure 2A), bone marrow cells (Figure 2B), and intestinal epithelial cells (Figure 2C) samples. tsRNAs and rsRNAs are found equally or more abundantly than the well-known miRNAs or piRNAs (length distribution data for each sRNA type are exemplified in Table S2). In particular, tsRNAs are dominant in sperm, and rsRNAs are highest in bone marrow cells, whereas intestinal epithelial cells contain an appreciable amount of both tsRNAs and rsRNAs in addition to a miRNA peak. Importantly, SPORTS1.0 found an appreciable portion of rsRNAs annotated in sperm (48.7%), bone marrow cell (11.1%) and intestinal epithelial cell (61.1%) samples that are previously deemed as “unmatch genome” (UMG) (Figure 2A−C upper pie chart). This is because these newly annotated rsRNAs are derived from rRNA genes (rDNA), which were not assembled and shown in current mouse genome (mm10) [41], and thus were discarded before analysis by previous sRNA analyzing pipelines. SPORTS1.0 can now annotate and analyze these rsRNAs, including providing the subtypes of rRNA precursors (5.8S, 18S, 28S, etc.) from which they are derived from (Figure 3A−C), as well as the loci mapping information (Figure 3D−F). Interestingly, our analyses revealed that the specific loci that generate rsRNAs are completely distinct among sperm, bone marrow cell, and intestinal epithelial cell samples (Figure 3D−F), suggesting distinct biogenesis and functions of these rsRNAs. Similarly, SPORTS1.0 also revealed tissue-specific landscape of tsRNAs in terms of their relative abundance (Figure 2A−C lower pie chart) and the tRNA loci where they are derived from (5’ terminus, 3’ terminus, 3’CCA end, etc.) (Figure 4 and Figures S1–S3). Since tsRNAs from different loci bear distinct biological functions [3], the tissue-specific tsRNA composition may represent features that define the unique functions of respective tissue/cell types.

Figure 4

Cell-specific tsRNA profiles revealed by SPORTS1.0

Examples of 3 cell-specific tsRNA profiles revealed in mouse sperm (A), bone marrow cells (B), and intestinal epithelial cells (C). Full tsRNA mapping results against tRNA loci are included in Figures S1–S3 for sperm (Figure S1), bone marrow cells (Figure S2), and intestinal epithelial cells (Figure S3), respectively. RPM, reads per million clean reads.

Cell-specific tsRNA profiles revealed by SPORTS1.0 Examples of 3 cell-specific tsRNA profiles revealed in mouse sperm (A), bone marrow cells (B), and intestinal epithelial cells (C). Full tsRNA mapping results against tRNA loci are included in Figures S1–S3 for sperm (Figure S1), bone marrow cells (Figure S2), and intestinal epithelial cells (Figure S3), respectively. RPM, reads per million clean reads. In addition, SPORTS1.0 also revealed distinct mismatch rates among different types of sRNAs (Figure 5 and Table S3), with tsRNAs showing the highest. The detected mismatch sites represent the modified nucleotides that might have caused misincorporation of nucleotides during the RT process. The relatively higher mismatch rate detected in tsRNA sequences is consistent with their highly modified nature. The mismatch sites detected by SPORTS1.0 could provide a potential source for further analyses of RNA modifications within sRNAs.

Figure 5

sRNA mismatch statistics by SPORTS1.0

The percentage of unique sequences that contain significantly-enriched mismatches out of total number of unique sequences from each subtype of sRNAs (miRNAs, piRNAs, tsRNAs, and rsRNAs) is provided for sperm (A), bone marrow cells (B), and intestinal epithelial cells (C). EMS, enrichment mismatch sequences; TUS, total unique sequences.

sRNA mismatch statistics by SPORTS1.0 The percentage of unique sequences that contain significantly-enriched mismatches out of total number of unique sequences from each subtype of sRNAs (miRNAs, piRNAs, tsRNAs, and rsRNAs) is provided for sperm (A), bone marrow cells (B), and intestinal epithelial cells (C). EMS, enrichment mismatch sequences; TUS, total unique sequences. Finally, SPORTS1.0 can analyze sRNAs of a wide range of species, depending on the availability of their reference genome and sRNA sequences (Figure 6 and Table S1). The species to be analyzed and their associated sRNA references are subject to update in future versions, or can be customized by the end users.

Figure 6

Species recompiled for analysis by SPORTS1.0

The 68 species and their respective reference database included in SPORTS1.0 precompiled for analysis.

Species recompiled for analysis by SPORTS1.0 The 68 species and their respective reference database included in SPORTS1.0 precompiled for analysis.

Conclusion

SPORTS1.0 is an easy-to-use and flexible pipeline for analyzing sRNA-seq data across a wide-range of species. Using mice as example, SPORTS1.0 provides a far more complicated sRNA landscape than having been previously seen, highlighting a tissue-specific dynamic regulation of tsRNAs and rsRNAs. SPORTS1.0 can also predict potential RNA modification sites based on nucleotide mismatches within sRNAs, and show a distinct pattern between different sRNA types. SPORTS1.0 may set the platform for potential new discoveries in biomedical and evolutionary studies that are related to sRNAs. The real voyage of discovery consists not in seeking new landscapes, but in looking with new eyes.

Authors’ contributions

JS, TZ, and QC conceived the idea and wrote the manuscript. JS and TZ developed the SPORTS1.0 software and analyzed the RNA-seq data. JS, EK, KMS, QC, and TZ contributed to the interpretation of the results. All authors read and approved the final manuscript.

Competing interests

The authors have declared no competing interests.

40 in total

1. A sequence of 28S rRNA-derived small RNAs is enriched in mature sperm and various somatic tissues and possibly associates with inflammation.

Authors: Chen Chu; Lu Yu; Bin Wu; Li Ma; Lan-Tao Gou; Miao He; Yunli Guo; Zhi-Tong Li; Wei Gao; Huijuan Shi; Mo-Fang Liu; Hongyan Wang; Charlie Degui Chen; Joël R Drevet; Yuchuan Zhou; Yonglian Zhang
Journal: J Mol Cell Biol Date: 2017-06-01 Impact factor: 6.216

2. Functional Transcriptomics in Diverse Intestinal Epithelial Cell Types Reveals Robust MicroRNA Sensitivity in Intestinal Stem Cells to Microbial Status.

Authors: Bailey C E Peck; Amanda T Mah; Wendy A Pitman; Shengli Ding; P Kay Lund; Praveen Sethupathy
Journal: J Biol Chem Date: 2017-01-04 Impact factor: 5.157

3. tRF2Cancer: A web server to detect tRNA-derived small RNA fragments (tRFs) and their expression in multiple cancers.

Authors: Ling-Ling Zheng; Wei-Lin Xu; Shun Liu; Wen-Ju Sun; Jun-Hao Li; Jie Wu; Jian-Hua Yang; Liang-Hu Qu
Journal: Nucleic Acids Res Date: 2016-05-13 Impact factor: 16.971

Review 4. tRNA fragments in human health and disease.

Authors: Paul Anderson; Pavel Ivanov
Journal: FEBS Lett Date: 2014-09-16 Impact factor: 4.124

5. sRNAtoolbox: an integrated collection of small RNA research tools.

Authors: Antonio Rueda; Guillermo Barturen; Ricardo Lebrón; Cristina Gómez-Martín; Ángel Alganza; José L Oliver; Michael Hackenberg
Journal: Nucleic Acids Res Date: 2015-05-27 Impact factor: 16.971

6. Rfam 12.0: updates to the RNA families database.

Authors: Eric P Nawrocki; Sarah W Burge; Alex Bateman; Jennifer Daub; Ruth Y Eberhardt; Sean R Eddy; Evan W Floden; Paul P Gardner; Thomas A Jones; John Tate; Robert D Finn
Journal: Nucleic Acids Res Date: 2014-11-11 Impact factor: 19.160

7. The tRNA methyltransferase Dnmt2 is required for accurate polypeptide synthesis during haematopoiesis.

Authors: Francesca Tuorto; Friederike Herbst; Nader Alerasool; Sebastian Bender; Oliver Popp; Giuseppina Federico; Sonja Reitter; Reinhard Liebers; Georg Stoecklin; Hermann-Josef Gröne; Gunnar Dittmar; Hanno Glimm; Frank Lyko
Journal: EMBO J Date: 2015-08-13 Impact factor: 11.598

8. Ensembl 2016.

Authors: Andrew Yates; Wasiu Akanni; M Ridwan Amode; Daniel Barrell; Konstantinos Billis; Denise Carvalho-Silva; Carla Cummins; Peter Clapham; Stephen Fitzgerald; Laurent Gil; Carlos García Girón; Leo Gordon; Thibaut Hourlier; Sarah E Hunt; Sophie H Janacek; Nathan Johnson; Thomas Juettemann; Stephen Keenan; Ilias Lavidas; Fergal J Martin; Thomas Maurel; William McLaren; Daniel N Murphy; Rishi Nag; Michael Nuhn; Anne Parker; Mateus Patricio; Miguel Pignatelli; Matthew Rahtz; Harpreet Singh Riat; Daniel Sheppard; Kieron Taylor; Anja Thormann; Alessandro Vullo; Steven P Wilder; Amonida Zadissa; Ewan Birney; Jennifer Harrow; Matthieu Muffato; Emily Perry; Magali Ruffier; Giulietta Spudich; Stephen J Trevanion; Fiona Cunningham; Bronwen L Aken; Daniel R Zerbino; Paul Flicek
Journal: Nucleic Acids Res Date: 2015-12-19 Impact factor: 16.971

9. A transfer-RNA-derived small RNA regulates ribosome biogenesis.

Authors: Hak Kyun Kim; Gabriele Fuchs; Shengchun Wang; Wei Wei; Yue Zhang; Hyesuk Park; Biswajoy Roy-Chaudhuri; Pan Li; Jianpeng Xu; Kirk Chu; Feijie Zhang; Mei-Sze Chua; Samuel So; Qiangfeng Cliff Zhang; Peter Sarnow; Mark A Kay
Journal: Nature Date: 2017-11-29 Impact factor: 49.962

10. Highly sensitive sequencing reveals dynamic modifications and activities of small RNAs in mouse oocytes and early embryos.

Authors: Qiyuan Yang; Jimin Lin; Miao Liu; Ronghong Li; Bin Tian; Xue Zhang; Beiying Xu; Mofang Liu; Xuan Zhang; Yiping Li; Huijuan Shi; Ligang Wu
Journal: Sci Adv Date: 2016-06-10 Impact factor: 14.136

31 in total

1. RNAdetector: a free user-friendly stand-alone and cloud-based system for RNA-Seq data analysis.

Authors: Alessandro La Ferlita; Salvatore Alaimo; Sebastiano Di Bella; Emanuele Martorana; Georgios I Laliotis; Francesco Bertoni; Luciano Cascione; Philip N Tsichlis; Alfredo Ferro; Roberta Bosotti; Alfredo Pulvirenti
Journal: BMC Bioinformatics Date: 2021-06-03 Impact factor: 3.169

2. The expanding repertoire of hereditary information carriers.

Authors: Ying Zhang; Qi Chen
Journal: Development Date: 2019-03-15 Impact factor: 6.868

3. Early cleavage of preimplantation embryos is regulated by tRNA^Gln-TTG-derived small RNAs present in mature spermatozoa.

Authors: Xiaoxu Chen; Yi Zheng; Anmin Lei; Hanxue Zhang; Huimin Niu; Xueliang Li; Pengfei Zhang; Mingzhi Liao; Yinghua Lv; Zhendong Zhu; Chuanying Pan; Wuzi Dong; Hong Chen; Wansheng Liu; Geert Hamer; Shenming Zeng; Wenxian Zeng
Journal: J Biol Chem Date: 2020-06-02 Impact factor: 5.157

4. Testicular miRNAs and tsRNAs provide insight into gene regulation during overwintering and reproduction of Onychostoma macrolepis.

Authors: Guofan Peng; Chao Zhu; Qingfang Sun; Jincan Li; Yining Chen; Yingjie Guo; Hong Ji; Fangxia Yang; Wuzi Dong
Journal: Fish Physiol Biochem Date: 2022-05-20 Impact factor: 2.794

5. tsRFun: a comprehensive platform for decoding human tsRNA expression, functions and prognostic value by high-throughput small RNA-Seq and CLIP-Seq data.

Authors: Jun-Hao Wang; Wen-Xin Chen; Shi-Qiang Mei; Yue-Dong Yang; Jian-Hua Yang; Liang-Hu Qu; Ling-Ling Zheng
Journal: Nucleic Acids Res Date: 2022-01-07 Impact factor: 16.971