Literature DB >> 26093149

Chimira: analysis of small RNA sequencing data and microRNA modifications.

Abstract

UNLABELLED: Chimira is a web-based system for microRNA (miRNA) analysis from small RNA-Seq data. Sequences are automatically cleaned, trimmed, size selected and mapped directly to miRNA hairpin sequences. This generates count-based miRNA expression data for subsequent statistical analysis. Moreover, it is capable of identifying epi-transcriptomic modifications in the input sequences. Supported modification types include multiple types of 3'-modifications (e.g. uridylation, adenylation), 5'-modifications and also internal modifications or variation (ADAR editing or single nucleotide polymorphisms). Besides cleaning and mapping of input sequences to miRNAs, Chimira provides a simple and intuitive set of tools for the analysis and interpretation of the results (see also Supplementary Material). These allow the visual study of the differential expression between two specific samples or sets of samples, the identification of the most highly expressed miRNAs within sample pairs (or sets of samples) and also the projection of the modification profile for specific miRNAs across all samples. Other tools have already been published in the past for various types of small RNA-Seq analysis, such as UEA workbench, seqBuster, MAGI, OASIS and CAP-miRSeq, CPSS for modifications identification. A comprehensive comparison of Chimira with each of these tools is provided in the Supplementary Material. Chimira outperforms all of these tools in total execution speed and aims to facilitate simple, fast and reliable analysis of small RNA-Seq data allowing also, for the first time, identification of global microRNA modification profiles in a simple intuitive interface.
AVAILABILITY AND IMPLEMENTATION: Chimira has been developed as a web application and it is accessible here: http://www.ebi.ac.uk/research/enright/software/chimira. CONTACT: aje@ebi.ac.uk SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.

Entities: Chemical Gene Species

Mesh：

Substances：

Year: 2015 PMID： 26093149 PMCID： PMC4595902 DOI： 10.1093/bioinformatics/btv380

Source DB: PubMed Journal: Bioinformatics ISSN： 1367-4803 Impact factor: 6.937

1 Introduction

Small RNA sequencing data are among the most straightforward types of NGS data to analyse. However many laboratories that generate such data still struggle to develop or adapt computational pipelines for the analysis and interpretation of these data. Additionally, in recent years, it has been reported that many miRNAs go through post-transcriptional alterations that modify their 3′ end, mainly via mono/poly-Uridylation (Heo et al., 2009, 2012) or poly-Adenylation (Shanfa ). Such modifications are believed to impart significant functional changes to the miRNA. Indeed, other modifications and/or editing have also been observed to occur in several other instances (Burroughs ; Sung-Chou ; Yu and Chen, 2010). Hence, it is imperative to explore the full profile of all modifications and/or edits that can be identified in small RNA-Seq data. The functional roles of small RNAs in different conditions may be greatly influenced by such modifications. This can be accomplished by aligning small RNA sequences against their hairpin precursors. The alignment region spanning each miRNA is analysed to detect bases in the miRNA sequence that could not possibly have derived from the precursor it aligns to. These unalignable nucleotides are likely either: (i) base-calling errors, (ii) single nucleotide polymorphisms (SNPs), (iii) ADAR edits or (iv) post-transcriptional miRNA modifications (e.g. via TUTases). Base-calling errors are pseudo-random depending on the platform used and usually more likely to occur towards the 3′ end of sequences. In order to study this diverse pool of possible miRNA post-transcriptional modifications, we have developed Chimira. This is a cohesive platform for the processing and analysis of small RNA NGS data allowing simultaneous detection of 3′, 5′ and internal miRNA modifications.

2 Input

Chimira accepts FASTQ/FASTA files as input, containing adapter and/or barcode stripped small RNA-Seq data. The user is provided with a simple system for uploading each sample and replicate and selecting among the available run options. Additionally, Chimira provides a limited 3′ adapter cleaning functionality using reaper (Davis ) supporting different adapters for each input sample. Finally, the system provides a simple interface for computationally determining likely 3′ sequencing adapters in case the user does not have this information.

3 Usage notes

In its current release (v1.0), Chimira supports mapping of small RNA-Seq data against 209 genomes registered in miRBase (Griffiths-Jones ). In order to optimize and speed-up the analysis, tally (Davis ) is being used for de-duplicating the sequence fragments. Tally dramatically reduces the size of input sequence files by collapsing identical sequences into a single one while storing the total read depth. When input sequence files are uploaded and the appropriate genome is selected, the user is able to launch a new queued job. Every job is submitted to a high-performance computing cluster and the user can follow its progress via an analysis console. Notification via e-mail upon completion of each job is also available, provided the user has supplied a valid e-mail address before launching the job.

4 Methods

Two types of miRNAs identification are provided: Plain Counts and Modifications. Plain Counts refers to the quantification of the miRNA molecules that are expressed in any form (template or modified) in each of the input samples. The input sequences are mapped against miRBase using BLASTn (Boratyn ) allowing up to two mismatches for each sequence. BLASTn output is filtered so that antisense matches are discarded. The extracted counts are normalized across all samples using DESeq2 (Love ) and basic QC plots are provided together with plots for the total counts per sample and the top-10 miRNAs expression levels across all samples. In cases where a small RNA sequence identically matches more than one precursor sequences (i.e. miRNA paralogues) the user can choose between calling only the first optimal alignment hit or assigning counts fractionally with equal weights between the paralogues. Apart from the quantification of miRNA counts and the prevalence of modifications, Chimira integrates basic functionality for comparative analysis of input samples. Specifically, differential expression (of plain counts) between two samples or sets of samples can be visualized through an interactive scatterplot that allows the user to view the miRNA identifier and the different expression levels at each point of the plot (Supplementary Fig. S1). Moreover, the user can display the 10, 20 or 50 most highly expressed miRNAs in two samples (or sets of samples) side by side. Secondly, ‘modifications’ refers to the quantification of any sequence segments that are part of the input sequences and cannot be explained by genomic sequence. In the example shown (Fig. 1), uridylation and adenylation are the most prevalent modification types in the 1 nt after the 3′ end of the miRNAs, while C modifications are highly enriched exactly at the 3′ end. ADAR editing is the predominant modification type in the internal modifications followed by a moderately expressed C-SNP, 11 nt upstream of the 3′ end (index position: 11).

Fig. 1.

Modification profile from 12 Heart, Liver and Brain tissue samples in H. sapiens, as detected by Chimira: (a) Global profile (b) 3′-Modifications (c) 5′-Modifications (d) Internal modifications (ADAR edits and SNPs) (e) Internal modifications (SNPs). The x-axis corresponds to the index positions across a miRNA molecule. They y-axis corresponds to the raw counts of the identified modification patterns. The start of a miRNA on the x-axis is at index ‘0’ (5′ end) while its end is at index ‘22’ (3′ end) The types of modifications being identified include: 3′-modifications: any non-templated sequences in a window that starts at the 5th nt upstream of the 3′ end of each miRNA and ends at the 6th nt downstream of the 3′ end. 5′-modifications: any non-templated sequences in a window that starts at the 8th nt upstream of the 5′ end of each miRNA and ends at the 5th nt downstream of the the 5′ end. Internal modifications: SNPs and ADAR edits. In order for a modification to be classified as a SNP, an arbitrary 70% value is used as a threshold for the ratio of the modified counts to the overall counts. It is worth noting that the window lengths being used for identification of 3′ and 5′ modifications include nucleotide positions also within the original miRNA sequence to better distinguish all possible modifications from multiple miRNA variants originating from the same precursor but with different length mature products. Modification types are inferred from BLAST alignments of input sequences to their hairpin precursors and then examining the content of alignment mismatches returned. In order to decipher the correct modification position a reference database has been built initially for all supported genomes, containing canonical alignments between mature miRNAs and their hairpin precursors. Based on these data each of the identified modification patterns is assigned a position index (Supplementary Table S1) in order to build the full depth-wise modification profile. Chimira also allows the display of the modification profiles across all samples for a specific miRNA, defined by the user. Finally, all counts (plain and modifications) can be downloaded for further analysis as separate files as soon as the processing of a set of samples is complete.

5 Conclusion

Chimira has been specifically developed for fast analysis of simple small RNA datasets with a user-friendly interface. It complements our previous pipeline sequenceImp, which was designed for command-line level analysis of larger-scale experiments with complex designs. Chimira is a fast and robust system for the cleaning, filtering, QC and mapping of small RNA-Seq data aiming to simplify the process of small RNA NGS analysis to a straightforward online workflow. Additionally, it allows the extraction of global modification profiles across and within each of the input samples thus allowing the inference of correlations between certain modification patterns and any conditions indicated by the input samples. We hope the system will prove useful to the small RNA community. Conflict of Interest: none declared.

10 in total

1. TUT4 in concert with Lin28 suppresses microRNA biogenesis through pre-microRNA uridylation.

Authors: Inha Heo; Chirlmin Joo; Young-Kook Kim; Minju Ha; Mi-Jeong Yoon; Jun Cho; Kyu-Hyeon Yeom; Jinju Han; V Narry Kim
Journal: Cell Date: 2009-08-21 Impact factor: 41.582

2. A comprehensive survey of 3' animal miRNA modification events and a possible role for 3' adenylation in modulating miRNA targeting effectiveness.

Authors: A Maxwell Burroughs; Yoshinari Ando; Michiel J L de Hoon; Yasuhiro Tomaru; Takahiro Nishibu; Ryo Ukekawa; Taku Funakoshi; Tsutomu Kurokawa; Harukazu Suzuki; Yoshihide Hayashizaki; Carsten O Daub
Journal: Genome Res Date: 2010-08-18 Impact factor: 9.043

3. Mono-uridylation of pre-microRNA as a key step in the biogenesis of group II let-7 microRNAs.

Authors: Inha Heo; Minju Ha; Jaechul Lim; Mi-Jeong Yoon; Jong-Eun Park; S Chul Kwon; Hyeshik Chang; V Narry Kim
Journal: Cell Date: 2012-10-11 Impact factor: 41.582

4. Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2.

Authors: Michael I Love; Wolfgang Huber; Simon Anders
Journal: Genome Biol Date: 2014 Impact factor: 13.583

5. Adenylation of plant miRNAs.

Authors: Shanfa Lu; Ying-Hsuan Sun; Vincent L Chiang
Journal: Nucleic Acids Res Date: 2009-02-02 Impact factor: 16.971

6. Analysis of miRNA Modifications.

Authors: Bin Yu; Xuemei Chen
Journal: Methods Mol Biol Date: 2010

7. BLAST: a more efficient report with usability improvements.

Authors: Grzegorz M Boratyn; Christiam Camacho; Peter S Cooper; George Coulouris; Amelia Fong; Ning Ma; Thomas L Madden; Wayne T Matten; Scott D McGinnis; Yuri Merezhuk; Yan Raytselis; Eric W Sayers; Tao Tao; Jian Ye; Irena Zaretskaya
Journal: Nucleic Acids Res Date: 2013-04-22 Impact factor: 16.971

8. MicroRNA 3' end nucleotide modification patterns and arm selection preference in liver tissues.

Authors: Sung-Chou Li; Kuo-Wang Tsai; Hung-Wei Pan; Yung-Ming Jeng; Meng-Ru Ho; Wen-Hsiung Li
Journal: BMC Syst Biol Date: 2012-12-12

9. miRBase: tools for microRNA genomics.

Authors: Sam Griffiths-Jones; Harpreet Kaur Saini; Stijn van Dongen; Anton J Enright
Journal: Nucleic Acids Res Date: 2007-11-08 Impact factor: 16.971

10. Kraken: a set of tools for quality control and analysis of high-throughput sequence data.

Authors: Matthew P A Davis; Stijn van Dongen; Cei Abreu-Goodger; Nenad Bartonicek; Anton J Enright
Journal: Methods Date: 2013-06-29 Impact factor: 3.608

10 in total

53 in total

Review 1. Identifying and characterizing functional 3' nucleotide addition in the miRNA pathway.

Authors: A Maxwell Burroughs; Yoshinari Ando
Journal: Methods Date: 2018-08-20 Impact factor: 3.608

Review 2. Toward the promise of microRNAs - Enhancing reproducibility and rigor in microRNA research.

Authors: Kenneth W Witwer; Marc K Halushka
Journal: RNA Biol Date: 2016-09-19 Impact factor: 4.652

3. The Transcription Factor, α1ACT, Acts Through a MicroRNA Network to Regulate Neurogenesis and Cell Death During Neonatal Cerebellar Development.

Authors: Cenfu Wei; Kellie Benzow; Michael D Koob; Christopher M Gomez; Xiaofei Du
Journal: Cerebellum Date: 2022-06-22 Impact factor: 3.847

4. Accurate detection for a wide range of mutation and editing sites of microRNAs from small RNA high-throughput sequencing profiles.

Authors: Yun Zheng; Bo Ji; Renhua Song; Shengpeng Wang; Ting Li; Xiaotuo Zhang; Kun Chen; Tianqing Li; Jinyan Li
Journal: Nucleic Acids Res Date: 2016-05-26 Impact factor: 16.971

5. MicroRNAs Regulating Autophagy in Neurodegeneration.

Authors: Qingxuan Lai; Nikolai Kovzel; Ruslan Konovalov; Ilya A Vinnikov
Journal: Adv Exp Med Biol Date: 2021 Impact factor: 2.622

6. Genetic Inactivation of ZCCHC6 Suppresses Interleukin-6 Expression and Reduces the Severity of Experimental Osteoarthritis in Mice.

Authors: Mohammad Y Ansari; Nazir M Khan; Nashrah Ahmad; Jonathan Green; Kimberly Novak; Tariq M Haqqi
Journal: Arthritis Rheumatol Date: 2019-03-06 Impact factor: 10.995