| Literature DB >> 29069500 |
Xiaogang Wu1, Taek-Kyun Kim1, David Baxter1, Kelsey Scherler1, Aaron Gordon1, Olivia Fong2, Alton Etheridge2, David J Galas2, Kai Wang1.
Abstract
Although many tools have been developed to analyze small RNA sequencing (sRNA-Seq) data, it remains challenging to accurately analyze the small RNA population, mainly due to multiple sequence ID assignment caused by short read length. Additional issues in small RNA analysis include low consistency of microRNA (miRNA) measurement results across different platforms, miRNA mapping associated with miRNA sequence variation (isomiR) and RNA editing, and the origin of those unmapped reads after screening against all endogenous reference sequence databases. To address these issues, we built a comprehensive and customizable sRNA-Seq data analysis pipeline-sRNAnalyzer, which enables: (i) comprehensive miRNA profiling strategies to better handle isomiRs and summarization based on each nucleotide position to detect potential SNPs in miRNAs, (ii) different sequence mapping result assignment approaches to simulate results from microarray/qRT-PCR platforms and a local probabilistic model to assign mapping results to the most-likely IDs, (iii) comprehensive ribosomal RNA filtering for accurate mapping of exogenous RNAs and summarization based on taxonomy annotation. We evaluated our pipeline on both artificial samples (including synthetic miRNA and Escherichia coli cultures) and biological samples (human tissue and plasma). sRNAnalyzer is implemented in Perl and available at: http://srnanalyzer.systemsbiology.net/.Entities:
Mesh:
Substances:
Year: 2017 PMID: 29069500 PMCID: PMC5716150 DOI: 10.1093/nar/gkx999
Source DB: PubMed Journal: Nucleic Acids Res ISSN: 0305-1048 Impact factor: 16.971
Figure 1.Main framework of sRNAnalyzer. The pipeline can be divided into three functional modules which are separated by dotted lines. The data format for each process is indicated in square brackets.
Figure 2.Comparison of miRNA profiles from sRNA-Seq, miRNA array and qRT-PCR platforms for synthetic miRNA samples. (A) Distribution of miRNA profiles from miRNA array on three plates; (B) Correlation between miRNA profiles from miRNA array and qRT-PCR; (C) Correlation between miRNA profiles from miRNA array and sRNA-Seq using single assignment approach; (D) Correlation between miRNA profiles from miRNA array and sRNA-Seq using multiple assignment approach; (E) Approximation from sRNA-Seq to miRNA array; and (F) Approximation from sRNA-Seq to qRT-PCR.
Differential analysis of miRNA profiles for human tissue samples from a CRC study (GEO accession number: GSE46622 or SRA accession number: SRP022054)
| miRNA | T/N_FC | T/N_pVal | M/N_FC | M/N_pVal | M/T_FC | M/T_pVal |
|---|---|---|---|---|---|---|
|
| −8.22250 | 0.00221 | −12.19746 | 0.00019 | −0.39062 | 0.25391 |
|
| −4.76817 | 0.00251 | −8.74696 | 0.00009 | −0.59894 | 0.10591 |
| hsa-miR-194–1-3P | −4.30349 | 0.02117 | −6.83087 | 0.00324 | −0.29064 | 0.29476 |
|
| −4.15678 | 0.01461 | −1.40532 | 0.10508 | 0.72822 | 0.22547 |
| hsa-miR-143–5p | −4.05022 | 0.00980 | −3.44819 | 0.00970 | 0.02421 | 0.41680 |
|
| −3.42637 | 0.00068 | −3.98052 | 0.00014 | −0.02076 | 0.35883 |
| hsa-miR-129–2-3p | −3.41944 | 0.02247 | −3.91576 | 0.00998 | −0.01681 | 0.41124 |
| hsa-miR-129–1-3p | −3.39301 | 0.01807 | −3.05203 | 0.01421 | 0.00903 | 0.42976 |
| hsa-miR-1224–5p | −3.01311 | 0.00238 | −0.49318 | 0.23264 | 1.06825 | 0.14720 |
| hsa-miR-147b | −2.74872 | 0.02911 | −4.74095 | 0.00365 | −0.26982 | 0.29074 |
| hsa-miR-124–3p | −2.70090 | 0.00639 | −1.08949 | 0.14865 | 0.35959 | 0.26224 |
| hsa-miR-490–3p | −2.68564 | 0.02562 | −3.50480 | 0.01366 | −0.05444 | 0.24816 |
| hsa-miR-215–3p | −2.64760 | 0.07514 | −5.24797 | 0.01009 | −0.44050 | 0.29178 |
| hsa-miR-133a-5p | −2.31880 | 0.00400 | −3.63812 | 0.00081 | −0.14794 | 0.13683 |
|
| −2.26530 | 0.02788 | −4.19600 | 0.00459 | −0.29520 | 0.21543 |
|
| 2.55000 | 0.01849 | 4.31795 | 0.00278 | 0.23145 | 0.25813 |
| hsa-miR-182–5p | 3.42372 | 0.03417 | 7.79898 | 0.00119 | 0.88800 | 0.17776 |
|
| 3.99934 | 0.01719 | 7.40056 | 0.00479 | 0.51921 | 0.25077 |
| hsa-miR-135a-5p | 4.57508 | 0.00792 | 7.91757 | 0.00206 | 0.45545 | 0.20791 |
| hsa-miR-122–5p | 5.42554 | 0.08238 | 34.37772 | 0.00572 | 12.48896 | 0.06981 |
| hsa-miR-31–3p | 6.04394 | 0.00683 | 2.12517 | 0.01529 | −1.00130 | 0.15600 |
|
| 6.65842 | 0.00524 | 10.20095 | 0.00138 | 0.37638 | 0.24436 |
|
| 17.06142 | 0.00031 | 10.03755 | 0.00234 | −0.92609 | 0.17938 |
Note: N—Normal, T—Tumor, M—Metastasis, FC—Fold Change and pVal—P-Value. Nine miRNAs validated by qRT-PCR in the original study (34) are highlighted (bold).
Figure 3.Example of summarized read count distributions for every nucleotide for both match and mismatch events across the miRNA precursor sequence (has-mir-1–1).
Figure 4.Overall mapping ratios and taxonomy summarization for Escherichia coli bacterial culture samples. (A) Overall domain mapping ratios; (B) Taxonomy summary at Genus level using ribosomal RNA (rRNA) filtering and applying a local probabilistic model (LPM); (C) Taxonomy summary at Genus level using rRNA filtering but without LPM; and (D) Taxonomy summary at Genus level neither using rRNA filtering nor applying LPM.
Figure 5.Overall domain mapping ratios and taxonomy summarization for human plasma samples from an exogenous RNA spectra study. (A) Overall domain mapping ratios; and (B) Taxonomy summary at Genus level; and (C) Taxonomy summary at Species level for human plasma samples from an exogenous RNA spectra study (SRA Session: ERP002414).
Figure 6.Overall mapping ratios and taxonomy summarization for human tissue samples from a CRC study. (A) Overall domain mapping ratios; (B) Taxonomy summary on Pseudomonas; and (C) Distribution of taxonomy summary on Pseudomonas for human tissue samples from a CRC study (SRA Session: SRP022054).