| Literature DB >> 26747388 |
Shanrong Zhao1, Li Xi2, Jie Quan3, Hualin Xi4, Ying Zhang5, David von Schack6, Michael Vincent7, Baohong Zhang8.
Abstract
BACKGROUND: RNA sequencing (RNA-seq), a next-generation sequencing technique for transcriptome profiling, is being increasingly used, in part driven by the decreasing cost of sequencing. Nevertheless, the analysis of the massive amounts of data generated by large-scale RNA-seq remains a challenge. Multiple algorithms pertinent to basic analyses have been developed, and there is an increasing need to automate the use of these tools so as to obtain results in an efficient and user friendly manner. Increased automation and improved visualization of the results will help make the results and findings of the analyses readily available to experimental scientists.Entities:
Mesh:
Substances:
Year: 2016 PMID: 26747388 PMCID: PMC4706714 DOI: 10.1186/s12864-015-2356-9
Source DB: PubMed Journal: BMC Genomics ISSN: 1471-2164 Impact factor: 3.969
Fig. 1Overview of the QuickRNASeq pipeline. Step #1 is computationally intensive, and processes individual samples independently. Step #2 integrates RNA-seq data analysis results from the individual samples in Step #1 and generates a comprehensive project report. Step #3 offers interactive navigation and visualization of RNA-seq data analyses results
Description of main scripts in the QuickRNASeq package
| Script | Function |
|---|---|
| star-fc-qc.sh | Master script for Step #1 in Fig. |
| star-fc-qc.ws.sh | Same as star-fc-qc.sh, but implemented for a standalone workstation |
| star-fc-qc.summary.sh | Master script for Step #2 in Fig. |
| get-star-summary.pl | Merge STAR mapping summary |
| get-fc-summary.pl | Merge featureCounts counting summary |
| get-read-dist.pl | Merge read distribution from RSeQC |
| get-snp-corr.pl | Calculate all-against-all pairwise SNP correlations |
| get-expr-table.R | Merge counts table from individual samples |
| get-expr-qc.R | Perform correlation-based QC, and calculate normalization factor |
| plot-rnaseq-metrics.R | Plot the summaries for read mapping, counting, or read distribution |
| plot-corr-matrix.R | Plot a correlations matrix |
| plot-expr-count.R | Plot the number of genes with varying RPKM cut-offs |
| RSeQC-html.pl | Generate a HTML QC report for individual sample |
| make_HTMLs.sh | Generate a comprehensive, integrated, and interactive project report |
| gtf2annot.pl | Utility to extract gene annotation from a GTF file |
| gtf2bed.pl | Utility to convert a gene annotation from GTF to BED format |
| star-fc-qc.config.template | Template configuration file for customization |
Annotation and mapping summary for the 48 samples used in the QuickRNASeq test run
| Sample | Subject | Tissue | Sex | Total_reads | Uniq_Ratea | Multi_Rateb | Unmap_Ratec |
|---|---|---|---|---|---|---|---|
| SRR607214 | GTEX-N7MS | Blood | M | 39769361 | 54.59 | 23.5 | 21.91 |
| SRR615261 | GTEX-N7MS | Blood Vessel | M | 47785162 | 79.69 | 2.21 | 18.1 |
| SRR603068 | GTEX-N7MS | Brain | M | 53339811 | 59.45 | 2.15 | 38.4 |
| SRR821282 | GTEX-N7MS | Esophagus | M | 44678159 | 65.58 | 2.62 | 31.8 |
| SRR608096 | GTEX-N7MS | Heart | M | 58482196 | 72.91 | 2.8 | 24.29 |
| SRR612839 | GTEX-N7MS | Muscle | M | 52016412 | 70.81 | 2.37 | 26.82 |
| SRR816609 | GTEX-N7MS | Pituitary | M | 38214685 | 62.27 | 2.37 | 35.36 |
| SRR821518 | GTEX-N7MS | Testis | M | 61509101 | 83.31 | 3.85 | 12.84 |
| SRR607679 | GTEX-N7MS | Thyroid | M | 80820067 | 51.37 | 2.38 | 46.25 |
| SRR809283 | GTEX-N7MT | Blood | F | 48818685 | 64.62 | 10.77 | 24.61 |
| SRR808044 | GTEX-N7MT | Blood Vessel | F | 44714926 | 81.42 | 2.92 | 15.66 |
| SRR598671 | GTEX-N7MT | Brain | F | 45163430 | 70.26 | 3.12 | 26.62 |
| SRR598509 | GTEX-N7MT | Heart | F | 44403911 | 71.19 | 4.3 | 24.51 |
| SRR600784 | GTEX-N7MT | Lung | F | 28065576 | 76.74 | 2.3 | 20.96 |
| SRR813208 | GTEX-N7MT | Pancreas | F | 53422565 | 72.34 | 4.37 | 23.29 |
| SRR821573 | GTEX-N7MT | Pituitary | F | 54452379 | 85.61 | 3.52 | 10.87 |
| SRR810945 | GTEX-NFK9 | Blood | M | 41131423 | 60.85 | 18.12 | 21.03 |
| SRR811819 | GTEX-NFK9 | Blood Vessel | M | 49527122 | 85.48 | 2.81 | 11.71 |
| SRR820689 | GTEX-NFK9 | Esophagus | M | 33541344 | 81.35 | 3.4 | 15.25 |
| SRR602106 | GTEX-NFK9 | Heart | M | 65071994 | 80.04 | 4.76 | 15.2 |
| SRR607166 | GTEX-NFK9 | Lung | M | 58741362 | 76.22 | 2.91 | 20.87 |
| SRR598044 | GTEX-NFK9 | Muscle | M | 58643842 | 80.85 | 3.36 | 15.79 |
| SRR614287 | GTEX-NFK9 | Nerve | M | 47388876 | 70.58 | 2.4 | 27.02 |
| SRR811029 | GTEX-NFK9 | Pancreas | M | 51304957 | 71.95 | 7.01 | 21.04 |
| SRR815280 | GTEX-NFK9 | Prostate | M | 85593813 | 80.46 | 4.55 | 14.99 |
| SRR820839 | GTEX-NFK9 | Testis | M | 51113138 | 66.02 | 2.89 | 31.09 |
| SRR603834 | GTEX-NFK9 | Thyroid | M | 61642193 | 79.4 | 3.49 | 17.11 |
| SRR808836 | GTEX-NPJ8 | Blood Vessel | M | 53974446 | 80.59 | 3.31 | 16.1 |
| SRR598124 | GTEX-NPJ8 | Brain | M | 55608656 | 65.46 | 3.1 | 31.44 |
| SRR817306 | GTEX-NPJ8 | Esophagus | M | 62209065 | 79.22 | 3.9 | 16.88 |
| SRR598148 | GTEX-NPJ8 | Heart | M | 53693956 | 68.13 | 3.35 | 28.52 |
| SRR603750 | GTEX-NPJ8 | Lung | M | 25962857 | 67.55 | 3.24 | 29.21 |
| SRR601695 | GTEX-NPJ8 | Muscle | M | 96240522 | 43.22 | 1.77 | 55.01 |
| SRR615790 | GTEX-NPJ8 | Nerve | M | 61182017 | 58.84 | 2.45 | 38.71 |
| SRR819771 | GTEX-NPJ8 | Pancreas | M | 60265701 | 80.07 | 4.82 | 15.11 |
| SRR807949 | GTEX-NPJ8 | Pituitary | M | 95246707 | 85.12 | 3.44 | 11.44 |
| SRR820234 | GTEX-NPJ8 | Prostate | M | 60423220 | 79.72 | 3.97 | 16.31 |
| SRR810899 | GTEX-NPJ8 | Testis | M | 57950635 | 81.5 | 3.71 | 14.79 |
| SRR602951 | GTEX-NPJ8 | Thyroid | M | 100317976 | 38.72 | 2.1 | 59.18 |
| SRR815494 | GTEX-O5YT | Blood | M | 61808169 | 65.24 | 4.9 | 29.86 |
| SRR809785 | GTEX-O5YT | Blood Vessel | M | 60730604 | 86.73 | 2.59 | 10.68 |
| SRR814003 | GTEX-O5YT | Esophagus | M | 64985455 | 85.69 | 3.07 | 11.24 |
| SRR820316 | GTEX-O5YT | Heart | M | 66455677 | 81.96 | 2.79 | 15.25 |
| SRR821525 | GTEX-O5YT | Lung | M | 56250586 | 78.65 | 2.75 | 18.6 |
| SRR815044 | GTEX-O5YT | Muscle | M | 65449073 | 84.77 | 2.96 | 12.27 |
| SRR812080 | GTEX-O5YT | Nerve | M | 58246823 | 86.85 | 3.1 | 10.05 |
| SRR810761 | GTEX-O5YT | Pancreas | M | 64065959 | 73.8 | 5.49 | 20.71 |
| SRR818850 | GTEX-O5YT | Testis | M | 64388347 | 84.18 | 3.52 | 12.3 |
The samples are from the Genotype-Tissue Expression (GTEx) project [39, 40]
aUniq_Rate, percentage of reads that were uniquely mapped. bMulti_Rate, percentage of reads mapped to multiple locations. cUnmap_Rate, percentage of unmapped reads
Fig. 2Representative entry webpage for a QuickRNAseq project report. The page layout and printable version of the page can be controlled by the top icons. The QC Metrics section provides QC results in plain text, static plot, and interactive plot formats accessible by clicking on the corresponding hyperlinked texts, the iconized figures, and pointing hand, respectively. The Parallel Plot of QC values offers an integrated view of linked QC measures for a single sample or a group of samples (see also Fig. 4). The Expression Tables section provides links to raw read counts, a normalized RPKM table, and interactive display of gene expression levels (see also Fig. 6)
Fig. 4Parallel plot and table of multi-dimensional QC measures. Top panel displays one representative sample with each measure shown in a shaded tooltip. Bottom panel provides sample annotation and the full QC measures in a searchable table. Hovering the cursor over a sample in the table highlights the corresponding sample in the parallel plot. Parallel plots can be customized using the controls instructions below the plot
Fig. 6Interactive visualization of gene expression. a Gene expression levels of selected genes are displayed in a searchable table. b Boxplot view of the expression levels of CKM (creatine kinase, muscle). c Heat map view of gene expression levels of selected genes. Expression values can be grouped or split according to the sample annotations, such as tissue type. Each plot is highly customizable on the fly by right clicking on the plot and selecting relevant options from the dropdown menu
Fig. 3Representative SNP correlation plots to detect sample swapping. a Samples are nicely clustered by donors, as expected. b Clustering is disrupted after purposely swapping SRR598044 and SRR608096
Fig. 5RNA-seq quality control metrics for the SRR603068 sample. a Duplication rates of the reads determined using a sequence-based and a mapping-based strategy. b Distribution of reads based on their percentage GC content. c Nucleotide composition bias of the reads. d Distribution of read quality scores. e Plot of junction saturation among the reads. f Characteristics of the splicing junction sites
Comparison of QuickRNASeq with QuickNGS
| QuickNGS [ | QuickRNAseq | |
|---|---|---|
| Scope and application | Next-generation sequencing: WGS, RNA-seq, miRNA-seq, Chip-seq | RNA-seq only |
| Dependence | Requires external MySQL database and web server support | None |
| Purpose of web interface | Track the progress of data analysis and provide access to result files | Provide access to analyses results and interactive visualization |
| Visualization | Limited | Interactive, very rich and dynamic interface built upon web 2.0 technology |
| RNA-seq functionalities | Limited. Reduction of the hands-on time | “ONE-STOP” integrated report. Particularly implemented to support large-scale RNA-seq. High level of automation and efficiency |