| Literature DB >> 23826166 |
Yichuan Liu1, Jane F Ferguson, Chenyi Xue, Ian M Silverman, Brian Gregory, Muredach P Reilly, Mingyao Li.
Abstract
Recent advances in RNA sequencing (RNA-Seq) have enabled the discovery of novel transcriptomic variations that are not possible with traditional microarray-based methods. Tissue and cell specific transcriptome changes during pathophysiological stress in disease cases versus controls and in response to therapies are of particular interest to investigators studying cardiometabolic diseases. Thus, knowledge on the relationships between sequencing depth and detection of transcriptomic variation is needed for designing RNA-Seq experiments and for interpreting results of analyses. Using deeply sequenced Illumina HiSeq 2000 101 bp paired-end RNA-Seq data derived from adipose of a healthy individual before and after systemic administration of endotoxin (LPS), we investigated the sequencing depths needed for studies of gene expression and alternative splicing (AS). In order to detect expressed genes and AS events, we found that ∼100 to 150 million (M) filtered reads were needed. However, the requirement on sequencing depth for the detection of LPS modulated differential expression (DE) and differential alternative splicing (DAS) was much higher. To detect 80% of events, ∼300 M filtered reads were needed for DE analysis whereas at least 400 M filtered reads were necessary for detecting DAS. Although the majority of expressed genes and AS events can be detected with modest sequencing depths (∼100 M filtered reads), the estimated gene expression levels and exon/intron inclusion levels were less accurate. We report the first study that evaluates the relationship between RNA-Seq depth and the ability to detect DE and DAS in human adipose. Our results suggest that a much higher sequencing depth is needed to reliably identify DAS events than for DE genes.Entities:
Mesh:
Year: 2013 PMID: 23826166 PMCID: PMC3691247 DOI: 10.1371/journal.pone.0066883
Source DB: PubMed Journal: PLoS One ISSN: 1932-6203 Impact factor: 3.240
Mapping statistics.
| Sample | Time | Reads | Reads mapped (%) | Reads after filtering (%) | Autosomal and sex chromosome reads after filtering (%) |
| Original | Pre-LPS | 911,584,508 | 771,290,702 (85%) | 655,529,906 (72%) | 481,769,060 (53%) |
| Post-LPS | 1,039,937,222 | 856,379,122 (82%) | 718,792,994 (69%) | 518,576,050 (50%) | |
| Technical replicate | Pre-LPS | 66,603,980 | 57,113,510 (86%) | 49,217,950 (74%) | 36,253,892 (54%) |
| Post-LPS | 64,824,708 | 53,726,630 (83%) | 45,005,478 (69%) | 32,587,354 (50%) |
Data were aligned to the hg19 reference genome using Tophat v1.3.3.
Figure 1Analysis results for differentially expressed (DE) genes in adipose.
(A) Percentage of detected expressed genes and differentially expressed (DE) genes for datasets with various sequencing depths. PreLPS: detection rate for expressed genes in the pre-LPS sample; post-PLS: detection rate for expressed genes in the post-LPS sample; DE: detection rate for DE genes. The curves for pre-LPS and post-LPS samples overlap, although the numbers of detected genes were different (Table S1). (B) Spearman correlation between FPKM values in datasets with various sequencing depths and FPKM values in the 500 M-read datasets. PreLPS: correlation of FPKM values in the pre-LPS sample; postLPS: correlation of FPKM values in the post-LPS sample; fold-change: correlation of the fold change of FPKM values. (C) Percentage of detected DE genes according to gene expression levels. PreLPS_high: detection rate for gene expression in highly expressed genes in the pre-LPS sample; preLPS_low: detection rate for gene expression in lowly expressed genes in the pre-LPS sample; postLPS_high: detection rate for gene expression in highly expressed genes in the post-LPS sample; postLPS_low: detection rate for gene expression in lowly expressed genes in the post-LPS sample; DE_high: detection rate for DE genes in highly expressed genes; DE_low: detection rate for DE genes in lowly expressed genes. (D) Performance of DE genes detected in datasets with various sequencing depths.
Figure 2Analysis results for alternative splicing (AS) and differential AS (DAS) in adipose.
(A) Percentage of detected alterantive splicing (AS) and differential AS (DAS) events for datasets with various sequencing depths. PreLPS: detection rate for AS events in the pre-LPS sample; postLPS: detection rate for AS events in the post-LPS sample; DAS: detection rate for DAS events. (B) Spearman correlation between exon or intron inclusion levels in datasets with various sequencing depths and inclusion levels in the 500 M-read datasets. PreLPS: correlation of inclusion levels in the pre-LPS sample; postLPS: correlation of inclusion levels in the post-LPS sample; fold-change: correlation of the fold change of isoform ratios. (C) Percentage of detected AS and DAS events according to gene expression levels. preLPS_high: detection rate for AS in highly expressed genes in the pre-LPS sample; preLPS_low: detection rate for AS in lowly expressed genes in the pre-LPS sample; postLPS_high: detection rate for AS in highly expressed genes in the post-LPS sample; postLPS_low: detection rate for AS in lowly expressed genes in the post-LPS sample; DAS_high: detection rate for DAS in highly expressed genes; DAS_low: detection rate for DAS in lowly expressed genes. (D) Performance of DAS events detected in datasets with various sequencing depths.
Figure 3Spearman correlation boxplot for randomly simulated datasets.
(A) Boxplot of Spearman correlations for FPKM values and fold change of FPKM values among 10 randomly sampled datasets. For each sequencing depth (10 M or 100 M), the correlation was calculated for each of the 45 pair-wise comparisons. (B) Boxplot of Spearman correlations for exon/intron inclusion levels and fold change of inclusion levels among 10 randomly sampled datasets. For each sequencing depth (10 M or 100 M), the correlation was calculated for each of the 45 pair-wise comparisons.
Numbers of expressed genes, differentially expressed (DE) genes, alternative splicing (AS) events and differential AS (DAS) events detected in the technical replicate samples and the resampled data with the same sequencing depth* as the technical replicate.
| Sample | Expressed genes(pre-LPS) | Expressed genes(post-LPS) | DE genes | AS events(pre-LPS) | AS events(post-LPS) | DAS events |
| Technical replicate | 15,962 | 15,324 | 732 | 7,506 | 7,347 | 12 |
| Resampled data | 16,064 | 15,375 | 748 | 8,805 | 8,586 | 8 |
| Overlap | 15,400 | 14,756 | 598 | 5,562 | 5,510 | 1 |
67 M and 65 M reads for the pre-LPS and post-LPS samples, respectively, with 36 M reads and 33 M reads after filtering and removal of reads mapped to mitochondria.