| Literature DB >> 26070955 |
Chen Chu1,2,3, Zhaoben Fang4, Xing Hua5,6, Yaning Yang7, Enguo Chen8, Allen W Cowley9, Mingyu Liang10, Pengyuan Liu11,12,13, Yan Lu14,15.
Abstract
BACKGROUND: The advent of the NGS technologies has permitted profiling of whole-genome transcriptomes (i.e., RNA-Seq) at unprecedented speed and very low cost. RNA-Seq provides a far more precise measurement of transcript levels and their isoforms compared to other methods such as microarrays. A fundamental goal of RNA-Seq is to better identify expression changes between different biological or disease conditions. However, existing methods for detecting differential expression from RNA-Seq count data have not been comprehensively evaluated in large-scale RNA-Seq datasets. Many of them suffer from inflation of type I error and failure in controlling false discovery rate especially in the presence of abnormal high sequence read counts in RNA-Seq experiments.Entities:
Mesh:
Year: 2015 PMID: 26070955 PMCID: PMC4465298 DOI: 10.1186/s12864-015-1676-0
Source DB: PubMed Journal: BMC Genomics ISSN: 1471-2164 Impact factor: 3.969
Fig. 1Overview of deGPS for analyzing sequence count data in RNA-Seq
Fig. 2Modeling sequence read counts from RNA-Seq with the NB and GP distributions. a Read counts fitted by the NB and GP distributions and b QQ plots
Fig. 3Type I error and false discovery rate. Data were simulated from large-scale TCGA lung cancer sequencing studies, a miRNA and b mRNA. Two different types of data transformation, “shift” and “scaling & shift” were applied. Boxplots summarize type I error and false discovery rate of different statistical methods for DE detection under a wide range of simulations. Methods in red font are those do not have correct type I error and/or false discovery rate
Fig. 4True positive rate. a miRNA and (b) mRNA. True positive rate (TPR) can be interpreted as statistical power
Fig. 5Sensitivity and specificity. a miRNA and b mRNA. The AUC with false positive rate less than 0.05 was calculated. Boxplots summarize AUC values from a wide range of simulation settings. TPR, true positive rate; FPR, false positive rate
Fig. 6Benchmark data from compcodeR. Type I error rate, FDR, TPR and AUC are evaluated under 0, 0.5, 1 and 2 % of outliers in RNA-Seq data. Sample size is 5 subjects per group
Fig. 7Analysis of the development transcriptome of Drosophila Melanogaster. Four development stages (early embryo, later embryo, larval and adult) were analyzed (Graveley, et al., 2011). The numbers of genes differentially expressed between two adjacent stages are presented at a FDR threshold of 0.05. The “overlap proportion” is calculated as dividing overlap numbers by its column’s DEs