| Literature DB >> 26208977 |
Anto P Rajkumar1,2,3,4,5, Per Qvist6,7,8, Ross Lazarus9, Francesco Lescai10,11,12, Jia Ju13, Mette Nyegaard14,15,16, Ole Mors17,18,19, Anders D Børglum20,21,22,23,24, Qibin Li25, Jane H Christensen26,27,28.
Abstract
BACKGROUND: Massively parallel cDNA sequencing (RNA-seq) experiments are gradually superseding microarrays in quantitative gene expression profiling. However, many biologists are uncertain about the choice of differentially expressed gene (DEG) analysis methods and the validity of cost-saving sample pooling strategies for their RNA-seq experiments. Hence, we performed experimental validation of DEGs identified by Cuffdiff2, edgeR, DESeq2 and Two-stage Poisson Model (TSPM) in a RNA-seq experiment involving mice amygdalae micro-punches, using high-throughput qPCR on independent biological replicate samples. Moreover, we sequenced RNA-pools and compared their results with sequencing corresponding individual RNA samples.Entities:
Mesh:
Substances:
Year: 2015 PMID: 26208977 PMCID: PMC4515013 DOI: 10.1186/s12864-015-1767-y
Source DB: PubMed Journal: BMC Genomics ISSN: 1471-2164 Impact factor: 3.969
Fig. 1Agreement between four different methods for DEG analysis of RNA-seq data. a Intersections between DEGs, which were detected by Cuffdiff2, edgeR, DESeq2 and Two-stage Poisson Model (TSPM), after Benjamini-Hochberg false discovery correction at 5 %. b-g Pairwise comparisons of logarithmic (base 2) fold changes (LFC) in expression that were estimated by Cuffdiff2, edgeR, DESeq2 and TSPM: b edgeR and Cuffdiff2; c edgeR and DESeq2; d edgeR and TSPM; e Cuffdiff2 and TSPM; f Cuffdiff2 and DESeq2; g TSPM and DESeq2; Spearman correlation coefficients (Rho) are included in each graph. RNA samples were obtained from amygdalae micro-punches of female mice, heterozygous for a targeted deletion in the Brd1 gene on a congenic C57BL/6NTac background and of their WT littermates (8 biological replicates/group)
Validation of four differential gene expression analysis methods for RNA-Seq
| Parametersa | edgeR | Cuffdiff2 | TSPM | DESeq2 |
|---|---|---|---|---|
| Total number of identified DEGsb | 82 | 136 | 8 | 1 |
| Number for DEGs selected for qPCR validation | 51 | 79 | 8 | 1 |
| Sensitivity (True positivity rate) (%) | 76.67 | 51.67 | 5.00 | 1.67 |
| Specificity (True negativity rate) (%) | 90.91 | 12.73 | 90.91 | 100.00 |
| False positivity rate (%) | 9.09 | 87.27 | 9.09 | 0.00 |
| False negativity rate (%) | 23.33 | 48.33 | 95.00 | 98.33 |
| Positive predictive value (%) | 90.20 | 39.24 | 37.50 | 100.00 |
| Negative predictive value (%) | 78.13 | 19.44 | 46.73 | 48.25 |
| Positive likelihood ratio | 8.43 | 0.59 | 0.55 | ∞ |
| Negative Likelihood ratio | 0.26 | 3.80 | 1.05 | 0.98 |
| Overall agreement (%) | 83.48 | 33.04 | 46.09 | 48.70 |
aReplication of differential expression by quantitative Polymerase Chain Reaction (qPCR) was the reference standard
bDifferentially Expressed Genes, after Benjamini-Hochberg false discovery correction at 5 %; TSPM: Two-stage Poisson Model
Fig. 2Agreement between sequencing RNA-pools and sequencing corresponding individual RNA samples. a Intersection between differentially expressed genes (DEGs), detected by edgeR, in RNA-seq data from pooled RNA (3 samples/ pool; two pools/ group) and of data from corresponding individual samples of RNA (3 samples/group). Rectangle represents all expressed genes. b Correlation between the logarithmic (base 2) fold changes (LFC) in expression that were estimated by sequencing RNA-pools (3 samples/ pool) and by sequencing corresponding individual samples (3 samples/group). c Intersection between the DEGs, detected by edgeR, in RNA-seq data from pooled RNA (8 samples/ pool; two pools/ group) and of data from corresponding individual samples of RNA (8 samples/group). Rectangle represents all expressed genes. d Correlation between the LFC in expression that were estimated by sequencing RNA-pools (8 samples/pool) and by sequencing corresponding individual samples (8 samples/ group)
Validation of two pooling strategies for RNA-Seq
| Parametersa | Pooling 3 samples | Pooling 8 samples |
|---|---|---|
| Total number of identified DEGsb | 4175 | 2513 |
| Sensitivity (True positivity rate) (%) | 93.75 | 90.24 |
| Specificity (True negativity rate) (%) | 81.27 | 86.59 |
| False positivity rate (%) | 18.73 | 13.41 |
| False negativity rate (%) | 6.25 | 9.76 |
| Positive predictive value (%) | 0.36 | 2.94 |
| Negative predictive value (%) | 99.99 | 99.95 |
| Agreement between identified DEGsc | 0.006 | 0.049 |
| Correlation between reported LFCd | 0.380 | 0.517 |
| Root-mean-square deviation of LFCe | 1.198 | 0.518 |
aSequencing corresponding individual biological samples was the reference standard
bDifferentially Expressed Genes (DEGs), after Benjamini-Hochberg false discovery correction at an expected rate of 5 %
cInter-rater agreement Cohen’s kappa between sequencing individual samples (3 or 8/group) and sequencing pooled samples (3 or 8 biological replicates/pool; 2 pools/group) to identify DEGs
dSpearman correlation coefficient between the logarithmic fold changes (LFC), which were estimated by sequencing individual samples and by sequencing pooled samples
eStandard deviation of the differences between the LFC, estimated by sequencing individual samples and by sequencing pooled samples