| Literature DB >> 23902433 |
Xiao Xu1, Yuanhao Zhang, Jennie Williams, Eric Antoniou, W Richard McCombie, Song Wu, Wei Zhu, Nicholas O Davidson, Paula Denoya, Ellen Li.
Abstract
BACKGROUND: High throughput parallel sequencing, RNA-Seq, has recently emerged as an appealing alternative to microarray in identifying differentially expressed genes (DEG) between biological groups. However, there still exists considerable discrepancy on gene expression measurements and DEG results between the two platforms. The objective of this study was to compare parallel paired-end RNA-Seq and microarray data generated on 5-azadeoxy-cytidine (5-Aza) treated HT-29 colon cancer cells with an additional simulation study.Entities:
Mesh:
Substances:
Year: 2013 PMID: 23902433 PMCID: PMC3697991 DOI: 10.1186/1471-2105-14-S9-S1
Source DB: PubMed Journal: BMC Bioinformatics ISSN: 1471-2105 Impact factor: 3.169
Figure 1Expressional consistency between RNA-Seq and microarray data. A. Detectable genes reported by each technology based on a common filter procedure (See Methods). Venn diagrams of detectable genes are shown 3 experimental conditions (0 µM, 5 µM and 10 µM) respectively, and overlap rates are calculated by dividing number of commonly detectable genes by the union. B. By-group scatter plot depicting the expression profiles of all genes. Log2 transformed FPKM values from RNA-Seq and log2 scaled microarray gene intensities (normalized) are used in the scatterplot. We added 1 to FPKM value before log2 transformation to facilitate calculation. Commonly detected genes are shown in red color while platform exclusive genes are denoted in black. Both Pearson correlation coefficients (PCC) and Spearman correlation coefficients (SCC) were calculated based on all gene entries (except for those not having probe names on the array or RNA-Seq reference genome). C. By-group expressional density histogram for both commonly detectable genes and RNA-Seq specific ones. The x-axis denotes the RNA-Seq FPKM value (log2 scale) distribution and y-axis shows the frequency of genes within each category. Commonly detectable genes are depicted in black while RNA-Seq exclusive genes are shown in grey color.
Figure 2EIV Regression Model Comparing Microarray and RNA-Seq Gene profiles. EIV regression model is constructed for independent variable (microarray normalized gene intensities in log2 unit-free scale) and dependent variable (RNA-Seq FPKM values in log2 unit-free scale) for each of the experimental groups (5 µM, 10 µM and 0 µM) of HT29 samples. Log2 scaled unit-free normalized gene intensities are shown as grey circles in the scatter plot and EIV regression line is drawn in bold black. For each of the plots, a dashed reference line of Y = X (corresponding to perfect platform agreement) is also included to indicate the deviation of the real regression line from the reference. The estimated regression equation is shown in the lower-right section of each plot. The 95% bootstrap confidence interval for the regression intercept and slope (α and ß) are shown on the top of each plot.
Cross-platform overlap in DEG lists using RNA-Seq and microarray HT-29 data.
| Comparison (5-Aza) | Methods | Cuffdiff | SAMSeq | baySeq | DESeq | Noiseq |
|---|---|---|---|---|---|---|
| 5 µM vs 0 µM | T-test | 25.8% | 22.5% | 24.5% | 25.8% | 29.7% |
| SAM | 39.9% | 39.9% | 33.5% | |||
| Ebayes | 39.5% | 39.5% | 33.3% | |||
| 10 µM vs 5 µM | T-test | 0.5% | 0.5% | 0.6% | 0.6% | 0.0% |
| SAM | 31.1% | 30.2% | 32.6% | 34.2% | 19.4% | |
| Ebayes | 30.3% | 28.3% | 31.9% | 33.5% | 19.1% | |
The cross-platform overlap rates (%)1 were calculated between the three microarray DEG algorithms (T-test, SAM, eBayes) and the five RNA-Seq algorithms (Cuffdiff, SAMSeq, baySeq, DESeq, and NOISeq) as described in Methods. Comparisons were made to 1) measure the effect of treating HT-29 cells with 5 μM 5-Aza compared to control cells and 2) the effect of increasing the 5-Aza concentration from 5 μM to 10 μM.
¹The DEG overlap rates were calculated by dividing the number of overlapped DEGs over the union number of DEGs identified from both methods. Overlap rates higher than 40% are denoted in bold style.
Intra- and cross-platform comparison of DEG lists generated from simulated microarray and RNA-Seq data.
| T-test | Ebayes | SAM | baySeq | DESeq | SAMseq | NOISeq | |
|---|---|---|---|---|---|---|---|
| T-test | 100.0% | ||||||
| Ebayes | 5.1% | 100.0% | |||||
| SAM | 4.8% | 95.1% | 100.0% | ||||
| baySeq | 100.0% | ||||||
| DESeq | 75.7% | 100.0% | |||||
| SAMseq | 54.1% | 51.9% | 100.0% | ||||
| NOISeq | 46.5% | 50.4% | 46.9% | 100% |
The simulations were carried out as described in Methods. The overlap percentages was calculated based on the percent of true positive DEGs (95% minimum fold change method: FC ≥ 2 & FDR < = 0.05) identified by each of the three microarray algorithms (T-test, SAM, eBayes) and four RNA-Seq algorithms (SAMseq, baySeq, DESeq, NOISeq).
¹The overlap rate was calculated based on true positive DEGs called by each method. The microarray and RNA-Seq cross-platform DEG overlap rates are shown in bold style.
Figure 3Sensitivity and False Discovery Rate (FDR) curve plots for simulated data using each DEG method. Sensitivity (A.) or FDR (B.) are calculated for 4 RNA-Seq DEG methods (SAMSeq, baySeq, DESeq, and NOISeq) and 3 microarray DEG algorithms (T-test, SAM, eBayes). Method curves are shown in different colors (see figure legends) at each 95% minimum fold change level for pre-determined DEGs. Each fold change on x-axis (in log2 scale) corresponds to the lower 5% fold change of normally distributed DEGs predefined in the simulation process (See Methods).
Confirmation of DEGs selected using both RNA-Seq and microarray data, RNA-Seq only and microarray only by qRT-PCR.
| Selected by1 | Gene symbol | FC2 qPCR | P value3 | FC2 microarray | FC2 RNA-Seq |
|---|---|---|---|---|---|
| BOTH | TGM2 | 1.6 | 0.762 | 0.4 | 0.3 |
| Array | IRF7 | 0.5 | 0.068 | 2.0 | 1.4 |
| Array | PLCL1 | 1.2 | 0.469 | 0.7 | 0.8 |
| Array | LIPE | 1.5 | 0.565 | 0.6 | 0.5 |
| SEQ | GGT7 | 4.8 | 0.077 | 1.7 | 2.9 |
| SEQ | EFNB1 | 2.0 | 0.605 | 0.8 | 0.4 |
| SEQ | SPARC4 | INF | <<0.001 | 0.8 | INF |
The predicted polarity of the fold change for each DEG is listed along with the qRT-PCR calculated fold change (2-ΔΔCt). The P-values were calculated based on applying the T-test to ΔCt values of the control cells in comparison with the ΔCt values of the 5 μM 5-Aza treated cells (see Methods).
1Genes confirmed by RT-PCR as DEGs are marked in bold style (SEQ: Illumina RNA-Seq platform; Array: Affymetrix microarray platform)
2FC (fold change) is calculated as the group average of 5µM (5-aza-deoxycytidine)/Control, the normalized expression intensity value is used for microarray data while FPKM values from Cufflinks program are used for RNA-Seq data.
3P value is calculated using unpaired t-test on qPCR data; Genes that are significantly different between two groups are highlighted in bold style.
4The SPARC gene (not part of the 13 genes list) was specially added into the qPCR experiment for validation. On both qPCR and RNA-Seq platforms, its expression values in control group are all found to be zeros (below detection cutoff) whereas there are moderate expressions in 5µM group, therefore we consider the fold change as infinite positive denoted by INF with infinite minimal p values.
Pathways commonly detected by SAM, eBayes, DESeq and baySeq
| Up-regulated | |
|---|---|
| Atherosclerosis Signaling | Human Embryonic Stem Cell Pluripotency |
| Interferon Signaling | Bladder Cancer Signaling |
| LPS/IL-1 Mediated Inhibition of RXR Function | Activation of IRF by Cytosolic Pattern Recognition Receptors |
| Antigen Presentation Pathway | Factors Promoting Cardiogenesis in Vertebrates |
| Role of BRCA1 in DNA Damage Response | Role of CHK Proteins in Cell Cycle Checkpoint Control |
| Hepatic Fibrosis/Hepatic Stellate Cell Activation | FXR/RXR Activation |
| Type I Diabetes Mellitus Signaling | Glutathione-mediated Detoxification |
| Estrogen-mediated S-phase Entry | Hereditary Breast Cancer Signaling |
| GADD45 Signaling | Neuroprotective Role of THOP1 in Alzheimer's Disease |
| Caveolar-mediated Endocytosis Signaling | JAK/Stat Signaling |
| Graft-versus-Host Disease Signaling | Protein Ubiquitination Pathway |
| LXR/RXR Activation | PI3K/AKT Signaling |
| Oncostatin M Signaling | CDK5 Signaling |
| Autoimmune Thyroid Disease Signaling | Role of IL-17A in Arthritis |
| Cell Cycle Regulation by BTG Family Proteins | Aryl Hydrocarbon Receptor Signaling |
| ATM Signaling | Role of Osteoblasts, Osteoclasts and Chondrocytes in Rheumatoid Arthritis |
| FXR/RXR Activation | |