| Literature DB >> 27092944 |
Qifeng Xu1,2, Xuegong Zhang1.
Abstract
The assumption that total abundance of RNAs in a cell is roughly the same in different cells is underlying most studies based on gene expression analyses. But experiments have shown that changes in the expression of some master regulators such as c-MYC can cause global shift in the expression of almost all genes in some cell types like cancers. Such shift will violate this assumption and can cause wrong or biased conclusions for standard data analysis practices, such as detection of differentially expressed (DE) genes and molecular classification of tumors based on gene expression. Most existing gene expression data were generated without considering this possibility, and are therefore at the risk of having produced unreliable results if such global shift effect exists in the data. To evaluate this risk, we conducted a systematic study on the possible influence of the global gene expression shift effect on differential expression analysis and on molecular classification analysis. We collected data with known global shift effect and also generated data to simulate different situations of the effect based on a wide collection of real gene expression data, and conducted comparative studies on representative existing methods. We observed that some DE analysis methods are more tolerant to the global shift while others are very sensitive to it. Classification accuracy is not sensitive to the shift and actually can benefit from it, but genes selected for the classification can be greatly affected.Entities:
Mesh:
Year: 2016 PMID: 27092944 PMCID: PMC4836657 DOI: 10.1371/journal.pone.0153903
Source DB: PubMed Journal: PLoS One ISSN: 1932-6203 Impact factor: 3.240
A simple illustrative hypothetic example on the effect of global expression shift.
| Experiment | #1 | #2 | ||
|---|---|---|---|---|
| Group | Control | Test | Control | Test |
| Shift factor | 1.0 | 2.0 | 1.0 | 2.0 |
| The number of cell | 2000 | 1000 | 2000 | 2000 |
| Expression of gene A | 0.5 | |||
| Expression of gene B | 0.5 | |||
| Test-control ratio of gene A | 0.5 | |||
| Test-control ratio of gene B | 0.5 | |||
Gene expression datasets used in the experiments.
| Data ID | Cancer name | Paper information | Number of samples | Number of Cancer(Normal) samples | Number of Probeset |
|---|---|---|---|---|---|
| 1 | bladder | Dyrskjøt,L. et al. Cancer Res 2004 | 43 | 29(14) | 22215 |
| 2 | brain | Sun,L. et al. Cancer Cell 2006 | 61 | 38(23) | 54613 |
| 3 | cervical | Scotto,L. et al. Genes Chromosomes Cancer 2008 | 52 | 28(24) | 22215 |
| 4 | cervical | Zhai,Y. et al. Cancer Res 2007 | 31 | 21(10) | 22215 |
| 5 | colorectal | Sabates-Bellver,J. et al. Mol Cancer Res 2007 | 64 | 32(32) | 54613 |
| 6 | colorectal | Hong,Y. et al. Clin Exp Metastasis 2010 | 82 | 70(12) | 54613 |
| 7 | esophageal | Hu,N. et al. BMC Genomics 2010 | 34 | 17(17) | 22215 |
| 8 | esophageal | Su,H. et al. Clin Cancer Res 2011 | 106 | 53(53) | 22215 |
| 9 | esophageal | Su,H. et al. Clin Cancer Res 2011 | 102 | 51(51) | 22477 |
| 10 | Head neck | Kuriakose,MA. et al. Cell Mol Life Sci 2004 | 44 | 22(22) | 12558 |
| 11 | Head neck | Pyeon,D. et al. Cancer Res 2007 | 56 | 42(14) | 54613 |
| 12 | leukemia | Stirewalt,DL. et al. Genes Chromosomes Cancer 2008 | 64 | 26(38) | 22215 |
| 13 | lung | Landi,MT. et al. PLoS One 2008 | 107 | 58(49) | 22215 |
| 14 | lung | Spira,A. et al. Nat Med 2007 | 187 | 97(90) | 22215 |
| 15 | lung | Stearman,RS. et al. Am J Pathol 2005 | 39 | 20(19) | 12558 |
| 16 | lung | Su,LJ. et al. BMC Genomics 2007 | 54 | 27(27) | 22215 |
| 17 | pancreatic | Badea,Pancreas. et al. Hepatogastroenterology 2008 | 78 | 39(39) | 54613 |
| 18 | pancreatic | Pei,Pancreas. et al. Cancer Cell 2009 | 52 | 36(16) | 54613 |
| 19 | prostate | Yu,YP. et al. J Clin Oncol 2004 | 75 | 58(17) | 12579 |
| 20 | prostate | Wallace,TA. et al. Cancer Res 2008 | 87 | 69(18) | 22215 |
Fig 1The flowchart of the classification and gene selection experiments on data with simulated global shift.
Fig 2Overlap proportions of differentially expressed genes detected by fold-change from the data with corrected and uncorrected global shift effects on Loven et al’s data.
(A) Up-regulated DE genes. (B) Down-regulated DE genes. The x-axis is the number of the top genes of the up-regulated DE gene lists or the down-regulated DE gene lists. The y-axis is the overlap proportions of the top genes.
Fig 3Overlap proportions of differentially expressed genes detected by SAM from the data with corrected and uncorrected global shift effects on Loven et al’s data.
(A) Up-regulated DE genes. (B) Down-regulated DE genes. The settings are the same with Fig 2.
Fig 4Overlap proportions of differentially expressed genes detected by fold-change, SAM and t-test from the data with simulated global shift effects, averaged over the 20 datasets.
(A) DE genes ranked by whole differentially expressed differences; (B) Up-regulated DE genes; (C) Down-regulated DE genes. The settings are the same with Fig 2.
Illustrative examples of experiments without global shift and with shifts of two directions.
| Experiment #1: shift factor 2.0 | |||
| FC value | 0.5~1 | ||
| Identification / Truth | down / up | ||
| Experiment #2: no global shift | |||
| FC value | - | ||
| Identification / Truth | - | ||
| Experiment #3: shift factor 0.5 | |||
| FC value | 1~2 | ||
| Identification / Truth | up / down | ||
FC, fold-change ratio; down, down-regulated gene; up, up-regulated gene.
Illustrative examples of multiple samples of the experiment with no shift effect.
| Gene | Expression in control samples | Expression in test samples | Difference of means | Pooled variance | t | p | Inference (p-value<0.05) | p-value rank (from small to large) |
|---|---|---|---|---|---|---|---|---|
| #1 | 150, 200, 250 | 1, 50, 100 | 149.67 | 40.62 | 3.68 | 0.021 | Significant, Down | 3 |
| #2 | 101.1, 101.2, 101.3 | 100.1, 100.2, 100.3 | 1 | 0.082 | 12.25 | 2.6e-04 | Significant, Down | 1 |
| #3 | 150, 200, 250 | 50, 100, 150 | 100 | 40.82 | 2.45 | 0.07 | Non-significant | 4 |
| #4 | 180, 200, 220 | 95.1, 100.2, 105.3 | 99.8 | 11.92 | 8.37 | 0.0096 | Significant, Down | 2 |
Illustrative examples of multiple samples of the experiment with shift factor = 2 in test samples.
| Gene | Expression in control samples | Expression in test samples | Difference of means | Pooled variance | t | p | Inference (p-value<0.05) | p-value rank (from small to large) |
|---|---|---|---|---|---|---|---|---|
| #1 | 150, 200, 250 | 2, 100, 200 | 99.33 | 64.03 | 1.55 | 0.220 | Non-significant | 2 |
| #2 | 101.1, 101.2, 101.3 | 200.2, 200.4, 200.6 | -99.2 | 0.13 | -768.4 | 6.81e-09 | Significant, Up | 1 |
| #3 | 150, 200, 250 | 100, 200, 300 | 0 | 64.55 | 0 | 1 | Non-significant | 4 |
| #4 | 180, 200, 220 | 190.2, 200.4, 210.6 | -0.4 | 12.96 | -0.03 | 0.97 | Non-significant | 3 |
The classification errors of rank lists of Dataset 1.
| # of genes | 21649 | 10824 | 5412 | 2706 | 1353 | 676 | 338 | 169 | 84 | 42 | 21 | 10 | 5 |
| On original data | 0.047 | 0.07 | 0.047 | 0.047 | 0.047 | 0.07 | 0.07 | 0.07 | 0.07 | 0.07 | 0.07 | 0.047 | 0.047 |
| On data with shift factor 2 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
The classification errors of rank lists of Dataset 2.
| # of genes | 54429 | 27214 | 13607 | 6804 | 3402 | 1701 | 850 | 425 | 212 | 106 | 53 | 26 | 13 | 6 |
| On original data | 0.115 | 0.098 | 0.098 | 0.098 | 0.098 | 0.098 | 0.098 | 0.115 | 0.082 | 0.098 | 0.082 | 0.082 | 0.082 | 0.082 |
| On data with shift factor 2 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
Fig 5The overlap proportion of selected gene lists by R-SVM.
(A) on Dataset 1; (B) on Dataset 2. The settings are the same with Fig 2.