| Literature DB >> 29268715 |
Yuhang Liu1, Jinfeng Zhang2, Xing Qiu3.
Abstract
BACKGROUND: Normalization is an important data preparation step in gene expression analyses, designed to remove various systematic noise. Sample variance is greatly reduced after normalization, hence the power of subsequent statistical analyses is likely to increase. On the other hand, variance reduction is made possible by borrowing information across all genes, including differentially expressed genes (DEGs) and outliers, which will inevitably introduce some bias. This bias typically inflates type I error; and can reduce statistical power in certain situations. In this study we propose a new differential expression analysis pipeline, dubbed as super-delta, that consists of a multivariate extension of the global normalization and a modified t-test. A robust procedure is designed to minimize the bias introduced by DEGs in the normalization step. The modified t-test is derived based on asymptotic theory for hypothesis testing that suitably pairs with the proposed robust normalization.Entities:
Keywords: Differential expression analysis; Gene expression; Modified t-test; Robust normalization; Super-delta
Mesh:
Year: 2017 PMID: 29268715 PMCID: PMC5740711 DOI: 10.1186/s12859-017-1992-2
Source DB: PubMed Journal: BMC Bioinformatics ISSN: 1471-2105 Impact factor: 3.169
A summary table of three simulation scenarios
| Classical | Super-delta | |||||||
|---|---|---|---|---|---|---|---|---|
| Oracle | Global | medIQR | Quantile | Cyclic-loess | Mean | Median | MFTM | |
| SIM1 | ||||||||
| Power | 88.82(1.02) | 88.17(1.08) | 87.84(0.99) | 87.71(1.00) | 87.61(1.01) | 88.48(1.04) | 88.89(1.13) | 88.85(1.00) |
| Type I error | 0.46(0.07) | 0.49(0.08) | 0.50(0.07) | 0.51(0.08) | 0.52(0.07) | 0.44(0.08) | 0.41(0.08) | 0.40(0.07) |
| SIM2 | ||||||||
| Power | 92.11(0.81) | 90.94(0.89) | 90.37(0.95) | 90.17(0.93) | 89.94(0.95) | 91.26(1.01) | 91.79(0.94) | 92.09(0.77) |
| Type I error | 0.93(0.14) | 1.08(0.17) | 1.25(0.22) | 1.26(0.19) | 1.36(0.23) | 1.03(0.22) | 0.85(0.15) | 0.83(0.15) |
| SIM3 | ||||||||
| Power | 89.18(1.49) | 76.67(1.51) | 77.28(1.84) | 76.20(1.63) | 76.51(1.59) | 77.16(1.89) | 86.05(1.70) | 88.55(1.62) |
| Type I error | 0.61(0.11) | 1.53(0.19) | 1.42(0.20) | 1.55(0.20) | 1.52(0.20) | 1.46(0.26) | 0.61(0.14) | 0.54(0.12) |
Sample size is 50 for both groups. All p-values are Benjamini-Hochberg adjusted; Power: Approximate statistical power; Type I error: Approximate Type I error rate. All these measurements are calculated by averaging over 50 replicates. Numbers within parentheses are standard deviations. All numbers are pertentage rates
Fig. 1Venn diagram of significant genes (numbers in figure are numbers of genes)
A summary table of high-frequency pairing genes
| Times being paired | 29 | 27 | 26 | 25 | 24 | 23 | 22 | 21 |
| Number of genes | 2 | 1 | 1 | 1 | 2 | 1 | 4 | 4 |
| Times being paired | 20 | 19 | 18 | 17 | 16 | 15 | 14 | 13 |
| Number of genes | 7 | 4 | 9 | 9 | 19 | 20 | 21 | 18 |
| Times being paired | 12 | 11 | 10 | 9 | 8 | 7 | 6 | 5 |
| Number of genes | 25 | 33 | 49 | 60 | 67 | 91 | 127 | 168 |
Fig. 2Gene RPL3 is selected as a DEG by quantile normalization but not by super-delta. Upper left: Parallel boxplot of raw gene expressions; Upper right: Parallel boxplot of quantile normalized gene expressions; Lower left: Parallel boxplot of super-delta differences; Lower right: Histogram of super-delta t-statistics by normalizing with all genes. Diamonds on boxplots represent sample means. Dashed vertical line in histogram represents MFTM
Fig. 3Gene BIRC5 is selected as DE gene by super-delta but not by quantile normalization. Annotations are the same as Fig. 2