| Literature DB >> 15128431 |
Abstract
BACKGROUND: The detection of small yet statistically significant differences in gene expression in spotted DNA microarray studies is an ongoing challenge. Meeting this challenge requires careful examination of the performance of a range of statistical models, as well as empirical examination of the effect of replication on the power to resolve these differences.Entities:
Mesh:
Year: 2004 PMID: 15128431 PMCID: PMC420235 DOI: 10.1186/1471-2105-5-54
Source DB: PubMed Journal: BMC Bioinformatics ISSN: 1471-2105 Impact factor: 3.169
Analysis of general, unconstrained variance vs. nested, constrained variance models
| Data | Model* | Genes | s / gene** | Parameters | BIC*** | |
| Townsend | AU | 4506 | 5.2 | 33312.6 | 7 | -11147.1 |
| Townsend | AC | 5759 | 4.1 | 21296.8 | 4 | -4108.8 |
| Townsend | AV | 5759 | 4.8 | 21365.8 | 4 | -4039.8 |
| Townsend | MU | 4506 | 7.0 | 33969.2 | 7 | -10490.6 |
| Townsend | MC | 5759 | 5.7 | 21469.2 | 4 | -3936.4 |
| Townsend | MV | 5759 | 6.2 | 21459.1 | 4 | -3946.5 |
| Sudarsanam | AU | 4756 | 3.5 | 19874.9 | 5 | -1053.6 |
| Sudarsanam | AC | 5888 | 3.2 | 16199.2 | 3 | 502.9 |
| Sudarsanam | AV | 5888 | 3.6 | 16200.4 | 3 | 504.1 |
| Sudarsanam | MU | 4756 | 4.4 | 20229.8 | 5 | -698.7 |
| Sudarsanam | MC | 5888 | 4.1 | 16494.4 | 3 | 798.0 |
| Sudarsanam | MV | 5888 | 4.7 | 16370.7 | 3 | 674.3 |
* (A)dditive or (M)ultiplicative error, with (U)nconstrained variances, a common (C)oefficient of variation, or a common (V)ariance. ** seconds of processor time on a dual 1 GHz PowerPC G4 *** Bayesian Information Criterion
Figure 1Detection of gene expression differences from ratio data that are truncated-ratio-of-normals distributed. Frequencies of affirmative significance calls with six analytical models are plotted against the factor of gene expression difference. Symbols represent the analysis model used: AC(+), AV(◇), AU(O), MC(×), MV(□), and MU(△). Diagrams correspond to data simulated with A) and B) equal variance in the two nodes of the experimental design, and C) and D) standard deviations proportional to expression level in each node.
Figure 2Detection of gene expression differences from ratio data that are lognormally distributed. Frequencies of affirmative significance calls with six analytical models are plotted against the factor of gene expression difference. Symbols represent the analysis model used: AC(+), AV(◇), AU(O), MC(×), MV(□), and MU(△). Diagrams correspond to ratio data simulated with A) and B) equal variance in the two nodes of the experimental design, and C) and D) standard deviations proportional to expression level in each node.
Figure 3Detection of gene expression differences from ratio data that are gamma or truncated-Cauchy distributed. Frequencies of affirmative significance calls with six analytical models are plotted against the factor of gene expression difference. Symbols represent the analysis model used: AC(+), AV(◇), AU(O), MC(×), MV(□), and MU(△). Diagrams correspond to data simulated from A) and B) a gamma distribution of ratios, and C) and D) a truncated Cauchy distribution of ratios.
Figure 4Logistic regressions of the probability of detection of gene expression differences from simulated data. Logistic regressions of the frequency of affirmative significance call over log2 factor of difference in gene expression. The logistic model plotted is that loge(p/(1 - p)) = mx + b, where x is the log2 factor of difference in gene expression. Cross symbols represent actual data points. Each is placed at its estimated expression level, either at the top of the plot. When identified as significant (S), or at the bottom when identified as not significant(NS). Logistic regressions are of statistical significance calls A) on the "true" factors of fold change from which data was simulated. The model has a highly significant fit (χ2 = 884.5, P < 0.0001). The estimated intercept for the log odds, b, of an affirmative significance call is -16.4 (significant, P < 0.0001). This corresponds to a probability of a positive call of 0.02, which is the observed average false-positive rate. The estimated slope with log2 factor of difference in gene expression, m, is 12.5 (significant, P < 0.0001). B) on the factors of difference estimated from the simulated data. The model has a highly significant fit (χ2 = 890.5, P < 0.0001). The estimated intercept for the log odds, b, of a significant call versus no significant call is -3.9 (significant, P < 0.0001), and the estimated slope with log2 factor of difference in gene expression, m, is 10.7 (significant, P < 0.0001).
Figure 5Logistic regressions of the probability of detection of gene expression differences from experimental data. Logistic regressions of the frequency of affirmative significance call on the estimated log2 factor of difference in gene expression for five datasets from four published studies that publicly reported replicated ratio results for each hybridization. The logistic model plotted is that loge(p/(1 - p)) = mx + b, where p is the probability of an affirmative significance call, and x is the log2 factor of difference in gene expression. Cross symbols (+) are plotted at the estimated expression level of each gene, either at the top of the plot when identified as significant (S), or at the bottom when identified as not significant (NS). See Townsend and Hartl [8] and Townsend et al. [10] for diagrams of the experimental designs for these studies. Logistic regressions of significance call on the factor of difference are computed from the data of A) Alexandre et al. [27], comparing yeast in log-phase growth with yeast in log-phase growth after 30 minutes of exposure to high ethanol. The model has a highly significant fit (χ2 = 2126.4, P < 0.00001). The estimated intercept for the log odds, b, of an affirmative significance call is -6.0 (significant, P < 0.0001), and the estimated slope with log2 factor of difference in gene expression, m, is 4.0 (significant, P < 0.0001). Three microarray comparisons were performed on two samples. The factor of gene expression at which 50% of estimated differences were identified as significant (GEL50) was 2.8-fold. B) Lyons et al. [28], comparing expression in yeast in wild type and zap1 strains at log-phase growth in low zinc media. The model has a highly significant fit (χ2 = 2844.0, P < 0.00001). The estimated intercept for the log odds, b, of a significant call is -4.2 (significant, P < 0.00001), and the estimated slope with log2 factor of difference in gene expression, m, is 5.8 (significant, P < 0.0001). Nine microarray comparisons were reported on six samples, and GEL50 = 1.65-fold. C) Sudarsanam et al. [26], comparing expression in yeast between wild type and snf2 strains at log-phase growth in rich and minimal media. Cross symbols representing the data are plotted only for the left-hand curve, which regresses data from the comparison in minimal media. The model has a highly significant fit (χ2 = 2429.3, P < 0.00001). The estimated intercept for the log odds, b, of a significant call is -3.9 (significant, P < 0.00001), and the estimated slope with log2 factor of difference in gene expression, m, is 6.7 (significant, P < 0.0001). Six microarray hybridizations were performed between three samples, and GEL50 = 1.49-fold. The right-hand curve is from an experiment on rich media. The model has a highly significant fit (χ2 = 1458.7, P < 0.0001). The estimated intercept for the log odds, b, of a affirmative significance call is -4.0 (significant, P < 0.00001), and the estimated slope with log2 factor of difference in gene expression, m, is 4.3 (significant, P < 0.0001). The data were restricted to five microarray hybridizations among three samples, and GEL50 = 1.91-fold. D) Townsend et al. [10], comparing expression in two natural isolates of yeast at log-phase growth. The model has a highly significant fit (χ2 = 925.5, P < 0.0001). The estimated intercept for the log odds, b, of an affirmative significance call is -2.9 (significant, P < 0.0001), and the estimated slope, m, is 4.5 (significant, P < 0.0001). Ten microarray comparisons were performed among four samples, and GEL50 = 1.56-fold.