| Literature DB >> 17439644 |
Samuel O M Manda1, Rebecca E Walls, Mark S Gilthorpe.
Abstract
BACKGROUND: In many laboratory-based high throughput microarray experiments, there are very few replicates of gene expression levels. Thus, estimates of gene variances are inaccurate. Visual inspection of graphical summaries of these data usually reveals that heteroscedasticity is present, and the standard approach to address this is to take a log2 transformation. In such circumstances, it is then common to assume that gene variability is constant when an analysis of these data is undertaken. However, this is perhaps too stringent an assumption. More careful inspection reveals that the simple log2 transformation does not remove the problem of heteroscedasticity. An alternative strategy is to assume independent gene-specific variances; although again this is problematic as variance estimates based on few replications are highly unstable. More meaningful and reliable comparisons of gene expression might be achieved, for different conditions or different tissue samples, where the test statistics are based on accurate estimates of gene variability; a crucial step in the identification of differentially expressed genes.Entities:
Mesh:
Year: 2007 PMID: 17439644 PMCID: PMC1876253 DOI: 10.1186/1471-2105-8-124
Source DB: PubMed Journal: BMC Bioinformatics ISSN: 1471-2105 Impact factor: 3.169
Posterior mean (SD) for various variance mixture models
| Model | Bayesian Model | EM algorithm | ||
| π | π | |||
| 1 | 0.2276 (0.009) | 0.7624 (0.008) | 0.2286 | 0.7607 |
| 2 | 0.7724 (0.0085) | 0.3426 (0.0024) | 0.7713 | 0.3422 |
| 22911.89 | 22657.89 | |||
| 0.0677 (0.0039) | 1.0220 (0.2627) | 0.0688 | 1.0225 | |
| 2 | 0.4462 (0.0170) | 0.4977 (0.0077) | 0.4546 | 0.4970 |
| 3 | 0.4860 (0.0182) | 0.2841 (0.0035) | 0.4766 | 0.2839 |
| 21819.82 | 21703.82 | |||
| 1 | 0.2303 (0.0146) | 0.6125 (0.0129) | 0.2317 | 0.6125 |
| 2 | 0.0384 (0.0045) | 1.1547 (0.0321) | 0.0370 | 1.1523 |
| 3 | 0.2318 (0.0218) | 0.2413 (0.0053) | 0.2285 | 0.2427 |
| 4 | 0.4994 (0.0201) | 0.3775 (0.0161) | 0.5028 | 0.3782 |
| 21727.75 | 21620.86 | |||
Figure 1Distribution of the . Plot (A) is a histogram based on gene-specific standard error, (B) is based on the four-component mixture model estimated standard error and (C) shows a histogram based on homogeneous standard error.
Classification of genes under the Bayesian and EM mixture models, with four components
| EM model | Bayesian model | Total | |||
| 1 | 2 | 3 | 4 | ||
| 1 | 1621 | 3 | 0 | 52 | 1676 |
| 2 | 18 | 242 | 0 | 0 | 260 |
| 3 | 0 | 0 | 1848 | 161 | 2009 |
| 4 | 42 | 0 | 111 | 5118 | 5271 |
| Total | 1681 | 245 | 1959 | 5331 | 9216 |
Top ten ranked genes by different variance models (Genes are listed by their codes)
| Gene-specific | Constant | 2 Classes | 3 Classes | 4 Classes | Weighted |
| 4323 | 2602 | 1939 | 2143 | 2143 | 2143 |
| 4532 | 1945 | 2143 | 3181 | 4069 | 4069 |
| 4069 | 257 | 3181 | 4069 | 4323 | 4323 |
| 8076 | 1400 | 7161 | 8903 | 4532 | 4532 |
| 4331 | 1939 | 2003 | 4323 | 7496 | 8076 |
| 6635 | 2143 | 4069 | 4347 | 8076 | 3181 |
| 2026 | 3181 | 6731 | 4532 | 4586 | 4586 |
| 2143 | 7003 | 6343 | 6592 | 5674 | 8903 |
| 4586 | 4323 | 8649 | 8628 | 8048 | 2542 |
| 8892 | 3151 | 4323 | 5746 | 1631 | 5674 |