| Literature DB >> 12659637 |
Abstract
BACKGROUND: Using suitable error models for gene expression measurements is essential in the statistical analysis of microarray data. However, the true probabilistic model underlying gene expression intensity readings is generally not known. Instead, in currently used approaches some simple parametric model is assumed (usually a transformed normal distribution) or the empirical distribution is estimated. However, both these strategies may not be optimal for gene expression data, as the non-parametric approach ignores known structural information whereas the fully parametric models run the risk of misspecification. A further related problem is the choice of a suitable scale for the model (e.g. observed vs. log-scale).Entities:
Mesh:
Year: 2003 PMID: 12659637 PMCID: PMC153502 DOI: 10.1186/1471-2105-4-10
Source DB: PubMed Journal: BMC Bioinformatics ISSN: 1471-2105 Impact factor: 3.169
Examples for extended quasi-log-likelihood functions
| Comment | ||
| σ2 | normal distribution | |
| μσ2 | Poisson distribution (σ2 = 1) | |
| μ2σ2 | approx. Gamma distribution | |
| (μ - β)2σ2 + ρ2 | this paper |
Parameter mapping in the simulation study
| observed scale | normal scale |
| μ = ( | |
| σ2 = | |
| var( | var( |
Figure 1Variance-mean relationship for simulated data: true value (red), maximum-likelihood estimate β (green), maximum-quasi-likelihood estimate (blue)
Parameter estimates (true model: convolution of normal and log-normal distribution)
| EQL | ANL | true value | |||||
| Replicates | 4 | 10 | 20 | 4 | 10 | 20 | |
| err( | 0.0008 | 0.0007 | 0.0006 | 0.0010 | 0.0014 | 0.0009 | 0 |
| 20605 | 20012 | 19332 | 22979 | 24178 | 25161 | 25000 | |
| 0.1906 | 0.2035 | 0.2032 | 0.2340 | 0.2395 | 0.2427 | 0.25 | |
| 4150.1 | 4559.8 | 4669.6 | 4995.4 | 5053.6 | 5082.3 | 5000 | |
| -log | 280822 | 708618 | 1420480 | 280205 | 708308 | 1419534 | |
err() = avg((- μ)/μ) EQL: all parameters estimated via EQL ANL: all parameters estimated using asinh-normal assumption
Parameter estimates (true model: ANL)
| EQL | ANL | true value | |||||
| Replicates | 4 | 10 | 20 | 4 | 10 | 20 | |
| err( | 0.0013 | 0.0007 | 0.0003 | 0.0013 | 0.0007 | 0.0003 | 0 |
| 20646 | 20686 | 19102 | 24848 | 25206 | 24999 | 25000 | |
| 0.1907 | 0.2073 | 0.2065 | 0.2176 | 0.2376 | 0.2427 | 0.25 | |
| 4132.2 | 4539.3 | 4591.8 | 4293.4 | 4727.8 | 4876.3 | 5000 | |
| -log | 280546 | 708608 | 1419646 | 280362 | 707998 | 1418279 | |
See Table 3 for abbreviations.
Parameter estimates (true model: Gamma distribution)
| EQL | ANL | true value | |||||
| Replicates | 4 | 10 | 20 | 4 | 10 | 20 | |
| err( | 0.0001 | 0.0006 | 0.0001 | 0.0001 | 0.0006 | 0.0001 | 0 |
| -647.63 | 973.56 | 988.59 | 7616.7 | 5663.5 | 4821.5 | 0 | |
| 0.2142 | 0.2458 | 0.2538 | 0.2308 | 0.2479 | 0.2519 | 0.25 | |
| 34.713 | 7.7921 | 5.3068 | 3991.5 | 3846.1 | 3818.3 | 0 | |
| -log | 289207 | 728069 | 1462227 | 289469 | 728668 | 1463451 | |
See Table 3 for abbreviations.
Fit to Leukemia data
| -log | ||||
| EQL | 0.0010 | 0.6895 | 0.0054 | 825827 |
| ANL | 0.0010 | 0.7957 | 0.0216 | 828054 |
See Table 3 for abbreviations.
Figure 2Number of differentially expressed genes in dependence of the nominal α value (type I error), computed using the approximate error model (open triangles) and the ANL model (filled triangles).