| Literature DB >> 16412220 |
Morgan N Price1, Adam P Arkin, Eric J Alm.
Abstract
BACKGROUND: Differentially expressed genes are typically identified by analyzing the variation between replicate measurements. These procedures implicitly assume that there are no systematic errors in the data even though several sources of systematic error are known.Entities:
Mesh:
Substances:
Year: 2006 PMID: 16412220 PMCID: PMC1397872 DOI: 10.1186/1471-2105-7-19
Source DB: PubMed Journal: BMC Bioinformatics ISSN: 1471-2105 Impact factor: 3.169
Figure 1Accuracy of . (A) A typical simulation matching the OpWise model. The solid line shows the estimated log odds for each gene () as a function of the "ideal" log odds based on the true values of the hyperparameters. The slope is from linear regression with the intercept fixed at zero. (B) Slopes from 50 simulations for each data set's hyperparameters. The boxes show the first and third quartiles and the medians, the whiskers show the most extreme point within 1.5 times the inter-quartile range of the box, and the points indicate outliers. (C) A typical "uncoupled" simulation where means and variances were independent. We sorted the genes by their estimated log odds into 10 bins of equal size. For each bin, a point shows the true log odds (from the number of genes with μ> 0 and μ< 0) and the average of the estimated log odds. Logistic regression gave a slope of 0.97 (solid line). (D) Slopes from 50 uncoupled simulations for each data set and from 50 heavy-tailed simulations for the ecox data set. The dashed lines in (A) and (C) show x = y.
Systematic bias in four biological data sets.
| dvSalt30 | ecox | shHeat5 | shCold5 | |
| Typical bias | 0.25 | 0.12 | 0.37 | 0.88 |
| Bias/signal (%) | 70.4% | 19.6% | 49.9% | 86.9% |
| Bias/replication error (%) | 72.7% | 35.8% | 143.1% | 199.1% |
| Bias/total (%) | 52.4% | 15.8% | 47.2% | 74.6% |
| Significance of bias | ||||
| Likelihood ratio | 1.74e+02 | 9.38e+00 | 1.48e+03 | 1.81e+03 |
| | < 10-77 | < 10-5 | < 10-646 | < 10-786 |
The typical size of the bias in the apparent log2-ratio is the square root of its variance, or , where E(1/θi) = α/(ν - 1). The bias over the signal is the square root of the ratio of variances (). The bias over the replicate error is also the square root of the ratio of variances (), and considers a single measurement (is not divided by the number of replicates). We also report the typical bias divided by the standard deviation of the observed log-changes mi. To show that the bias is statistically significant, we compared the likelihood ratio of the best-fitting model given systematic error to that without (with γ = ∞), using Eq. 10. Because we are testing whether γ lies at a boundary, in the absence of bias the distribution of 2·log(ratio) approximates a 50:50 mixture of two chi-squared distributions with 0 and 1 degrees of freedom [26].
Figure 2Single-gene significance and agreement with operons. For each data set and for three methods of assessing significance (OpWise, OpWise without bias, and significance analysis of microarrays), we divided the changers into eight groups of genes with different levels of confidence. The x axis shows the average confidence within each group of genes. For each group, the y axis shows the adjusted agreement with operon pairs (the adjusted proportion of pairs which have the same sign of log-ratio), which ranges from 0 for random data to 1 for perfect measurements. We also show average results from simulations for each data set (simulated and analyzed with the OpWise model). The error bars give the 95% confidence interval (from a t test) for the mean agreement for each group from the OpWise significance values. The odd left side of the ecox SAM curve is due to noise in the local FDR.
Figure 3Sensitivity of single-gene and operon-wise methods. For each data set, we show the cumulative number of changers identified at varying levels of significance. Note the log scales. The horizontal line is at 0.01. Genes that are not in operons are included in the operon-wise results.
Genes with significant changes in expression as identified by OpWise methods and by SAM.
| 1-gene (OpWise) | 220 | 100% | 1062 | 98% | 1002 | 97% | 187 | 100% |
| operon-wise | 401 | 99% | 1318 | 100% | 1284 | 99% | 374 | 100% |
| no-bias | 1090 | 90% | 1269 | 98% | 3020 | 87% | 3063 | 70% |
| SAM | 852 | 94% | 957 | 99% | 3348 | 83% | 3258 | 68% |
For OpWise, genes were selected if the two-tailed confidence was 95% or higher (P(μ> 0) < 0.025 or P(μ< 0) > 0.975). For SAM, genes were selected if the false discovery rate was 5% or lower. For each method and for each data set, we report how many genes were selected as significant changers and what percentage of the operon pairs that contain those genes changed in the same direction. This "agreement" should be 100% for perfect microarray data and perfect operon predictions and 50% for random data.