| Literature DB >> 18431446 |
Abstract
Control-treatment design is widely used in microarray gene expression experiments. The purpose of such a design is to detect genes that express differentially between the control and the treatment. Many statistical procedures have been developed to detect differentially expressed genes, but all have pros and cons and room is still open for improvement. In this study, we propose a Bayesian mixture model approach to classifying genes into one of three clusters, corresponding to clusters of downregulated, neutral, and upregulated genes, respectively. The Bayesian method is implemented via the Markov chain Monte Carlo (MCMC) algorithm. The cluster means of down- and upregulated genes are sampled from truncated normal distributions whereas the cluster mean of the neutral genes is set to zero. Using simulated data as well as data from a real microarray experiment, we demonstrate that the new method outperforms all methods commonly used in differential expression analysis.Entities:
Year: 2008 PMID: 18431446 PMCID: PMC2292802 DOI: 10.1155/2008/892927
Source DB: PubMed Journal: Int J Plant Genomics ISSN: 1687-5389
Parameters used in the simulation experiment and their estimates from the Bayesian mixture model analysis.
| Parameter |
|
|
|
|
| |
|---|---|---|---|---|---|---|
| True | Group 1 | 3.5 | 1.2 | 0.005 | 0.01 | 0.03 |
| Group 2 | 4 | 0 | 0.345 | 0.04 | ||
| Group 3 | 4.5 | −1 | 0.02 | 0.02 | ||
| Group 4 | 0 | 0 | 0.615 | 0.04 | ||
| Group 5 | 0 | 0.8 | 0.005 | 0.01 | ||
| Group 6 | 0 | 0.9 | 0.01 | 0.01 | ||
|
| ||||||
| True (combined) | Cluster 1 | — | −1 | 0.020 | 0.02 | 0.03 |
| Cluster 2 | — | 0 | 0.960 | 0.04 | ||
| Cluster 3 | — | 0.95 | 0.020 | 0.03 | ||
|
| ||||||
| Estimated | Cluster 1 | — | −0.984 | 0.023 | 0.03 | 0.03 |
| Cluster 2 | — | 0 | 0.937 | 0.04 | ||
| Cluster 3 | — | 0.702 | 0.040 | 0.11 | ||
Figure 1Original expression patterns for the six simulated groups of genes. In each plot, the first half represents the observed data from chips 1–8 and the second half represents the observed data from chips 9–16.
Figure 2Expression patterns for the three combined clusters after normalization. See Figure 1 for the legends.
Numbers of genes assigned into each of the three clusters for six different methods of differential expression analysis. (The sum of each column within a method represents the true number of genes simulated from that cluster and the sum of each row represents the number of genes assigned into that cluster.)
| Method | Estimate | True | Sum | Type I error | ||
|---|---|---|---|---|---|---|
| Cluster 1 | Cluster 2 | Cluster 3 | ||||
| I | Cluster 1 | 20 | 1 | 0 | 21 | 0.002 |
| Cluster 2 | 0 | 958 | 0 | 958 | ||
| Cluster 3 | 0 | 1 | 20 | 21 | ||
|
| ||||||
| II | Cluster 1 | 20 | 205 | 0 | 225 | 0.459 |
| Cluster 2 | 0 | 519 | 0 | 519 | ||
| Cluster 3 | 0 | 236 | 20 | 256 | ||
|
| ||||||
| III | Cluster 1 | 0 | 0 | 0 | 0 | 0.033 |
| Cluster 2 | 0 | 928 | 0 | 928 | ||
| Cluster 3 | 20 | 32 | 20 | 72 | ||
|
| ||||||
| IV | Cluster 1 | 20 | 37 | 0 | 57 | 0.073 |
| Cluster 2 | 0 | 890 | 0 | 890 | ||
| Cluster 3 | 0 | 33 | 20 | 53 | ||
|
| ||||||
| V | Cluster 1 | 20 | 153 | 0 | 173 | 0.325 |
| Cluster 2 | 0 | 648 | 0 | 648 | ||
| Cluster 3 | 0 | 159 | 20 | 179 | ||
|
| ||||||
| VI | Cluster 1 | 20 | 5 | 0 | 25 | 0.011 |
| Cluster 2 | 0 | 949 | 0 | 949 | ||
| Cluster 3 | 0 | 6 | 20 | 26 | ||
|
| ||||||
| Sum (method) | 20 | 960 | 20 | 1000 | ||
Parameters estimated for the mice data using the Bayesian mixture model analysis.
|
|
|
|
| |
|---|---|---|---|---|
| Cluster 1 | −0.0065 | 0.1724 | 0.0616 | 0.0904 |
| Cluster 2 | 0 | 0.8246 | 0.0006 | |
| Cluster 3 | 0.6418 | 0.0030 | 2.0266 |
Numbers of genes assigned to the three clusters for the mice data for six different methods.
| Method | Cluster 1 | Cluster 2 | Cluster 3 |
|---|---|---|---|
| (down) | (neutral) | (up) | |
| I | 0 | 6332 | 10 |
| II | 40 | 6182 | 120 |
| III | 66 | 6276 | 0 |
| IV | 13 | 6300 | 29 |
| V | 12 | 6293 | 37 |
| VI | 0 | 6329 | 13 |
Some differentially expressed genes for the mice data detected by six different methods.
|
|
a: genes reported in Callow et al. [23]; b: a subset of genes not reported in Callow et al. [23].
Figure 3Expression patterns of some genes from the mice data. The first two rows (representing ten genes) are genes detected by all methods. The third row (five genes) are genes detected by none of the methods. The last row (five genes) are those detected by Methods II–VI but not by Methods I.