| Literature DB >> 19455234 |
Abbas Khalili1, Dustin Potter, Pearlly Yan, Lang Li, Joe Gray, Tim Huang, Shili Lin.
Abstract
With state-of-the-art microarray technologies now available for whole genome CpG island (CGI) methylation profiling, there is a need to develop statistical models that are specifically geared toward the analysis of such data. In this article, we propose a Gamma-Normal-Gamma (GNG) mixture model for describing three groups of CGI loci: hypomethylated, undifferentiated, and hypermethylated, from a single methylation microarray. This model was applied to study the methylation signatures of three breast cancer cell lines: MCF7, T47D, and MDAMB361. Biologically interesting and interpretable results are obtained, which highlights the heterogeneity nature of the three cell lines. This underlies the premise for the need of analyzing each of the microarray slides individually as opposed to pooling them together for a single analysis. Our comparisons with the fitted densities from the Normal-Uniform (NU) mixture model in the literature proposed for gene expression analysis show an improved goodness of fit of the GNG model over the NU model. Although the GNG model was proposed in the context of single-slide methylation analysis, it can be readily adapted to analyze multi-slide methylation data as well as other types of microarray data.Entities:
Keywords: CpG islands; breast cancer cell lines; methylation/epigenetic signature; microarrays; mixture modeling
Year: 2007 PMID: 19455234 PMCID: PMC2675845
Source DB: PubMed Journal: Cancer Inform ISSN: 1176-9351
Figure 1.Heat map for the normalized log-ratio of the Cy5 to Cy3 intensity for the three analyzed data sets. Color scheme represents relative ranking of the log-ratio with green denoting a low ranking, or relatively small ratio, and red denoting a high ranking, or relatively large ratio. The x and y axis of the plots denote the position of the probe on the microarray.
Figure 2.Scatter plots for the average log intensity versus the log ratio for each of the three data sets. A–C: The un-normalized data is plotted. D–F: The loess normalized data is plotted with the hyper- and hypomethylated probes highlighted in red and green, respectively. The unclassified probes are highlighted in blue. The black bullets in D are some of the genes that we had validated as hypermethylated.
Parameter Estimates in the Three Experiments.
| (α̂1, β̂1) | (0.94, 0.42) | (1.07, 0.31) | (0.95, 0.37) |
| (μ̂, σ̂) | (0.002, 0.24) | (−0.01, 0.21) | (0.01, 0.20) |
| (α̂2, β̂2) | (1.11, 0.31) | (0.93, 0.36) | (0.91, 0.24) |
| (π̂1, π̂2, π̂3) | (0.16, 0.68, 0.16) | (0.13, 0.73, 0.15) | (0.16, 0.71, 0.14) |
Figure 3.Density plots and QQ plots of fitted model.
A–C: density plots of fitted model superimposed on observed data histograms for each of the three datasets. D–F: QQ-plots of the fitted model and the observed empirical distribution, for each of the three datasets
Kullback-Leibler Distance Between the Fitted Models (GNG or NU) and the Ob served Data.
| K-L Distance | MCF7 | T47D | MDAMB361 |
|---|---|---|---|
| KL(GNG, Obs.) | 0.00055 | 0.00047 | 0.00086 |
| KL(UN, Obs.) | 0.022 | 0.017 | 0.026 |
| (KL(UN, Obs.) − KL(GNG, Obs.))/KL(UN, Obs.) | 0.97 | 0.97 | 0.97 |
Figure 4.Venn diagram denoting the intersection between the sets of hypermethylated genes for each of the three experiments. The radius of each circle is relative to the number of genes in each set.