| Literature DB >> 16872497 |
Paul Marjoram1, Jing Chang, Peter W Laird, Kimberly D Siegmund.
Abstract
BACKGROUND: DNA methylation, a molecular feature used to investigate tumor heterogeneity, can be measured on many genomic regions using the MethyLight technology. Due to the combination of the underlying biology of DNA methylation and the MethyLight technology, the measurements, while being generated on a continuous scale, have a large number of 0 values. This suggests that conventional clustering methodology may not perform well on this data.Entities:
Mesh:
Year: 2006 PMID: 16872497 PMCID: PMC1555616 DOI: 10.1186/1471-2105-7-361
Source DB: PubMed Journal: BMC Bioinformatics ISSN: 1471-2105 Impact factor: 3.169
Figure 1Distribution of methylation values for 91 genes in 48 samples. A histogram of methylation values (PMR) is shown. PMR values were transformed using the natural log. Zeros were assigned a value of -5.5, a value slightly below the lowest log-transformed value. The x-axis shows the methylation value. The y-axis shows the percentage of values in that range.
Figure 2Mean of the log-transformed positive PMR. Mean of the log-transformed positive PMR values are plotted against the percent of positive PMR values in (a) lung cancer cell lines (87 samples/21 genes; x: small cell-predicting genes/o: non predicting genes) and (b) colon cancer tissue (48 samples/91 genes; x:CIMP-predicting genes/o: non predicting genes).
Misclassification rates for analysis of two observed data sets
| Colon cancer Error rate (Silhouette width/Dunn's index) | Lung cancer Error rate (Silhouette width/Dunn's index) | |
| Bernoulli-lognormal model | 0.04 (0.15/1.16) | 0.31 (0.29/1.19) |
| Single threshold model | 0.01 (0.17/1.20) | 0.30 (0.29/1.25) |
| k-means | 0.04 (0.17/1.18) | 0.38 (0.35/1.25) |
| PAM | 0.04 (0.17/1.18) | 0.30 (0.29/1.18) |
| HCD | 0.04 (0.15/1.18) | 0.38 (0.35/1.25) |
| MCLUST | 0.04 (0.14/1.12) | 0.36 (0.30/1.24) |
| SOM | 0.21 (0.12/1.14) | 0.37 (0.34/1.20) |
Misclassification rates (standard error) by the number of genes selected (48 samples)
| Misclassification rate (SE) | |||||
| Percentage of genes selected | No. of CIMP genes | No. of non-CIMP genes | Bernoulli- lognormal | Single threshold | k-means |
| 100% | 15 | 76 | 0.018 (0.020) | 0.006 (0.009) | 0.010 (0.002) |
| 80% | 12 | 61 | 0.030 (0.004) | 0.009 (0.002) | 0.020 (0.004) |
| 60% | 9 | 46 | 0.067 (0.007) | 0.022 (0.003) | 0.070 (0.013) |
| 40% | 6 | 30 | 0.147 (0.013) | 0.073 (0.009) | 0.140 (0.014) |
| 20% | 3 | 15 | 0.265 (0.012) | 0.177 (0.013) | 0.283 (0.016) |
| 0.035 (0.004) | 0.020 (0.004) | 0.037 (0.005) | |||
Misclassification rates for different pair wise correlations within the two gene clusters (48 samples/91 genes)
| Misclassification rate (SE) | |||
| Pair wise correlation | Bernoulli- lognormal | Single threshold | k-means |
| 0.01 | 0.024 (0.004) | 0.013 (0.004) | 0.024 (0.004) |
| 0.05 | 0.076 (0.008) | 0.062 (0.006) | 0.119 (0.012) |
| 0.1 | 0.155 (0.012) | 0.169 (0.015) | 0.220 (0.015) |
| 0.2 | 0.229 (0.013) | 0.235 (0.012) | 0.342 (0.012) |
Summary statistics for colorectal cancer data
| CIMP group | Non-CIMP group | |
| CIMP-predicting genes (N = 15) | 4.15 (92) | 0.95 (61) |
| Other genes (N = 76) | 3.02 (82) | 1.74 (72) |