| Literature DB >> 18782434 |
E Andres Houseman1, Brock C Christensen, Ru-Fang Yeh, Carmen J Marsit, Margaret R Karagas, Margaret Wrensch, Heather H Nelson, Joseph Wiemels, Shichun Zheng, John K Wiencke, Karl T Kelsey.
Abstract
BACKGROUND: Epigenetics is the study of heritable changes in gene function that cannot be explained by changes in DNA sequence. One of the most commonly studied epigenetic alterations is cytosine methylation, which is a well recognized mechanism of epigenetic gene silencing and often occurs at tumor suppressor gene loci in human cancer. Arrays are now being used to study DNA methylation at a large number of loci; for example, the Illumina GoldenGate platform assesses DNA methylation at 1505 loci associated with over 800 cancer-related genes. Model-based cluster analysis is often used to identify DNA methylation subgroups in data, but it is unclear how to cluster DNA methylation data from arrays in a scalable and reliable manner.Entities:
Mesh:
Substances:
Year: 2008 PMID: 18782434 PMCID: PMC2553421 DOI: 10.1186/1471-2105-9-365
Source DB: PubMed Journal: BMC Bioinformatics ISSN: 1471-2105 Impact factor: 3.169
Classification error and computation time for various clustering methods applied to simulated data.
| 25 | 33.2 | 44.7 | 9.9 | 16.4 | 12.6 | 15.5 | 15.4 | |
| 50 | 32.5 | 43.8 | 5.0 | 10.0 | 6.2 | 5.5 | 5.5 | |
| 500 | 33.9 | 38.4 | 3.5 | 11.3 | 1.5 | 0.1 | 0.1 | |
| 1000 | 34.0 | 38.5 | 9.2 | 14.4 | 1.1 | 0.1 | 0.1 | |
| 5 | 59.4 | 60.5 | 65.1 | 65.8 | 59.4 | 59.4 | 59.4 | |
| 10 | 58.9 | 60.0 | 66.9 | 67.5 | 59.2 | 59.2 | 59.2 | |
| 25 | 30.0 | 39.6 | 4.1 | 8.1 | 0.0 | 0.0 | 0.0 | |
| 50 | 29.9 | 39.6 | 3.6 | 6.4 | 0.3 | 0.3 | 0.3 | |
| 25 | 0.00 | 0.04 | 4.15 | 1.18 | 36.39 | 13.80 | 13.83 | |
| 50 | 0.01 | 0.05 | 3.29 | 1.09 | 51.14 | 14.23 | 14.23 | |
| 500 | 0.03 | 0.08 | 2.98 | 1.04 | 436.82 | 90.99 | 91.05 | |
| 1000 | 0.06 | 0.11 | 3.05 | 1.10 | 848.10 | 176.99 | 176.81 | |
| 5 | 0.00 | 0.04 | 2.80 | 1.21 | 29.73 | 5.14 | 6.09 | |
| 10 | 0.00 | 0.04 | 2.01 | 1.13 | 46.48 | 9.69 | 10.05 | |
| 25 | 0.00 | 0.01 | 3.33 | 1.23 | 34.56 | 8.85 | 8.86 | |
| 50 | 0.01 | 0.01 | 2.63 | 1.16 | 47.52 | 10.90 | 10.86 | |
HC = Hierarchical clustering
DynTree = Hierarchical clustering with classes determined by dynamic tree cutting
HOPACH(best) = HOPACH with 'best' number of classes
HOPACH(greedy) = HOPACH with 'greedy' number of classes
MM(1–6) = Beta mixture model fitting 1–6 classes sequentially
RPMM (ICL-BIC) = Recursively partitioned mixture model employing ICL-BIC
RPMM (BIC) = Recursively partitioned mixture model employing BIC
J = Number of loci considered in analysis
Number of classes obtained for various clustering methods applied to simulated data
| 25 | 3 | 2.5 | 0.50 | 25 | 2 | 2.0 | 0.00 | |
| 50 | 3 | 2.5 | 0.50 | 50 | 2 | 2.0 | 0.00 | |
| 500 | 3 | 2.7 | 0.58 | 500 | 2 | 2.0 | 0.00 | |
| 1000 | 3 | 2.8 | 0.59 | 1000 | 2 | 2.0 | 0.00 | |
| 25 | 40 | 38.0 | 12.10 | 5 | 17 | 18.9 | 9.10 | |
| 50 | 35 | 35.4 | 11.38 | 10 | 14 | 15.0 | 8.27 | |
| 500 | 23 | 23.0 | 9.52 | 25 | 25 | 24.7 | 9.80 | |
| 1000 | 23 | 23.1 | 9.47 | 50 | 25 | 25.3 | 7.34 | |
| 25 | 8 | 13.4 | 14.41 | 5 | 5 | 7.1 | 6.35 | |
| 50 | 6 | 11.9 | 12.66 | 10 | 5 | 7.1 | 7.11 | |
| 500 | 5 | 6.6 | 5.19 | 25 | 7.5 | 10.8 | 8.52 | |
| 1000 | 4 | 6.2 | 4.41 | 50 | 8 | 10.1 | 7.85 | |
| 25 | 8 | 7.7 | 2.00 | 5 | 2 | 2.0 | 0.10 | |
| 50 | 5 | 5.6 | 1.32 | 10 | 2 | 2.4 | 2.28 | |
| 500 | 5 | 5.0 | 0.22 | 25 | 4 | 4.0 | 0.20 | |
| 1000 | 5 | 5.0 | 0.00 | 50 | 4 | 4.1 | 0.58 | |
DynTree = Hierarchical clustering with classes determined by dynamic tree cutting
HOPACH(best) = HOPACH with 'best' number of classes
HOPACH(greedy) = HOPACH with 'greedy' number of classes
RPMM = Recursively partitioned mixture model employing BIC
J = Number of loci considered in analysis
Figure 1Profiles of latent classes among normal tissue samples. Average value (equation 1) depicted by color: yellow = 1.0, black = 0.5, blue = 0.0. Classes are separated by yellow dividing line, with height indicating the relative proportion of subjects within each class. Loci are ordered by their position in a dendrogram obtained via hierarchical clustering.
Cross-classification of sample type with latent classes obtained from proposed method
| Class | bladder | blood (ad) | blood (nb) | brain | cervical | H & N | kidney | lung | placenta | pleura | sm intestine | Total |
| 000 | 3 | 2 | 12 | 8 | 3 | 28 | ||||||
| 0010 | 19 | 5 | 24 | |||||||||
| 0011 | 20 | 2 | 1 | 23 | ||||||||
| 0100 | 2 | 2 | 1 | 4 | 2 | 2 | 1 | 14 | ||||
| 01010 | 1 | 4 | 5 | |||||||||
| 0101100 | 3 | 3 | ||||||||||
| 0101101 | 3 | 3 | ||||||||||
| 010111 | 2 | 2 | ||||||||||
| 01100 | 1 | 1 | 2 | |||||||||
| 01101 | 5 | 5 | ||||||||||
| 0111 | 13 | 13 | ||||||||||
| 1000 | 3 | 3 | ||||||||||
| 100100 | 2 | 2 | ||||||||||
| 100101 | 4 | 4 | ||||||||||
| 1001100 | 3 | 3 | ||||||||||
| 1001101 | 4 | 4 | ||||||||||
| 100111 | 5 | 5 | ||||||||||
| 101 | 34 | 34 | ||||||||||
| 1100 | 18 | 18 | ||||||||||
| 1101 | 12 | 12 | ||||||||||
| 11100 | 5 | 5 | ||||||||||
| 11101 | 3 | 3 | ||||||||||
| 1111 | 1 | 1 | 2 | |||||||||
| Total | 5 | 30 | 55 | 12 | 3 | 11 | 6 | 53 | 19 | 18 | 5 | 217 |
Classes are labeled with the sequence vector representing the terminal node from which the class was derived.
Cross-classification of sample type with clusters obtained from HOPACH
| Class | bladder | blood (ad) | blood (nb) | brain | cervical | H & N | kidney | lung | placenta | pleura | sm intestine | Total |
| 1 | 30 | 1 | 31 | |||||||||
| 2 | 55 | 55 | ||||||||||
| 3 | 10 | 10 | ||||||||||
| 4 | 2 | 1 | 10 | 13 | ||||||||
| 5 | 5 | 2 | 6 | 53 | 18 | 5 | 89 | |||||
| 6 | 1 | 1 | ||||||||||
| 7 | 16 | 16 | ||||||||||
| 8 | 1 | 1 | ||||||||||
| 9 | 1 | 1 | ||||||||||
| Total | 5 | 30 | 55 | 12 | 3 | 11 | 6 | 53 | 19 | 18 | 5 | 217 |
Cross-classification of latent classes obtained from proposed method with clusters obtained from HOPACH
| Class | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | Total |
| 000 | 28 | 28 | ||||||||
| 0010 | 24 | 24 | ||||||||
| 0011 | 23 | 23 | ||||||||
| 0100 | 2 | 12 | 14 | |||||||
| 01010 | 5 | 5 | ||||||||
| 0101100 | 3 | 3 | ||||||||
| 0101101 | 3 | 3 | ||||||||
| 010111 | 1 | 1 | 2 | |||||||
| 01100 | 1 | 1 | 2 | |||||||
| 01101 | 4 | 1 | 5 | |||||||
| 0111 | 12 | 1 | 13 | |||||||
| 1000 | 3 | 3 | ||||||||
| 100100 | 2 | 2 | ||||||||
| 100101 | 4 | 4 | ||||||||
| 1001100 | 3 | 3 | ||||||||
| 1001101 | 4 | 4 | ||||||||
| 100111 | 5 | 5 | ||||||||
| 101 | 34 | 34 | ||||||||
| 1100 | 18 | 18 | ||||||||
| 1101 | 12 | 12 | ||||||||
| 11100 | 5 | 5 | ||||||||
| 11101 | 3 | 3 | ||||||||
| 1111 | 1 | 1 | 2 | |||||||
| Total | 31 | 55 | 10 | 13 | 89 | 1 | 16 | 1 | 1 | 217 |
Rows represent classes from proposed method, labeled with the sequence vector representing the terminal node from which the class was derived. Columns represent clusters from HOPACH.
Cross-classification of sample type with latent classes obtained from proposed method among subjects within the 5th class obtained by HOPACH
| Class | bladder | cervical | kidney | lung | pleura | sm intestine | Total |
| 000 | 3 | 2 | 12 | 8 | 3 | 28 | |
| 0010 | 19 | 5 | 24 | ||||
| 0011 | 20 | 2 | 1 | 23 | |||
| 0100 | 2 | 1 | 4 | 2 | 2 | 1 | 12 |
| 010111 | 1 | 1 | |||||
| 01100 | 1 | 1 | |||||
| Total | 5 | 2 | 6 | 53 | 18 | 5 | 89 |
Classes are labeled with the sequence vector representing the terminal node from which the class was derived.
Figure 2Unadjusted Average Beta values obtained from Illumina GoldenGate methylation platform for 1413 tumor suppressor loci on 217 normal tissue samples. Yellow = 1.0, black = 0.5, blue = 0.0. Autosomal chromosomes are grouped to aid visualization. For each chromosome group, loci are ordered by their position in a dendrogram produced by hierarchical clustering. Similarly, within tissue sample groups, samples are ordered by their position in a hierarchical clustering dendrogram.
Figure 3Examples of simulated data. Yellow = 1.0, black = 0.5, blue = 0.0. True classes indicated and separated by yellow dividing line. Height of region indicates the relative number of subjects in each class.