| Literature DB >> 27570841 |
Jonathan Kenyon1, Gabrielle Nickel-Meester2, Yulan Qing3, Gabriela Santos-Guasch4, Ellen Drake4, Shuying Sun5, Xiaodong Bai6, David Wald7, Eric Arts2, Stanton L Gerson8.
Abstract
Normal human hematopoietic stem and progenitor cells (HPC) lose expression of MLH1, an important mismatch repair (MMR) pathway gene, with age. Loss of MMR leads to replication dependent mutational events and microsatellite instability observed in secondary acute myelogenous leukemia and other hematologic malignancies. Epigenetic CpG methylation upstream of the MLH1 promoter is a contributing factor to acquired loss of MLH1 expression in tumors of the epithelia and proximal mucosa. Using single molecule high-throughput bisulfite sequencing we have characterized the CpG methylation landscape from -938 to -337 bp upstream of the MLH1 transcriptional start site (position +0), from 30 hematopoietic colony forming cell clones (CFC) either expressing or not expressing MLH1. We identify a correlation between MLH1 promoter methylation and loss of MLH1 expression. Additionally, using the CpG site methylation frequencies obtained in this study we were able to generate a classification algorithm capable of sorting the expressing and non-expressing CFC. Thus, as has been previously described for many tumor cell types, we report for the first time a correlation between the loss of MLH1 expression and increased MLH1 promoter methylation in CFC derived from CD34+ selected hematopoietic stem and progenitor cells.Entities:
Keywords: Epigenetics; Hematopoietic stem cells; High throughput bisulfite sequencing; Mismatch repair
Year: 2016 PMID: 27570841 PMCID: PMC4996274 DOI: 10.23937/2469-570x/1410031
Source DB: PubMed Journal: Int J Stem Cell Res Ther ISSN: 2469-570X
Donor CFC number, barcode, and corresponding sequence frequency generated.
| Fragment 1 | Fragment 2 | |||||||||
|---|---|---|---|---|---|---|---|---|---|---|
| Sample ID | Clone | Donor | Barcode sequence | MLH1 | Methylated | Unmethylated | CpG Methylation | Methylated | Unmethylated | CpG Methylation |
| BMA01 | BMA01-C13 | 73 | ATATCTCAA | - | 315 | 2175 | 0.145 | 106 | 4706 | 0.023 |
| BMA01 | BMA01-C14 | 73 | ACTAGGTCC | - | 726 | 1922 | 0.378 | 185 | 4220 | 0.044 |
| BMA01 | BMA01-C2 | 73 | CCACGGCCG | - | 494 | 1173 | 0.421 | 63 | 1907 | 0.033 |
| BMA01 | BMA01-C3 | 73 | TCCTTAGCT | - | 2829 | 8051 | 0.351 | 198 | 3721 | 0.053 |
| BMA01 | BMA01-C4 | 73 | GAGACGTAA | - | 2945 | 13298 | 0.221 | 483 | 2891 | 0.167 |
| BMA01 | BMA01-C8 | 73 | GTTATGTAT | - | 1400 | 2578 | 0.543 | 289 | 3385 | 0.085 |
| BMA01 | BMA01-C1 | 73 | GTTTACAGT | + | 2807 | 8399 | 0.334 | 1118 | 5777 | 0.194 |
| BMA01 | BMA01-C11 | 73 | CAATCCCTC | + | 299 | 1710 | 0.175 | 65 | 2599 | 0.025 |
| BMA01 | BMA01-C5 | 73 | ACGGCCCTA | + | 1292 | 5065 | 0.255 | 500 | 9305 | 0.054 |
| BMA01 | BMA01-C7 | 73 | TTGCTAGGT | + | 1548 | 7087 | 0.218 | 1634 | 6205 | 0.263 |
| BMA02 | BMA02-T1C1 | 42 | GAGTAGGCA | - | 2810 | 3381 | 0.831 | 139 | 2570 | 0.054 |
| BMA02 | BMA02-T1C4 | 42 | TACGCTGGA | - | 855 | 978 | 0.874 | 290 | 1092 | 0.266 |
| BMA02 | BMA02-T1C6 | 42 | GGGCCATTG | - | 171 | 439 | 0.390 | 20 | 374 | 0.053 |
| BMA02 | BMA02-T1C8 | 42 | CGTGTACGC | - | 254 | 801 | 0.317 | 47 | 846 | 0.056 |
| BMA02 | BMA02-T2C7 | 42 | GACACCGGT | - | 3751 | 3841 | 0.977 | 60 | 1346 | 0.045 |
| BMA02 | BMA02-T2C8 | 42 | ACCAACCTT | - | 3883 | 3968 | 0.979 | 52 | 1153 | 0.045 |
| BMA02 | BMA02-T3C8 | 42 | TACAGGTTT | - | 805 | 2976 | 0.270 | 147 | 2172 | 0.068 |
| BMA02 | BMA02-T1C12 | 42 | GTACTCATG | + | 297 | 4544 | 0.065 | 93 | 2114 | 0.044 |
| BMA02 | BMA02-T1C9 | 42 | ACTCCGAGT | + | 225 | 779 | 0.289 | 45 | 750 | 0.060 |
| BMA03 | BMA03-T1C1 | 47 | TCTCCACAG | - | 928 | 7605 | 0.122 | 137 | 4139 | 0.033 |
| BMA03 | BMA03-T2C1 | 47 | TGAGCATGG | - | 1301 | 6712 | 0.194 | 175 | 5350 | 0.033 |
| BMA03 | BMA03-T2C4 | 47 | CAGAGTGTT | - | 830 | 6879 | 0.121 | 80 | 3174 | 0.025 |
| BMA03 | BMA03-T2C5 | 47 | AACCGCGTT | - | 1354 | 7333 | 0.185 | 48 | 2573 | 0.019 |
| BMA03 | BMA03-T2C9 | 47 | TCTAATGTT | - | 844 | 4334 | 0.195 | 172 | 3073 | 0.056 |
| BMA03 | BMA03-T1C5 | 47 | AACCCAAGA | + | 1224 | 5789 | 0.211 | 113 | 3275 | 0.035 |
| BMA03 | BMA03-T2C7 | 47 | TCCTTCTGG | + | 6301 | 13041 | 0.483 | 166 | 2503 | 0.066 |
| BMA04 | BMA04-C13 | 74 | GTAGCCTCG | - | 2233 | 6501 | 0.343 | 516 | 3064 | 0.168 |
| BMA04 | BMA04-C14 | 74 | AATGGCTTA | - | 11820 | 37208 | 0.318 | 387 | 2595 | 0.149 |
| BMA04 | BMA04-C6 | 74 | TGCCGGATA | + | 1609 | 5991 | 0.269 | 429 | 2590 | 0.166 |
| BMA04 | BMA04-C7 | 74 | CCCAAGGTG | + | 2063 | 5976 | 0.345 | 474 | 2787 | 0.170 |
Figure 1A) An illustration of the MLH1 promoter region identifying the MLH1 transcriptional start site, CpG residues, CCAAT box, and primer binding locations. CpG sites are numbered from the transcription start site located at position 0, NCBI sapiens chromosome 3 genomic contig, GRCh37.p9 Primary assembly Reference Sequence NT_022517.18 Fragment 1 CpG residues are located at: -896, -884, -872, -809, -807, -786, -776, -765, -731, -722, -714, -708, -694, -692, -690, -686, -683, -679, -669, -665, -656, -644, -636, -629, -626, -624, -620, -618, -608, -600, -597, -572, -565, -543, -530, -525, -509, and -506 bp and Fragment 2 CpG residues are located at: -572, -565, -543, -530, -525, -509, -506, -481, -465, -449, -428, -400, -384, -377, -345, and -339. B) Bias corrected CpG methylation frequency is depicted as a heat map, each block representing the frequency of methylation at a single CpG within a single CFC sample. The methylation frequency at CpG residues are read from right to left along horizontal axis. Each row represents a unique CFC and each column represents a specific CpG. The frequency scale is generated with ( ) yellow is equivalent to a frequency of 1.0, blue ( ) a frequency of 0.5 and black (■) a frequency of 0.0. C) A T-test comparison of the mean frequency of all non-CpG site methylation events to the total frequency of methylation at CpG residues.
Primer sequences used. A-linker adapter forward and reverse sequences are followed by a 9 bp unique [BARCODE] and Fragment 1 or 2 specific forward or reverse primer.
| Name | Sequence 5′ to 3′ |
|---|---|
| MLH1-1f | ACTCAAAATCCTCTACCTTATAATATC |
| MLH1-1r | TTAAAAGAAGTAAGATGGAAG |
| MLH1-2f | ACAAACCAAACACAAAACCCCAT |
| MLH1-2r | TTTAGTTAATAGGAGTAGAGATG |
| A-linker adapter forward | CGTATCGCCTCCCTCGCGCCATCAG[BARCODE][MLH1-1f or MLH1-2f] |
| B-linker adapter reverse | CTATGCGCCTTGCCAGCCCGCTCAG[BARCODE][MLH1-1r or MLH1-2r] |
Figure 2T-test comparison of the average frequency of CpG methylation at the CpG residues of MLH1 non-expressing CFC (n=20) compared to the average CpG methylation at the CpG residues of expressing CFC (n-10) in A) Fragment 1 and B) Fragment 2. Expressing CFC have a significantly lower average frequency of CpG methylation than observed in MLH1 non-expressing CFC.
Figure 3A) An illustration of the MLH1 promoter region identifying the MLH1 transcriptional start site, CpG residues, CCAAT box, and primer binding locations. CART analysis of Fragment 1 B) and the combination of both Fragment 1 & 2 C) showing clustering of similar CpG methylation frequency patterns. Red arrows(→) indicate miss identified CFC and Black vertical arrows(↓)indicate CpG residues identified by CART analysis.
Figure 4CART Decision algorithms generated with CpG methylation frequencies from A) Fragment 1 (f1) and B) Fragment 2 (f2) combined. Each branch node (ellipse) defines a branch point which filters CFC into progressively more homogenous classes. The terminal nodes (rectangles) indicate no further partitioning is necessary (either the size of the node is small or the node is sufficiently homogeneous). Branch nodes are labeled with the majority CFC expression identity; as labeled with an A to indicate the majority CFCs classified within a node lack MLH1 expression while P indicates the majority of CFC expressed MLH1. Misclassification ratio is indicated below each branch and terminal node. The segregating parameter is indicated by the CpG residue location followed by an inequality statement and CpG methylation value indicating the optimal threshold between the two nodes.