| Literature DB >> 31631960 |
Lillian Sun1, Surya Namboodiri2, Emily Chen3, Shuying Sun4.
Abstract
DNA methylation plays a significant role in regulating the expression of certain genes in both cancerous and normal breast tissues. It is therefore important to study within-sample co-methylation, ie, methylation patterns between consecutive sites in a chromosome. In this article, we develop 2 new methods to compare co-methylation patterns between normal and cancerous breast samples. In particular, we investigate the co-methylation patterns of 4 different methylation states/levels separately. Using these 2 methods, we focus on addressing the following questions: How often does 1 methylation state change to other methylation states and how is this change dependent on chromosome distance? What co-methylation patterns do normal and cancerous breast samples have? Do genomic sites with different methylation states/levels have different co-methylation patterns? Our results show that cancerous and normal co-methylation patterns are significantly different. We find that this difference exists even when the physical distance of 2 sites are less than 50 bases. Breast cancer cell lines tend to remain in the same methylation state more often than normal samples, especially for the no/low or high/full methylation states. We also find that the co-methylation region lengths for various methylation states (no/low, partial, and high/full methylation states) are very different. For example, the co-methylation region lengths for partial methylation regions are shorter than the unmethylated or fully methylated regions. Our research may provide a deep understanding of co-methylation patterns. These co-methylation patterns will aid in discovering and understanding new methylation events that may be related to novel biomarkers.Entities:
Keywords: Within-sample co-methylation; bioinformatics; breast cancer
Year: 2019 PMID: 31631960 PMCID: PMC6778999 DOI: 10.1177/1176935119880516
Source DB: PubMed Journal: Cancer Inform ISSN: 1176-9351
Sample section of chromosome 1 breast cancer data.
| chr | Position | Sequence coverage | MC ratio | Methylation state | Distance |
|---|---|---|---|---|---|
| chr1 | 534314 | 6 | 0.666667 | C | 12 |
| chr1 | 534326 | 3 | 1 | D | 3 |
| chr1 | 534329 | 4 | 1 | D | 14 |
| chr1 | 534343 | 4 | 0.25 | B | 17 |
| chr1 | 534360 | 4 | 0.75 | D | 45 |
| chr1 | 534405 | 3 | 0.666667 | C | 31 |
| chr1 | 534436 | 0 | NA | NA | 108 |
Abbreviation: MC, methylation ratio.
Frequency of each methylation state change between consecutive CG Sites.
| Distance | State (%) | A | B | C | D | |
|---|---|---|---|---|---|---|
| Normal breast tissue (HMEC) | [0, 200) | A | 74.023 | 6.737 | 6.401 | 12.839 |
| B | 19.504 | 15.226 | 18.13 | 47.14 | ||
| C | 11.33 | 10.798 | 18.947 | 58.925 | ||
| D | 3.726 | 4.656 | 9.664 | 81.953 | ||
| [200, 500) | A | 25.517 | 13.172 | 17.084 | 44.226 | |
| B | 17.459 | 13.454 | 18.274 | 50.813 | ||
| C | 14.065 | 11.868 | 17.02 | 57.047 | ||
| D | 8.4 | 7.476 | 12.754 | 71.37 | ||
| [500, Inf) | A | 20.769 | 13.505 | 18.161 | 47.565 | |
| B | 17.769 | 14.098 | 18.916 | 49.218 | ||
| C | 15.841 | 12.355 | 16.92 | 54.884 | ||
| D | 9.554 | 7.78 | 12.704 | 69.962 | ||
| Cancerous breast tissue (HCC1954) | [0, 200) | A | 78.109 | 8.521 | 5.177 | 8.192 |
| B | 29.762 | 21.303 | 18.234 | 30.701 | ||
| C | 13.009 | 13.128 | 20.433 | 53.43 | ||
| D | 2.802 | 2.941 | 7.187 | 87.07 | ||
| [200, 500) | A | 61.845 | 12.098 | 8.824 | 17.233 | |
| B | 32.744 | 15.121 | 15.946 | 36.189 | ||
| C | 18.333 | 11.779 | 16.348 | 53.54 | ||
| D | 6.382 | 5.335 | 9.774 | 78.509 | ||
| [500, Inf) | A | 63.188 | 12.141 | 8.801 | 15.869 | |
| B | 34.707 | 16.363 | 15.401 | 33.53 | ||
| C | 20 | 13.03 | 16.145 | 50.825 | ||
| D | 7.72 | 5.704 | 10.39 | 76.186 | ||
| Cancerous breast tissue (MCF7) | [0, 200) | A | 80.565 | 7.379 | 5.035 | 7.02 |
| B | 25.821 | 22.294 | 20.165 | 31.719 | ||
| C | 10.358 | 11.716 | 22.022 | 55.904 | ||
| D | 1.71 | 2.219 | 6.713 | 89.357 | ||
| [200, 500) | A | 53.536 | 13.723 | 12.088 | 20.654 | |
| B | 28.046 | 14.891 | 17.737 | 39.326 | ||
| C | 16.117 | 11.487 | 17.55 | 54.845 | ||
| D | 5.562 | 5.336 | 11.264 | 77.838 | ||
| [500, Inf) | A | 57.04 | 13.038 | 11.614 | 18.308 | |
| B | 31.504 | 15.667 | 17.877 | 34.953 | ||
| C | 18.535 | 12.939 | 18.586 | 49.941 | ||
| D | 7.735 | 6.562 | 13.143 | 72.56 |
Abbreviation: HMEC, human mammary epithelial cell.
Distance is the number of base pairs between 2 consecutive CG sites. Cells show the percentages of CG sites with a specific methylation state remaining the same or changing to a different state in the next consecutive CG site.
P-values of chr1 chi-square test with smaller distance restrictions.
| Distance interval | A | B | C | D |
|---|---|---|---|---|
| 0-10 | 0 | 0 | 7.63E–291 | 0 |
| 10-20 | 5.41E–283 | 0 | 2.10E–155 | 0 |
| 20-30 | 1.46E–255 | 1.30E–244 | 5.16E–72 | 0 |
| 30-40 | 1.06E–282 | 4.12E–210 | 8.51E–42 | 0 |
| 40-50 | 4.78E–252 | 1.29E–122 | 4.21E–21 | 0 |
| 50-60 | 1.28E–240 | 4.22E–95 | 1.38E–13 | 0 |
| 60-70 | 2.47E–235 | 1.68E–84 | 4.72E–11 | 0 |
| 70-80 | 3.46E–239 | 5.96E–55 | 7.17E–15 | 3.45E–295 |
| 80-90 | 1.58E–222 | 2.42E–49 | 4.62E–08 | 6.62E–228 |
| 90-100 | 5.45E–211 | 1.48E–52 | 6.53E–07 | 3.84E–206 |
| 100-Inf | 0 | 0 | 1.72E–99 | 0 |
Each “E-number” means “10–number.” For example, 5.41E–283 = 5.41 × 10–283. Shown are test results using data with restrictions on distance between CG sites. Distance levels are shown in the first column. All distance intervals show a significant difference between normal and cancerous data.
Figure 1.Methylation state changes between consecutive CG sites based on 50 base intervals. HMEC indicates human mammary epithelial cell.
This figure displays the percentage of methylation state changes with an initial methylation state of A, B, C, and D. The horizontal axis is the distance between consecutive CG sites. The vertical axis is the percentage of occurrence on a 0 to 1 scale. The lines of 3 colors are for HMEC (yellow), HCC1954 (gray), and MCF7 (brown), respectively.
Figure 2.Methylation state changes between consecutive CG sites based on 10 base intervals. HMEC indicates human mammary epithelial cell.
The figure displays the percentage of methylation state changes with an initial methylation state of A, B, C, and D. The horizontal axis is the distance between consecutive CG sites. The vertical axis is the percentage of occurrence on a 0 to 1 scale. The lines of 3 colors are for HMEC (yellow), HCC1954 (gray), and MCF7 (brown), respectively.
Methylation state changes in chromosome 1 for normal and cancerous data.
| A | B | C | D | ||
|---|---|---|---|---|---|
| Normal (HMEC) | A count | 233 808 | 24 568 | 24 753 | 52 697 |
| A% | 69.622 | 7.316 | 7.371 | 15.692 | |
| B count | 24 443 | 19 061 | 23 208 | 60 942 | |
| B% | 19.148 | 14.932 | 18.18 | 47.74 | |
| C count | 24 709 | 22 917 | 38 791 | 121 969 | |
| C% | 11.857 | 10.997 | 18.615 | 58.53 | |
| D count | 52 596 | 60 721 | 121 570 | 973 855 | |
| D% | 4.351 | 5.023 | 10.058 | 80.568 | |
| Cancerous (HCC1954) | A count | 328 514 | 39 165 | 24 707 | 40 949 |
| A% | 75.80% | 9.00% | 5.70% | 9.50% | |
| B count | 39 096 | 26 029 | 22 900 | 40 656 | |
| B% | 30.40% | 20.20% | 17.80% | 31.60% | |
| C count | 24 581 | 22 795 | 34 766 | 93 966 | |
| C% | 14.00% | 12.90% | 19.70% | 53.40% | |
| D count | 40 813 | 40 416 | 93 585 | 1 068 796 | |
| D% | 3.30% | 3.30% | 7.50% | 85.90% | |
| Cancerous (MCF7) | A count | 236 253 | 25 277 | 18 400 | 27 103 |
| A% | 76.947 | 8.233 | 5.993 | 8.827 | |
| B count | 25 372 | 19 960 | 18 880 | 31 782 | |
| B% | 26.431 | 20.793 | 19.668 | 33.108 | |
| C count | 18 502 | 18 807 | 34 024 | 89 049 | |
| C% | 11.537 | 11.726 | 21.214 | 55.523 | |
| D count | 26 903 | 31 898 | 89 217 | 1 079 114 | |
| D% | 2.192 | 2.599 | 7.27 | 87.938 |
Abbreviation: HMEC, human mammary epithelial cell.
The number in each cell represents the count or percentage of CG sites that display the specific methylation change.
0-Inf chromosome 1 chi-square test results.
| Value | A | B | C | D |
|---|---|---|---|---|
| ~0 | ~0 | ~0 | ~0 | |
| chi-square | 11 756.38 | 9951.727 | 1570.923 | 30 424.83 |
The input of this chi-square test is shown in Table 3 with no restriction on the distance between CG sites. A, B, C, and D columns represent the results of tests conducted for each of the 4 methylation states.
Co-methylation region length summary.
| State | Minimum | First quarter | Median | Mean | Third quarter | Maximum | |
|---|---|---|---|---|---|---|---|
| Normal data (HMEC) | A | 2 | 42 | 116 | 223.6 | 288 | 5493 |
| B | 2 | 22 | 61 | 132.7 | 161 | 2576 | |
| C | 2 | 24 | 63 | 133.2 | 162 | 3278 | |
| D | 2 | 55 | 189 | 394.3 | 499 | 13 700 | |
| Cancerous data (HCC1954) | A | 2 | 62 | 206 | 503.4 | 573 | 18 750 |
| B | 2 | 16 | 46 | 125.5 | 147 | 2805 | |
| C | 2 | 17 | 51 | 121.9 | 147 | 2013 | |
| D | 2 | 66 | 247 | 584.3 | 711 | 100 300 | |
| Cancerous data (MCF7) | A | 2 | 67 | 207 | 394.3 | 504 | 11 780 |
| B | 2 | 20 | 56 | 136.4 | 168 | 3347 | |
| C | 2 | 22 | 62 | 138.7 | 175 | 2473 | |
| D | 2 | 75 | 275 | 565.5 | 728 | 18 410 |
Abbreviation: HMEC, human mammary epithelial cell.
Shown are co-methylation region length summaries for normal and cancerous data separately.
Wilcoxon rank sum test results.
| A | B | C | D | |
|---|---|---|---|---|
| Co-methylation region length | ||||
| 0 | 1.42E–40 | 9.05E–69 | 0 | |
| Chi-square | 3583.585 | 183.5097 | 313.3517 | 4959.549 |
| Number of CG sites in each co-methylation region | ||||
| 1.07E–157 | 5.80E–71 | 2.74E–21 | 0 | |
| Chi-square | 722.87 | 323.4529 | 94.69338 | 6575.451 |
Summary of number of CG sites in different co-methylation regions.
| 2 CGs | 3 CGs | 4 CGs | 5 CGs | 6 CGs | >6 CGs | ||
|---|---|---|---|---|---|---|---|
| HMEC | A count | 16 680 | 5964 | 2922 | 1680 | 1153 | 7620 |
| A% | 46.3 | 16.6 | 8.1 | 4.7 | 3.2 | 21.2 | |
| B count | 13 147 | 2115 | 378 | 92 | 16 | 14 | |
| B% | 83.4 | 13.4 | 2.4 | 0.6 | 0.1 | 0.1 | |
| C count | 24 954 | 4767 | 968 | 225 | 55 | 32 | |
| C% | 80.5 | 15.4 | 3.1 | 0.7 | 0.2 | 0.1 | |
| D count | 58 092 | 37 016 | 24 969 | 17 915 | 13 088 | 54 715 | |
| D% | 28.2 | 18 | 12.1 | 8.7 | 6.4 | 26.6 | |
| HCC1954 | A count | 19 413 | 9316 | 5487 | 3551 | 2640 | 13 048 |
| A% | 36.3 | 17.4 | 10.3 | 6.6 | 4.9 | 24.4 | |
| B count | 15 005 | 3301 | 844 | 255 | 90 | 60 | |
| B% | 76.7 | 16.9 | 4.3 | 1.3 | 0.5 | 0.3 | |
| C count | 20 892 | 4354 | 1045 | 263 | 99 | 65 | |
| C% | 78.2 | 16.3 | 3.9 | 1 | 0.4 | 0.2 | |
| D count | 36 875 | 23 435 | 16 334 | 12 360 | 9402 | 54 911 | |
| D% | 24.1 | 15.3 | 10.7 | 8.1 | 6.1 | 35.8 | |
| MCF7 | A count | 16 173 | 7922 | 4562 | 2799 | 1888 | 7995 |
| A% | 39.1 | 19.2 | 11 | 6.8 | 4.6 | 19.3 | |
| B count | 11 204 | 2437 | 619 | 211 | 68 | 92 | |
| B% | 76.6 | 16.7 | 4.2 | 1.4 | 0.5 | 0.6 | |
| C count | 20 012 | 4283 | 1032 | 290 | 92 | 87 | |
| C% | 77.6 | 16.6 | 4 | 1.1 | 0.4 | 0.3 | |
| D count | 34 363 | 21 931 | 15 550 | 11 632 | 9066 | 56 394 | |
| D% | 23.1 | 14.7 | 10.4 | 7.8 | 6.1 | 37.9 |
Abbreviation: HMEC, human mammary epithelial cell.
Rows 2 to 9 are for the normal sample HMEC. Rows 10 to 17 are for the cancer cell line HCC1954. Rows 18 to 25 are for the cancer cell line MCF7. For each sample, the top row is number of CG sites in each type of co-methylation region, ie, 2 CG sites, 3 CG sites, and so on. For each sample, “A count” and “A%” are the total number and percentage of co-methylation regions of methylation states “A.” For example, for the HMEC sample, in the “2 CGs” column, “A count” is 16 680, and “A%” is 46.3. These 2 numbers mean that, among all the AA . . . A type co-methylation region, 16 680, ie, 46.3%, of them have only 2 CGs.
Figure 3.Comparison of co-methylation patterns of 11 samples. HMEC indicates human mammary epithelial cell.
In each plot, the 11 lines represent the 11 samples/tissues (HMEC, HCC1954, MCF7, and the 8 tissues of the STL0001). Each plot is for a state pair, ie, AA, AB, AC, AD, BA, BB, and so on.