| Literature DB >> 31496635 |
Shuying Sun1, Yu Ri Lee1, Brittany Enfield2.
Abstract
DNA methylation is an epigenetic event that involves adding a methyl group to the cytosine (C) site, especially the one that pairs with a guanine (G) site (ie, CG or CpG site), in a human genome. This event plays an important role in both cancerous and normal cell development. Previous studies often assume symmetric methylation on both DNA strands. However, asymmetric methylation, or hemimethylation (methylation that occurs only on 1 DNA strand), does exist and has been reported in several studies. Due to the limitation of previous DNA methylation sequencing technologies, researchers could only study hemimethylation on specific genes, but the overall genomic hemimethylation landscape remains relatively unexplored. With the development of advanced next-generation sequencing techniques, it is now possible to measure methylation levels on both forward and reverse strands at all CpG sites in an entire genome. Analyzing hemimethylation patterns may potentially reveal regions related to undergoing tumor growth. For our research, we first identify hemimethylated CpG sites in breast cancer cell lines using Wilcoxon signed rank tests. We then identify hemimethylation patterns by grouping consecutive hemimethylated CpG sites based on their methylation states, methylation "M" or unmethylation "U." These patterns include regular (or consecutive) hemimethylation clusters (eg, "MMM" on one strand and "UUU" on another strand) and polarity (or reverse) clusters (eg, "MU" on one strand and "UM" on another strand). Our results reveal that most hemimethylation clusters are the polarity type, and hemimethylation does occur across the entire genome with notably higher numbers in the breast cancer cell lines. The lengths or sizes of most hemimethylation clusters are very short, often less than 50 base pairs. After mapping hemimethylation clusters and sites to corresponding genes, we study the functions of these genes and find that several of the highly hemimethylated genes may influence tumor growth or suppression. These genes may also indicate a progressing transition to a new tumor stage.Entities:
Keywords: Methylation; bioinformatics; breast cancer; hemimethylation
Year: 2019 PMID: 31496635 PMCID: PMC6716185 DOI: 10.1177/1176935119872959
Source DB: PubMed Journal: Cancer Inform ISSN: 1176-9351
Figure 1.Two types of methylation: (A) Example of de novo methylation; (B) Example of maintenance methylation. “F” and “R” mean forward (or “+”) and reverse (or “‒”) stands, respectively. “CG” on the F and R strands means a CG site is not methylated; “CmG” on the F or R strand means a CG site is methylated.
Figure 2.Two hemimethylation clusters (regular and polarity clusters). “F” and “R” mean forward (or “+”) and reverse (or “‒”) stands, respectively. “M” means methylation and “U” means unmethylation.
HM CpG sites and percentage of HM sites that form clusters.
| Mean difference | HM sites | HM sites in clusters | Percentage |
|---|---|---|---|
| |Mean difference| ⩾ 0.4 | 19 736 | 3492 | 17.69% |
| |Mean difference| ⩾ 0.6 | 15 526 | 2526 | 16.27% |
| |Mean difference| ⩾ 0.8 | 10 136 | 1382 | 13.63% |
Abbreviation: HM, hemimethylation.
The first column is the mean difference cutoff values. The second column shows the total number of identified HM CpG sites. The third column is the number of HM sites that form or belong to a cluster with at least 2 consecutive HM sites.
Summary of hemimethylation clusters.
| |Mean difference| ⩾ 0.4 | |Mean difference| ⩾ 0.6 | |Mean difference| ⩾ 0.8 | ||||||
|---|---|---|---|---|---|---|---|---|
| Regular HM cluster | Regular HM cluster | Regular HM cluster | ||||||
| MM-UU | UU-MM | Length > 2 | MM-UU | UU-MM | Length > 2 | MM-UU | UU-MM | Length > 2 |
| 69 | 56 | 36 | 42 | 34 | 12 | 15 | 10 | 6 |
| 42.86% | 34.78% | 22.36% | 47.73% | 38.64% | 13.64% | 48.39% | 32.26% | 19.35% |
| |Mean difference| ⩾ 0.4 | |Mean difference| ⩾ 0.6 | |Mean difference| ⩾ 0.8 | ||||||
| Polarity cluster | Polarity cluster | Polarity cluster | ||||||
| MU-UM | UM-MU | MU-UM | UM-MU | MU-UM | UM-MU | |||
| 1534 | 24 | 1160 | 7 | 652 | 4 | |||
| 98.46% | 1.54% | 99.40% | 0.60% | 99.39% | 0.61% | |||
Abbreviation: HM, hemimethylation.
The Table shows regular HM clusters with 2 or more CpG sites and polarity clusters with only 2 CpG sites. The number of clusters and percentage of each type are calculated based on each mean difference cutoff value.
Frequency or count of HM clusters with d ⩾ 0.4.
| HM clusters | Frequency/Count | Polarity or not |
|---|---|---|
| MMMMM-UUUUU | 1 | Regular |
| MMMM-UUUU | 3 | Regular |
| MMM-UUU | 12 | Regular |
| MMMU-UUUM | 1 | Regular |
| MM-UU | 69 | Regular |
| MMU-UUM | 2 | Regular[ |
| MUM-UMU | 1 | Regular[ |
| MU-UM | 1534 | Polarity |
| MUU-UMM | 1 | Regular[ |
| UM-MU | 24 | Polarity |
| UMU-MUM | 1 | Regular[ |
| UU-MM | 56 | Regular |
| UUU-MMM | 7 | Regular |
| UUUU-MMMM | 4 | Regular |
| UUUUU-MMMMM | 2 | Regular |
| UUUUUUU-MMMMMMM | 1 | Regular |
| Total: 1719 |
Abbreviation: HM, hemimethylation.
The first and second columns of Table 3 are cluster patterns and counts. The third column indicates whether an HM pattern is a regular or polarity cluster. Four regular clusters are labeled as “Regular*.” These clusters are classified as regular clusters with more than 2 CpG sites, but each of them has at least 1 pair of polarity CpG sites that are embedded in this regular cluster. For example, “MMU-UUM” is a regular cluster, but its last 2 CpG sites have the “MU-UM” polarity pattern. Because only 4 of the 1719 clusters are like this, we consider them as regular clusters and define that a polarity cluster consists of only 2 CpG sites to simplify our definition. We point these 4 clusters out using the label “Regular*” to show the complexity of hemimethylation clusters.
Summary of HM CpG site distribution.
| Summary of HM sites | ||||||||||
|---|---|---|---|---|---|---|---|---|---|---|
| No. of HM CpG site per gene
( | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | ⩾10 |
| No. of genes with | 2194 | 1875 | 885 | 581 | 335 | 277 | 172 | 101 | 92 | 319 |
| Summary of HM sites on promoter regions | ||||||||||
| No. of HM CpG sites per promoter region
( | 1 | 2 | 3 | 4 | 5 | |||||
| No. of promoter regions with | 849 | 373 | 105 | 50 | 22 | |||||
Abbreviation: HM, hemimethylation.
Figure 3.The length of HM clusters. HM indicates hemimethylation.
The left plot in Figure 3 is the histogram of the regular HM cluster length; the right plot is the histogram of the polarity cluster length.
Figure 4.Hemimethylated CpG sites by chromosome. HM indicates hemimethylation.
The horizontal axis corresponds to the 23 pairs of chromosomes; 1 stacked bar represents each chromosome. The vertical axis corresponds to the number of HM clusters; 1 color corresponds to each type. For example, in the left plot, red is for “MM-UU,” orange is for “UU-MM,” and green is for clusters with more than 2 CpG sites (ie, length > 2).
Forty-five genes with at least 25 hemimethylated CpG sites.
| Gene | Hemimethylation sites | Description |
|---|---|---|
| PTPRN2 | 90 | Member of the protein tyrosine phosphatase (PTP) family that regulates a variety of cellular processes including cell growth, differentiation, mitotic cycle, and oncogenic transformation. |
| MAD1L1 | 87 | Mitotic spindle-assembly checkpoint component that prevents the onset of anaphase until all chromosome are properly aligned at the metaphase plate. May play a role in cell cycle control and tumor suppression. |
| PRDM16 | 72 | Zinc finger transcription factor. Translocation results in overexpression, which plays a role in myelodysplastic syndrome (MDS) and acute myeloid leukemia (AML). |
| DIP2C | 48 | Encodes a member of the disco interacting protein homolog 2 family expressed in the nervous system. |
| CBFA2T3 | 44 | Encodes a member of the myeloid translocation gene family, which interacts with DNA-bound transcription factors. Also known to be a putative breast tumor suppressor. |
| PRKAR1B | 40 | Encodes protein kinase A (PKA) enzyme that assists cell in regulation of metabolism, ion transport, and gene transcription. |
| KDM4B | 39 | Regulates gene expression by demethylating histone. Known to bind to ESR1 leading to tumorigenesis of various cancers. |
| KCNT1 | 37 | Member of potassium sodium-activated channel subfamily T member 1. |
| SORCS2 | 36 | Containing receptor for sortilin-related VPS10 domain. |
| KCNQ1 | 35 | Encodes a voltage-gated potassium channel required for repolarization phase of the cardiac action potential. |
| MACROD1 | 35 | Estrogen and androgen-responsive gene; known to have higher expression in hormone-dependent cancer cells such as MCF7. |
| TERT | 35 | Ribonucleoprotein polymerase that maintains telomere ends by addition of telomere repeats. Deregulation of telomerase expression in somatic cells may be involved in oncogenesis. |
| EXD3 | 35 | Protein required for 3’-end trimming of AGO1-bound miRNAs. |
| NCOR2 | 34 | Mediates transcriptional silencing of certain target genes. Aberrant expression of this gene is associated with certain cancers. |
| RPTOR | 34 | Component of a signaling pathway that regulates cell growth in response to nutrient and insulin levels. |
| C7orf50 | 34 | Chromosome 7 open reading frame 50, poly(A) RNA binding is related to GO annotations. |
| TRAPPC9 | 34 | Encodes a protein that likely plays a role in NF-kappa-B signaling. |
| CACNA1 H | 33 | Encodes a protein in the voltage-dependent calcium channel complex. |
| HDAC4 | 33 | Histone deacetylase; plays a critical role in transcriptional regulation, cell cycle progression, and developmental events. Affects transcription factor access to DNA. |
| ADAMTS2 | 33 | Responsible for the degradation of a major proteoglycan of cartilage, leading to arthritic disease. |
| SEPT9 | 32 | Involved in cytokinesis and cell cycle control. Candidate for ovarian tumor suppressor gene. |
| NOTCH1 | 30 | GO annotations include transcription factor activity and sequence-specific DNA binding. |
| VAV2 | 30 | Second member of the VAV guanine nucleotide exchange factor family of oncogenes. |
| MOB2 | 29 | MOB kinase activator 2. |
| FBRSL1 | 29 | Fibrosin-line 1 protein. |
| GNAS | 29 | GNAS complex locus protein. |
| FAM20C | 29 | Encodes a protein that binds calcium and phosphorylates proteins involved in bone mineralization. Mutations in this gene are associated with Raine syndrome. |
| COL18A1 | 28 | Collagen type XVIII alpha 1 chain protein. |
| CUX1 | 28 | Member of the homeodomain family of DNA binding proteins. May regulate gene expression, morphogenesis, differentiation, and cell cycle progression. |
| MEGF6 | 27 | Multiple EGF like domains 6 protein. |
| OBSCN | 27 | Obscurin, cytoskeletal calmodulin and titin-interacting RhoGEF protein. |
| SHANK2 | 27 | Encodes synaptic proteins that may function as molecular scaffolds in the postsynaptic density of excitatory synapses. Alterations of the protein may be associated with susceptibility to autism spectrum disorder. |
| RASA3 | 27 | Member of the GAP1 family of GTPase-activating proteins. |
| SOLH | 27 | Calpain 15 protein. |
| BAIAP2 | 27 | BAI1 associated protein 2 protein. |
| AGAP1 | 27 | Member of an ADP-ribosylation factor involved in membrane trafficking and cytoskeleton dynamics. |
| CDH4 | 27 | Cadherin 4 protein. |
| COL5A1 | 27 | Collagen type V alpha 1 chain protein. |
| PRKCZ | 26 | Protein kinase C Zeta protein. |
| ASPSCR1 | 26 | UBX domain containing tether for SLC2A4 protein, related pathways are transcriptional misregulation in cancer. |
| KCNQ2 | 26 | Potassium voltage-gated channel subfamily Q member 2 protein. |
| ZC3H3 | 26 | Zinc finger CCCH type containing 3. |
| LMF1 | 25 | Lipase maturation factor 1 protein. |
| RBFOX3 | 25 | RNA binding protein. |
| IQSEC1 | 25 | Promotes binding of GTP and is particularly important in regulating cell adhesion. Highly expressed in the prefrontal cortex. |
Fourteen genes involved in breast cancer.
| Gene | Involvement in breast or breast cancer |
|---|---|
| PTPRN2 | Related to increased cell death in breast cancer cell line MDA-MB-435 |
| MAD1L1 | 1. Related to increased cell death in breast cancer cell
line MDA-MB-435 |
| DIP2C | Genetic variants (VAR_035905, VAR_035907) are found in this gene (found in a breast cancer sample) |
| CBFA2T3 | This gene is a putative breast tumor suppressor. Alternative splicing results in transcript variants. |
| MACROD1 | Overexpressed by estrogens in breast cancer MCF-7 cells, probably via an activation of nuclear receptors for steroids (ESR1 but not ESR2) |
| TERT | Related to an anomaly of the structure of the breast and the abnormal growth of breast tissue |
| CACNA1H | Related to ductal breast carcinoma, and a single nucleotide polymorphism (SNP) (rs761025927) found in this gene (uncertain-significance) |
| HDAC4 | 1. Involved in the MTA1-mediated epigenetic regulation of
ESR1 expression in breast cancer. |
| NOTCH1 | NOTCH1 is 1 of 4 known genes encoding the NOTCH family of proteins, a group of receptors involved in the Notch signaling pathway. Activation of Notch has been shown to be correlative with mammary tumorigenesis in mice and increased expression of Notch receptors has been observed in a variety of cancer types including cervical, colon, head and neck, lung, renal, pancreatic, leukemia, and breast cancer. A number of treatment modalities have been explored related to Notch inhibition especially in breast cancer with mixed results. |
| GNAS | 1. Related to the abnormal growth of breast tissue and
neoplasm of breast. |
| FAM20C | Related to an anomaly of the sternum, also known as the breastbone. |
| CUX1 | A genetic variant (VAR_036285) is found in this gene (found in a breast cancer sample) |
| OBSCN | Genetic variants (VAR_035534, VAR_035537, VAR_035538) of this gene are found in a breast cancer sample |
| COL5A1 | Related to an anomaly of the sternum, also known as the breastbone. |
Genes known to be hypermethylated in breast cancer.
| APAF1 |
| EPM2AIP1 | GPC3 |
|
| WDR79 |
| IKIP | CDKN2A | MLH1 |
| PGR |
|
|
|
| CDKN2B |
|
|
| THBS1 | TWIST1 |
| CDH1 |
| FHIT | HOXA6 | STAT1 |
|
|
|
| DAPK2 | GJB2 | HSD17B4 | SYK | TP53 | WIT1 |
The genes with asterisks and highlighted in bold are found to have HM CpG sites as noted in the parentheses. For example, “TERT*(35)” means the gene TERT is a hypermethylated gene, and it covers 35 HM CpG sites.
Oncogenes and tumor suppressors with HM sites.
| Oncogenes | Tumor suppressors | ||
|---|---|---|---|
| Gene | No. of HM sites | Gene | No. of HM sites |
| ASPSCR1 | 26 | APC | 3 |
| BCL11B | 13 | CBLC | 3 |
| BCR | 17 | FANCA | 6 |
| CARD11 | 13 | FANCC | 3 |
| CBFA2T3 | 44 | PTCH1 | 4 |
| CRTC1 | 11 | SMARCA4 | 7 |
| GNAS | 29 | STK11 | 5 |
| MN1 | 10 | TSC1 | 3 |
| NOTCH1 | 30 | TSC2 | 14 |
| NTRK1 | 11 | WT1 | 5 |
| PDE4DIP | 10 | ||
| PRDM16 | 72 | ||
| SEPT9 | 32 | ||
Abbreviation: HM, hemimethylation.
Figure 5.Relationships of 45 genes with at least 25 hemimethylated CpG sites.
Figure 6.Comparing HM CpG sites in the HMEC and 7 breast cancer cell lines. HM indicates hemimethylation; HMEC, human mammary epithelial cell.
In total, 10 136 HM CpG sites are hemimethylated in breast cancer cell lines (ie, with the Wilcoxon test P value < .05 and absolute mean difference at least 0.8 as shown in Table 1). About 657 of these 10 136 CpG sites are hemimethylated in breast cancer cell lines, but there are no data in the HMEC sample. “No data in HMEC” means there are no sequencing reads or not enough sequencing reads covering those CpG sites (ie, <3× coverage). 9477 of these 10 136 CpG sites are hemimethylated in breast cancer, but not in HMEC. Only 2 of these 10 136 CpG sites are hemimethylated in both breast cancer cell lines and HMEC.