| Literature DB >> 20525369 |
Leandro A Loss1, Anguraj Sadanandam, Steffen Durinck, Shivani Nautiyal, Diane Flaucher, Victoria E H Carlton, Martin Moorhead, Yontao Lu, Joe W Gray, Malek Faham, Paul Spellman, Bahram Parvin.
Abstract
BACKGROUND: Methylation of CpG islands within the DNA promoter regions is one mechanism that leads to aberrant gene expression in cancer. In particular, the abnormal methylation of CpG islands may silence associated genes. Therefore, using high-throughput microarrays to measure CpG island methylation will lead to better understanding of tumor pathobiology and progression, while revealing potentially new biomarkers. We have examined a recently developed high-throughput technology for measuring genome-wide methylation patterns called mTACL. Here, we propose a computational pipeline for integrating gene expression and CpG island methylation profiles to identify epigenetically regulated genes for a panel of 45 breast cancer cell lines, which is widely used in the Integrative Cancer Biology Program (ICBP). The pipeline (i) reduces the dimensionality of the methylation data, (ii) associates the reduced methylation data with gene expression data, and (iii) ranks methylation-expression associations according to their epigenetic regulation. Dimensionality reduction is performed in two steps: (i) methylation sites are grouped across the genome to identify regions of interest, and (ii) methylation profiles are clustered within each region. Associations between the clustered methylation and the gene expression data sets generate candidate matches within a fixed neighborhood around each gene. Finally, the methylation-expression associations are ranked through a logistic regression, and their significance is quantified through permutation analysis.Entities:
Mesh:
Substances:
Year: 2010 PMID: 20525369 PMCID: PMC2903569 DOI: 10.1186/1471-2105-11-305
Source DB: PubMed Journal: BMC Bioinformatics ISSN: 1471-2105 Impact factor: 3.169
Figure 1Computational pipeline. The computational pipeline for identification of epigenetically regulated genes from a panel of breast cancer cell lines. It was developed with the aim of (i) reducing the dimensionality of the methylation data, comprised of two sub-steps: (i.i) clustering of methylation profiles on the basis of proximity, and (i.ii) clustering within methylation sub-regions on the basis of similarity; (ii) associating the reduced methylation data to gene expression data; and (iii) ranking the methylation-expression associations according to their epigenetic regulation
Figure 2Optimization example. We tested our clustering approach on synthetic data with linear and non-linear boundaries to predict the validity of the results on real data. This example shows the determination of KSC's optimal parameters σ and k for the solution of a problem with samples distributed into three concentric circles. Each combination of σ and k produces a compactness value. The solution is selected from the set of parameters which produced the minimum value of compactness (marked by the blue box)
Figure 3Logistic regression. Evaluation of the logistic regression on synthetic data reveals that expected inverse relationship between expression and methylation can be correctly ranked. R is the correlation coefficient and determines the quality of the logistic regression. The associations are ordered according to their R value and reflect the strength of the method's confidence on an epigenetic regulation. The logistic approach is flexible enough to incorporate any data scale and distribution, and does not contain rigid and arbitrary definitions that could limit its application
Panel of cell lines.
| 600MPE | AU565 | BT20 | BT474 | BT483 |
| BT549 | CAMA1 | DU4475 | HBL100 | HCC1143 |
| HCC1187 | HCC1428 | HCC1500 | HCC1569 | HCC1599 |
| HCC1937 | HCC1954 | HCC202 | HCC2185 | HCC38 |
| HCC3153 | HCC70 | HS578T | LY2 | MCF10A |
| MCF12A | MCF7 | MDAMB157 | MDAMB231 | MDAMB361 |
| MDAMB415 | MDAMB435 | MDAMB453 | MDAMB468 | SKBR3 |
| SUM1315 | SUM149PT | SUM159PT | SUM185PE | SUM44PE |
| SUM52PE | T47D | UACC812 | ZR751 | ZR75B |
Forty-five cell lines were found in common between the ICBP expression and the mTACL methylation data. The cell lines listed here formed the gene signature used in our analysis.
Gene ranking.
| Gene | R | p-Value |
|---|---|---|
| 0.888126 | 0.001200 | |
| S100A2 | 0.770036 | 0.008100 |
| 0.764194 | 0.000000 | |
| INHBA | 0.761402 | 0.000400 |
| WNT5A | 0.731727 | 0.002700 |
| GJA1 | 0.722746 | 0.000300 |
| GNG11 | 0.722025 | 0.000600 |
| GSTM3 | 0.693819 | 0.000000 |
| IGFBP5 | 0.684982 | 0.000000 |
| IFI16 | 0.615905 | 0.002200 |
| FDXR | 0.611562 | 0.000500 |
| CTGF | 0.594878 | 0.000000 |
| NUPR1 | 0.586186 | 0.000100 |
| GSTP1 | 0.560942 | 0.004200 |
| CYP1B1 | 0.550128 | 0.000200 |
| 0.522335 | 0.009000 | |
| ESR1 | 0.518515 | 0.015700 |
| IFITM3 | 0.514558 | 0.002600 |
| MX1 | 0.503719 | 0.012400 |
| 0.500448 | 0.008800 | |
| CD44 | 0.496155 | 0.034700 |
| MTHFD1 | 0.494175 | 0.028900 |
| 0.481777 | 0.000800 | |
| TFAP2A | 0.474620 | 0.000200 |
| HOXA9 | 0.473453 | 0.000000 |
| DHRS2 | 0.454703 | 0.009000 |
| CBFA2T3 | 0.443504 | 0.021400 |
| ZIC1 | 0.435035 | 0.016000 |
| LITAF | 0.434958 | 0.001700 |
| ADAM12 | 0.428524 | 0.016000 |
| IFITM2 | 0.421762 | 0.019400 |
| EFS | 0.412792 | 0.007300 |
| TACSTD2 | 0.407764 | 0.006500 |
| GSTO1 | 0.390240 | 0.010400 |
| CGREF1 | 0.372320 | 0.000000 |
| MAFB | 0.366501 | 0.011300 |
| CAMK2N1 | 0.353566 | 0.008600 |
| SEMA3F | 0.348895 | 0.000000 |
| RAB25 | 0.347329 | 0.023900 |
| ANXA13 | 0.341399 | 0.012600 |
| ALCAM | 0.335584 | 0.009400 |
| EIF4B | 0.328433 | 0.000000 |
| GATA3 | 0.328377 | 0.008500 |
| RAB21 | 0.321558 | 0.012700 |
| PTN | 0.320676 | 0.030900 |
| PYCARD | 0.319203 | 0.035600 |
| MAPK13 | 0.316035 | 0.013700 |
| IGFBP2 | 0.315176 | 0.021300 |
| S100A6 | 0.310833 | 0.033000 |
| C12orf24 | 0.310481 | 0.020100 |
| IGFBP7 | 0.309320 | 0.049000 |
| ALDH4A1 | 0.302697 | 0.000000 |
| APITD1 | 0.296412 | 0.000000 |
| CRABP2 | 0.285055 | 0.048900 |
| ITGB4 | 0.281394 | 0.031500 |
| BMP1 | 0.279983 | 0.001700 |
| UNG | 0.275001 | 0.000000 |
| FAM134A | 0.268202 | 0.043500 |
Top 58 genes predicted as epigenetically regulated according to our logistic model and p-value calculation.
Methylation-expression associations for underlined genes are shown in Figure 4.
Figure 4Methylation-expression associations. Five genes with known (based on the literature) epigenetic regulation demonstrate that logistic regression is appropriate as a model system. It is clear that the methylation patterns are highly heterogeneous for the panel of breast cancer cell lines
Subnetwork enrichment.
| Common regulator | Predicted biomarkers | p-value |
|---|---|---|
| GF | S100A6,CD44,CDKN2A,CTGF,ESR1,IGFBP5,TFF1,GSTP1 | 2.86E-07 |
| IGFBP2,GJA1,WNT5A,NUPR1,S100A2 | ||
| Jun/Fos | CD44,CYP1B1,CDKN2A,CTGF,ESR1,IGFBP5,TFF1,GSTP1 | 1.68E-06 |
| GJA1,PTN,COL1A2,S100A2,IFI16 | ||
| EGF | CD44,CTGF,ESR1,IGFBP5,TFF1,KRT18,INHBA,IGFBP2 | 4.09E-06 |
| GJA1,PTN,COL1A2,S100A2 | ||
| TP53 | CD44,CDKN2A,ESR1,SEMA3F,ANXA1,TFAP2A,COL1A2,LITAF | 8.52E-06 |
| S100A2,TOP2A,FDXR,IFI16 | ||
| BMP2 | CTGF,IGFBP5,INHBA,GJA1,WNT5A,CRABP2,COL1A2,ZIC1 | 8.56E-06 |
| MAPK | GATA3,CD44,CDKN2A,CTGF,ESR1,IGFBP5,MAFB,TFF1 | 9.57E-06 |
| IGFBP2,GJA1,TFAP2A,COL1A2 | ||
| PKA | CYP1B1, CTGF, ESR1, IGFBP5, TFF1, KRT18, INHBA, IGFBP2 | 1.04E-05 |
| GJA1, TFAP2A | ||
| TNF | S100A6,CD44,CYP1B1,CTGF,ESR1,IGFBP5,TFF1,GSTP1 | 1.71E-05 |
| INHBA,GJA1,PTN,IGFBP7,TFAP2A,COL1A2,MX1 | ||
| SRC | CD44,CTGF,ESR1,TFF1,KRT18,IGFBP2,COL1A2 | 2.30E-05 |
| TGF family | GATA3,CD44,CDKN2A,CTGF,IGFBP5,INHBA,GJA1,IGFBP7 | 2.89E-05 |
| CRABP2,COL1A2,VAV3 | ||
Top 10 lowest p-valued pathways identified by subnetwork enrichment analysis of the predicted 58 genes through Pathway Studio (see complete list in Additional file 2). Common regulator, genes involved and respective p-values are shown. The occurrence of multiple predicted markers within the same pathway suggests that epigenetic regulation of individual genes occurs in a coordinated fashion and through common regulators.
Figure 5Subnetwork enrichment (Jun/Fos). Jun/Fos complex has been shown to be a common regulator for 13 of the predicted epigenetic biomarkers. Jun/Fos' subnetwork's statistical significance (p-value) is 1.68E-06, as shown in Table 3, row 2. Together with the AP1 transcription factor, Jun and Fos drive expression of a number of genes necessary for cell cycle progression
Figure 6Subnetwork enrichment (Jun/Fos and GP). Interaction of two common regulators and their relations to 17 of the predicted epigenetic markers (Table 3, rows 1 and 2). Subnetwork enrichment and the presence of a large number of common regulators further substantiate our methodology
Figure 7Cell line subtypes. Methylation-expression associations for CD44 (basal A and B specific gene) and GATA3 (luminal specific gene) according to the cellular subtype. There is evidence that the methylation pattern reflects the basal and luminal subtypes in breast cancer cell lines