| Literature DB >> 30999858 |
Alexander Kel1,2,3, Ulyana Boyarskikh4, Philip Stegmaier5, Leonid S Leskov6, Andrey V Sokolov6, Ivan Yevshin7, Nikita Mandrik7, Daria Stelmashenko7, Jeannette Koschmann5, Olga Kel-Margoulis5, Mathias Krull5, Anna Martínez-Cardús8, Sebastian Moran8, Manel Esteller8,9,10,11, Fedor Kolpakov7,12, Maxim Filipenko4, Edgar Wingender5,13.
Abstract
BACKGROUND: The search for molecular biomarkers of early-onset colorectal cancer (CRC) is an important but still quite challenging and unsolved task. Detection of CpG methylation in human DNA obtained from blood or stool has been proposed as a promising approach to a noninvasive early diagnosis of CRC. Thousands of abnormally methylated CpG positions in CRC genomes are often located in non-coding parts of genes. Novel bioinformatic methods are thus urgently needed for multi-omics data analysis to reveal causative biomarkers with a potential driver role in early stages of cancer.Entities:
Keywords: Circulating DNA; Colorectal cancer; DNA methylation; Genetic algorithm; Multi-omics analysis; Prognostic biomarkers; Signal transduction; Transcription factor binding sites
Mesh:
Substances:
Year: 2019 PMID: 30999858 PMCID: PMC6471696 DOI: 10.1186/s12859-019-2687-7
Source DB: PubMed Journal: BMC Bioinformatics ISSN: 1471-2105 Impact factor: 3.169
Fig. 1Schematic representation of a match of a composite module (CM) in a particular promoter (s) (TSS – transcription start site). Several TF sites are concentrated around position x found by the algorithm in the sequence
Fig. 2General scheme of the data analysis pipeline applied in this study
Fig. 3Venn diagram of the number of differentially expressed genes revealed for each tumor stage
Fig. 4A screenshot of the results of CMAcorrel for analysis of 5746 CpG loci with correlation coefficients higher then 0.18 (and r < − 0.18) between the levels of DNA methylation and gene expression. We analysed 500 bp regions around each CpG. At the right there is the composite model consisting of two composite modules with 10 PWMs each. At the left there is the plot of DNA-methylation-gene expression correlation versus the composite score of the region around CpG. Spearman correlation coefficient = 0.38. PWMs are the Position Weight Matrices (PWMs) selected by CMAcorrel algorithm to be included into the model consisting of two composite modules. Below each matrix name are the cut-off values given that were optimized by the CMAcorrel algorithm (in cases of cut-off = 0.0 the original profile cut-off was chosen by the algorithm). The parameter N (e.g. N = 2) gives the number of top scoring TF sites in the sequence that were considered for score calculation. The module width is the sigma value of the score (see Methods section)
The final list of 19 TFs after filtering according to their differential expression as well as differential DNA methylation and the level of correlation with the methylation in the associated CpG loci
| Ensembl | TF gene symbol | TF protein name | correl | logFC I_stage vs. Control | logFC Cancer vs. Control | Methylation logFC Cancer vs. Control | CMA NEG_200bp | CMA POS_200bp | CMAcorrel_500bp | F-Match NEG_200bp | F-Match POS_200bp |
|---|---|---|---|---|---|---|---|---|---|---|---|
| ENSG00000198521 |
| ZNF43 | −0.763 | − 0.706 | − 0.711 | 0.296 | + | ||||
| ENSG00000257591 |
| ZNF625 | −0.667 | −0.461 | − 0.365 | 0.558 | + | ||||
| ENSG00000136997 |
| c-Myc | −0.567 | 2.126 | 2.442 | −0.682 | + | ||||
| ENSG00000197905 |
| TEF-3 | −0.453 | 1.829 | 2.335 | −0.589 | + | + | |||
| ENSG00000139515 |
| IPF-1 | −0.407 | 2.565 | 2.869 | 0.541 | + | + | |||
| ENSG00000125798 |
| HNF-3beta | −0.364 | 0.821 | 1.312 | −0.332 | + | + | |||
| ENSG00000177426 |
| TGIF | −0.282 | 1.003 | 1.511 | −0.294 | + | + | |||
| ENSG00000204103 |
| MafB | −0.244 | −0.678 | − 0.226 | 0.280 | + | ||||
| ENSG00000149948 |
| HMGI-C | −0.242 | 0.533 | 1.016 | −0.284 | + | + | |||
| ENSG00000137309 |
| HMGIY | −0.220 | 1.422 | 1.758 | −0.328 | + | + | |||
| ENSG00000169016 |
| E2F-6 | −0.204 | 0.793 | 1.259 | −0.420 | + | ||||
| ENSG00000196628 |
| ITF-2 | −0.166 | −0.429 | 0.021a | 0.332 | + | ||||
| ENSG00000159216 |
| AML1 | −0.057 | 0.911 | 1.525 | −0.556 | + | ||||
| ENSG00000075426 |
| Fra-2 | −0.050 | −0.559 | − 0.027a | −0.386 | + | + | |||
| ENSG00000164749 |
| HNF-4gamma | 0.064 | −0.708 | − 0.209 | − 0.365 | + | + | |||
| ENSG00000129514 |
| HNF-3alpha | 0.183 | −1.296 | −0.768 | − 0.430 | + | + | |||
| ENSG00000156127 |
| B-ATF | 0.222 | 0.607 | 1.028 | −0.416 | + | ||||
| ENSG00000176842 |
| IRX2a | 0.355 | 0.890 | 1.155 | 0.400 | + | + | + | ||
| ENSG00000113580 |
| GR | 0.548 | −1.331 | −0.931 | − 0.588 | + | + | + |
a The LogFC was not significant for the full Cancer vs Control comparison, but was highly significant for the Cancer stage I vs Control
Selected 23 genes as potential master-regulators prioritized according to the level of differential gene expression in different cancer stages and in metastatic cancer and also according to the level of the differential DNA methylation in cancer versus control sets
| Master molecule name | Gene symbol | Meth probes: Illumina ID | correl | logFC I_stage vs. Normal | Cancer_vs_Normal logFC | Meth_logFC | Number of target TFsa | Master-regulator Score |
|---|---|---|---|---|---|---|---|---|
| MKP-2 |
| cg13635007 | − 0.018 | 1.694 | 2.309 | −0.408 | 13 | 0.751 |
| c-Myc |
| cg00163372 | −0.498 | 2.126 | 2.442 | −0.682 | 13 | 0.725 |
| IL-17A |
| cg11924517 | −0.145 | 1.083 | 0.889 | −0.747 | 13 | 0.649 |
| MT1-MMP |
| cg05931439 | −0.212 | 0.793 | 1.439 | −0.486 | 13 | 0.619 |
| eNOS |
| cg08018731 | −0.059 | 1.460 | 1.958 | −0.631 | 13 | 0.560 |
| TGFbeta-2A |
| cg06899755 | −0.171 | 0.645 | 0.946 | 0.296 | 13 | 0.556 |
| IGF-2 |
| cg02425416 | −0.025 | 0.760 | 1.353 | −0.419 | 13 | 0.517 |
| col1A1 |
| cg18618815 | −0.142 | 1.340 | 2.085 | −0.428 | 13 | 0.502 |
| Matrin |
| cg01813071 | −0.055 | 4.505 | 4.814 | −0.367 | 13 | 0.498 |
| CTLA-4 |
| cg08460026 | −0.110 | 0.892 | 1.022 | −0.699 | 13 | 0.496 |
| amphiregulin-NTF |
| cg02334660 | −0.438 | 1.649 | 1.711 | −0.644 | 13 | 0.494 |
| alpha-enolase |
| cg06972019 | −0.405 | 0.783 | 1.148 | −0.653 | 13 | 0.480 |
| CXCR2 |
| cg06547715 | 0.005 | 1.156 | 1.036 | −0.570 | 13 | 0.479 |
| calcitonin |
| cg01421342 | 0.052 | 0.834 | 1.043 | 0.309 | 13 | 0.455 |
| IRAK-2 |
| cg09386682 | −0.419 | 1.292 | 1.614 | −0.444 | 13 | 0.446 |
| WT1 |
| cg01952234 | −0.227 | 0.412 | 1.004 | 0.346 | 13 | 0.415 |
| IL-11 |
| cg26367719 | 0.082 | 1.789 | 2.039 | −0.568 | 13 | 0.401 |
| Wnt-2 |
| cg07697895 | 0.128 | 2.171 | 2.494 | 0.288 | 13 | 0.385 |
| CD86b |
| cg00697440 | 0.141 | −0.153 | 0.105 | −0.584 | 13 | 0.384 |
| GROalpha |
| cg00419314 | −0.145 | 3.922 | 3.903 | −0.313 | 11 | 0.378 |
| trip6 |
| cg00374672 | −0.292 | 0.949 | 1.356 | −0.610 | 11 | 0.363 |
| mgat5 |
| cg20063095 | −0.209 | 0.570 | 1.162 | −0.459 | 13 | 0.343 |
| Fcgamma RIIIB |
| cg04567009 | 0.048 | 1.709 | 1.727 | −0.573 | 11 | 0.284 |
a 13 target TFs: AML1a, E2F-6, Fra-2, GR-alpha, HMGI-C, HMGIY, HNF-3beta, HNF-4gamma, ITF-2-A-, SEF2-1A, TGIF, c-Myc, IPF-1; 11 target TFs: AML1a, E2F-6, Fra-2, GR-alpha, HMGI-C, HMGIY, HNF-3beta, HNF-4gamma, TGIF, c-Myc, IPF-1
b CD89 did not achieve statistical significant levels of gene expression Fold Changes but was selected here due to its highly significant level of methylation Fold Change
Fig. 5A fragment of a diagram of the signal transduction network combined with the gene regulatory network predicted by MGE workflow as playing a regulatory role in CRC. Red nodes represent master-regulators identified by the network analysis algorithm. Blue nodes represent transcription factors predicted by CMA in the gene regulatory regions of the differently expressed genes (green arrows on a blue lines at the bottom). Red stars represent methylated CpG loci identified in our work whose methylation level correlates with expression level of the genes. Red arrows show translation of the genes into proteins making the multiple feedback loops in the system. The products of the differentially expressed genes play the master-regulatory role in the system. Brown and violet shading around some nodes in the network shows the level of up-regulation or down-regulation of the genes encoding these proteins
Selected set of 47 potential DNA methylation biomarkers
| ID | CHR | Position (hg19) | Gene symbol (gene) | Methylation Caner_vs_Control LogFC | Expression Cancer_vs_Control logFC | Correlation | Enriched TFa | Master-regulatorb |
|---|---|---|---|---|---|---|---|---|
| cg02612618 | 19 | 22,018,605 |
| 0.296 | −0.711 | −0.743 | + | |
| cg07945582 | 7 | 26,206,579 |
| −0.534 | 2.858 | −0.518 | + | |
| cg00163372 | 8 | 128,752,988 |
| −0.682 | 2.442 | −0.498 | + | + |
| cg02915837 | 12 | 3,069,243 |
| −0.306 | 2.335 | −0.453 | + | |
| cg02334660 | 4 | 75,312,483 |
| −0.644 | 1.711 | −0.438 | + | |
| cg09386682 | 3 | 10,207,069 |
| −0.444 | 1.614 | −0.419 | + | |
| cg06972019 | 1 | 8,937,448 |
| −0.653 | 1.148 | −0.405 | + | |
| cg01777575 | 20 | 22,566,140 |
| −0.332 | 1.312 | −0.307 | + | |
| cg19377250 | 7 | 100,463,206 |
| −0.712 | 1.356 | −0.292 | + | |
| cg01952234 | 11 | 32,457,130 |
| 0.346 | 1.004 | −0.227 | + | |
| cg05931439 | 14 | 23,305,957 |
| −0.486 | 1.439 | −0.212 | + | |
| cg20063095 | 2 | 134,977,141 |
| −0.459 | 1.162 | −0.209 | + | |
| cg17726575 | 2 | 11,606,945 |
| −0.420 | 1.259 | −0.204 | + | |
| cg18696576 | 6 | 34,203,630 |
| −0.328 | 1.758 | −0.190 | + | |
| cg06899755 | 1 | 218,520,325 |
| 0.296 | 0.946 | −0.171 | + | |
| cg01742897 | 18 | 53,257,019 |
| 0.184 | 0.021 | −0.166 | + | |
| cg15555970 | 18 | 3,452,317 |
| −0.294 | 1.511 | −0.161 | + | |
| cg00419314 | 4 | 74,735,092 |
| −0.313 | 3.903 | −0.145 | + | |
| cg11924517 | 6 | 52,050,597 |
| −0.747 | 0.889 | −0.145 | + | |
| cg18618815 | 17 | 48,275,324 |
| −0.428 | 2.085 | −0.142 | + | |
| cg08460026 | 2 | 204,732,474 |
| −0.699 | 1.022 | −0.110 | + | |
| cg00425708 | 12 | 66,217,779 |
| −0.284 | 1.016 | −0.105 | + | |
| cg08018731 | 7 | 150,687,961 |
| −0.631 | 1.958 | −0.059 | + | |
| cg01813071 | 11 | 102,401,616 |
| −0.367 | 4.814 | −0.055 | + | |
| cg02425416 | 11 | 2,163,808 |
| −0.419 | 1.353 | −0.025 | + | |
| cg13635007 | 8 | 29,210,154 |
| −0.408 | 2.309 | −0.018 | + | |
| cg07330438 | 21 | 37,258,460 |
| −0.556 | 1.525 | −0.011 | + | |
| cg06547715 | 2 | 218,990,976 |
| −0.570 | 1.036 | 0.005 | + | |
| cg08836542 | 2 | 28,618,831 |
| −0.386 | −0.027 | 0.006 | + | |
| cg02059626 | 8 | 76,319,264 |
| −0.365 | −0.209 | 0.026 | + | |
| cg04567009 | 1 | 161,600,769 |
| −0.573 | 1.727 | 0.048 | + | |
| cg01421342 | 11 | 14,995,754 |
| 0.309 | 1.043 | 0.052 | + | |
| cg26367719 | 19 | 55,875,605 |
| −0.568 | 2.039 | 0.082 | + | |
| cg01830294 | 7 | 116,963,492 |
| 0.132 | 2.494 | 0.128 | + | |
| cg01664670 | 20 | 39,316,308 |
| 0.280 | −0.226 | 0.140 | + | |
| cg01824511 | 14 | 38,064,456 |
| −0.430 | −0.768 | 0.141 | + | |
| cg00697440 | 3 | 121,795,768 |
| −0.584 | 0.105 | 0.141 | + | |
| cg01589587 | 14 | 76,002,440 |
| −0.416 | 1.028 | 0.222 | + | |
| cg24093411 | 5 | 133,449,651 |
| 0.387 | 2.022 | 0.321 | + | |
| cg02991571 | 13 | 28,501,126 |
| 0.541 | 2.869 | 0.353 | + | |
| cg06613263 | 5 | 142,779,552 |
| −0.588 | −0.931 | 0.410 | + | |
| cg03130910 | 1 | 234,908,226 |
| −1.107 | −2.482 | 0.601 | ||
| cg05259836 | 6 | 74,290,516 |
| −0.813 | −5.604 | 0.601 | ||
| cg24032190 | 15 | 67,442,893 |
| −1.071 | −3.781 | 0.603 | ||
| cg04786142 | 1 | 234,908,381 |
| −1.075 | −1.644 | 0.606 | ||
| cg03800922 | 6 | 74,290,220 |
| −0.842 | −6.934 | 0.610 | ||
| cg26541218 | 7 | 47,826,387 |
| −0.771 | −1.938 | 0.617 |
aThe column “Enriched TF” marks genes that encode TFs whose sites found enriched around CpG loci
bThe column “Master regulator” marks genes that encode master-regulator molecules identified in the study
Fig. 6Diagram of DNA methylation values of two markers cg00163372 (in gene MYC) and cg08018731 (in gene NOS3). The red dots show values obtained in tumor samples, the green dots show values for the normal samples
Six DNA methylation markers selected for building CRC diagnostic classification function using SVM method
| Probe ID | Chromosome | Gene Symbol | Gene Name |
|---|---|---|---|
| cg01421342 | 11 | CALCA | calcitonin-related polypeptide alpha |
| cg06972019 | 1 | ENO1 | enolase 1, (alpha) |
| cg00163372 | 8 | MYC | v-myc avian myelocytomatosis viral oncogene homolog |
| cg02991571 | 13 | PDX1 | pancreatic and duodenal homeobox 1 |
| cg24093411 | 5 | TCF7 | transcription factor 7 (T-cell specific, HMG-box) |
| cg02612618 | 19 | ZNF43 | zinc finger protein 43 |