| Literature DB >> 32850701 |
An-Phi Nguyen1,2, Paola Nicoletti3, Damien Arnol1, Andrea Califano3,4,5,6,7,8,9, María Rodríguez Martínez3.
Abstract
In the last decade, a large number of genome-wide association studies have uncovered many single-nucleotide polymorphisms (SNPs) that are associated with complex traits and confer susceptibility to diseases, such as cancer. However, so far only a few heritable traits with medium-to-high penetrance have been identified. The vast majority of the discovered variants only leads to disease in combination with other still unknown factors. Furthermore, while many studies aimed to link the effect of SNPs to changes in molecular phenotypes, the analysis has been often focused on testing associations between a single SNP and a transcript, hence disregarding the dysregulation of gene regulatory networks that has been shown to play an essential role in disease onset, notably in cancer. Here we take a systems biology approach and develop GVITamIN (Genetic VarIaTIoN functional analysis tool), a new statistical and computational approach to characterize the effect of a SNP on both genes and transcriptional regulatory programs. GVITamIN exploits a novel statistical approach to combine the usually small effect of disease-susceptibility SNPs, and reveals important potential oncogenic mechanisms, hence taking one step further in the direction of understanding the SNP mechanism of action. We apply GVITamIN on a breast cancer cohort and identify well-known cancer-related transcription factors, such as CTCF, LEF1, and FOXA1, as TFs dysregulated by breast cancer-associated SNPs. Furthermore, our results reveal that SNPs located on the RAD51B gene are significantly associated with an abnormal regulatory activity, suggesting a pivotal role for homologous recombination repair mechanisms in breast cancer.Entities:
Keywords: SNP mechanism of action; breast cancer; cancer-susceptibility SNP; multi-omics integration; nonparametric hypothesis test; p-value combination; transcription factor dysregulation
Year: 2020 PMID: 32850701 PMCID: PMC7417307 DOI: 10.3389/fbioe.2020.00798
Source DB: PubMed Journal: Front Bioeng Biotechnol ISSN: 2296-4185
Figure 1gVITaMIN pipeline. For the analysis on each SNP, we split the patients in two cohorts according to their allele. In the first-order analysis, we compare the distribution of gene expressions of the two cohorts using the non-parametric Mann-Whitney U-test. In the second-order analysis, given a list of Transcription Factors and a (optional) list of TF-TGs pairs, we compute correlations between TFs and their targets. After Fisher's z-transform, we test abnormal regulation activity for each TF-TG pair via a z-test. We finally test the global dysregulation of each TF by combining the p-values over its targets.
First order analysis: top 10 most significant first order results.
| rs4455437 | LCE1F | 1.09224e-80 | 1.75194e-75 |
| rs4455437 | LCE2D | 8.99746e-80 | 7.21591e-75 |
| rs4455437 | LCE2C | 7.94305e-79 | 4.24685e-74 |
| rs4455437 | LCE1A | 5.00239e-78 | 2.00594e-73 |
| rs2842347 | KRTAP9-4 | 1.20898e-77 | 1.93918e-72 |
| rs2842347 | GAGE12J | 1.00726e-76 | 8.07816e-72 |
| rs4455437 | LCE1E | 7.45837e-75 | 2.39262e-70 |
| rs2842347 | SPRR2B | 2.71334e-74 | 1.45072e-69 |
| rs2842347 | KRTAP4-12 | 1.34211e-73 | 5.38182e-69 |
| rs7107217 | LCE6A | 2.14795e-72 | 3.44529e-67 |
Most of the top results involve genes from the late cornified envelope (LCE) family and SNP rs445543. This SNP is located close to the gene TNIP3, which has been linked to breast cancer in African American women.
Gene enrichment analysis for the 2 of the SNPs associated with the highest number of genes: rs421379 and rs2048672.
| rs421379 (1329) | TBK1.DF_DN | 2.341e-07 | 0.005 |
| GCM_RAB10 | 2.489e-06 | 0.025 | |
| rs2048672 (117) | MODULE_54 (Cell cycle) | 2.684e-18 | 0.005 |
| HALLMARK_G2M_CHECKPOINT | 1.900e-06 | 0.013 | |
| HALLMARK_E2F_TARGETS | 1.990e-06 | 0.013 | |
| REACTOME_CELL_CYCLE | 4.897e-06 | 0.024 |
rs421379 might be related to dysregulation of cell proliferation and apoptosis via TBK1. rs2048672 seems to play a role in the G2-M cell cycle checkpoint.
Second order analysis: top 15 most significant results.
| rs16882214 | PDX1 | INS | 1.68881e-12 | 2.36652e-06 |
| rs1876206 | MYOD1 | CRCT1 | 1.53497e-11 | 2.15094e-05 |
| rs1876206 | MYOD1 | HTN1 | 9.40789e-11 | 6.59161e-05 |
| rs10509373 | HEY1 | CDH23 | 3.46953e-10 | 0.000486182 |
| rs1876206 | MYOD1 | CELA3A | 2.07202e-08 | 0.00967838 |
| rs1801270 | SMARCA4 | DGCR8 | 8.87854e-09 | 0.0124414 |
| rs3784099 | CDC5L | PCDHGC3 | 3.96714e-08 | 0.0185304 |
| rs3784099 | ATF1 | CD2AP | 2.97751e-08 | 0.0208618 |
| rs1876206 | MYOD1 | NKX2-1 | 6.05328e-08 | 0.0212061 |
| rs3784099 | FOXA1 | TM4SF1 | 2.0034e-08 | 0.0280735 |
| rs10510102 | CEBPB | MBNL3 | 2.283e-08 | 0.0319916 |
| rs3784099 | CTCF | SCAF8 | 5.86418e-07 | 0.035728 |
| rs3784099 | LMO2 | LYL1 | 5.69159e-07 | 0.0362527 |
| rs3784099 | CTCF | TLK1 | 2.84675e-07 | 0.0362648 |
| rs3784099 | CTCF | C8G | 3.36879e-07 | 0.0363128 |
| rs3784099 | CTCF | SFN | 5.21694e-07 | 0.0365523 |
Several of these TFs and targets have been implicated in cancer, such as PDX1, MYOD1, CTCF, LMO2, and INS.
List of SNP-TF pairs ranked by number of significant targets.
| rs3784099 | CTCF | 5 |
| rs1876206 | MYOD1 | 4 |
| rs3784099 | EP300 | 4 |
| rs3784099 | YY1 | 4 |
| rs3784099 | LMO2 | 2 |
| rs10509373 | HEY1 | 1 |
| rs10510102 | CEBPB | 1 |
| rs16882214 | PDX1 | 1 |
| rs1801270 | SMARCA4 | 1 |
| rs3784099 | ATF1 | 1 |
| rs3784099 | BDP1 | 1 |
| rs3784099 | CDC5L | 1 |
| rs3784099 | ELF1 | 1 |
| rs3784099 | FOXA1 | 1 |
| rs3784099 | IRF2 | 1 |
| rs3784099 | JUND | 1 |
| rs3784099 | MYOD1 | 1 |
| rs3784099 | SIN3A | 1 |
| rs3784099 | SMC3 | 1 |
| rs3784099 | SOX9 | 1 |
| rs3784099 | SP3 | 1 |
| rs4987047 | RAD21 | 1 |
Among the top pairs, CTCF, EP300, and LMO2 are reported on the COSMIC cancer gene list.
Significant targets for the top significant transcription factors with at least one significant target.
| rs3784099 | CTCF | SCAF8, TLK1, C8G, SFN, RRP9 |
| rs3784099 | EP300 | PICALM, RFXANK, MAD2L2, PMVK |
| rs1876206 | MYOD1 | CRCT1, HTN1, CELA3A, NKX-2 |
| rs3784099 | YY1 | LYPLA2, ABCE1, WDR13, DDX6 |
| rs3784099 | LMO2 | LYL1, CDH5 |
Most TFs in this table have been associated with cancer. Interestingly, also several of the target genes are reported on the COSMIC cancer gene list, for example, PICALM (targeted by EP300), DDX6 (targeted by YY1), and LYL1 (targeted by LMO2).
Results of the global second order analysis.
| rs2380205 | LEF1 |
| rs10822013 | LEF1 |
| rs421379 | LEF1 |
| rs3784099 | EP300 |
| rs3784099 | LEF1 |
| rs2842347 | LEF1 |
| rs11613298 | FOXO4 |
| rs12762549 | LEF1 |
| rs2842347 | TCF3 |
| rs2048672 | LEF1 |
| rs3784099 | CTCF |
| rs10822013 | TCF3 |
| rs737387 | MAZ |
| rs2842347 | CTCF |
| rs3784099 | YY1 |
The table shows the top 15 most significant results. GVITamIN reported a zero p-value for all the association of the SNP-TF pairs. We therefore used a Chi-score to rank them. LEF1, which may play an important role in cancer invasion, migration and proliferation, is correlated with many SNPs.
Global second order analysis.
| rs3784099 (61) | AHR, |
| rs10509373 (37) | |
| rs2048672 (25) | |
| rs421379 (23) | |
| rs2842347 (21) | |
| rs4784227 (20) | |
| rs2380205 (18) | |
| rs10822013 (18) | |
| rs704010 (13) | BPTF, CDC5L (1.89e-15, 3.48e-13), |
| rs1314913 (10) |
We report the top 10 SNPs that are significantly associated with the highest number of TFs. The p-value and the FDR-corrected p-values are given in brackets for each TF. If no values are reported, it means that GVITamIN reported p-values of zero. We denote in bold characters cancer-related genes as reported in the COSMIC gene list.
TFs significantly associated with the highest number of SNPs.
| FOXA1 | 10 |
| CTCF | 9 |
| ESR1 | 8 |
| AR | 7 |
| LEF1 | 7 |
| ZEB1 | 7 |
| BCL11A | 6 |
| E2F1 | 6 |
| ELF2 | 6 |
| POU2F1 | 6 |
| CEBPB | 5 |
| GATA2 | 5 |
| MYB | 5 |
| SP1 | 5 |
Among these transcription factors, FOXA1, CTCF, ESR1, AR, LEF1, ZEB1, BCL11A, GATA2, MYB are established cancer-related genes.