| Literature DB >> 29161273 |
Stephanie A Bien1, Paul L Auer2, Tabitha A Harrison1, Conghui Qu1, Charles M Connolly1, Peyton G Greenside3, Sai Chen4, Sonja I Berndt5, Stéphane Bézieau6, Hyun M Kang4, Jeroen Huyghe1, Hermann Brenner7,8,9, Graham Casey10, Andrew T Chan11,12, John L Hopper13, Barbara L Banbury1, Jenny Chang-Claude14,15, Stephen J Chanock5, Robert W Haile16, Michael Hoffmeister7, Christian Fuchsberger4, Mark A Jenkins13, Suzanne M Leal17, Mathieu Lemire18, Polly A Newcomb1, Steven Gallinger19, John D Potter1, Robert E Schoen20, Martha L Slattery21, Joshua D Smith22, Loic Le Marchand23, Emily White1,24, Brent W Zanke25,26, Goncalo R Abeçasis4, Christopher S Carlson1,24, Ulrike Peters1,24, Deborah A Nickerson22, Anshul Kundaje27, Li Hsu1,28.
Abstract
BACKGROUND: The evaluation of less frequent genetic variants and their effect on complex disease pose new challenges for genomic research. To investigate whether epigenetic data can be used to inform aggregate rare-variant association methods (RVAM), we assessed whether variants more significantly associated with colorectal cancer (CRC) were preferentially located in non-coding regulatory regions, and whether enrichment was specific to colorectal tissues.Entities:
Mesh:
Year: 2017 PMID: 29161273 PMCID: PMC5697874 DOI: 10.1371/journal.pone.0186518
Source DB: PubMed Journal: PLoS One ISSN: 1932-6203 Impact factor: 3.240
Fig 1Analysis approaches for assessing ARE enrichment of stronger CRC associations with common and less frequent variants.
ARE, Active Regulatory Elements; CR, Colorectal; ND, Non-digestive; Fig 1 describes the series of analyses performed examining the enrichment of ARE for more significant common single variant CRC associations from GWAS and less frequent variant set associations from an integrative epigenomic RVAM.
Fig 2ARE enrichment of stronger GWAS CRC p-values.
Each point in this scatter plot corresponds to one of the 127 tissues and cell-types examined for ARE enrichment of more significant single common variant association p-values with CRC. The x-axis shows the number of variants in ARE examined for the corresponding tissue or cell-type. The y-axis scale is the–log 10 × the p-value from the KS test comparing the distribution of CRC-variant association p-values for variants inside and outside of ARE. The size of the point corresponds to the number of ARE in the tissue or cell-type. The color of the point represents membership to an epigenomic group, based on a hierarchal clustering of the AREs. ARE of digestive tissues (in light purple) and immune cell types (in two shades of green) were more significantly enriched (lower KS test p-values). The range of ARE and ARE variants spreads across the high and low KS P-values, suggesting that the difference in number of ARE across tissues is not biasing the enrichment results. The red dash line corresponds to the Bonferroni p-value threshold correcting for 127 comparisons (0.05/127 = 4 × 10−4).
Fig 3CR-specific ARE enrichment of stronger GWAS CRC p-values.
Each point in this scatter plot corresponds to an enrichment result comparing CR ARE (n = 108,297) to ARE in one of the 124 non-colorectal tissues and cell-types tested. The x-axis shows the number of variants in ARE examined for the corresponding non-colorectal tissue or cell-type (n = 270,030 CR ARE variants). The y-axis is the–log 10 × the p-value from the KS test comparing the distribution of CRC association p-values for variants inside CR ARE versus those inside ARE of a non-colorectal tissue and cell-type. The size of the point corresponds to the number of ARE in the tissue or cell-type. The color of the point represents membership to an epigenomic group, based on a hierarchal clustering of the AREs. In comparison to ARE of digestive tissues (in light purple) and immune cell types (in two shades of green), CR-specific ARE did not exhibit additional enrichment. The strongest enrichment was observed for induced pluripotent stem cells (iPSCs), embryonic stem cells (ESC), and ESC-derived cells, and Brain cell-types. The red dash line corresponds to the Bonferroni p-value threshold correcting for 124 comparisons (0.05/124 = 4 × 10−4). The red solid line corresponds to the Bonferroni p-value threshold correcting for 19 epigenomic group comparisons (0.05/19 = 3 × 10−3).
Fig 4CR-specific ARE enrichment of stronger GWAS CRC p-values accounting for known loci and variant correlation.
Variant-CRC association p-values for CR ARE variants were compared to ARE variants of all non-digestive tissues. The y-axis shows the–log10(KS test p-value) reflecting the significance of the enrichment. Enrichment analyses were performed for single variant-CRC association p-values from GWAS unadjusted for known loci (shown in black) and from GWAS that included a polygenic risk score (PRS) in the model (shown in gray). Results across different LD pruning schemes are shown using three different correlation r2 thresholds. For each threshold, LD blocks were defined as sets of correlated CR ARE variants or ARE variants from non-digestive tissues with r2 greater than or equal to 0.9, 0.8, or 0.5. For each LD block, a priority pruning scheme was employed selecting the ARE variant with the strongest CRC association p-value. Enrichment tests were repeated for each pruning threshold and compared to enrichment results without LD pruning (‘No Threshold’).
Fig 5Rare variant test set.
Variant sets were anchored on Transcription Start Sites (TSS) as defined by protein coding gene transcripts with validated RefSeq records. If a gene had multiple TSS, the 5'-most and 3'-most TSS were used as anchors. Accordingly, variants overlapping ARE within 200Kb of a TSS were pooled into a test set.