| Literature DB >> 35151364 |
Alexa A Woodward1, Deanne M Taylor2, Elizabeth Goldmuntz2, Laura E Mitchell3, A J Agopian3, Jason H Moore4, Ryan J Urbanowicz5.
Abstract
BACKGROUND: Gene set enrichment analysis (GSEA) uses gene-level univariate associations to identify gene set-phenotype associations for hypothesis generation and interpretation. We propose that GSEA can be adapted to incorporate SNP and gene-level interactions. To this end, gene scores are derived by Relief-based feature importance algorithms that efficiently detect both univariate and interaction effects (MultiSURF) or exclusively interaction effects (MultiSURF*). We compare these interaction-sensitive GSEA approaches to traditional χ2 rankings in simulated genome-wide array data, and in a target and replication cohort of congenital heart disease patients with conotruncal defects (CTDs).Entities:
Keywords: Congenital heart disease; Epistasis; GWAS; Gene set enrichment analysis
Year: 2022 PMID: 35151364 PMCID: PMC8841104 DOI: 10.1186/s13040-022-00287-w
Source DB: PubMed Journal: BioData Min ISSN: 1756-0381 Impact factor: 4.079
Fig. 1Flowchart of the univariate and Relief-based methods applied to the case-control and trio data
Fig. 2Example case and pseudo-control generated from a case-parent trio
Fig. 3Heatmap of leading edge genes from the top 10 GO terms for each analysis of the simulated data. Hierarchical clustering was applied to both the rows and columns
Fig. 6Heatmap of leading edge genes captured in all four of the Relief-based methods applied to the case-control and trio data. Hierarchical clustering was applied to both the rows and columns
Spearman’s rank-order correlation coefficients (ρ) for gene ranks between analyses
| Data | MultiSURF & MultiSURF* | ||
|---|---|---|---|
| Cohort 1 | 0.479 | 0.293 | 0.644 |
| Cohort 2 | 0.543 | 0.325 | 0.737 |
Fig. 4Density plots depicting the correlation between the gene ranks across the three analyses in Cohort 1. Spearman’s rank-order correlation coefficient (ρ) is given for each comparison
Fig. 5For each analysis, the bar plot illustrates the percentage of genes in each category that appeared on the leading edge. Numbers to the right of each bar are the number of genes on the leading edge for that category