| Literature DB >> 26865700 |
Jaleal S Sanjak1, Anthony D Long2, Kevin R Thornton2.
Abstract
Genome-wide association studies (GWAS) have associated many single variants with complex disease, yet the better part of heritable complex disease risk remains unexplained. Analytical tools designed to work under specific population genetic models are needed. Rare variants are increasingly shown to be important in human complex disease, but most existing GWAS data do not cover rare variants. Explicit population genetic models predict that genes contributing to complex traits and experiencing recurrent, unconditionally deleterious, mutation will harbor multiple rare, causative mutations of subtle effect. It is difficult to identify genes harboring rare variants of large effect that contribute to complex disease risk via the single marker association tests typically used in GWAS. Gene/region-based association tests may have the power detect associations by combining information from multiple markers, but have yielded limited success in practice. This is partially because many methods have not been widely applied. Here, we empirically demonstrate the utility of a procedure based on the rank truncated product (RTP) method, filtered to reduce the effects of linkage disequilibrium. We apply the procedure to the Wellcome Trust Case Control Consortium (WTCCC) data set, and uncover previously unidentified associations, some of which have been replicated in much larger studies. We show that, in the absence of significant rare variant coverage, RTP based methods still have the power to detect associated genes. We recommend that RTP-based methods be applied to all existing GWAS data to maximize the usefulness of those data. For this, we provide efficient software implementing our procedure.Entities:
Keywords: GWAS; gene-based rare variants
Mesh:
Substances:
Year: 2016 PMID: 26865700 PMCID: PMC4825638 DOI: 10.1534/g3.115.026013
Source DB: PubMed Journal: G3 (Bethesda) ISSN: 2160-1836 Impact factor: 3.154
New Associations: regions with ESM test 1e-6 with no corresponding hit from Wellcome ) are reported below
| Disease | Chr | Position (Mb) | Gene Region | Source |
|---|---|---|---|---|
| CAD | 7 | 80.78–80.88 | SEMA3C | This analysis |
| CAD | 7 | 129.993–130.123 | ZC3HC1/KLHDC10 | ( |
| T1D, RA | 22 | 37.096–37.203 | IL2RB | ( |
| IBD | 1 | 172.872–172.983 | FASLG/TNFSF18 | ( |
Three out of four regions contain corresponding hits in the NHGRI GWAS database not due to Wellcome ) or were otherwise previously indicated in the particular disease as cited in the source column above. One region is novel based on our analysis, and overlaps with a biologically plausible gene SEMA3C. CAD, coronary artery disease; T1D, type 1 diabetes; RA, rheumatoid arthritis; IBD, Chron’s disease.
Figure 1Manhattan plots with ESM significant regions highlighted. Single marker p-values vs. chromosomal position (BP) for all seven diseases analyzed, with SNPs in ESM significant (ESM 1e–6) regions highlighted in green. Horizontal lines are placed at to illustrate the typical single marker genome-wide significance threshold. SNP clusters that are highlighted in green, but do not contain a single genome-wide significant SNP, are reported as novel.
Figure 2Region plot for SEMA3C hit. The top panel contains single marker (black points) and ESM test (red triangles) -values for coronary artery disease vs. chromosomal position in the region chr7:80-82 (Mb). Each ESM test point is plotted at the midpoint of a genomic window to which that -values corresponds. The single 100 kb ESM significant (ESM 1e–6) region chr7:80.78-80.88 (Mb) is demarcated by vertical dashed lines, and the horizontal lines are placed at to indicate the ESM test significance threshold. The middle panel contains the recombination rate in cM/Mb obtained from HapMap througout the same region. The lower panel shows the refseq gene UCSC genome browser track for the region.