| Literature DB >> 23193196 |
S Hong Lee1, Denise Harold, Dale R Nyholt, Michael E Goddard, Krina T Zondervan, Julie Williams, Grant W Montgomery, Naomi R Wray, Peter M Visscher.
Abstract
Common diseases such as endometriosis (ED), Alzheimer's disease (AD) and multiple sclerosis (MS) account for a significant proportion of the health care burden in many countries. Genome-wide association studies (GWASs) for these diseases have identified a number of individual genetic variants contributing to the risk of those diseases. However, the effect size for most variants is small and collectively the known variants explain only a small proportion of the estimated heritability. We used a linear mixed model to fit all single nucleotide polymorphisms (SNPs) simultaneously, and estimated genetic variances on the liability scale using SNPs from GWASs in unrelated individuals for these three diseases. For each of the three diseases, case and control samples were not all genotyped in the same laboratory. We demonstrate that a careful analysis can obtain robust estimates, but also that insufficient quality control (QC) of SNPs can lead to spurious results and that too stringent QC is likely to remove real genetic signals. Our estimates show that common SNPs on commercially available genotyping chips capture significant variation contributing to liability for all three diseases. The estimated proportion of total variation tagged by all SNPs was 0.26 (SE 0.04) for ED, 0.24 (SE 0.03) for AD and 0.30 (SE 0.03) for MS. Further, we partitioned the genetic variance explained into five categories by a minor allele frequency (MAF), by chromosomes and gene annotation. We provide strong evidence that a substantial proportion of variation in liability is explained by common SNPs, and thereby give insights into the genetic architecture of the diseases.Entities:
Mesh:
Year: 2012 PMID: 23193196 PMCID: PMC3554206 DOI: 10.1093/hmg/dds491
Source DB: PubMed Journal: Hum Mol Genet ISSN: 0964-6906 Impact factor: 6.150
Estimated heritability using genome-wide SNP data after the stringent QC
| Disease | Case/control | No. of SNPs | Heritabilityb | GWASc | ||
|---|---|---|---|---|---|---|
| ED | 3154/6981 | 488 532 | 0.26 (0.04) | 3.62e-11 | ∼ 0.5 ( | <0.01 ( |
| AD | 3290/3849 | 499 757 | 0.24d (0.03) | 2.15e-15 | ∼ 0.76 ( | 0.18d ( |
| MS | 1604/1953 | 293 474 | 0.30e (0.03) | 7.15e-22 | 0.25–0.76 ( | 0.06e ( |
aEstimated genetic variance proportional to the total variance on the liability scale. bHeritability estimated from twin or family-based studies. cVariance explained by genome-wide significant SNPs. dOf this, ∼0.04 can be attributed to the APOE locus. eOf this ∼0.03 can be attributed to the MHC region.
Estimated proportion of variance on the liability scales explained by all SNPs and partitioned by SNP MAF
| MAF | ED | AD | MS | |||
|---|---|---|---|---|---|---|
| No. of SNPs | No. of SNPs | No. of SNPs | ||||
| <0.1 | 83034 | 0.03 (0.03) | 83002 | 0.08 (0.02) | 40360 | 0.03 (0.02) |
| 0.1–0.2 | 118571 | 0.03 (0.04) | 121780 | 0.00 (0.03) | 70550 | 0.08 (0.03) |
| 0.2–0.3 | 102261 | 0.07 (0.04) | 104937 | 0.06 (0.03) | 63876 | 0.07 (0.03) |
| 0.3–0.4 | 94183 | 0.08 (0.03) | 96610 | 0.08 (0.03) | 60243 | 0.09 (0.03) |
| 0.4–0.5 | 90483 | 0.05 (0.03) | 93428 | 0.02 (0.02) | 58445 | 0.03 (0.02) |
| Total | 488532 | 0.25 | 499757 | 0.25 | 293474 | 0.30 |
Figure 1.Joint analysis for each chromosome for estimating the genetic variance using SNP data. (A) ED. y = −0.0002 + 0.00009x, R2 = 0.37, P = 0.003. (B) AD. y = 0.0081 + 0.00003x, R2 = 0.024, P = 0.49 and omitting chromosome 19, y = 0.00061 + 0.00007x, R2 = 0.25, P = 0.02. (C) MS. y = −0.002509 + 0.00012x, R2 = 0.31, P = 0.007 and omitting chromosome 6, y = 0.0014 + 0.0001x, R2 = 0.45, P = 0.0009.
Estimated proportion of variance on the liability scales explained by SNPs associated with annotated genes and SNPs not associated with annotated genes
| No. of SNPs | ∼Mb | |||
|---|---|---|---|---|
| ED | ||||
| Genesa | 253514 | 1370 (49%) | 0.13 (0.03) | 50% |
| Not in genes | 235018 | 1408 (51%) | 0.13 (0.03) | 50% |
| Total | 488532 | 2778 (100%) | 0.27 | |
| AD | ||||
| Genesa | 259031 | 1367 (49%) | 0.15 (0.03) | 62% |
| Not in genes | 240726 | 1412 (51%) | 0.09 (0.03) | 38% |
| Total | 499757 | 2779 (100%) | 0.24 | |
| MS | ||||
| genesa | 150499 | 1368 (49%) | 0.19 (0.03) | 62% |
| Not in genes | 142975 | 1409 (51%) | 0.11 (0.03) | 38% |
| Total | 293474 | 2777 (100%) | 0.30 | |
aSNPs were assigned to genes if they were positioned within 20kb from the boundary of a gene. The P-value for difference between the proportion of physical coverage and genetic variance of genic region is P = 0.785, 0.117 and 0.059 and for ED, AD and MS, respectively.
Impact of genotype calling algorithm on estimates of heritability from control–control samples
| Disease | Samples | No. of SNPs | ||
|---|---|---|---|---|
| Standard QC | ||||
| 1958A (I)/1958B (I) | 1076/1576 | 492893 | 0.00 (0.13) | 1.00 |
| 1958A (G)/1958B (I) | 1076/1576 | 492893 | 0.20 (0.14) | 0.13 |
| 1958A (I)/OECX (I) | 1076/2565 | 473047 | 0.04 (0.10) | 0.66 |
| 1958A (G)/OECX (I) | 1076/2565 | 466005 | 0.48 (0.10) | 1.9e-06 |
| 1958A (I)/QIMR (G) | 1076/1836 | 488150 | 0.53 (0.12) | 4.7e-06 |
| 1958A (G)/QIMR (G) | 1076/1836 | 472019 | 0.64 (0.12) | 1.0e-07 |
| Most stringent QC | ||||
| 1958A (G)/OECX (I) | 1076/2565 | 313076 | 0.04 (0.10) | 0.68 |
| 1958A (I)/QIMR (G) | 1076/1836 | 397802 | 0.08 (0.12) | 0.48 |
| 1958A (G)/QIMR (G) | 1076/1836 | 337695 | 0.16 (0.12) | 0.20 |
The 1958 cohort was split into a sample (1958A) for which genotypes had been called with both Illuminus (I) and with GenCall (G) and into a sample only called with Illuminus (1958B). The OECX sample is the OEC control samples with 1958 cohort controls removed. In this way, 1958A, 1958B and OECX are three independent control samples.
Figure 2.Histogram of the proportion of genotypes for each SNP that is called the same for the 1958A cohort called by the Illuminus or GenCall algorithm.