| Literature DB >> 23284720 |
Ming-Huei Chen1, Jie Huang, Wei-Min Chen, Martin G Larson, Caroline S Fox, Ramachandran S Vasan, Sudha Seshadri, Christopher J O'Donnell, Qiong Yang.
Abstract
Imputation has been widely used in genome-wide association studies (GWAS) to infer genotypes of un-genotyped variants based on the linkage disequilibrium in external reference panels such as the HapMap and 1000 Genomes. However, imputation has only rarely been performed based on family relationships to infer genotypes of un-genotyped individuals. Using 8998 Framingham Heart Study (FHS) participants genotyped with Affymetrix 550K SNPs, we imputed genotypes of same set of SNPs for additional 3121 participants, most of whom were never genotyped due to lack of DNA sample. Prior to imputation, 122 pedigrees were too large to be handled by the imputation software Merlin. Therefore, we developed a novel pedigree splitting algorithm that can maximize the number of genotyped relatives for imputing each un-genotyped individual, while keeping new sub-pedigrees under a pre-specified size. In GWAS of four phenotypes available in FHS (Alzheimer disease, circulating levels of fibrinogen, high-density lipoprotein cholesterol, and uric acid), we compared results using genotyped individuals only with results using both genotyped and imputed individuals. We studied the impact of applying different imputation quality filtering thresholds on the association results and did not found a universal threshold that always resulted in a more significant p-value for previously identified loci. However most of these loci had a lower p-value when we only included imputed genotypes with with ≥60% SNP- and ≥50% person-specific imputation certainty. In summary, we developed a novel algorithm for splitting large pedigrees for imputation and found a plausible imputation quality filtering threshold based on FHS. Further examination may be required to generalize this threshold to other studies.Entities:
Mesh:
Substances:
Year: 2012 PMID: 23284720 PMCID: PMC3524237 DOI: 10.1371/journal.pone.0051589
Source DB: PubMed Journal: PLoS One ISSN: 1932-6203 Impact factor: 3.240
Figure 1Box plots of imputation certainty in FHS imputed samples.
Figure 2Scatter plot of imputation certainty against MAF.
Figure 3Scatter plot of MAF in well-genotyped sample and filtered (person_specific imputation certainty greater than 50%) imputed sample.
Top SNPs (p-value <1.25E-7) from GWAS of Alzheimer disease, fibrinogen, HDL and uric acid using 550K genotype data.
| Trait | SNP | Chr | Position | ClosestRefGene | HWE p | callrate | MAF | N | beta | se | p |
| Alzheimer disease | rs4420638 | 19 | 50114786 |
| 0.78 | 0.999 | 0.16 | 3192 | 0.856 | 0.124 | 5.96E-12 |
| Fibrinogen | rs4681 | 4 | 155710282 |
| 0.53 | 0.998 | 0.18 | 7271 | 10.009 | 1.446 | 4.48E-12 |
| HDL | rs3764261 | 16 | 55550825 |
| 0.04 | 0.982 | 0.31 | 7996 | 3.077 | 0.266 | 5.71E-31 |
| rs1919484 | 8 | 19913956 |
| 0.11 | 0.981 | 0.27 | 7999 | 1.948 | 0.276 | 1.76E-12 | |
| rs10186236 | 2 | 115096721 |
| 0.21 | 0.999 | 0.19 | 8128 | −1.647 | 0.307 | 7.79E-8 | |
| rs1800588 | 15 | 56510967 |
| 0.65 | 1.000 | 0.22 | 8134 | 1.514 | 0.293 | 2.42E-7 | |
| Uric acid | rs16890979 | 4 | 9531265 |
| 0.01 | 0.998 | 0.25 | 8229 | −0.352 | 0.022 | 2.64E-59 |
| rs2231142 | 4 | 89271347 |
| 0.76 | 0.999 | 0.11 | 8234 | 0.246 | 0.031 | 1.46E-15 | |
| rs1165205 | 6 | 25978521 |
| 0.19 | 0.985 | 0.46 | 8096 | −0.105 | 0.019 | 4.34E-8 |
Position in base pairs, based on NCBI build 36.1 (hg18).
MAF is computed in genotyped and phenotyped sample.
rs4420638 is a marker of the APOE haplotype.
Mean imputation certainty of the top SNPs in the entire 3121 imputed sample and in the person-specific certainty >0.5, SNP-specific certainty >0.6 and phenotyped sample.
| Trait | SNP | mean(sd) certainty in 3121 imputed sample |
| mean(sd) certainty in |
| Alzheimer disease | rs4420638 | 0.838(0.162) | 288 | 0.955(0.074) |
| Fibrinogen | rs4681 | 0.824(0.161) | 331 | 0.971(0.068) |
| HDL | rs3764261 | 0.742(0.187) | 512 | 0.928(0.128) |
| rs1919484 | 0.753(0.174) | 524 | 0.930(0.117) | |
| rs10186236 | 0.802(0.167) | 431 | 0.962(0.072) | |
| rs1800588 | 0.817(0.164) | 419 | 0.951(0.092) | |
| Uric acid | rs16890979 | 0.769(0.170) | 595 | 0.939(0.100) |
| rs2231142 | 0.884(0.142) | 638 | 0.974(0.049) | |
| rs1165205 | 0.658(0.199) | 553 | 0.980(0.068) |
: the number of phenotyped and imputed individuals with person-specific certainty >0.5 and SNP-specific certainty >0.6.
Results of top SNPs (p-value <1.25E-7) from GWAS of Alzheimer disease, fibrinogen, HDL and uric acid using 550K genotype data and incorporated genotype data.
| genotyped subjects only | genotyped and imputed subjects | ||||||||||||
| Trait | SNP | N | MAF | beta | se | p |
| N | MAF | beta | se | p |
|
| Alzheimer disease | rs4420638 | 3192 | 0.16 | 0.856 | 0.124 | 5.96E-12 | 1.02 | 3480 | 0.16 | 0.902 | 0.108 | 4.93E-17 | 1.03 |
| Fibrinogen | rs4681 | 7271 | 0.18 | 10.009 | 1.446 | 4.48E-12 | 1.03 | 7602 | 0.18 | 9.729 | 1.431 | 1.05E-11 | 1.02 |
| HDL | rs3764261 | 7996 | 0.31 | 3.077 | 0.266 | 5.71E-31 | 1.03 | 8508 | 0.32 | 3.155 | 0.259 | 3.15E-34 | 1.02 |
| rs1919484 | 7999 | 0.27 | 1.948 | 0.276 | 1.76E-12 | 8523 | 0.27 | 2.020 | 0.270 | 7.59E-14 | |||
| rs10186236 | 8128 | 0.19 | −1.647 | 0.307 | 7.79E-8 | 8559 | 0.19 | −1.577 | 0.301 | 1.62E-7 | |||
| rs1800588 | 8134 | 0.22 | 1.514 | 0.293 | 2.42E-7 | 8553 | 0.22 | 1.548 | 0.288 | 8.09E-8 | |||
| Uric acid | rs16890979 | 8229 | 0.25 | −0.352 | 0.022 | 2.64E-59 | 1.03 | 8824 | 0.23 | −0.351 | 0.021 | 8.13E-61 | 1.02 |
| rs2231142 | 8234 | 0.11 | 0.246 | 0.031 | 1.46E-15 | 8872 | 0.10 | 0.248 | 0.030 | 2.86E-16 | |||
| rs1165205 | 8096 | 0.46 | −0.105 | 0.019 | 4.34E-8 | 8649 | 0.47 | −0.109 | 0.019 | 6.96E-9 | |||
Number of genome-wide significant SNPs (p-value <1.25E-7) with improved statistical significance (smaller p-value) from GWAS of Alzheimer disease, fibrinogen, HDL and uric acid using incorporated genotype data (new GWAS).
| Trait | # SNPs with smaller p/# SNPs with p<1.25E-7 in 550K GWAS | # SNPs with p<1.25E-7 in new GWAS but not in 550K GWAS |
| Alzheimer disease | 1/1 | 0 |
| Fibrinogen | 0/4 | 0 |
| HDL | 14/15 | 1 |
| Uric acid | 83/126 | 1 |
Figure 4Scatter plots of –log10(p-value) from 550K GWAS and GWAS using incorporated genotype data.
Figure 5Example for pedigree splitting and trimming.
Sample characteristics of Alzheimer disease, fibrinogen, HDL and uric acid data in the genotyped and imputed sample.
| Alzheimerdisease | Fibrinogen(mg/dl) | HDL (mg/dl) | Uric acid(mg/dl) | |
| Sample size | 4200 | 8229 | 9453 | 10491 |
| Phenotype | 284 (6.8%) | 321.2 (67.9) | 52.6 (16.1) | 5.3 (1.5) |
| Age | 78.2 (8.2) | 48.0 (12.0) | 51.2 (13.2) | 38.9 (9.8) |
| Sex (female) | 2318 (55.2%) | 4414 (53.6%) | 5079 (53.7%) | 5443 (51.9%) |
| Original cohort | 1899 (45.2%) | 1062 (13%) | 2044 (21.6%) | 1984 (18.9%) |
| Offspring cohort | 2301 (54.8%) | 3131 (38%) | 3339 (35.3%) | 4459 (42.5%) |
| Third Generation cohort | NA | 4036 (49%) | 4070 (43.1%) | 4048 (38.6%) |
| Imputed | 978 (23.3%) | 942 (11.4%) | 1313 (13.9%) | 2248 (21.4%) |
| Length of follow-up (years) | 13.2 (8.2) | NA | NA | NA |
For continuous variables, mean value and standard deviation (in parenthesis) are presented, while for binary variables, the number of cases and its proportion (in parenthesis) are presented.