| Literature DB >> 26715385 |
Young Jin Kim1,2, Juyoung Lee3, Bong-Jo Kim4, Taesung Park5,6.
Abstract
BACKGROUND: Rare variants have gathered increasing attention as a possible alternative source of missing heritability. Since next generation sequencing technology is not yet cost-effective for large-scale genomic studies, a widely used alternative approach is imputation. However, the imputation approach may be limited by the low accuracy of the imputed rare variants. To improve imputation accuracy of rare variants, various approaches have been suggested, including increasing the sample size of the reference panel, using sequencing data from study-specific samples (i.e., specific populations), and using local reference panels by genotyping or sequencing a subset of study samples. While these approaches mainly utilize reference panels, imputation accuracy of rare variants can also be increased by using exome chips containing rare variants. The exome chip contains 250 K rare variants selected from the discovered variants of about 12,000 sequenced samples. If exome chip data are available for previously genotyped samples, the combined approach using a genotype panel of merged data, including exome chips and SNP chips, should increase the imputation accuracy of rare variants.Entities:
Mesh:
Year: 2015 PMID: 26715385 PMCID: PMC4696174 DOI: 10.1186/s12864-015-2192-y
Source DB: PubMed Journal: BMC Genomics ISSN: 1471-2164 Impact factor: 3.969
Fig. 1Scatter plot of estimated r2 against dosage r2 by MAF bins. Estimated r2 was plotted against dosage r2 by MAF bins (a) MAF ≥ 5 %, (b) MAF = 1–5 %, (c) MAF = 0.5–1 %, (d) MAF < 0.5 %, (e) MAF = 0.3–0.5 %, and (f) MAF < 0.3 %. The red dotted line represents the diagonal
Fig. 2Mean estimated r2 of genotype panels by MAF bins
Genomic coverage of genotype panels of SNP chip only and combined approach
| MAF bin | Estimated | Estimated | ||||
|---|---|---|---|---|---|---|
| Exome chip | SNP chip | Combined | Exome chip | SNP chip | Combined | |
| ALL | 0.367 | 0.435 | 0.560 | 0.492 | 0.749 | 0.818 |
| ≥5 % | 0.600 | 0.794 | 0.901 | 0.756 | 0.953 | 0.983 |
| 1–5 % | 0.374 | 0.403 | 0.588 | 0.510 | 0.799 | 0.881 |
| 0.5–1 % | 0.192 | 0.146 | 0.290 | 0.290 | 0.585 | 0.686 |
| <0.5 % | 0.107 | 0.079 | 0.172 | 0.192 | 0.491 | 0.591 |
Fig. 3Mean estimated r 2 of various combinations of reference panels and genotype panels. Reference panels are the 1000 genomes phase 1 dataset (1KG) and various combinations of whole exome sequencing data (WES), SNP chip data (GWAS), and exome chip data (EXOME)
The number of overlapped variants between reference panels and genotype panels
| Reference panels | Exome chip | SNP chip | Combined |
|---|---|---|---|
| WES | 21,120 | 4,472 | 24,514 |
| WES + EXOME | 38,243 | 7,323 | 41,637 |
| WES + GWAS | 23,972 | 344,359 | 364,402 |
| WES + GWAS + EXOME | 38,243 | 344,359 | 378,695 |
| 1KG | 49,286 | 344,359 | 389,715 |
Fig. 4Mean estimated r2 varied by sample size of reference panel
Relative increase in mean estimated r 2 by reference sample size (MAF 0.5–1 %)
| Genotype panel | 300 to 500 | 500 to 700 | 700 to 848 |
|---|---|---|---|
| SNP chip only | 9.27 % | 4.90 % | 2.09 % |
| Combined (SNP + exome chip) | 8.18 % | 4.46 % | 1.74 % |
Relative increase in mean estimated r 2 by using combination of reference panel and combined approach (MAFs 0.5–1 %)
| R300-C | R500-C | R700-C | R848-C | |
|---|---|---|---|---|
| R300-G | 10.88 % | 19.95 % | 25.30 % | 27.49 % |
| R500-G | 1.47 % | 9.77 % | 14.67 % | 16.67 % |
| R700-G | −3.27 % | 4.64 % | 9.31 % | 11.21 % |
| R848-G | −5.25 % | 2.50 % | 7.07 % | 8.94 % |
The names of panels were abbreviated as follows: R (reference panel), G (the genotype panel of SNP chip only), and C (the genotype panel of combined data). R300, R500, R700, and R848 indicates sample sizes of 300, 500, 700, and 848 for reference panel, respectively. R(sample size) -G represents combination of R(sample size) reference panel and G genotype panel. R(sample size)-C represents the combination of R(sample size) reference panel and C genotype panel