| Literature DB >> 28004816 |
Wen-Chi Chou1,2, Hou-Feng Zheng3, Chia-Ho Cheng1, Han Yan4, Li Wang3, Fang Han4, J Brent Richards5,6, David Karasik1, Douglas P Kiel1,2, Yi-Hsiang Hsu1,2,7.
Abstract
Imputation using the 1000 Genomes haplotype reference panel has been widely adapted to estimate genotypes in genome wide association studies. To evaluate imputation quality with a relatively larger reference panel and a reference panel composed of different ethnic populations, we conducted imputations in the Framingham Heart Study and the North Chinese Study using a combined reference panel from the 1000 Genomes (N = 1,092) and UK10K (N = 3,781) projects. For rare variants with 0.01% < MAF ≤ 0.5%, imputation in the Framingham Heart Study with the combined reference panel increased well-imputed genotypes (with imputation quality score ≥0.4) from 62.9% to 76.1% when compared to imputation with the 1000 Genomes. For the North Chinese samples, imputation of rare variants with 0.01% < MAF ≤ 0.5% with the combined reference panel increased well-imputed genotypes by from 49.8% to 61.8%. The predominant European ancestry of the UK10K and the combined reference panels may explain why there was less of an increase in imputation success in the North Chinese samples. Our results underscore the importance and potential of larger reference panels to impute rare variants, while recognizing that increasing ethnic specific variants in reference panels may result in better imputation for genotypes in some ethnic groups.Entities:
Mesh:
Year: 2016 PMID: 28004816 PMCID: PMC5177868 DOI: 10.1038/srep39313
Source DB: PubMed Journal: Sci Rep ISSN: 2045-2322 Impact factor: 4.379
Number of actual genotypes and imputed variants in FHS.
| MAF = 0 | 0 < MAF ≤ 0.01% | 0.01% < MAF ≤ 0.5% | 0.5% < MAF ≤ 1% | 1% < MAF ≤ 5% | MAF > 5% | Total | |
|---|---|---|---|---|---|---|---|
| Genotyped by Affy550K chips | 1,578 (0.3%) | 0 | 30,175 (5.6%) | 15,171 (2.8%) | 60,450 (11.3%) | 428,843 (80%) | 536,217 |
| Imputed with 1000 G | 1,235,417 (4%) | 6,834,647 (23%) | 11,141,317 (38%) | 1,410,031 (5%) | 2,799,620 (9%) | 6,184,400 (21%) | 29,605,432 |
| Imputed with 1000 G + UK10K | 3,897,784 (10%) | 7,743,094 (20%) | 17,568,163 (45%) | 1,296,955 (3%) | 2,689,779 (7%) | 6,193,345 (16%) | 39,389,120 |
Figure 1Imputation quality of FHS data evaluated by squared correlation (R2) between actual allelic dosages and imputed allelic dosages from imputations with 1000 G and 1000 G + UK10K.
The actual allelic dosages were from a second set of FHS genotype data, OMNI5, and the original genotypes (Affy550K, the input genotype data) were excluded. The MAFs were estimated from variants imputed with 1000 G + UK10K reference panel.
Proportion of well-imputed variants in FHS imputation results.
| MAF | 0 < MAF ≤ 0.01% | 0.01% < MAF ≤ 0.5% | 0.5% < MAF ≤ 1% | 1% < MAF ≤ 5% | MAF > 5% | ||||||
|---|---|---|---|---|---|---|---|---|---|---|---|
| Reference panel | 1000 G | 1000 G + UK10K | 1000 G | 1000 G + UK10K | 1000 G | 1000 G + UK10K | 1000 G | 1000 G + UK10K | 1000 G | 1000 G + UK10K | |
| Total imputed variants | 6,834,647 | 7,743,094 | 11,141,317 | 17,568,163 | 1,410,031 | 1,296,955 | 2,799,620 | 2,689,779 | 6,184,400 | 6,193,345 | |
| INFO | ≥0.4 | 12.0% | 21.2% | 62.9% | 76.1% | 90.8% | 98.4% | 97.1% | 99.2% | 99.3% | 99.8% |
| ≥0.5 | 9.0% | 17.9% | 46.4% | 62.1% | 74.5% | 94.3% | 89.3% | 97.7% | 97.8% | 99.4% | |
| ≥0.6 | 4.9% | 15.1% | 33.0% | 46.0% | 55.7% | 83.2% | 77.3% | 93.2% | 95.0% | 98.4% | |
| ≥0.7 | 0.0% | 11.6% | 22.5% | 31.4% | 38.8% | 64.2% | 64.2% | 83.4% | 91.0% | 96.4% | |
| ≥0.8 | 0.0% | 7.4% | 13.4% | 19.5% | 25.0% | 41.9% | 51.0% | 67.8% | 85.0% | 92.2% | |
| ≥0.9 | 0.0% | 3.6% | 5.2% | 10.1% | 14.1% | 23.0% | 36.9% | 48.8% | 73.9% | 83.5% | |
The imputations were performed with 1000 G and 1000 G + UK10K reference panels.
Proportion of well-imputed variants with functional roles and MAFs.
| Functional roles | 0 < MAF ≤ 0.1% | 0.1% < MAF ≤ 0.5% | 0.5% < MAF ≤ 1% | 1% < MAF ≤ 5% | MAF > 5% |
|---|---|---|---|---|---|
| Exons | 78.3% | 90.1% | 95.7% | 97.9% | 99.1% |
| Nonsense | 79.6% | 87.1% | 96.3% | 95.8% | 100% |
| Splicing | 100% | 83.3% | 100% | 100% | 100% |
| Missense | 78.2% | 90.8% | 97.0% | 98.6% | 99.3% |
| 3′UTR | 79.1% | 91.2% | 96.8% | 98.7% | 99.3% |
| 5′UTR | 78.7% | 91.3% | 95.9% | 98.2% | 99.4% |
| Non-coding RNAs located in exons | 77.7% | 88.1% | 92.7% | 96.2% | 98.7% |
| Non-coding RNAs located in introns | 80.3% | 91.9% | 97.3% | 99.1% | 99.4% |
| Intergenic regions | 79.2% | 90.9% | 96.7% | 98.7% | 99.3% |
Number of actual genotypes and imputed variants in NCS.
| MAF = 0 | 0 < MAF ≤ 0.01% | 0.01% < MAF ≤ 0.5% | 0.5% < MAF ≤ 1% | 1% < MAF ≤ 5% | MAF > 5% | Total | |
|---|---|---|---|---|---|---|---|
| Genotyped by Affymetrix Axiom CHB 1 array | 0 | 0 | 0 | 0 | 16,173 (3%) | 511,811 (97%) | 527,984 |
| Imputed with 1000 G | 1,642,218 (6%) | 8,111,504 (28%) | 10,354,147 (35%) | 1,386,886 (5%) | 2,299,157 (8%) | 5,404,143 (19%) | 29,198,055 |
| Imputed with 1000 G + UK10K | 4,791,282 (12%) | 13,485,420 (33%) | 12,236,827 (30%) | 1,450,070 (4%) | 2,464,567 (6%) | 6,183,666 (15%) | 40,611,832 |
The imputations were performed with 1000 G and 1000 G + UK10K reference panels.
Figure 2Imputation quality of NCS data evaluated by squared correlation (R2) between actual allelic dosages and imputed allelic dosages from imputations with 1000 G and 1000 G + UK10K.
The actual allelic dosages were from the input genotype data, and the actual genotypes were first masked and then imputed to get imputed allelic dosages. The MAFs were estimated from variants imputed with 1000 G + UK10K.
Proportion of well-imputed variants in NCS imputation results.
| MAF | 0 < MAF ≤ 0.01% | 0.01% < MAF ≤ 0.5% | 0.5% < MAF ≤ 1% | 1% < MAF ≤ 5% | MAF > 5% | ||||||
|---|---|---|---|---|---|---|---|---|---|---|---|
| Reference panel | 1000 G | 1000 G + UK10K | 1000 G | 1000 G + UK10K | 1000 G | 1000 G + UK10K | 1000 G | 1000 G + UK10K | 1000 G | 1000 G + UK10K | |
| Total imputed variants | 8,111,504 | 13,485,420 | 10,354,147 | 12,236,827 | 1,386,886 | 1,450,070 | 2,299,157 | 2,464,567 | 5,404,143 | 6,183,666 | |
| INFO | ≥0.4 | 0.2% | 0.9% | 49.8% | 61.8% | 87.4% | 94.0% | 96.1% | 98.3% | 99.5% | 99.8% |
| ≥0.5 | 0.0% | 0.2% | 33.8% | 47.8% | 67.9% | 80.6% | 86.3% | 93.0% | 98.3% | 99.4% | |
| ≥0.6 | 0.0% | 0.0% | 21.1% | 33.2% | 47.6% | 60.3% | 72.9% | 81.7% | 96.3% | 98.3% | |
| ≥0.7 | 0.0% | 0.0% | 12.3% | 20.8% | 31.4% | 40.4% | 59.8% | 67.5% | 93.8% | 96.5% | |
| ≥0.8 | 0.0% | 0.0% | 6.2% | 11.5% | 19.1% | 24.5% | 47.4% | 53.2% | 90.3% | 93.6% | |
| ≥0.9 | 0.0% | 0.0% | 2.2% | 4.9% | 9.3% | 11.9% | 33.8% | 37.8% | 83.3% | 88.0% | |
The imputations were performed with 1000 G and 1000 G + UK10K reference panels.
Figure 3Correlation between the FHS and NCS imputation results, by MAF and FST.
The x and y axes of each small plot are INFO scores of FHS and NCS. Small plots show distribution of two INFO scores within a range of FST and MAF. CEU and CHB stand for Utah residents with ancestry from Northern and Western Europe and Han Chinese in Beijing.