| Literature DB >> 22754564 |
Xiaoyi Gao1, Talin Haritunians, Paul Marjoram, Roberta McKean-Cowdin, Mina Torres, Kent D Taylor, Jerome I Rotter, William J Gauderman, Rohit Varma.
Abstract
Genotype imputation is a vital tool in genome-wide association studies (GWAS) and meta-analyses of multiple GWAS results. Imputation enables researchers to increase genomic coverage and to pool data generated using different genotyping platforms. HapMap samples are often employed as the reference panel. More recently, the 1000 Genomes Project resource is becoming the primary source for reference panels. Multiple GWAS and meta-analyses are targeting Latinos, the most populous, and fastest growing minority group in the US. However, genotype imputation resources for Latinos are rather limited compared to individuals of European ancestry at present, largely because of the lack of good reference data. One choice of reference panel for Latinos is one derived from the population of Mexican individuals in Los Angeles contained in the HapMap Phase 3 project and the 1000 Genomes Project. However, a detailed evaluation of the quality of the imputed genotypes derived from the public reference panels has not yet been reported. Using simulation studies, the Illumina OmniExpress GWAS data from the Los Angles Latino Eye Study and the MACH software package, we evaluated the accuracy of genotype imputation in Latinos. Our results show that the 1000 Genomes Project AMR + CEU + YRI reference panel provides the highest imputation accuracy for Latinos, and that also including Asian samples in the panel can reduce imputation accuracy. We also provide the imputation accuracy for each autosomal chromosome using the 1000 Genomes Project panel for Latinos. Our results serve as a guide to future imputation based analysis in Latinos.Entities:
Keywords: 1000 Genomes Project; HapMap Project; Latino; genotype imputation
Year: 2012 PMID: 22754564 PMCID: PMC3384355 DOI: 10.3389/fgene.2012.00117
Source DB: PubMed Journal: Front Genet ISSN: 1664-8021 Impact factor: 4.599
Phased haplotypes downloaded from the MACH website.
| Population | Code | HapMap phase 3 | 1000 Genomes project | |
|---|---|---|---|---|
| Number of haplotypes | Number of haplotypes | Group code | ||
| Mexican ancestry in Los Angeles, California | MEX | 104 | 132 | AMR |
| Colombian in Medellin, Colombia | CLM | 120 | ||
| Puerto Rican in Puerto Rico | PUR | 110 | ||
| CEPH in Utah residents | CEU | 234 | 174 | EUR |
| Tuscans in Italy | TSI | 176 | 196 | |
| Finnish individuals from Finland | FIN | 186 | ||
| British individuals from England and Scotland | GBR | 178 | ||
| Iberian populations in Spain | IBS | 28 | ||
| Yoruba in Ibadan, Nigeria | YRI | 230 | 176 | AFR |
| African ancestry individuals from Southwest, US | ASW | 122 | ||
| Luhya in Webuye, Kenya | LWK | 194 | ||
| Han Chinese in Beijing, China | CHB | 168 | 194 | ASN |
| Japanese in Tokyo, Japan | JPT | 172 | 178 | |
| Han Chinese South, China | CHS | 200 | ||
| Total | 1084 | 2188 | ||
The population labels were obtained from the HapMap and the 1000 Genomes Project websites.
Figure A1Principal components analysis of the simulated individuals and the HapMap Mexican–American individuals.
Figure 1Distributions of per individual and per SNP errors and the imputed MACH Rsq. Pink and blue denote the 200-individual and 52-individual reference panels, respectively. Genotype imputation accuracy is tested in an additional 500 simulated individuals. The true (simulated) genotypes of 11,825 SNPs on chromosome 22 (those HapMap Phase 3 SNPs not present on the Illumina OmniExpree Beadchip) are compared with the imputed genotypes. (A) Distribution of per individual errors. (B) Distribution of per SNP errors. (C) Boxplots of the MACH Rsq for the imputed SNPs.
Genotype imputation accuracy for chromosome 22 based on the HM3 and 1KGP reference panels.
| Reference data | Reference panels | Number of haplotypes | Memory used (GB) | Per genotype error rate (%) | Per allele error rate (%) |
|---|---|---|---|---|---|
| HM3 | CEU + YRI + JPT + CHB | 804 | 2.3 | 5.17 | 2.67 |
| MEX | 104 | 0.2 | 6.09 | 3.13 | |
| MEX + CEU | 338 | 1.7 | 5.11 | 2.63 | |
| MEX + CEU + YRI | 568 | 2.3 | 4.19 | 2.16 | |
| MEX + CEU + YRI + JPT + CHB | 908 | 2.3 | 4.24 | 2.18 | |
| MEX + CEU + YRI + TSI | 744 | 2.3 | 4.00 | 2.06 | |
| MEX + CEU + YRI + TSI + JPT + CHB | 1084 | 2.3 | 4.12 | 2.12 | |
| 1KGP | MEX | 132 | 1.3 | 4.84 | 2.49 |
| AMR | 362 | 3.5 | 3.72 | 1.91 | |
| AMR + EUR | 1124 | 4.5 | 3.69 | 1.90 | |
| AMR + EUR + AFR | 1616 | 4.9 | 3.27 | 1.68 | |
| AMR + EUR + AFR + ASN | 2188 | 5.3 | 3.35 | 1.73 | |
| AMR + CEU | 536 | 4.1 | 3.58 | 1.84 | |
| AMR + CEU + YRI | 712 | 4.3 | 3.23 | 1.66 | |
| AMR + CEU + YRI + JPT + CHB | 1084 | 4.5 | 3.35 | 1.73 |
HM3, HapMap project phase 3; 1KGP, 1000 genomes project; see Table .
Genotype imputation accuracy for chromosome 9 based on the 1000 Genomes Project reference panels.
| Reference panels | Number of haplotypes | Memory used (GB) | Per genotype error rate (%) | Per allele error rate (%) |
|---|---|---|---|---|
| MEX | 132 | 4.5 | 3.75 | 1.93 |
| AMR | 362 | 11.4 | 2.79 | 1.44 |
| AMR + EUR | 1124 | 14.9 | 2.79 | 1.43 |
| AMR + EUR + AFR | 1616 | 16.0 | 2.64 | 1.40 |
| AMR + EUR + AFR + ASN | 2188 | 17.4 | 3.21 | 1.68 |
| AMR + CEU | 536 | 13.5 | 2.68 | 1.38 |
| AMR + CEU + YRI | 712 | 13.9 | 2.36 | 1.22 |
| AMR + CEU + YRI + JPT + CHB | 1084 | 14.8 | 2.51 | 1.29 |
Figure 2Pairwise plot of the dosage . Diagonal line (red) is a perfect match between the MACH Rsq and the dosage r2. Further off the diagonal line means poorer estimate. The correlation coefficient between Rsq and r2 is 0.96.
Figure A2Pairwise plot of the dosage . Diagonal line (red) is a perfect match between the MACH Rsq and the dosage r2. Further off the diagonal line means poorer estimate.
Figure 3Boxplot of the MACH Rsq for the imputed SNPs stratified by the minor allele frequency. Boxplot of the MACH Rsq for 485,313 imputed SNPs on chromosome 22 (with all typed SNPs by the Illumina OmniExpress excluded) based on the 1000 Genomes Project AMR + CEU + YRI reference panel. Abbreviations: MAF, minor allele frequency.
Figure 4Genotype imputation accuracy by chromosome. Genotype imputation accuracy is measured by per genotype error rate by randomly masking 2% genome-wide SNPs.
Genotype imputation accuracy for chromosome 13 based on the 1000 Genomes Project reference panels.
| Reference panels | Number of haplotypes | Memory used (GB) | Per genotype error rate (%) | Per allele error rate (%) |
|---|---|---|---|---|
| AMR + EUR + AFR | 1616 | 13.0 | 2.14 | 1.11 |
| AMR + EUR + AFR + ASN | 2188 | 14.1 | 2.68 | 1.46 |
| AMR + CEU + YRI | 712 | 11.4 | 2.05 | 1.06 |
| AMR + CEU + YRI + JPT + CHB | 1084 | 12.0 | 2.22 | 1.15 |
Genotype imputation accuracy for chromosome 19 based on the 1000 Genomes Project reference panels.
| Reference panels | Number of haplotypes | Memory used (GB) | Per genotype error rate (%) | Per allele error rate (%) |
|---|---|---|---|---|
| AMR + EUR + AFR | 1616 | 7.2 | 3.88 | 2.05 |
| AMR + EUR + AFR + ASN | 2188 | 7.9 | 4.09 | 2.16 |
| AMR + CEU + YRI | 712 | 6.3 | 3.58 | 1.89 |
| AMR + CEU + YRI + JPT + CHB | 1084 | 6.7 | 4.01 | 2.12 |