| Literature DB >> 29439659 |
Worachart Lert-Itthiporn1,2, Bhoom Suktitipat3,4, Harald Grove2,4, Anavaj Sakuntabhai5,6,7, Prida Malasit8,9, Nattaya Tangthawornchaikul8,9, Fumihiko Matsuda10, Prapat Suriyaphol11,12.
Abstract
BACKGROUND: Imputation involves the inference of untyped single nucleotide polymorphisms (SNPs) in genome-wide association studies. The haplotypic reference of choice for imputation in Southeast Asian populations is unclear. Moreover, the influence of SNP annotation on imputation results has not been examined.Entities:
Keywords: Genotype; Imputation; Pan-Asian SNP; Reference; SNP annotation
Mesh:
Year: 2018 PMID: 29439659 PMCID: PMC5812212 DOI: 10.1186/s12881-018-0534-8
Source DB: PubMed Journal: BMC Med Genet ISSN: 1471-2350 Impact factor: 2.103
Fig. 1Boxplot of accuracies and yields for imputation results across all populations. Five percent of randomly removed SNPs were imputed with IMPUTE2 using either the 1000 Genomes project phase I (1000G) or combined Chinese and Japanese haplotypes from the International HapMap project phase II (HMII) as a reference. The imputed SNPs were tested for accuracy with the previously removed SNPs. The same set of the removed SNPs was applied to all population dataset. The technique was repeated five times. a Boxplot of accuracy comparing populations and references. b Boxplot of yield comparing populations and references. Abbreviations: Indonesia (ID), Malaysia (MY), the Philippines (PI), Singapore (SG), and Thailand (TH)
Fig. 2Imputation accuracy and yield by chromosome. The results derived from 5% randomly removed SNPs. Imputation with IMPUTE2 was accomplished to recover the removed SNPs. The imputed SNPs were tested for accuracy with previously removed SNPs. The same set of the removed SNPs was applied to all population dataset. This process was repeated five times. a Imputation accuracy by chromosome using HMII as a reference. b Imputation accuracy by chromosome using 1000G as a reference. c Imputation yield by chromosome using HMII as a reference. d Imputation yield by chromosome using 1000G as a reference. Abbreviations: Indonesia (ID), Malaysia (MY), the Philippines (PI), Singapore (SG), and Thailand (TH)
Fig. 3Multidimensional scaling plot of Southeast Asian populations from PanSNPdb. Genotype data of samples from Southeast Asian populations were downloaded from PanSNPdb. After quality control, multidimensional scaling (MDS) was performed in PLINK v1.07. a Plotting of the first (C1) and the second (C2) axes. b Plotting of the third (C3) and the fourth (C4) axes. Abbreviations: Indonesia (ID), Malaysia (MY), the Philippines (PI), Singapore (SG), Thailand (TH), China (CHB) and Japan (JPT)
Squared correlation of allele frequencies and chi-square P-values from SNPs in different regions
| Region | Squared correlation (r2) of minor allele frequency | Squared correlation (r2) | ||
|---|---|---|---|---|
| Before post-imputation QC | After post-imputation QC | Before post-imputation QC | After post-imputation QC | |
| Coding region | 0.868 | 0.997 | 0.387 | 0.813 |
| Complex region | 0.817 | 0.991 | 0.267 | 0.782 |
| Intergenic region | 0.864 | 0.996 | 0.328 | 0.789 |
| Intron region | 0.863 | 0.995 | 0.340 | 0.784 |
| UTR region | 0.830 | 0.991 | 0.303 | 0.756 |