| Literature DB >> 26108142 |
Yosuke Kawai1,2, Takahiro Mimori1, Kaname Kojima1,2,3, Naoki Nariai1,2, Inaho Danjoh1, Rumiko Saito1, Jun Yasuda1,2, Masayuki Yamamoto1,2, Masao Nagasaki1,2,3,4.
Abstract
The Tohoku Medical Megabank Organization constructed the reference panel (referred to as the 1KJPN panel), which contains >20 million single nucleotide polymorphisms (SNPs), from whole-genome sequence data from 1070 Japanese individuals. The 1KJPN panel contains the largest number of haplotypes of Japanese ancestry to date. Here, from the 1KJPN panel, we designed a novel custom-made SNP array, named the Japonica array, which is suitable for whole-genome imputation of Japanese individuals. The array contains 659,253 SNPs, including tag SNPs for imputation, SNPs of Y chromosome and mitochondria, and SNPs related to previously reported genome-wide association studies and pharmacogenomics. The Japonica array provides better imputation performance for Japanese individuals than the existing commercially available SNP arrays with both the 1KJPN panel and the International 1000 genomes project panel. For common SNPs (minor allele frequency (MAF)>5%), the genomic coverage of the Japonica array (r(2)>0.8) was 96.9%, that is, almost all common SNPs were covered by this array. Nonetheless, the coverage of low-frequency SNPs (0.5%<MAF⩽5%) of the Japonica array reached 67.2%, which is higher than those of the existing arrays. In addition, we confirmed the high quality genotyping performance of the Japonica array using the 288 samples in 1KJPN; the average call rate 99.7% and the average concordance rate 99.7% to the genotypes obtained from high-throughput sequencer. As demonstrated in this study, the creation of custom-made SNP arrays based on a population-specific reference panel is a practical way to facilitate further association studies through genome-wide genotype imputations.Entities:
Mesh:
Substances:
Year: 2015 PMID: 26108142 PMCID: PMC4635170 DOI: 10.1038/jhg.2015.68
Source DB: PubMed Journal: J Hum Genet ISSN: 1434-5161 Impact factor: 3.172
Figure 1Schematic illustration of tag SNP selection. (a) The flowchart represents the algorithm of tag SNP selection. Target SNPs were selected from SNPs of the 1KJPN panel so that the MAF of each target SNP was ⩾0.5%. The tag SNPs were progressively selected from the target SNPs according to the algorithm. (b) Schematic illustration of target SNPs and tag SNPs along with a chromosomal region. R is the LD measure calculated as the squared correlation coefficient between genotype frequencies of a pair of SNPs. Note that the R described here is distinct from the measure of imputation accuracy, r. The MI is calculated between a pair of SNPs and reflects MAFs and the LD strength of the pair. LD, linkage disequilibrium; MAF, minor allele frequency; MI, mutual information; SNP, single nucleotide polymorphism.
Category of SNPs on the Japonica array
| Tag SNPs (including X chromosome) | 638 269 | 96.8% |
| Pharmacogenomics markers | 2028 | 0.31% |
| Y chromosome | 275 | 0.04% |
| Mitochondria | 70 | 0.01% |
| NHGRI GWAS catalog | 10 798 | 1.64% |
| HLA | 3906 | 0.59% |
| Untaggable functional SNPs | 3990 | 0.61% |
| Total | 659 253 | — |
Abbreviations: GWAS, genome-wide association studies; SNP, single nucleotide polymorphism.
Some SNPs are overlapped among categories.
Comparison of the Japonica array with the existing SNP arrays
| Japonica array | 659 253 | 657 152 (99.7%) | 72.4% |
| HumanOmni2.5S | 2 391 739 | 1 422 455 (59.5%) | 71.4% |
| HumanOmniExpressExome | 930 717 | 638 494 (68.6%) | 61.2% |
| Axiom Genome-wide ASI1 | 627 781 | 527 859 (88.9%) | 60.0% |
Abbreviation: SNP, single nucleotide polymorphism.
Figure 2Improvement in imputation accuracy with the Japonica array. Comparison of the imputation accuracy of different SNP arrays using the 1KJPN panel (a and b) and the imputation accuracy of the Japonica array using different reference panels (c and d). The imputation was conducted to the 131 individuals (ToMMo131, independent from the 1070 individuals in the 1KJPN panel) using the 1KJPN panel. The average r values are plotted against the MAF (a and c). The fraction of SNPs in which the genotype was imputed with a given r threshold (x-axis) over the total SNPs in the reference panel (genomic coverage) is plotted (b and d) with solid and dashed lines for common and low-frequency SNPs, respectively. The r value is the squared correlation coefficient between the imputed genotype and the genotype obtained by whole-genome sequencing. MAF, minor allele frequency; SNP, single nucleotide polymorphism.
The number of imputed genotype
| Japonica array | 1 214 767 | 2 077 383 | 4 944 610 |
| HumanOmni2.5S | 1 051 158 | 1 969 616 | 4 946 935 |
| HumanOmniExpressExome | 1 104 194 | 1 854 752 | 4 876 863 |
| Axiom Genome-wide ASI1 | 1 092 543 | 1 836 323 | 4 787 601 |
Abbreviation: SNP, single nucleotide polymorphism.
The number of SNPs with r>0.8