| Literature DB >> 31640730 |
Seong-Keun Yoo1,2, Chang-Uk Kim2,3, Hie Lim Kim4,5, Sungjae Kim2,3, Jong-Yeon Shin2, Namcheol Kim2, Joshua Sung Woo Yang2, Kwok-Wai Lo6, Belong Cho7, Fumihiko Matsuda8, Stephan C Schuster5,9, Changhoon Kim2, Jong-Il Kim3,10, Jeong-Sun Seo11,12,13,14,15.
Abstract
Here, we present the Northeast Asian Reference Database (NARD), including whole-genome sequencing data of 1779 individuals from Korea, Mongolia, Japan, China, and Hong Kong. NARD provides the genetic diversity of Korean (n = 850) and Mongolian (n = 384) ancestries that were not present in the 1000 Genomes Project Phase 3 (1KGP3). We combined and re-phased the genotypes from NARD and 1KGP3 to construct a union set of haplotypes. This approach established a robust imputation reference panel for Northeast Asians, which yields the greatest imputation accuracy of rare and low-frequency variants compared with the existing panels. NARD imputation panel is available at https://nard.macrogen.com/ .Entities:
Keywords: East Asians; Genotype imputation; Northeast Asians; Reference panel; Whole-genome sequencing
Mesh:
Year: 2019 PMID: 31640730 PMCID: PMC6805399 DOI: 10.1186/s13073-019-0677-z
Source DB: PubMed Journal: Genome Med ISSN: 1756-994X Impact factor: 11.117
Total number of variants in 1779 individuals by MAF and functional category
| Type | Frequencya | Number of variants | Functional variation | ||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|
| Protein coding region | Non-coding region | ||||||||||
| Silent/nonframeshift | Missense/frameshift | Stoploss/Stopgain | Unknown | Intronic | Intergenic | Splicing | UTR | ncRNA | |||
| SNP | Singleton | 17,811,366 | 86,804 | 146,480 | 3722 | 2690 | 6,842,300 | 9,370,754 | 2110 | 247,422 | 1,109,084 |
| Rare | 13,673,626 | 54,642 | 87,791 | 1658 | 1917 | 5,270,353 | 7,248,270 | 1363 | 164,492 | 843,140 | |
| Low | 3,430,315 | 12,753 | 15,710 | 232 | 428 | 1,299,727 | 1,851,373 | 245 | 38,673 | 211,174 | |
| Common | 5,727,339 | 17,886 | 15,981 | 151 | 729 | 2,049,372 | 3,228,994 | 159 | 53,221 | 360,846 | |
| Total | 40,642,646 | 172,085 | 265,962 | 5763 | 5764 | 15,461,752 | 21,699,391 | 3877 | 503,808 | 2,524,244 | |
| Indel | Singleton | 1,402,707 | 3191 | 5068 | 157 | 129 | 558,772 | 717,182 | 517 | 27,748 | 89,943 |
| Rare | 1,376,996 | 2733 | 2884 | 127 | 127 | 544,183 | 717,045 | 217 | 22,047 | 87,633 | |
| Low | 452,337 | 634 | 827 | 37 | 37 | 173,946 | 241,506 | 61 | 6444 | 28,845 | |
| Common | 569,436 | 422 | 369 | 18 | 89 | 207,132 | 317,135 | 145 | 7157 | 36,969 | |
| Total | 3,801,476 | 6980 | 9148 | 339 | 382 | 1,484,033 | 1,992,868 | 940 | 63,396 | 243,390 | |
aRare, MAF < 0.5%; low, 0.5% ≤ MAF < 5%; common, MAF ≥ 5%
Fig. 1Ancestry composition of 1779 individuals in the NARD. a PCA of global populations from the NARD and 1KGP3. AFR, AMR, EAS, EUR, and SAS denote Africans, Americans, East Asians, Europeans, and South Asians, respectively. b PCA of Northeast and Southeast Asians from the NARD and 1KGP3. Japanese in Tokyo from the 1KGP3 were combined into JPN. CHN from the NARD were categorized into CHB and CHS. c Population substructure of Northeast and Southeast Asians with five ancestral components inferred by ADMIXTURE algorithm
Fig. 2Imputation performance evaluation. a Imputation accuracy assessment using the five different reference panels. The pseudo-GWAS panel of 97 KOR was used for the imputation. The x-axis represents MAF of 850 KOR individuals from the NARD. The y-axis represents the aggregated R2 values of SNPs, which were calculated by the true genotypes and the imputed dosages. Only SNPs that were imputed across all panels were used for the aggregation of R2 values. b Number of imputed SNPs as a function of the estimated imputation accuracy and the types of imputation panel. This result was generated based on the R2 values that were estimated by Minimac3
Fig. 3Variant interpretation using the NARD. a MAF differences of SNPs shared between the NARD and gnomAD. The y-axis denotes the MAF of SNPs in worldwide populations (ALL) or EAS from the gnomAD. Color represents the MAF of SNPs in 1779 Northeast Asians from the NARD. b Number of uncommon (MAF < 5%) protein-altering variants (missense, nonsense, frameshift, and splicing variants) after filtration using the gnomAD with/without NARD. Variant catalogue from the gnomAD (exome) was applied. ***P < 0.0001 by two-tailed Mann-Whitney U test (compared with gnomAD-EAS + NARD)