| Literature DB >> 32766443 |
Sungwon Jeon1,2, Youngjune Bhak1,2,3, Yeonsong Choi1,2, Yeonsu Jeon1,2, Seunghoon Kim1,2, Jaeyoung Jang1, Jinho Jang1,2, Asta Blazyte1, Changjae Kim1,3, Yeonkyung Kim1, Jungae Shim1, Nayeong Kim1, Yeo Jin Kim1, Seung Gu Park1, Jungeun Kim4, Yun Sung Cho3, Yeshin Park3, Hak-Min Kim1,2,3, Byoung-Chul Kim3, Neung-Hwa Park5,6, Eun-Seok Shin7, Byung Chul Kim3, Dan Bolser3, Andrea Manica8, Jeremy S Edwards9, George Church10, Semin Lee1,2, Jong Bhak1,2,3,4.
Abstract
We present the initial phase of the Korean Genome Project (Korea1K), including 1094 whole genomes (sequenced at an average depth of 31×), along with data of 79 quantitative clinical traits. We identified 39 million single-nucleotide variants and indels of which half were singleton or doubleton and detected Korean-specific patterns based on several types of genomic variations. A genome-wide association study illustrated the power of whole-genome sequences for analyzing clinical traits, identifying nine more significant candidate alleles than previously reported from the same linkage disequilibrium blocks. Also, Korea1K, as a reference, showed better imputation accuracy for Koreans than the 1KGP panel. As proof of utility, germline variants in cancer samples could be filtered out more effectively when the Korea1K variome was used as a panel of normals compared to non-Korean variome sets. Overall, this study shows that Korea1K can be a useful genotypic and phenotypic resource for clinical and ethnogenetic studies.Entities:
Mesh:
Year: 2020 PMID: 32766443 PMCID: PMC7385432 DOI: 10.1126/sciadv.aaz7835
Source DB: PubMed Journal: Sci Adv ISSN: 2375-2548 Impact factor: 14.136
Fig. 1Variants statistics and discovery rate of the novel variants.
(A) Number of variants in the Korea1K dataset in all autosomal regions categorized on the basis of allele frequencies (AFs). Singleton, allele count = 1; doubleton, allele count = 2; rare, allele count of >2 and allele frequency of ≤0.01; common, allele frequency of >0.01 and allele frequency of ≤0.05; and very common, allele frequency of >0.05. (B) The number of novel variants as a function of unrelated Korean genome samples.
Fig. 2Comparison with other populations.
Results of PCA of Korea1K and the 1KGP set of (A) worldwide populations and (B) East Asian samples. (C) The number of TE insertions with significantly different allele frequencies between the Korea1K set and the population. (D) The proportion of differential TE insertions. Colors indicate TE subtypes. Abbreviation for populations is same population code as 1KGP (ACB, African Caribbean; ASW, African Ancestry in Southwest USA; BEB, Bengali; CDX, Dai Chinese; CEU, Utah residents with Northern and Western European ancestry; CHB, Han Chinese; CHS, Southern Han Chinese; CLM, Colombian; ESN, Esan; FIN, Finnish; GBR, British; GIH, Gujarati; GWD, Gambian Mandinka; IBS, Iberian; ITU, Telugu; JPT, Japanese; KHV, Kinh Vietnamese; LWK, Luhya; MSL, Mende; MXL, Mexican Ancestry; PEL, Peruvian; PJL, Punjabi; PUR, Puerto Rican; STU, Tamil; TSI, Toscani; and YRI, Yoruba).
Fig. 3Manhattan plot of the reported loci via a GWAS.
Each color indicates a different clinical trait. The most significant reported markers in the loci are denoted with triangles. The dashed line indicates the threshold for genome-wide significance (7.5 × 10−9). The dotted line indicates the threshold for study-wide significance (9.5 × 10−11).
List of traits with index variants located in previously reported loci.
Highlighted rows indicate unreported variants with higher significance values, located in the same linkage disequilibrium block with reported variants.
| Carbohydrate | chr19 | 5,844,781 | rs28362459 | 1.83 × 10−42 | 0.341 | |
| Total bilirubin | chr2 | 233,762,816 | rs28946889 | 1.85 × 10−23 | 0.439 | |
| Lactate | chr12 | 7,437,350 | rs200382222 | 1.40 × 10−21 | 0.186 | |
| Lipoprotein A | chr6 | 160,596,331 | rs73596816 | 1.31 × 10−19 | 0.038 | |
| Uric acid | chr11 | 64,593,747 | rs121907892 | 7.94 × 10−15 | 0.013 | |
| Direct bilirubin | chr2 | 233,762,816 | rs28946889 | 6.43 × 10−14 | 0.439 | |
| Lipoprotein A | chr6 | 160,607,693 | rs41269888 | 4.30 × 10−13 | 0.454 | |
| Amylase | chr1 | 103,348,267 | rs878863022 | N/A | 1.01 × 10−12 | 0.476 |
| Carcinoembryonic | chr9 | 133,257,129 | rs2073823 | 2.53 × 10−11 | 0.228 | |
| Total bilirubin | chr2 | 233,708,761 | rs7583278 | 2.89 × 10−11 | 0.100 | |
| Neutral fat | chr11 | 116,792,991 | rs662799 | 4.22 × 10−10 | 0.315 | |
| Lipoprotein A | chr6 | 160,703,093 | rs35289817 | 3.45 × 10−09 | 0.203 |
Fig. 4Imputation performance evaluation.
The x axis indicates alternative (Alt) allele frequency in the Korea1K set. The y axis represents the aggregated R2 values of SNVs. We used SNVs that were overlapped by imputed results across all panels.
Fig. 5Performance of the variant classification using different panels of normals.
(A) Accuracy (ACC) of classification. (B) Matthews correlation coefficient (MCC) values. (C) Germline recovery rate. The x axis indicates the used reference panel and allele frequency cutoff concatenated by the underscore symbol. EAS, SAS, AMR, EUR, and AFR indicate East Asian, South Asian, American, European, and African populations in 1KGP, respectively.