| Literature DB >> 35156024 |
Min Ou1, Henry Chi-Ming Leung1, Amy Wing-Sze Leung1, Ho-Ming Luk2, Bin Yan1, Chi-Man Liu1, Tony Ming-For Tong2, Myth Tsz-Shun Mok2, Wallace Ming-Yuen Ko3, Wai-Chun Law3, Tak-Wah Lam1, Ivan Fai-Man Lo2, Ruibang Luo1.
Abstract
HKG is the first fully accessible variant database for Hong Kong Cantonese, constructed from 205 novel whole-exome sequencing data. There has long been a research gap in the understanding of the genetic architecture of southern Chinese subgroups, including Hong Kong Cantonese. HKG detected 196 325 high-quality variants with 5.93% being novel, and 25 472 variants were found to be unique in HKG compared to three Chinese populations sampled from 1000 Genomes (CHN). PCA illustrates the uniqueness of HKG in CHN, and the admixture study estimated the ancestral composition of HKG and CHN, with a gradient change from north to south, consistent with their geological distribution. ClinVar, CIViC and PharmGKB annotated 599 clinically significant variants and 360 putative loss-of-function variants, substantiating our understanding of population characteristics for future medical development. Among the novel variants, 96.57% were singleton and 6.85% were of high impact. With a good representation of Hong Kong Cantonese, we demonstrated better variant imputation using reference with the addition of HKG data, thus successfully filling the data gap in southern Chinese to facilitate the regional and global development of population genetics.Entities:
Year: 2022 PMID: 35156024 PMCID: PMC8826781 DOI: 10.1093/nargab/lqac005
Source DB: PubMed Journal: NAR Genom Bioinform ISSN: 2631-9268
Figure 1.Variant compositions of HKG samples. (A) Comparison of variants in HKG and gnomAD under different allele counts. (B) Number of variants as a function of the number of individuals. (C) Percentage of novel variants and known variants according to MAF categories. Singleton, AC = 1; doubleton, AC = 2; rare, AC > 2 and MAF ≤ 0.01; common, MAF > 0.01 and MAF ≤ 0.05; and very common, MAF > 0.05. (D) Percentage singletons of different variant types and impacts.
The count of variants in HKG by function impact and MAF
| Number of variants | ||||||||||
|---|---|---|---|---|---|---|---|---|---|---|
| Variant Consequence Annotations | Function impact | Total | SNPs/Insertions/ Deletions | Singletons | Doubletons | MAF ≤ 0.01 | 0.01 < MAF ≤ 0.02 | 0.02 < MAF ≤ 0.05 | 0.05 < MAF ≤ 0.1 | MAF > 0.1 |
| Frameshift variant | HIGH | 1994 | 0/717/1277 | 1213 | 233 | 126 | 101 | 90 | 64 | 167 |
| Stop gained | HIGH | 1374 | 1344/26/4 | 976 | 112 | 85 | 51 | 34 | 27 | 89 |
| Splice donor variant | HIGH | 364 | 293/8/63 | 254 | 35 | 28 | 16 | 10 | 10 | 11 |
| Splice acceptor variant | HIGH | 247 | 213/9/25 | 168 | 24 | 15 | 9 | 9 | 5 | 17 |
| Start lost | HIGH | 160 | 148/0/12 | 112 | 7 | 9 | 5 | 5 | 4 | 18 |
| Stop lost | HIGH | 69 | 66/0/3 | 36 | 6 | 3 | 4 | 3 | 0 | 17 |
| Missense variant | MODERATE | 83362 | 83362/0/0 | 47036 | 7956 | 5367 | 3751 | 3807 | 2725 | 12720 |
| Inframe deletion | MODERATE | 2092 | 0/0/2092 | 899 | 289 | 225 | 172 | 172 | 97 | 238 |
| Inframe insertion | MODERATE | 1006 | 0/1006/0 | 449 | 109 | 80 | 67 | 74 | 41 | 186 |
| Protein altering variant | MODERATE | 10 | 0/10/0 | 7 | 3 | 0 | 0 | 0 | 0 | 0 |
| Synonymous variant | LOW | 61496 | 61496/0/0 | 27876 | 5346 | 4038 | 3142 | 3518 | 2825 | 14751 |
| Splice region variant | LOW | 2348 | 2011/148/189 | 1025 | 201 | 154 | 141 | 160 | 119 | 548 |
| Stop retained variant | LOW | 52 | 46/0/6 | 28 | 0 | 4 | 6 | 3 | 1 | 10 |
| Incomplete terminal codon variant | LOW | 2 | 2/0/0 | 0 | 1 | 0 | 0 | 0 | 0 | 1 |
| Intron variant | MODIFIER | 17295 | 15246/931/1118 | 7486 | 1616 | 1130 | 974 | 1091 | 934 | 4064 |
| Downstream gene variant | MODIFIER | 8359 | 7719/249/391 | 4194 | 769 | 540 | 412 | 498 | 370 | 1576 |
| Upstream gene variant | MODIFIER | 5749 | 5275/171/303 | 2792 | 551 | 385 | 313 | 255 | 249 | 1204 |
| Non-coding transcript exon variant | MODIFIER | 5534 | 5174/149/211 | 2176 | 499 | 343 | 334 | 371 | 330 | 1481 |
| 3 prime UTR variant | MODIFIER | 2739 | 2445/100/194 | 1243 | 262 | 193 | 132 | 157 | 134 | 618 |
| 5 prime UTR variant | MODIFIER | 1475 | 1283/101/91 | 698 | 118 | 113 | 79 | 68 | 62 | 337 |
| Intergenic variant | MODIFIER | 433 | 223/74/136 | 108 | 33 | 37 | 29 | 53 | 42 | 131 |
| Regulatory region variant | MODIFIER | 85 | 59/6/20 | 23 | 5 | 4 | 9 | 10 | 2 | 32 |
| Mature miRNA Variant | MODIFIER | 60 | 52/1/7 | 30 | 7 | 1 | 4 | 2 | 1 | 15 |
| TF binding site variant | MODIFIER | 13 | 5/3/5 | 2 | 1 | 0 | 3 | 1 | 1 | 5 |
| Coding sequence variant | MODIFIER | 7 | 4/0/3 | 1 | 1 | 0 | 0 | 0 | 1 | 4 |
|
| 196325 | 186466/3709/6150 | 98832 | 18184 | 12880 | 9754 | 10391 | 8044 | 38240 | |
Figure 2.Comparison of variants among HKG and other populations. (A) Venn diagram of variants in HKG and three Chinese populations of 1KGP CHS, CHB and CDX. CHB: Han Chinese in Beijing, China; CHS: southern Han Chinese; CDX: Chinese Dai in Xishuangbanna, China. (B) PCA of HKG and CHS, CHB and CDX. (C) ADMIXTURE analysis of HKG samples with East Asian and South Asian samples in 1KGP (K ranges from 2 to 7). Number of ancestries K = 5 best fits the model. Different colors represent different ancestry components. JPT: Japanese in Tokyo, Japan; KHV: Kinh in Ho Chi Minh City, Vietnam; GIH: Gujarati Indian from Houston, Texas; PJL: Punjabi from Lahore, Pakistan; BEB: Bengali from Bangladesh; STU: Sri Lankan Tamil from the UK; ITU: Indian Telugu from the UK.
Figure 3.Analyses of the novel HKG variants. (A) Proportion of known and novel variants according to consequences. (B) the pathogenicity score in novel and all variants of HKG. (C) Venn diagram of novel variants, rare LoF and high impact variants. (D) Significantly enriched GO terms and KEGG pathways in 731 genes responsible for the novel high-impact variants of HKG.
Figure 4.Validation of HKG variants by imputation and correlation analysis. (A) Imputation testing using the two reference panels: 1KGP and 1KGP + HKG. The average Info scores ± standard deviation error was based on 22 chromosomes. ** indicates the difference meets a significant level with P < 0.01 of student's T test. (B) Correlation analysis using AFs of variants in HKG and NARD_HK. (C) Correlation analysis using AFs of variants in HKG and Yu et al. reported actionable pharmacogenetic variants.