| Literature DB >> 28655895 |
Sangmoon Lee1, Jihae Seo2, Jinman Park3,4, Jae-Yong Nam5,6, Ahyoung Choi2,7, Jason S Ignatius8, Robert D Bjornson9, Jong-Hee Chae10, In-Jin Jang11, Sanghyuk Lee2,7, Woong-Yang Park5,6,12, Daehyun Baek13,14,15, Murim Choi16.
Abstract
Despite efforts to interrogate human genome variation through large-scale databases, systematic preference toward populations of Caucasian descendants has resulted in unintended reduction of power in studying non-Caucasians. Here we report a compilation of coding variants from 1,055 healthy Korean individuals (KOVA; Korean Variant Archive). The samples were sequenced to a mean depth of 75x, yielding 101 singleton variants per individual. Population genetics analysis demonstrates that the Korean population is a distinct ethnic group comparable to other discrete ethnic groups in Africa and Europe, providing a rationale for such independent genomic datasets. Indeed, KOVA conferred 22.8% increased variant filtering power in addition to Exome Aggregation Consortium (ExAC) when used on Korean exomes. Functional assessment of nonsynonymous variant supported the presence of purifying selection in Koreans. Analysis of copy number variants detected 5.2 deletions and 10.3 amplifications per individual with an increased fraction of novel variants among smaller and rarer copy number variable segments. We also report a list of germline variants that are associated with increased tumor susceptibility. This catalog can function as a critical addition to the pre-existing variant databases in pursuing genetic studies of Korean individuals.Entities:
Mesh:
Year: 2017 PMID: 28655895 PMCID: PMC5487339 DOI: 10.1038/s41598-017-04642-4
Source DB: PubMed Journal: Sci Rep ISSN: 2045-2322 Impact factor: 4.379
Figure 1Population profile of KOVA. (a) Distribution of variant minor allele frequencies (MAFs). (b) Variant increment patterns as the number of the participants increases. (c) Venn diagram of coding variant comparisons among KOVA, Japanese population, and UK10K[6, 8]. Numbers and proportion of novel variants (i.e. not in dbSNP build 142) in each area are shown in the parentheses. (d) Principal component analysis of KOVA and East Asian populations from 1000 Genomes Project (left panel) and corresponding geographical locations (right panel). The map image was modified from Openclipart with permission. (e and f) Gene-level F between KOVA and (e) East Asian, European, and African populations, and (f) Chinese, Japanese, and Southeast Asian populations. Each dot indicates a gene (see methods) and percentage values beneath population names denote proportion of dots that fell in each sector. (g) Network plot depicting pairwise fixation index (F ) of multiple population groups including KOVA, which is represented as a red node. Thicker line indicates smaller F , indicating closer relationship. Positions of the nodes are arbitrarily arranged to roughly reflect the geographical location. Each subpopulation of (h) EAS including KOVA, (i) EUR, and (j) AFR was drawn separately. 1000GP; 1000 Genomes Project, AFR; African excluding Americans of African Ancestry in southwestern USA and African Caribbeans in Barbados, CDX; Chinese Dai in Xishuangbanna, China, CEU; Utah Residents (CEPH) with Northern and Western Ancestry, CHB; Han Chinese in Bejing, China, CHS; Southern Han Chinese, EAS; East Asian, ESN; Esan in Nigeria, EUR; European, FIN; Finnish in Finland, GBR; British in England and Scotland, GWD; Gambian in Western Divisions in the Gambia, IBS; Iberian Population in Spain, JPT; Japanese in Tokyo, KHV; Kinh in Ho Chi Minh City, Vietnam, LWK; Luhya in Webuye, Kenya, MSL; Mende in Sierra Leone, SEAsia; CDX and KHV, TSI; Toscani in Italia, YRI; Yoruba in Ibadan, Nigeria.
Summary of Exonic Variants in KOVA.
| Types | Total | Known (dbSNP147) | Novel |
|---|---|---|---|
| Nonsynonymous SNV | 33,868 | 28,310 | 5,558 |
| Synonymous SNV | 27,481 | 24,821 | 2,660 |
| Frameshift deletion | 734 | 409 | 325 |
| Frameshift insertion | 298 | 175 | 123 |
| Inframe deletion | 556 | 435 | 121 |
| Inframe insertion | 122 | 96 | 26 |
| Stop gain | 552 | 369 | 183 |
| Stop loss | 44 | 33 | 11 |
| Unknown | 773 | 655 | 118 |
| Total Coding | 64,428 | 55,303 | 9,125 |
Figure 2Functional analysis of KOVA coding variants. (a) Numbers of novel and known variants categorized by function. The overlaid plot shows size distribution of indels, with the blue bar indicating multiples of three bases. (b) Nonsynonymous to synonymous SNV (NS/S) ratio by variant allele frequencies. (c) SIFT score and (d) Scaled C-score (CADD) by allele frequencies. (e) Degree of amino acid conservation of variant residues by allele frequencies. Fraction of species numbers with different amino acid on orthologous proteins compared to human orthologs. (f) Relative position of loss-of-function (LoF) variants on protein. Solid, dotted, and dash-dot lines in c-e indicate median, upper, and lower quantiles, respectively.
Figure 3Copy number variations in KOVA. (a) Distribution of KOVA CNV sizes. (b) Frequency of CNVs by number of events in KOVA. (c) Highly polymorphic copy number genes in KOVA. Genes are sorted by frequency. (d) Copy number genotype profiles of SIGLEC14 and SIGLEC5. (e) Frequency of SIGLEC14 deletion allele in worldwide populations from DGV. AFR: African, AMR: Mexican, native American, North American, and South American, ASN: Asian, EUR: European.
Figure 4Cancer susceptibility variant distributions in KOVA. Potentially deleterious SNV MAFs extracted from (a) lung adenocarcinoma and (b) stomach adenocarcinoma tumor-paired normal sets or other public databases were plotted. LUAD: lung adenocarcinoma; STAD: stomach adenocarcinoma.