| Literature DB >> 25350283 |
Wenqian Zhang, Joe Meehan, Zhenqiang Su, Hui Wen Ng, Mao Shu, Heng Luo, Weigong Ge, Roger Perkins, Weida Tong, Huixiao Hong.
Abstract
BACKGROUND: Due to a significant decline in the costs associated with next-generation sequencing, it has become possible to decipher the genetic architecture of a population by sequencing a large number of individuals to a deep coverage. The Korean Personal Genomes Project (KPGP) recently sequenced 35 Korean genomes at high coverage using the Illumina Hiseq platform and made the deep sequencing data publicly available, providing the scientific community opportunities to decipher the genetic architecture of the Korean population.Entities:
Mesh:
Year: 2014 PMID: 25350283 PMCID: PMC4251052 DOI: 10.1186/1471-2105-15-S11-S6
Source DB: PubMed Journal: BMC Bioinformatics ISSN: 1471-2105 Impact factor: 3.169
Figure 1Study design and workflow of this study. The whole genome sequencing data of 35 individuals were used for mapping to the human genome and SNV calling by two pipelines. The overlapped SNVs from the two pipelines were used to represent Korean population and then compared with two references to search for Korean only SNVs and shared SNVs with other populations. Then shared SNVs were annotated. For the Korean only SNVs, two subgroups (SNV-1 and SNV-35) were derived in accordance with the occurrences in the Korean population. The two subgroups of SNVs were then annotated. The non-synonymous SNVs were determined and the corresponding genes were used for enrichment and association analyses.
Figure 2Frequency distribution of Korean SNVs and Korean only SNVs. Frequency distributions in terms of occurrences in the population were calculated and plotted for Korean SNVs (a) and Korean only SNVs (b). The x-axis indicates the occurrences and the y-axis gives the corresponding SNV frequency in %.
Figure 3Mutation spectrum of Korean SNVs. The base substitution types of SNVs in the: (a) Korean SNVs, (b) Korean only SNVs (SNV-1) and (c) Korean only SNVs (SNV-35) are on the y-axis, and SNV number for category is on x-axis. Percentages in the parentheses give proportion among all SNVs. SNVs with multiple alternative alleles were considered to be SNVs with uncertain mutation type.
Figure 4Annotation of Korean SNVs. Annotation using VEP [36], of different group: SNVs-1 (the red bars), SNV-35 (the purple bars), shared SNVs between Korean SNVs and 1KGP or between Korean SNVs and HapMap (the cyan bars), shared SNVs between Korean SNVs and 1KGP (the dark purple bars), shared SNV between Korean SNVs, 1KGP and HapMap (the dark yellow bars). The x-axis gives number of SNVs in log2 transformation and y-axis gives the annotation terms.
Annotations of Korean only SNVs and shared SNVs
| Category | SNV-11 (%) | SNV-352 (%) | Shared with 1KGP & HapMap (%) | Shared with 1KGP (%) | Shared with 1KGP or HapMap (%) | |
|---|---|---|---|---|---|---|
| CDS3 and splicing regions | Synonymous Variant | 3,587 | 29 | 13,978 | 25,988 | 26,030 |
| Missense Variant | 6,620 | 40 | 14,147 | 26,835 | 26,909 | |
| Stop Gained Variant | 173 | 1 | 129 | 343 | 348 | |
| Stop Lost Variant | 20 | 0 | 70 | 105 | 106 | |
| Stop Retained Variant | 5 | 0 | 22 | 45 | 45 | |
| Initiator Codon Variant | 18 | 0 | 62 | 104 | 104 | |
| Incomplete Terminal Codon Variant | 4 | 0 | 8 | 16 | 16 | |
| Coding Sequence Variant | 5 | 0 | 15 | 29 | 29 | |
| Splice Donor Variant | 162 | 0 | 226 | 557 | 562 | |
| Splice Acceptor Variant | 95 | 0 | 165 | 357 | 359 | |
| Splice Region Variant | 1,496 | 18 | 3,983 | 8,298 | 8,312 | |
| Regulatory region and adjacent regions to CDS | 5' Prime UTR4 Variant | 3,358 | 22 | 8,462 | 19,526 | 19,568 |
| 3' Prime UTR Variant | 13,638 | 99 | 37,145 | 76,397 | 76,547 | |
| Regulatory Region Variant | 117,783 | 1,186 | 267,822 | 649,464 | 650,799 | |
| TF Binding Site Variant | 1,645 | 11 | 2,511 | 6,913 | 6,927 | |
| Upstream Gene Variant | 185,499 | 2,010 | 343,491 | 989,954 | 991,984 | |
| Downstream Gene Variant | 192,188 | 1,999 | 370,173 | 1,028,139 | 1,030,243 | |
| non-coding regioins | NMD5 Transcript Variant | 127,138 | 634 | 287,027 | 779,516 | 780,572 |
| Mature miRNA Variant | 48 | 1 | 20 | 146 | 146 | |
| Noncoding Exon Variant | 28,539 | 292 | 62,513 | 143,214 | 143,617 | |
| Noncoding Transcript Variant | 376,646 | 4,590 | 815,372 | 2,191,354 | 2,195,528 | |
| Intron Variant | 602,464 | 5,272 | 1,379,647 | 3,703,303 | 3,708,842 | |
| Intergenic Variant | 462,916 | 5,264 | 1,030,973 | 2,876,064 | 2,879,598 | |
| Total SNVs | 1,213,613 | 12,640 | 2,677,812 | 7,331,322 | 7,342,113 | |
1SNV-1: SNVs with occurrences ≥ 1 in all of 35 samples.
2SNV-35: SNVs with occurrences = 35 in all of 35 samples.
3CDS: coding DNA sequence
4UTR: untranslated region
5NMD: Nonsense Mediated decay
Figure 5Distribution of count of Korean SNVs per gene. The count of SNVs per gene was calculated for SNVs in SNV-1 (a), SNV-35 (b), SNV-1/ns (c) and SNV-35/ns (d). Top one or two ranked genes were labeled. The x-axis indicates SNVs per gene in log10 transformation and the y-axis depicts number of genes.
Top associated disease terms with non-synonymous SNVs in SNV-1 and SNV-35.
| Korean only SNVs | Order | Disease term | Gene count | %of all genes in the category | P value (raw) | P value (adjusted) |
|---|---|---|---|---|---|---|
| SNV-1/ns | 1 | Adhesion | 241 | 37.249 | 2.28E-62 | 3.44E-59 |
| 2 | Disease Susceptibility | 238 | 28.848 | 1.60E-39 | 1.21E-36 | |
| 3 | Genetic Predisposition to Disease | 231 | 28.589 | 1.04E-37 | 5.23E-35 | |
| 4 | Myocardial Infarction | 90 | 37.190 | 4.95E-24 | 1.87E-21 | |
| 5 | Urologic Diseases | 98 | 34.386 | 4.48E-23 | 1.13E-20 | |
| 6 | Subarachnoid Hemorrhage | 54 | 51.923 | 4.30E-23 | 1.13E-20 | |
| 7 | Metabolic Diseases | 162 | 26.471 | 6.55E-23 | 1.41E-20 | |
| 8 | Kidney Diseases | 95 | 34.672 | 1.04E-22 | 1.74E-20 | |
| 9 | Skin and Connective Tissue Diseases | 137 | 28.482 | 1.04E-22 | 1.74E-20 | |
| 10 | Nervous System Diseases | 176 | 25.360 | 1.48E-22 | 2.23E-20 | |
| SNV-35/ns | 1 | Nelson syndrome | 3 | 0.446 | 0.0102 | 0.0102 |
Top associated drug terms with non-synonymous SNVs in SNV-1 and SNV-35.
| Korean only SNVs | Order | Drug term | Gene count | %of all genes in the category | P value (raw) | P value (adjusted) |
|---|---|---|---|---|---|---|
| SNV-1/ns | 1 | adenosine | 133 | 27.883 | 3.47E-21 | 1.24E-18 |
| 2 | adenosine triphosphate | 87 | 29.097 | 1.46E-15 | 2.61E-13 | |
| 3 | immune globulin | 145 | 23.237 | 2.42E-15 | 2.88E-13 | |
| 4 | hydroxyurea | 30 | 46.154 | 9.01E-12 | 8.04E-10 | |
| 5 | glutathione | 84 | 24.633 | 7.75E-11 | 4.61E-09 | |
| 6 | phosphoric acid | 50 | 31.447 | 6.76E-11 | 4.61E-09 | |
| 7 | heparin | 55 | 29.255 | 1.72E-10 | 8.77E-09 | |
| 8 | bupropion | 38 | 35.185 | 3.41E-10 | 1.52E-08 | |
| 9 | rosuvastatin | 43 | 32.090 | 6.95E-10 | 2.76E-08 | |
| 10 | calcium chloride | 27 | 40.909 | 2.70E-09 | 9.64E-08 | |
| SNV-35/ns | 1 | mannitol | 2 | 10 | 8.23E-05 | 0.0001 |
| 2 | niflumic acid | 2 | 10 | 8.23E-05 | 0.0001 | |
| 3 | adenosine monophosphate | 2 | 1.961 | 0.0022 | 0.0022 | |