| Literature DB >> 25765185 |
Gaurav Thareja1, Sumi Elsa John2, Prashantha Hebbar3, Kazem Behbehani4, Thangavel Alphonse Thanaraj5, Osama Alsmadi6.
Abstract
BACKGROUND: The 1000 Genome project paved the way for sequencing diverse human populations. New genome projects are being established to sequence underrepresented populations helping in understanding human genetic diversity. The Kuwait Genome Project an initiative to sequence individual genomes from the three subgroups of Kuwaiti population namely, Saudi Arabian tribe; "tent-dwelling" Bedouin; and Persian, attributing their ancestry to different regions in Arabian Peninsula and to modern-day Iran (West Asia). These subgroups were in line with settlement history and are confirmed by genetic studies. In this work, we report whole genome sequence of a Kuwaiti native from Persian subgroup at >37X coverage.Entities:
Mesh:
Year: 2015 PMID: 25765185 PMCID: PMC4336699 DOI: 10.1186/s12864-015-1233-x
Source DB: PubMed Journal: BMC Genomics ISSN: 1471-2164 Impact factor: 3.969
Figure 1Size distribution for indels in coding and non-coding regions.
Classification of the identified coding SNPs based on their location in exonic regions and their effects on protein sequences
|
|
|
|
|
|
|---|---|---|---|---|
| Init Codon | 16 | 0 | 3 | 1 |
| Non SNV | 9197 | 193 | ||
| Splicing | 68 | 2 | 44 | |
| Stopgain | 63 | 4 | 2 | 1 |
| Stoploss | 10 | 0 | ||
| Synonymous | 10405 | 108 | ||
| Unknown | 440 | 1 | 70 | 1 |
| Del | - | - | 84 | 3 |
| FrameShift Del | - | - | 42 | 8 |
| FrameShift Ins | - | - | 32 | 2 |
| Ins | - | - | 70 | 5 |
$Legends to the class types.
Splicing, Variant affects a nucleotide that is in a splicing region of a coding transcript.
Init Codon, Variant changes the start codon.
Frameshift Ins, An insertion that causes a shift in the codon reading frame.
Frameshift Del, A deletion that causes a shift in the codon reading frame.
Frameshift Sub, A substitution that causes a shift in the codon reading frame.
Stopgain, Variant causes a stop codon to be created at the variant site.
Stoploss, Variant changes a stop codon to something else.
Ins, An insertion that does not cause a frameshift.
Del, A deletion that does not cause a frameshift.
Sub, A substitution that does not cause a frameshift.
Nonsyn SNV, A single nucleotide variant that changes the amino acid produced by a codon.
Synonymous, A variant affecting 1 or more nucleotides that does not change the amino acid sequence.
Unknown, A problem was found with the protein coding sequence, See Invalid Transcripts.
SNP & Variation Suite v8.1 (SVS) [Bozeman, MT: Golden Helix, Inc] was used for classifications.
Genotype-phenotype associations in the case of 28 SNPs from the set of ‘known’ deleterious SNPs
|
|
|
|
|
|
|---|---|---|---|---|
| rs1801133 [1:g.11856378G > A] [Ala263Val] | MTHFR [607093] | het | Homocysteine levels | 23824729 |
| rs676210 [2:g.21231524G > A] [Pro2739Leu] | APOB [107730] | hom | triglyceride (TG) response to fenofibrate treatment for hypertriglyceridemia; LDL (oxidized), Lipid metabolism phenotypes | 23247145 |
| rs6756629 [2:g.44065090G > A] [Arg50Cys] | ABCG5 [605459] | het | Cholesterol, total, LDL cholesterol | 19060911 |
| rs16891982 [2:g.21231524G > A] [Pro2739Leu] | SLC45A2 [606202] | het | Skin pigmentation, Hair color, Eye color | 17999355 |
| rs2043112 [5:g.38955796G > A] [Ser837Phe] | RICTOR [609022] | het | Obesity-related traits | 23251661 |
| rs30187 [5:g.96124330 T > C] [Lys528Arg] | ERAP1 [606832] | het | Ankylosing spondylitis | 21743469 |
| rs33980500 [6:g.111913262C > T] [Asp10Asn] | TRAF3IP2 [607043] | het | Psoriatic arthritis, Psoriasis | 20953186 |
| rs7076156 [10:g.64415184A > G] [Thr62Ala] | ZNF365 [607818] | het | Crohn’s disease | 22412388 |
| rs5006884 [11:g.5373251C > T] [Leu172Phe] | OR51B6 (Paralog of OR51E1 MIM:* [611267] | het | Fetal hemoglobin levels | 20018918 |
| rs11042023 [11:g.8662516 T > C] [His322Arg] | TRIM66 [612000] | het | Obesity | 23563607 |
| rs2306029 [11:g.46893108 T > C] [Ser1554Gly] | LRP4 [604270] | het | D-dimer levels | 21502573 |
| rs11230563 [11:g.60776209C > T] [Arg225Trp] | CD6 [186720] | het | Inflammatory bowel disease | 23128233 |
| rs6591182 [11:g.65349756 T > G] [Val538Gly] | EHBP1L1 | het | Non-alcoholic fatty liver disease histology (lobular) | 20708005 |
| rs1042602 [11:g.88911696C > A] [Ser192Tyr] | TYR [606933] | het | Skin pigmentation, Freckles | 17999355 |
| rs1126809 [11:g.89017961G > A] [Arg402Gln] | TYR [606933] | het | Tanning,Sunburns | 23548203 |
| rs3213764 [12:g.14587301A > G] [Lys530Arg] | ATF7IP [613644] | het | Prostate-specific antigen levels | 23359319 |
| rs4149056 [12:g.21331549 T > C] [Val174Ala] | SLCO1B1 [604843] | het | Sex hormone-binding globulin levels, Bilirubin levels, Response to statin therapy | 22829776 |
| rs883079 [12:g.114793240C > T] [3' UTR variant] | TBX5 [601620] | het | Ventricular conduction | 21076409 |
| rs17730281 [15:g.53907948G > A] [Leu829Phe] | WDR72 [613214] | het | Renal function-related traits (BUN) | 22797727 |
| rs12968116 [18:g.55322502C > T] [Arg952Gln] | ATP8B1 [602397] | het | Liver enzyme levels (gamma-glutamyl transferase) | 22001757 |
| rs2304256 [19:g.10475652C > A] [Val362Phe] | TYK2 [176941] | hom | Type 1 diabetes, Type 1 diabetes autoantibodies | 21829393 |
| rs2108622 [19:g.15990431C > T] [Val433Met] | CYP4F2 [604426] | het | Acenocoumarol maintenance dosage, Vitamin E levels, Metabolite levels, Warfarin maintenance dose, Response to Vitamin E supplementation | 19578179 |
| rs8100241 [19:g.17392894G > A] [Ala20Thr] | ANKLE1 | het | Breast cancer | 22976474 |
| rs2363956 [19:g.17394124 T > G] [Leu173Trp] | ANKLE1 | het | Ovarian cancer | 20852633 |
| rs1434579 [19:g.44932972C > T] [Gly662Arg] | ZNF229 [Paralog of ZNF224 MIM:* [194555] | het | Tuberculosis | 20694014 |
| rs2303759 [19:g.49869051 T > G] [Met34Arg] | DKKL1 [605418] | het | Multiple sclerosis | 21833088 |
| rs1799990 [20:g.4680251A > G] [Met129Val] | PRNP [176640] | hom | Long-term memory | 19081515 |
| rs738409 [22:g.44324727C > G] [Ile148Met] | PNPLA3 [609567] | het | Liver enzyme levels (alanine transaminase),Nonalcoholic fatty liver disease | 22001757 |
Only in the case of these 28 (of the identified 2123 ‘known’ deleterious SNPs), the genotype-phenotype associations are known in NHGRI GWAS Catalog.
Figure 2Intergenome distances between the KWP1 genome and individuals from continental populations. (A) Nearest-neighbor tree based on variant positions shared between the KWP1 samples and individuals from intercontinental populations. (B) Nearest-neighbor tree based on variant positions associated with OMIM disease genes and are shared between the KWP1 samples and individuals from intercontinental populations.
Classification of identified structural variations
|
|
|
|
|
|---|---|---|---|
| Deletions | 7645 | 7190 (94.05%) | 2969 (38.84%) |
| Duplications | 1697 | 1575 (92.81%) | 212 (12.49%) |
| Insertions | 585 | 514 (87.86%) | 362 (61.88%) |
| Inversions | 135 | 104 (77.04%) | 26 (19.26%) |
| Translocations | 1076 | 900 (83.64%) | 710 (65.98%) |
A detected structural variation is defined to be ‘known’ if at least 50% of the detected variation (e.g. deletion) overlaps with a known variation.
Figure 3Illustration of discordance in SNP calls between the deep sequencing experiment and genome-wide genotyping using bead chip arrays. rs6552934 is considered as an example. (A) Sequencing data calls GG genotype. (B) Bead chip data calls AG genotype.
Figure 4Impact of novel SNPs and indels in the vicinity of SNPs typed on bead chip on genotype calling. (A) Considered is an exemplary SNP of rs3899654, which has a novel heterozygous deletion of 2 bps upstream of the variant in KGP1 genome, that has CT call. (B) The typed marker in bead chip leads to inconsistent genotype call of CC.
Figure 5Count of novel SNPs and indels in KWP1 genome around typed common markers in bead chips.
Figure 6Summary of analysis of genomes from Kuwait subgroup of Persian ancestry. Tracks (from outer to inner): Karyotype of Human Genome; Density (in every window of 1 Mb) of ‘known’ SNPs (i.e. annotated in dbSNP 138); Density of ‘novel’ SNPs (i.e. not annotated in dbSNP138); Density of ‘known’ indels; Density of ‘novel’ indels; Density of Long Deletions; Density of Long Insertions; Density of Inversions; Density of Duplications; Links representing intra- and inter-chromosomal translocations. The image was generated using Circos [48].