| Literature DB >> 24896259 |
Osama Alsmadi1, Sumi E John1, Gaurav Thareja1, Prashantha Hebbar1, Dinu Antony1, Kazem Behbehani1, Thangavel Alphonse Thanaraj1.
Abstract
Population of the State of Kuwait is composed of three genetic subgroups of inferred Persian, Saudi Arabian tribe and Bedouin ancestry. The Saudi Arabian tribe subgroup traces its origin to the Najd region of Saudi Arabia. By sequencing two whole genomes and thirteen exomes from this subgroup at high coverage (>40X), we identify 4,950,724 Single Nucleotide Polymorphisms (SNPs), 515,802 indels and 39,762 structural variations. Of the identified variants, 10,098 (8.3%) exomic SNPs, 139,923 (2.9%) non-exomic SNPs, 5,256 (54.3%) exomic indels, and 374,959 (74.08%) non-exomic indels are 'novel'. Up to 8,070 (79.9%) of the reported novel biallelic exomic SNPs are seen in low frequency (minor allele frequency <5%). We observe 5,462 known and 1,004 novel potentially deleterious nonsynonymous SNPs. Allele frequencies of common SNPs from the 15 exomes is significantly correlated with those from genotype data of a larger cohort of 48 individuals (Pearson correlation coefficient, 0.91; p <2.2×10-16). A set of 2,485 SNPs show significantly different allele frequencies when compared to populations from other continents. Two notable variants having risk alleles in high frequencies in this subgroup are: a nonsynonymous deleterious SNP (rs2108622 [19:g.15990431C>T] from CYP4F2 gene [MIM:*604426]) associated with warfarin dosage levels [MIM:#122700] required to elicit normal anticoagulant response; and a 3' UTR SNP (rs6151429 [22:g.51063477T>C]) from ARSA gene [MIM:*607574]) associated with Metachromatic Leukodystrophy [MIM:#250100]. Hemoglobin Riyadh variant (identified for the first time in a Saudi Arabian woman) is observed in the exome data. The mitochondrial haplogroup profiles of the 15 individuals are consistent with the haplogroup diversity seen in Saudi Arabian natives, who are believed to have received substantial gene flow from Africa and eastern provenance. We present the first genome resource imperative for designing future genetic studies in Saudi Arabian tribe subgroup. The full-length genome sequences and the identified variants are available at ftp://dgr.dasmaninstitute.org and http://dgr.dasmaninstitute.org/DGR/gb.html.Entities:
Mesh:
Substances:
Year: 2014 PMID: 24896259 PMCID: PMC4045902 DOI: 10.1371/journal.pone.0099069
Source DB: PubMed Journal: PLoS One ISSN: 1932-6203 Impact factor: 3.240
Classification of the identified biallelic SNPs based on genome annotation.
| Class | UE_SNP_Known | UE_SNP_Novel | UE_Indel_Known | UE_Indel_Novel | UW_SNP_Known | UW_SNP_Novel | UW_Indel_Known | UW_Indel_Novel |
| Coding | 48377 | 3988 | 328 | 461 | 819 | 30 | 10 | 30 |
| Coding, Splicing | 6 | 3 | 0 | 0 | 2 | 0 | 0 | 0 |
| Downstream | 26 | 2 | 3 | 4 | 27059 | 800 | 1011 | 2623 |
| Downstream, Upstream | 3 | 1 | 1 | 0 | 834 | 27 | 31 | 57 |
| Intergenic | 183 | 17 | 4 | 8 | 2894110 | 87303 | 77233 | 222130 |
| Intronic | 262 | 25 | 16 | 39 | 1733328 | 50399 | 51898 | 143952 |
| NCExonic | 8769 | 796 | 342 | 306 | 3684 | 105 | 75 | 218 |
| NCSplicing | 1 | 0 | 1 | 0 | 44 | 4 | 2 | 4 |
| Splicing | 9 | 0 | 7 | 11 | 69 | 4 | 3 | 23 |
| 3′ UTR | 45838 | 4386 | 3409 | 3735 | 3025 | 110 | 118 | 347 |
| 3′ UTR, 5′ UTR | 3 | 1 | 0 | 0 | 11 | 0 | 1 | 1 |
| 5′ UTR | 7772 | 851 | 306 | 353 | 883 | 40 | 17 | 51 |
| Upstream | 38 | 4 | 2 | 1 | 24626 | 839 | 766 | 2025 |
, Legends to the class types.
Coding - Variant is in the coding exonic region of a protein coding transcript.
Splicing - Variant affects a nucleotide that is in a splicing region of a coding transcript.
Downstream - Variant is within 1000 bp of the transcript stop site on the 3′ side.
Upstream - Variant is within 1000 bp of the transcript start site on the 5′ side.
Intergenic - Variant does not interact with any gene transcripts.
Intronic - Variant lies within an intron.
NCSplicing - Variant affects a nucleotide that is in a splicing region of a non-coding transcript.
NCExonic - Variant is in an exon for a non-coding transcript.
UTR5 - Variant is in an exon of a coding transcript but is on the 5′ side of the start codon.
UTR3 - Variant is in an exon of a coding transcript but is on the 3′ side of the stop codon.
Figure 1Repeat Composition as seen in deletion variants identified in the two whole genome sequences of Saudi Arabian tribe ancestry.
Two regions with highly biased (as compared to general distribution) repeat compositions are seen in the ranges of 300–400 bp and 6–7 kb in length, as insertion polymorphisms of short interspersed nuclear elements (SINE) and long interspersed nuclear elements (LINE), respectively.
Figure 2Comparison of allele frequencies of the deleterious nonsynonymous SNPs identified in the exome data set with those in the data set of genotypes from 48 samples.
Figure 3Intergenome distances between the KWS genomes and individuals from continental populations.
(a) Nearest neighbor tree based on variant positions shared between the KWS samples and individuals from intercontinental populations. (b). Intergenome comparisons based on variant positions associated with OMIM disease genes and are shared between the KWS samples and individuals from intercontinental populations.
Figure 4Venn diagram depicting the number of SNPs having significant difference in allele frequencies between the KWS group and other continent populations from the 1000 Genomes Project (Fst >0.25 & q-value <0.05).
Markers denoting causal variants for OMIM diseases and showing significant differences in risk allele frequencies between the KWS and continental populations.
| SNP_ID (risk allele) & gene name | HGVS | MIM_ID | Phenotype | Risk Allele Frequency | Allele Frequency in a larger data set of 63 Kuwaiti natives of Saudi Arabian tribe ancestry | Reference | ||||
| AFR | AMR | ASN | EUR | KWS | ||||||
| rs1042114 (G) | 1:g.29138975G>T | #103780 | Alcohol Dependence | 0.037 | 0.086 | 0 | 0.131 | 0.667 | 0.105 | Zhang H et al. |
| OPRD1 (*165195) | ||||||||||
| rs1049254 (G) | 16:g.88709828A>G | +608508 | Reactive Oxygen Species Generation | 0.831 | 0.646 | 0.773 | 0.62 | 0.133 | Bedard K et al. | |
| CYBA (+608508) | ||||||||||
| rs1800742 (A) | 16:g.2110805G>A | #191100 | Tuberous Sclerosis-1 | 0 | 0.011 | 0 | 0.004 | 0.133 | Jones AC et al. | |
| TSC1 (*605284) | ||||||||||
| rs1801483 (A) | 17:g.79767715G>A | #125853 | Diabetes Mellitus, Noninsulin-Dependent | 0 | 0.003 | 0 | 0.012 | 0.133 | Hager J et al. | |
| GCGR (*138033) | ||||||||||
| rs2020912 (C) | 2:g.48027755T>C | #614350 | Colorectal Cancer, Hereditary Nonpolyposis, Type 5 | 0 | 0 | 0 | 0.015 | 0.167 | Wu Y et al. | |
| MSH6 (*600678) | ||||||||||
| rs2108622 (T) | 19:g.15990431C>T | #122700 | Coumarin Resistance/Warfarin resistance | 0.085 | 0.285 | 0.206 | 0.273 | 0.7 | 0.532 | Caldwell MD et al. |
| CYP4F2 (*604426) | ||||||||||
| rs2814778 (C) | 1:g.159174683T>C | #110700; #611162 | Duffy Blood Group System; protection against Plasmodium Vivax | 0.943 | 0.069 | 0 | 0.003 | 0.4 | Reich D et al. | |
| DARC (*613665) | ||||||||||
| rs6151429 (C) | 22:g.51063477T>C | #250100 | Metachromatic Leukodystrophy (also called Arylsulfatase A deficiency) | 0 | 0.041 | 0.019 | 0.081 | 0.533 | 0.325 | Regis S et al. |
| ARSA (*607574) | ||||||||||
| rs7076156 (G) | 10:g.64415184A>G | 605990 | Nephrolithiasis, Uric Acid, Susceptibility To | 0.974 | 0.815 | 0.913 | 0.734 | 0.3 | 0.508 | Gianfrancesco F et al. |
| ZNF365 (607818) | ||||||||||
Figure 5Distribution of total number of variants (SNPs and indels) upon step-wise addition of exomes, and the distribution of number of new variants added per exome.
(Coefficient of determination (R2) >0.99 for all fitted curves).
Figure 6Phylogenetic tree of the observed HVS1 segments among the 15 participants together with those observed by Abu Amero [46] in Saudi Arabia natives.
Kuwaiti samples are labeled as KWS. Green triangles denote sample from Central region of Saudi Arabia; Blue triangles denote samples from Southern region of Saudi Arabia; Red triangles denote samples from Western region of Saudi Arabia; Black triangles denote samples from Northern region of Saudi Arabia; Not Known [Cyan triangles].
Figure 7Summary of analysis of genomes from Kuwait subgroup of Saudi Arabian tribe ancestry.
Tracks (from outer to inner): Karyotype of Human Genome; Density (in every window of 1 Mb) of ‘known’ SNPs (i.e. annotated in dbSNP 137) from the UE data set; Density of ‘novel’ SNPs (i.e. not annotated in dbSNP137) from the UE data set; Density of ‘known’ indels from the UE data set; Density of ‘novel’ indels from the UE data set; Density of ‘known’ SNPs from the UW data set; Density of ‘novel’ SNPs from the UW data set; Density of ‘known’ indels from the UW data set; Density of ‘novel’ indels from the UW data set; Density of long Indels; Density of duplications, inversions and tandem duplications; Links representing intra- and inter-chromosomal translocations. The image was generated using Circos [71].