| Literature DB >> 23139751 |
Juan L Rodriguez-Flores1, Jennifer Fuller, Neil R Hackett, Jacqueline Salit, Joel A Malek, Eman Al-Dous, Lotfi Chouchane, Mahmoud Zirie, Amin Jayoussi, Mai A Mahmoud, Ronald G Crystal, Jason G Mezey.
Abstract
The Qatari population, located at the Arabian migration crossroads of African and Eurasia, is comprised of Bedouin, Persian and African genetic subgroups. By deep exome sequencing of only 7 Qataris, including individuals in each subgroup, we identified 2,750 nonsynonymous SNPs predicted to be deleterious, many of which are linked to human health, or are in genes linked to human health. Many of these SNPs were at significantly elevated deleterious allele frequency in Qataris compared to other populations worldwide. Despite the small sample size, SNP allele frequency was highly correlated with a larger Qatari sample. Together, the data demonstrate that exome sequencing of only a small number of individuals can reveal genetic variations with potential health consequences in understudied populations.Entities:
Mesh:
Year: 2012 PMID: 23139751 PMCID: PMC3490971 DOI: 10.1371/journal.pone.0047614
Source DB: PubMed Journal: PLoS One ISSN: 1932-6203 Impact factor: 3.240
Figure 1Functional classification of single nucleotide polymorphism (SNP) sites in seven Qatari exomes.
Genotypes for 126,924 SNPs in target exons ±500 bp were confidently called as ref/ref, ref/alt or alt/alt (where ref = GRCh37 reference allele and alt = non-reference allele) using GATK [33] and classified using databases of SNP function (NCBI dbSNP build 134, SIFT online webserver [34], and GATK VariantAnnotator function [33]. Shown are bar plots of: A. SNPs observed in ≥1 of 1,099 exomes [QE7 and 1000 G]; B. SNPs identified in ≥1 of 14 QE7 alleles; C. SNPs significantly higher or lower in QE7 vs at least one population; and D. Subset of significantly higher or lower SNPs in genes with a health-related role (OMIM [12], HGMD [37], PharmGKB [38] or HUGE [39]). In the four plots, the x-axis lists the functional categories (noncoding, coding, silent, missense, splice, nonsense) and the y-axis the number of SNPs. There were 20,857 (52%) missense SNPs predicted deleterious by SIFT [34] or PolyPhen2 [35] polymorphic in 1,099 exomes (QE7 and 1000 G), a subset 2,750 polymorphic in QE7 with ≥1 of 14 alleles (Table 1). There were 1,853 significantly higher or lower missense SNPs predicted deleterious by SIFT [34] or PolyPhen2 [35] polymorphic in 1,099 exomes (QE7+1000 G), and a subset of 510 relevant to health; see Table 2). Red = predicted deleterious SNPs.
Potentially Deleterious Missense Coding SNPs in the Qatari Genome Identified by Exome Sequencing1.
| Functional classification | Potentially deleterious alternate allele observed in QE7 n (%) | |
| ≥1 of 14 | ≥6 of 14 | |
| Total potentially deleterious missense coding SNPs | 2,750 (100%) | 339 (100%) |
| Not previously associated with nor within a health-linked gene | 1,969 (72%) | 240 (71%) |
| In a gene previously linked to human health, but a different SNP than previously reported | 650 (24%) | 74 (22%) |
| The gene and SNP have been previously linked to human health | 131 (5%) | 25 (7%) |
In order to identify potentially deleterious missense health-linked SNPs in Qatar, genotypes of the 2,750 predicted to be potentially deleterious alternate alleles observed in QE7 were subdivided by frequency [≥1/14 or ≥6/14 alternate allele frequency] and by functional category.
In order to identify potentially deleterious SNPs of medical interest in Qatar, the 2,750 predicted to be potentially deleterious SNPs were subclassified into 3 groups based on prior link of the gene or SNP to a health-related phenotype using four major databases of disease and metabolism SNPs (OMIM [12], HGMD [37], PharmGKB [38] and HUGE [39]). 1st row - total number of potentially deleterious SNPs; 2nd row - number of potentially deleterious of SNPs where no SNP in the gene has been previously associated with a phenotype relevant to human health; 3rd row - SNPs in genes linked to human health, but the SNP has not been previously tested for phenotypic effect; and 4th - number of potentially deleterious SNPs where the specific SNP and gene has been reported to be health-linked. SNPs in the fourth row (previously identified) are not counted in the third row (in a gene, but not SNP, previously linked).
1st column - the 2,750 SNPs where the potentially deleterious alternate allele was observed at least once in QE7 (≥1 of 14), representing 2% of the sites confidently genotyped in QE7, subdivided by the health-linked classification described above. 2nd column - the 339 potentially deleterious alleles observed at least 6 times in QE7 (12% of the 2,750), subdivided by the health-linked classification described above. For each column, percentages are based on the total in the first row for that column.
Affymetrix Microarray Validation of Qatari Exome Potentially Deleterious SNPs Where the SNP and Gene Have Been Previously Linked to Human Health1.
| Allele frequency | |||||||||||
| Gene | SNP | QE7 exome sequencing (n = 14 alleles) | QA149 Affymetrix genotyping (n = 298 alleles) | ||||||||
| Symbol | Name | Amino acid substitution | dbSNP rsID | Chromosome | Position | Ref> alt | Risk allele | Health-associated phenotype | Alt | Alt | Risk |
| PPARG | Peroxisome proliferator-activated receptor gamma | Pro12Ala | rs1801282 | 3 | 12393125 | C>G | C | Type 2 diabetes | 0.07 | 0.05 | 0.95 |
| PON2 | Paraoxonase 2 | Ser311Cys | rs7493 | 7 | 95034775 | G>C | G | Coronary artery disease | 0.29 | 0.28 | 0.72 |
| NAT2 | N-acetyltransferase | Arg197Gln | rs1799930 | 8 | 18258103 | G>A | A | Slow metabolism of clonazepam | 0.14 | 0.23 | 0.23 |
| MTR | 5-methyltetrahydrofolate-homocysteine methyltransferase | Asp473Gly | rs1805087 | 1 | 237048500 | A>G | G | Cardiovascular disease | 0.29 | 0.22 | 0.22 |
| NQO1 | NAD(P)H dehydrogenase, quinone 1 | Pro187Ser | rs1800566 | 16 | 69745145 | G>A | A | Increased risk of benzene poisoning, colorectal cancer, poor survival in breast cancer | 0.14 | 0.21 | 0.21 |
| ULK4 | Unc-51-like kinase 4 | Lys569Arg | rs3774372 | 3 | 41877414 | T>C | C | Diastolic blood pressure | 0.57 | 0.19 | 0.19 |
| CDC6 | Cell division cycle 6 | Val441Ile | rs13706 | 17 | 38457151 | G>A | A | Rate of decline in ex-smokers with COPD | 0.29 | 0.14 | 0.14 |
| PARP1 | Poly (ADP-ribose) polymerase 1 | Val762Ala | rs1136410 | 1 | 226555302 | A>G | G | Increased risk of cancer in Asians, decreased risk of cancer in Europeans | 0.14 | 0.13 | 0.13 |
| BDNF | Brain-derived neurotrophicfactor | Val66Met | rs6265 | 11 | 27679916 | C>T | T | Anxiety, depression disorders, motor function | 0.07 | 0.11 | 0.11 |
| PPP1R3A | Protein phosphatase 1,regulatory subunit 3A | Asp905Tyr | rs1799999 | 7 | 113518434 | C>A | A | Type 2 diabetes, insulin resistance | 0.14 | 0.10 | 0.10 |
Analysis of the exomes in the QE7 14 alleles identified 131 missense coding SNPs where the SNP and gene have been previously identified as linked to human health (Table 1, 4th row). To validate this observation in a larger group of Qataris, the Affymetrix Genome-Wide SNP Array 5.0 was used to assess an independent group of 149 Qataris (QA149, 298 alleles). Of the 2,750 missense potentially deleterious SNPs identified in at least 1 of the QE7 14 alleles, 131 were on the microarray. Of these, 49 were in genes linked to human health, including 16 where both the gene and the SNP are linked to human health. Of these 16, listed are 10 chosen as examples of missense SNPs linked to human health.
Gene symbol and name obtained from the Consensus Coding Sequence (CCDS) NCBI database [32], amino acid substitution position and residues obtained from dbSNP when available, otherwise SIFT online webserver [34]. Transcript position and amino acid substitution were verified to be consistent with the literature.
SNP information includes chromosome amino acid substitution, dbSNP build 134 rsID if available, chromosome, position in GRCh37 human reference genome assembly, reference and alternate allele in QE7. Ref = references; alt = alternative.
Phenotype information from OMIM [12], HGMD [37], PharmGKB [38] or HUGE [39] database.
For more details and references, see Details S1.
Shown is the alternate allele frequency determined by exome sequencing in QE7 individuals.
Shown is the risk allele frequency in the validation set of QA149 individuals (n = 149 Qatari, 298 alleles). Failed genotypes are accounted for in the allele frequency. For statistical comparisons of the QE7 and QA149 allele frequencies, see Figure 2.
Figure 2Validation of allele frequency for potentially deleterious nonsynonymous missense SNPs observed in n = 7 Qatari exomes using Affymetrix 5.0 array genotyping of n = 149 Qataris or TaqMan genotyping of n = 86 Qataris (n = 82 overlapping).
A. To confirm the allele frequency estimates for the Qatari population based on the number of alleles observed in QE7 (n = 14 alleles) for potentially deleterious SNPs, the QE7 allele frequency observed in at least 1 of 14 (7%) QE7 alleles was compared to the allele frequency in QA149 (n = 298 alleles) generated using Affymetrix 5.0 SNP microarrays. Of the 2,750 potentially deleterious nonsynonymous SNPs identified in QE7, 149 probes were on the Affymetrix 5.0 array. Shown is the QE7 allele frequency along the x-axis and the QA149 allele frequency along the y-axis for 131 SNPs, excluding 18 Affymetrix 5.0 SNPs where the QE7 allele frequency could not be validated due to partial missing genotypes. B. Validation of allele frequency for potentially deleterious nonsynonymous missense SNPs significantly higher or lower in Qatari exomes using TaqMan genotyping of n = 86 Qataris. To confirm the allele frequency estimates for the Qatari population based on the number of alleles observed in QE7 (n = 14 alleles) for deleterious SNPs in Table 4, the QE7 allele frequency observed in at least 1 of 14 (7%) QE7 alleles was compared to the allele frequency in QT86 (n = 172 alleles) generated using TaqMan. Shown is the QE7 allele frequency along the x-axis and the QT86 allele frequency along the y-axis.
Predicted Deleterious SNPs in Known Health-associated Genes Enriched in Qatari Exomes Compared to Worldwide Populations and Validated by TaqMan PCR in a Larger Qatari Population1.
| Gene | SNP | Qatari allele frequency | 1000 Genomes allele frequency | |||||||||||
| Symbol | Name | Amino acid substitution | dbSNP rsID | Chr | Position | Ref> alt | Risk allele | Health-associated phenotype | QE7 exome sequencing (n = 14 alleles) | QT86 TaqMan genotyping (n = 172 alleles) | Europe | Asia | Africa | America |
| BMP4 | Bone morphogenetic protein IV | Val152Ala | rs17563 | 14 | 54417522 | A>G | G | Osteoporosis | 0.64 | 0.41 | 0.56 | 0.27 | 0.19 | 0.34 |
| ZNF229 | Zinc finger protein 229 | Gly662Arg | rs1434579 | 19 | 44932972 | C>T | T | Resistance to tuberculosis | 0.57 | 0.21 | 0.36 | 0.32 | 0.03 | 0.22 |
| ULK4 | UNC-51-like kinase 4 | Lys569Arg | rs3774372 | 3 | 41877414 | T>C | C | Diastolic blood pressure | 0.57 | 0.34 | 0.20 | 0.15 | 0.23 | 0.15 |
| AKAP13 | A kinase anchor protein 13 | Gly624Val | rs745191 | 15 | 86123170 | G>T | A | Familial breast cancer | 0.43 | 0.27 | 0.31 | 0.02 | 0.01 | 0.19 |
| FMO2 | Flavin monooxygenase II | Ser195Leu | rs2020862 | 1 | 171168584 | C>T | C | Pulmonary disease | 0.71 | 0.51 | 0.25 | 0.48 | 0.39 | 0.38 |
| COL4A3 | Collagen type IV, alpha-3 | Asp326Tyr | rs55703767 | 2 | 228121101 | G>T | G | Keratoconus | 0.43 | 0.21 | 0.23 | 0.11 | 0.01 | 0.14 |
| UTS2 | Urotensin II | Thr21Met | rs228648 | 1 | 7913430 | G>A | G | Type 2 diabetes | 0.79 | 0.71 | 0.58 | 0.32 | 0.46 | 0.44 |
| ACAT2 | Acetyl-CoA acetyltransferase 2 | Lys211Arg | rs25683 | 6 | 160196343 | A>G | * | Plasma lipid levels | 0.64 | 0.48 | 0.56 | 0.15 | 0.14 | 0.49 |
| TTC37 | Tetratricopeptide repeat domain 37 | Arg1296Ser | rs2303650 | 5 | 94826655 | C>A | * | Trichohepatoenteric syndrome | 0.71 | 0.28 | 0.17 | 0.13 | 0.38 | 0.24 |
| PDZRN4 | PDZ domain containing ring finger 4 | Gly171Ser | rs285584 | 12 | 41946539 | G>A | * | Multiple sclerosis | 0.57 | 0.22 | 0.16 | 0.11 | 0.24 | 0.14 |
Analysis of the QE7 exomes (n = 14 alleles) identified 1,841 predicted deleterious SNPs observed in at least 1 of 14 QE7 alleles and significantly different in prevalence compared to overall continental populations as represented by the 1000 Genomes. Of these, 135 SNPs were significantly different in prevalence compared to the 1000 Genomes and observed in at least 6 of 14 QE7 alleles. Of these, 39 were either in a gene previously identified within a health-linked gene (n = 9) or in a gene previously linked to human health, but a different SNP than previously reported (n = 30). Listed in this table are the 10 examples linked to diseases relevant to Qatar. SNPs from this list for which there is literature supporting a link to human health; the first 7 are from the category of “previously identified within a health-linked gene” and the last 3 are from the category of “a gene previously linked to human health, but a different SNP than previously reported.” These 10 genes were validated by TaqMan PCR in an independent group of 86 Qataris (QT86, 172 alleles, including 82 Qatari overlapping with the QA149 and 4 non-overlapping).
Gene symbol and name obtained from the Consensus Coding Sequence (CCDS) NCBI database [45], amino acid substitution position and residues obtained from dbSNP when available, otherwise SIFT online webserver [46].
SNP information includes chromosome amino acid substitution, dbSNP build 134 rsID if available, chromosome, position in GRCh37 human reference genome assembly, reference and alternate allele in QE7. Ref = references; alt = alternative; * = risk allele could not be determined.
Phenotype information from OMIM [12], HGMD [37], PharmGKB [38] or HUGE [39] database. Shown is the risk allele, the health-associated phenotype and the reference(s).
For more details and references see Details S1.
Shown is the alternate allele frequency determined by TaqMan in QE7 individuals; no genotypes discordant with the QE7 exome sequences were observed.
Shown is the alternate allele frequency in the validation set of QT86 individuals (n = 86 Qatari, 172 alleles). Failed genotypes are accounted for in the allele frequency. For statistical comparisons of the QE7 and QT86 allele frequencies, see Figure 2.
Shown is the alternate allele frequency in the 1000 Genomes Phase 1 population samples (n = 1,092 including n = 379 Europeans, n = 296 Asians, n = 185 Africans and n = 242 Americans; 2,184 alleles). For details of populations included in the continental allele frequency estimates, see Methods S1.
Affymetrix Microarray Validation of Qatari Exome Predicted Deleterious SNPs in Genes Linked to Human Health, but a Different SNP than Previously Reported1.
| Allele frequency | ||||||||||||
| Gene | SNP | QE7 exome sequencing (n = 14 alleles) | QA149 Affymetrix genotyping (n = 298 alleles) | |||||||||
| Symbol | Name | Amino acid substitution | dbSNP rsID | Chromosome | Position | Ref> alt | Health-associated phenotype | Alt | Alt | |||
| HMCN1 | Hemicentin | Gln4437ARG | rs10911825 | 1 | 186101539 | A>G | Age related macular degeneration | 0.50 | 0.43 | |||
| IKBKAP | Inhibitor of kappa light polypeptidegene enhancer in B cells, kinasecomplex-associated protein | Cys1072Ser | rs3204145 | 9 | 111651620 | A>T | Familial dysautonomia | 0.14 | 0.27 | |||
| VSX1 | Visual system homeobox gene 1 | Arg217His | rs6138482 | 20 | 25059442 | C>T | Keratoconus, polymorphous corneal dystrophy | 0.21 | 0.19 | |||
| EVC | Ellis-van creveld syndrome | Gln74Pro | rs2291157 | 4 | 5721021 | A>C | Ellis-van Creveld syndrome; Weyers acrodental dysostosis | 0.07 | 0.17 | |||
| SGCG | Sarcoglycan, gamma | Arg116His | rs17314986 | 13 | 23824818 | G>A | Muscular dystrophy, limb-girdle, type 2c | 0.07 | 0.16 | |||
| SACS | Sacsin | Asn232Lys | rs2031640 | 13 | 23930055 | A>T | Spastic ataxia, Charlevoix-Saguenay type | 0.07 | 0.11 | |||
| OSMR | Oncostatin m-specific receptor | Glu527Lys | rs10941412 | 5 | 38919158 | G>A | Primary cutaneous amyloidosis | 0.14 | 0.10 | |||
| ARHGEF10 | Rho guanine nucleotide exchange factor 10 | Ser980Ala | rs17683288 | 8 | 1877480 | T>G | Slow nerve conduction velocity, autosomal dominant | 0.14 | 0.09 | |||
| CACNA1S | Calcium channel, voltage-dependent, l type, alpha-1s | Arg1539Cys | rs3850625 | 1 | 201016296 | G>A | Hypokalemic periodic paralysis, thyrotoxic periodic paralysis, malignant hyperthermia | 0.07 | 0.08 | |||
| RSPH4A | Radial spoke head 4, Chlamydomonas, homolog A | Asn627His | rs9488991 | 6 | 116951678 | A>C | Ciliary dyskinesia, primary | 0.07 | 0.05 | |||
Analysis of the exomes in the QE7 14 alleles identified 650 missense coding SNPs where the gene has been previously identified as linked to human health, but the missense SNP is different than that previously reported (Table 1, row 3). To validate this observation in a larger group of Qataris, the Affymetrix Genome-Wide SNP array 5.0 was used to assess an independent group of 149 Qataris (QA149, 298 alleles). Of the 2,750 missense potentially deleterious SNPs identified in at least 1 of the QE7 14 alleles, 131 were on the microarray. Of these, 49 were in genes linked to human health, including 33 where the gene is linked to human health, but the reported link was for a different SNP. Of these 33, listed are 10 chosen as examples of missense SNPs linked to human health that are extensively documented in the OMIM database (3).
Gene symbol and name obtained from the Consensus Coding Sequence (CCDS) NCBI database [32], amino acid substitution position and residues obtained from dbSNP when available; otherwise SIFT online webserver [34]. Transcript position and amino acid substitution was verified to be consistent with the literature.
SNP information includes amino acid substitution, dbSNP build 134 rsID if available, chromosome, position in GRCh37 human reference genome assembly, reference and alternate allele in QE7. Ref = references; alt = alternative.
Phenotype information from OMIM [12], HGMD [37], PharmGKB [38] or HUGE [39] database.
See Details S1.
Shown is the alternate allele frequency determined by exome sequencing in QE7 individuals.
Shown is the alternate allele frequency in the validation set of QA149 individuals (n = 149 Qatari, 298 alleles). Failed genotypes are accounted for in the allele frequency. For statistical comparisons of the QE7 and QA149 allele frequencies, see Figure 2.
Figure 3Principal component analysis (PCA) validation of exome genotypes for the QE7 individuals.
In order to verify the overall quality of the genotyping call set, the seven Qatari exomes were compared to 1,092 individuals from four continents (1000 Genomes Project October 2011 Integrated Phase 1 Variant Set Release) at 18,865 SNPs segregating in both QE7 and 1000 Genomes that are present in dbSNP build 134 using SMARTPCA [14]. Plotted is PCA 1 (x-axis) vs PCA 2 (y-axis). Individuals are color-coded by continent of origin (European = red, Asian = green, African = blue, American = grey, Qatar = orange). Clustering of the Qatari individuals was verified to be consistent with our prior report [9], where Q1 cluster near Europeans, Q2 in between Q1 and Asians, and Q3 between Q1 and Africans.
Figure 4Identification of autosomal exome SNPs in the QE7 Qatari individuals with an allele frequency distinct from at least one continent (Europe, Asia, Africa, the Americas) as estimated from exomes vs 1000 Genomes.
A. Illustration of threshold selection. Fixation index (Fst; x-axis) and -log10 (q-values; y-axis) for a binomial test for all SNPs assessed versus each continent (red = QE7 vs Europeans, green = QE7 vs Asians, blue = QE7 vs Africans, tan = blue = QE7 vs Americans). Shown is the threshold selected for identifying enriched SNPs; Fst >0.25 [43] and FDR <0.05 [44]. B–E. Heat maps of the false discovery rate [44] for enrichment of higher or lower than expected number of alternative alleles tested on 126,924 exome SNPs. Shown is the allele counts for the 7 Qatari exomes (y axis) vs the continental alternative allele frequency in 1000 Genomes continental populations (x axis). The map shows combined FDR and Fst thresholds for all SNPs (enriched = red = Fst >0.25 and FDR <0.05; not enriched = blue = Fst <0.25 or FDR >0.05; white = no observations). B. Qataris vs Europeans (EUR), C. Qataris vs Asians (ASN), and D. Qataris vs Africans (AFR). D. Qataris vs Americans (AMR). E. Venn diagram of 25,803 SNPs enriched in Qataris vs at least one continent.