| Literature DB >> 29795570 |
Manuel A Rivas1,2, Brandon E Avila1,3, Jukka Koskela1,3,4, Hailiang Huang1,3, Christine Stevens1, Matti Pirinen4,5, Talin Haritunians6, Benjamin M Neale1,3, Mitja Kurki1,3, Andrea Ganna1,3, Daniel Graham1, Benjamin Glaser7, Inga Peter8, Gil Atzmon9,10, Nir Barzilai9, Adam P Levine11, Elena Schiff11, Nikolas Pontikos11,12, Ben Weisburd1,3, Monkol Lek1,3, Konrad J Karczewski1,3, Jonathan Bloom1,3, Eric V Minikel1,3, Britt-Sabina Petersen13, Laurent Beaugerie14, Philippe Seksik14, Jacques Cosnes14, Stefan Schreiber15, Bernd Bokemeyer16, Johannes Bethge15, Graham Heap17, Tariq Ahmad18, Vincent Plagnol12, Anthony W Segal11, Stephan Targan6, Dan Turner19, Paivi Saavalainen20, Martti Farkkila21, Kimmo Kontula22, Aarno Palotie1,4,23, Steven R Brant24,25, Richard H Duerr26,27, Mark S Silverberg28, John D Rioux29,30, Rinse K Weersma31, Andre Franke13, Luke Jostins32, Carl A Anderson33, Jeffrey C Barrett33, Daniel G MacArthur1,3, Chaim Jalas34, Harry Sokol14, Ramnik J Xavier1,35, Ann Pulver36, Judy H Cho37, Dermot P B McGovern6, Mark J Daly1,3,4.
Abstract
As part of a broader collaborative network of exome sequencing studies, we developed a jointly called data set of 5,685 Ashkenazi Jewish exomes. We make publicly available a resource of site and allele frequencies, which should serve as a reference for medical genetics in the Ashkenazim (hosted in part at https://ibd.broadinstitute.org, also available in gnomAD at http://gnomad.broadinstitute.org). We estimate that 34% of protein-coding alleles present in the Ashkenazi Jewish population at frequencies greater than 0.2% are significantly more frequent (mean 15-fold) than their maximum frequency observed in other reference populations. Arising via a well-described founder effect approximately 30 generations ago, this catalog of enriched alleles can contribute to differences in genetic risk and overall prevalence of diseases between populations. As validation we document 148 AJ enriched protein-altering alleles that overlap with "pathogenic" ClinVar alleles (table available at https://github.com/macarthur-lab/clinvar/blob/master/output/clinvar.tsv), including those that account for 10-100 fold differences in prevalence between AJ and non-AJ populations of some rare diseases, especially recessive conditions, including Gaucher disease (GBA, p.Asn409Ser, 8-fold enrichment); Canavan disease (ASPA, p.Glu285Ala, 12-fold enrichment); and Tay-Sachs disease (HEXA, c.1421+1G>C, 27-fold enrichment; p.Tyr427IlefsTer5, 12-fold enrichment). We next sought to use this catalog, of well-established relevance to Mendelian disease, to explore Crohn's disease, a common disease with an estimated two to four-fold excess prevalence in AJ. We specifically attempt to evaluate whether strong acting rare alleles, particularly protein-truncating or otherwise large effect-size alleles, enriched by the same founder-effect, contribute excess genetic risk to Crohn's disease in AJ, and find that ten rare genetic risk factors in NOD2 and LRRK2 are enriched in AJ (p < 0.005), including several novel contributing alleles, show evidence of association to CD. Independently, we find that genomewide common variant risk defined by GWAS shows a strong difference between AJ and non-AJ European control population samples (0.97 s.d. higher, p<10-16). Taken together, the results suggest coordinated selection in AJ population for higher CD risk alleles in general. The results and approach illustrate the value of exome sequencing data in case-control studies along with reference data sets like ExAC (sites VCF available via FTP at ftp.broadinstitute.org/pub/ExAC_release/release0.3/) to pinpoint genetic variation that contributes to variable disease predisposition across populations.Entities:
Mesh:
Year: 2018 PMID: 29795570 PMCID: PMC5967709 DOI: 10.1371/journal.pgen.1007329
Source DB: PubMed Journal: PLoS Genet ISSN: 1553-7390 Impact factor: 5.917
Fig 1Enrichment of alleles discovered in AJ exome sequencing project.
A) Histogram of estimated log enrichment statistic, defined as the log of the bias corrected odds ratio comparing the allele frequency in AJ population to the maximum allele frequency estimated from NFE, AFR, and AMR populations in ExAC. For each histogram bin we show a bar plot of the expected number of alleles belonging to the two groups we analyzed: 1) enriched (green) and 2) not enriched (white). B) Bar plots of estimated percentage of alleles belonging to the two groups we analyzed for all protein-coding (ALL), synonymous (SYN), protein-altering (PRA), and protein-truncating variants (PTV). An estimate of 34% of protein-coding alleles observed in AJ have a mean shift of 15-fold increased odds of the alternate allele compared to other reference populations. This observation is supported by the property that compared to intergenic variants, coding variants tend to be younger for a given frequency and the more pathogenic a variant, the younger it is, therefore tending to be population specific[13].
Forty-eight ClinVar “pathogenic” alleles enriched in AJ.
HGVS and Gene is the allele nomenclature in ClinVar and gene symbol, respectively. Enrichment odds ratio corresponds to the bias corrected comparison of allele frequency in AJ (AJ AF) to maximum frequency among three population groups (max EXAC AF): 1) NFE; 2) AMR; and 3) AFR. Curated trait is based on the trait description in the Online Mendelian Inheritance in Man (OMIM) and is independent of effect size as a Crohn’s risk allele. Inheritance corresponds to the inheritance description in OMIM (AR: autosomal recessive, AD: autosomal dominant, risk factor: not specified genetic risk factor). Alleles are sorted in decreasing order by AJ AF.
| Variant | HGVS | Gene | Enrichment Odds Ratio | AJ AF | Max ExAC AF | Curated Traits | Inheritance |
|---|---|---|---|---|---|---|---|
| p.Val726Ala | 26.08 | 0.0416 | 0.0017 | Familial Mediterranean fever | AR | ||
| p.Gly87Val | 3.51 | 0.0414 | 0.0122 | Hyperglycinuria | AD | ||
| p.Asn409Ser | 11.16 | 0.0296 | 0.0027 | Susceptibility to Lewy bod dementia, Gaucher’s disease, Susceptibility to late onset Parkinson’s disease | AR | ||
| p.Phe301Leu | 47.17 | 0.0273 | 0.0006 | Hereditary factor XI deficiency | AR | ||
| p.Leu56Argfs | 39.19 | 0.0199 | 0.0005 | Autosomal recessive deafness | AR | ||
| p.Glu135Ter | 28.20 | 0.0195 | 0.0007 | Factor XI deficiency | AR | ||
| p.Arg49Cys | 16.12 | 0.0189 | 0.0012 | Salivary peroxidase | AR | ||
| c.2204+6T>C | 45.22 | 0.0168 | 0.0004 | Familial dysautonomia | AR | ||
| p.Tyr427IlefsTer5 | 19.14 | 0.0122 | 0.00064 | Tay-Sachs disease | AR | ||
| p.Arg4192His | 13.63 | 0.0106 | 0.0008 | Retinitis pigmentosa | AR | ||
| p.Ser428Phe | 50.06 | 0.0103 | 0.0002 | Hereditary cancer, multiple types | Risk factor | ||
| p.Glu315del | 29.28 | 0.0101 | 0.0003 | Primary hyperoxaluria | AR | ||
| p.Trp1282Ter | 23.64 | 0.0085 | 0.0004 | Cystic fibrosis | AR | ||
| c.3992-9G>A | 40.62 | 0.0076 | 0.0002 | Hyperinsulinemic hypoglycemia | AR, AD | ||
| p.Glu285Ala | 40.36 | 0.0076 | 0.0002 | Canavan disease | AR | ||
| c.101+1G>A | 26.11 | 0.0074 | 0.0003 | Achromatopsia | AR | ||
| p.Ser1982Argfs | 27.57 | 0.0069 | 0.0003 | Hereditary cancer, multiple types | Risk factor | ||
| c.456+4A>T | 42.75 | 0.0069 | 0.0002 | Fanconi anemia | AR | ||
| p.Phe390Ilefs | 32.62 | 0.0067 | 0.0002 | Limb-girdle muscular dystrophy-dystroglycanopathy | AR | ||
| p.Gly2019Ser | 20.64 | 0.0064 | 0.0003 | Parkinson’s disease | Risk factor | ||
| p.Arg83Cys | 11.04 | 0.0062 | 0.0006 | Glycogen storage disease | AR | ||
| p.Lys42Glu | 64.83 | 0.0051 | 0.0001 | Retinitis pigmentosa | AR | ||
| p.Asn48Lys | 46.26 | 0.0051 | 0.0001 | Usher syndrome | AR | ||
| p.Ile293Profs | 25.75 | 0.0048 | 0.0002 | Ciliary dyskinesia without situs inversus | AR | ||
| p.Arg183Pro | 29.42 | 0.0046 | 0.0002 | Maple syrup disease | AR | ||
| p.Arg245Ter | 26.58 | 0.0046 | 0.0002 | Usher syndrome | AR | ||
| p.Gly229Cys | 26.55 | 0.0046 | 0.0002 | Maple syrup disease | AR | ||
| c.1421+1G>C | 52.65 | 0.0044 | 0.0001 | Tay-Sachs disease | AR | ||
| p.Arg311Gln | 9.86 | 0.0042 | 0.0004 | Enhanced s-cone syndrome | AR | ||
| p.Gln225Ter | 129.41 | 0.0041 | 0.0000 | Ehlers-Danlos syndrome, dermatosparaxis type | AR | ||
| p.Ala612Thr | 12.48 | 0.0039 | 0.0003 | Early-onset sarcoidosis | Risk factor | ||
| p.Arg498Leu | 41.53 | 0.0039 | 0.0001 | Niemann-Pick disease | AR | ||
| p.Arg73Leu | 27.77 | 0.0039 | 0.0001 | Joubert syndrome | AR | ||
| pLys414ThrfsTer7 | 78.34 | 0.0037 | 0.0000 | Carnitine palmitoyltransferase II deficiency | AR | ||
| p.Phe448Leu | 78.35 | 0.0037 | 0.0000 | Carnitine palmitoyltransferase II deficiency | AR | ||
| p.Arg283Gln | 9.79 | 0.0037 | 0.0004 | Spermatogenic failure | AR | ||
| p.Val54Leu | 47.03 | 0.0037 | 0.0001 | Hypomyelinating leukodystrophy | AR | ||
| p.Arg119Ter | 20.03 | 0.0034 | 0.0002 | Peroxisome biogenesis disorder | AR | ||
| p.Cys845Gly | 190.98 | 0.0030 | 0.0000 | Hypomyelinating leukodystrophy | AR | ||
| p.Gln279Ter | 29.25 | 0.0028 | 0.0001 | Leber congenital amaurosis | AR | ||
| c.406-2A>G | 21.93 | 0.0028 | 0.0001 | Mucolipidosis | AR | ||
| p.Arg632Pro | 29.37 | 0.0028 | 0.0001 | Retinitis pigmentosa | AR | ||
| p.Glu23Valfs | 10.04 | 0.0025 | 0.0003 | Hereditary cancer, multiple types | Risk factor | ||
| p.Gly865Ter | 40.38 | 0.0025 | 0.0001 | Abetalipoproteinaemia | AR | ||
| p.Gly557Arg | 29.36 | 0.0023 | 0.0001 | Achromatopsia | AR | ||
| p.Glu375Lys | 26.44 | 0.0021 | 0.0001 | Maple syrup disease | AR | ||
| p.Gln1756Profs | 8.80 | 0.0021 | 0.0002 | Hereditary cancer, multiple types | Risk factor | ||
| p.Gly287Val | 22.01 | 0.0021 | 0.0001 | Primary hyperoxaluria | AR |
Fig 2Q-Q plots of enriched alleles.
Q-Q plots of Crohn’s disease association for AJ enriched A) protein-altering (protein-truncating and missense) and B) synonymous alleles in GWAS regions; and AJ enriched C) protein-altering and D) synonymous alleles outside of GWAS regions. For each Q-Q plot variants with a corresponding p-value less than or equal to a threshold where expected number of false discoveries is equal to one are annotated. The black dashed line is y = x, and the grey shapes show 95% confidence interval under the null.
Fig 3AJ individuals have higher CD polygenic risk score than NJ controls.
NJ: non-Jewish; AJ: Ashkenazi Jewish; CD: Crohn’s disease; PRS: polygenic risk score. A) Density plot of CD polygenic risk scores in 454 AJ (green) and 35,007 NJ(purple)controls. AJ controls have higher CD polygenic risk score than NJ controls (0.97 s.d. higher, p<10−16). B) Density plot of CD polygenic risk scores in 1,938 AJ (green) and 20,652 NJ CD (purple) cases (0.54 s.d. higher, p<10−16). For both density plots the scores have been scaled to NJ controls, thus resulting in an NJ control PRS density of mean equal to 0 and variance equal to 1 (see Online Methods). C) Ranked (decreasing order) CD associated variants by estimated contribution to the differences in genetic risk between AJ and NJ. Associated variants with estimated contribution greater than or equal to 0.01, computed as 2 log(odds ratio) (AJ frequency—NJ frequency), assuming additive effects on the log scale, are highlighted in green. Associated variants with estimated contribution less than or equal to -0.01 are highlighted in purple. Forward slashes represent a break in variants highlighted.