| Literature DB >> 25203624 |
Shai Carmi1, Ken Y Hui2, Ethan Kochav1, Xinmin Liu3, James Xue1, Fillan Grady1, Saurav Guha4, Kinnari Upadhyay5, Dan Ben-Avraham6, Semanti Mukherjee7, B Monica Bowen2, Tinu Thomas8, Joseph Vijai8, Marc Cruts9, Guy Froyen10, Diether Lambrechts11, Stéphane Plaisance12, Christine Van Broeckhoven9, Philip Van Damme13, Herwig Van Marck12, Nir Barzilai6, Ariel Darvasi14, Kenneth Offit8, Susan Bressman15, Laurie J Ozelius16, Inga Peter16, Judy H Cho2, Harry Ostrer17, Gil Atzmon6, Lorraine N Clark18, Todd Lencz19, Itsik Pe'er20.
Abstract
The Ashkenazi Jewish (AJ) population is a genetic isolate close to European and Middle Eastern groups, with genetic diversity patterns conducive to disease mapping. Here we report high-depth sequencing of 128 complete genomes of AJ controls. Compared with European samples, our AJ panel has 47% more novel variants per genome and is eightfold more effective at filtering benign variants out of AJ clinical genomes. Our panel improves imputation accuracy for AJ SNP arrays by 28%, and covers at least one haplotype in ≈ 67% of any AJ genome with long, identical-by-descent segments. Reconstruction of recent AJ history from such segments confirms a recent bottleneck of merely ≈ 350 individuals. Modelling of ancient histories for AJ and European populations using their joint allele frequency spectrum determines AJ to be an even admixture of European and likely Middle Eastern origins. We date the split between the two ancestral populations to ≈ 12-25 Kyr, suggesting a predominantly Near Eastern source for the repopulation of Europe after the Last Glacial Maximum.Entities:
Mesh:
Year: 2014 PMID: 25203624 PMCID: PMC4164776 DOI: 10.1038/ncomms5835
Source DB: PubMed Journal: Nat Commun ISSN: 2041-1723 Impact factor: 14.919
Figure 1Novel variants discovered in Ashkenazi Jewish and Flemish genomes.
(a) Variant counts (all and heterozygous; left) and fraction novel (right) per genome in the Ashkenazi Jewish (AJ) and Flemish (FL) cohorts (corresponding to about ≈80% of the raw variants remaining after QC and cohort merging; Supplementary Note 2; error bars represent s.d.). (b) Efficiency of filtering all novel variants detected in an AJ personal genome, measured by counting those that remain new after filtering such a genome against either FL or AJ panels of a matched size (n=26) or our complete AJ panel (n=127). Left: all novel variants; right: non-synonymous novel variants. Error bars represent s.d. (c) The number of newly discovered segregating sites in AJ and FL versus the number of already sequenced individuals in each cohort (markers). Dashed and solid lines are expectations based on either a constant size or a bottleneck and growth model (bn/growth), respectively, fitted to each population separately (Supplementary Note 3). The inset magnifies the region (0, 10).
Figure 2Utility of the AJ reference panel in IBD-based and traditional imputation.
(a) The distribution, over all pairs of individuals, of the fraction of the genome shared IBD (segment lengths >3 cM) either within AJ, within FL or between AJ and FL. (b) The average fraction of a genome (in AJ and CEU) where at least one haplotype is covered by segments shared with a population-matched panel. Data points (markers) were fit to (lines), where c is the average coverage and n is the number of individuals in the panel (Supplementary Note 4). (c) The aggregate r2 (over the AJ study genomes) between the true and the imputed dosages versus the minor allele frequency, when imputing an AJ genome using a reference panel consisting of either AJ or CEU genomes.
Figure 3The AFS and the lengths of shared segments.
(a) The (normalized) minor allele frequency spectrum in AJ and FL, shown as counts in subsets of n=25 genomes in each cohort. The green line corresponds to the expectation in a constant-size population (Wright–Fisher), and bars represent deviations in AJ and FL. The inset shows the spectra of alleles private to each population. (b) A heat map of the joint (minor) allele frequency spectrum of AJ and FL (lower left triangle) compared with the expected joint AFS, had population labels been random (upper right triangle)33. (c) The average fraction of the genome found in shared segments versus the segment length (AJ only; circles), along with the best fit to a recent bottleneck and growth model (solid blue line; Fig. 4) and the expectation in a constant-size population with the same total sharing (dashed green line).
Figure 4A reconstruction of the AJ and FL demographic history.
The upper part of the diagram shows the reconstruction of the ancient history by fitting the joint AFS (Fig. 3b) using ∂a∂i26 and using a mutation rate of 1.44 × 10−8 per generation per bp. The lower diagram shows the recent AJ history, reconstructed by fitting the IBD length decay pattern (Fig. 3c). The wide arrow represents an admixture event; all effective population sizes (horizontal arrows) are in number of diploid individuals; all times were computed assuming 25 years per generation. Confidence intervals are provided in Supplementary Tables 6 and 7.