| Literature DB >> 27654912 |
Swapan Mallick1,2,3, Heng Li2, Mark Lipson1, Iain Mathieson1, Melissa Gymrek2,4,5,6, Fernando Racimo7, Mengyao Zhao1,2,3, Niru Chennagiri1,2,3, Susanne Nordenfelt1,2,3, Arti Tandon1,2, Pontus Skoglund1,2, Iosif Lazaridis1,2, Sriram Sankararaman1,2, Qiaomei Fu1,2,8, Nadin Rohland1,2, Gabriel Renaud9, Yaniv Erlich6,10,11, Thomas Willems6,12, Carla Gallo13, Jeffrey P Spence14, Yun S Song15,16,17, Giovanni Poletti13, Francois Balloux18, George van Driem19, Peter de Knijff20, Irene Gallego Romero21,22, Aashish R Jha23, Doron M Behar24, Claudio M Bravi25, Cristian Capelli26, Tor Hervig27, Andres Moreno-Estrada28, Olga L Posukh29,30, Elena Balanovska31, Oleg Balanovsky31,32,33, Sena Karachanak-Yankova34, Hovhannes Sahakyan24,35, Draga Toncheva34, Levon Yepiskoposyan35, Chris Tyler-Smith36, Yali Xue36, M Syafiq Abdullah37, Andres Ruiz-Linares38, Cynthia M Beall39, Anna Di Rienzo23, Choongwon Jeong23, Elena B Starikovskaya40, Ene Metspalu24,41, Jüri Parik24, Richard Villems24,41,42, Brenna M Henn43, Ugur Hodoglugil44, Robert Mahley45, Antti Sajantila46, George Stamatoyannopoulos47, Joseph T S Wee48, Rita Khusainova49,50, Elza Khusnutdinova49,50, Sergey Litvinov24,49,50, George Ayodo51, David Comas52, Michael F Hammer53, Toomas Kivisild24,54, William Klitz6, Cheryl A Winkler55, Damian Labuda56, Michael Bamshad57, Lynn B Jorde58, Sarah A Tishkoff59, W Scott Watkins60, Mait Metspalu24, Stanislav Dryomov40,61, Rem Sukernik40,62, Lalji Singh63, Kumarasamy Thangaraj63, Svante Pääbo9, Janet Kelso9, Nick Patterson2, David Reich1,2,3.
Abstract
Here we report the Simons Genome Diversity Project data set: high quality genomes from 300 individuals from 142 diverse populations. These genomes include at least 5.8 million base pairs that are not present in the human reference genome. Our analysis reveals key features of the landscape of human genome variation, including that the rate of accumulation of mutations has accelerated by about 5% in non-Africans compared to Africans since divergence. We show that the ancestors of some pairs of present-day human populations were substantially separated by 100,000 years ago, well before the archaeologically attested onset of behavioural modernity. We also demonstrate that indigenous Australians, New Guineans and Andamanese do not derive substantial ancestry from an early dispersal of modern humans; instead, their modern human ancestry is consistent with coming from the same source as that of other non-Africans.Entities:
Mesh:
Year: 2016 PMID: 27654912 PMCID: PMC5161557 DOI: 10.1038/nature18964
Source DB: PubMed Journal: Nature ISSN: 0028-0836 Impact factor: 49.962
Extended Data Figure 1Heatmap of fraction of heterozygous sites missed in the 1000 Genomes Project
For each sample, we examine all heterozygous sites passing filter level 1, and compute the fraction included as known polymorphisms in the 1000 Genomes Project.
Extended Data Figure 2Worldwide variation in human short tandem repeats
A: Mean STR length is reported as the average of the length difference (in base pairs) from the GRCh37 reference for each genotype. Bubble area scales with the number of calls compared at each point. B: and C: show the first two principal components after performing principal component analysis on tetranucleotide and homopolymer genotypes, respectively. Colors represent the region of origin of each sample. D: Pairwise FST values between populations computed using only SNPs vs. using combined SNP+STR loci. E: Block jackknife standard errors for the SNP vs. SNP+STR FST analysis. The red dashed lines give the best-fit line, described by the formula in red. The black dashed line denotes the diagonal.
Extended Data Figure 3ADMIXTURE analysis
We carried out unsupervised ADMIXTURE 1.23[9,44] analysis over the 300 SGDP individuals in 20 replicates with randomly chosen initial seeds, varying the number of ancestral populations between K=2 and K=12 and using default 5-fold cross-validation (--cv flag). We used genotypes of at least filter level 1, and restricted analysis to sites where at least two individuals carried the variant allele (as singleton variants are non-informative for population clustering). After further filtering sites with at least 99% completeness and performing linkage-disequilibrium based pruning in PLINK 1.9[45,46] with parameters (--indep-pairwise 1000 100 0.2), a total of 482,515 single nucleotide polymorphisms remained. This figure shows the highest likelihood replicate for each value of K. We found that log likelihood monotonically increases with K, while the value K=5 minimizes cross-validation error (not shown). The solution at K=5 corresponds to major continental groups (Sub-Saharan Africans, Oceanians, East Asians, Native Americans, and West Eurasians), but we show the full range of K here as they illustrate finer-scale population structure that may be useful to users of the data.
Extended Data Figure 4Principal component analysis and neighbor joining tree
A: Principal component analysis. B: Neighbor-joining tree based on FST values for all populations with at least two samples.
Figure 1Genetic variation in the SGDP
A: Neighbor-joining tree of relationships based on pairwise divergence. B: Plot of autosomal heterozygosity against the X-to-autosome heterozygosity ratio, showing the reduction in this ratio in non-Africans and Pygmies. C: Estimate of Neanderthal ancestry with a heatmap scale of 0–3%. D: Estimate of Denisovan ancestry with a heatmap scale of 0–0.5% to bring out subtle differences in mainland Eurasia (Oceanian groups with as much as 5% Denisovan ancestry are saturated in bright red).
Figure 2Cross-coalescence rates and effective population sizes for selected population pairs
A–C: Cross-coalescence rates as a function of time in thousands of years ago (kya) estimated using MSMC, with four haplotypes per pair. In each subfigure legend, we give the point estimate of the date at which 25%, 50% and 75% of lineages in the pair of populations have coalesced into a common ancestral population. We generated these plots using data phased with the 1000 Genomes reference panel (method PS1 described in supplementary information section 9), but only show pairs of populations for which the cross-coalescence rates are relatively insensitive to the phasing approach. A: Selected African cross-coalescence rates. B: Central African rainforest hunter-gatherer cross-coalescence rates. C: Ancient non-African cross coalescence rates. D–F: Effective population sizes inferred using PSMC, using one diploid genome per population, for the same populations that we used in A–C.
Figure 3Present-day populations have negligible ancestry from an early dispersal of modern humans out of Africa
Best-fitting admixture graph model of relationships among Australians, New Guineans, Andamanese and other diverse populations. Present-day populations are shown in blue, ancient samples in red, and select inferred ancestral nodes in green. Dotted lines indicate admixture events, all of which involve archaic humans. All f-statistic relationships are accurately fit to within 2.1 standard errors. (Inset) Results of adding putative early dispersal admixture to the graph model for different assumptions about when the early lineage split off. We specify the split time in terms of the genetic drift above the "Non-African" node, with 0.01 units of drift representing on the order of ten thousand years. The (approximate) model likelihood is maximized with zero early dispersal ancestry, and no more than a few percent is consistent with the data.
Fewer accumulated mutations in Africans than in non-Africans.
| All | All X | Lowest B | Highest B | ||||||
|---|---|---|---|---|---|---|---|---|---|
| Population A | Population B | D×100 | Z | D×100 | Z | D×100 | Z | D×100 | Z |
| Khoesan | Oceania | −0.35 | −8.2 | −0.70 | −2.7 | −0.68 | −6.4 | −0.14 | −1.7 |
| Africa | America | −0.33 | −9.4 | −0.73 | −2.8 | −0.65 | −7.3 | −0.18 | −2.6 |
| Khoesan | WestEurasia | −0.30 | −7.5 | −0.68 | −3.1 | −0.63 | −6.3 | −0.17 | −2.1 |
| Africa | Oceania | −0.29 | −8.5 | −0.66 | −3.2 | −0.55 | −6.6 | −0.07 | −1.0 |
| Africa | WestEurasia | −0.25 | −8.5 | −0.66 | −3.1 | −0.49 | −6.4 | −0.11 | −1.8 |
| Khoesan | SouthAsia | −0.24 | −6.0 | −0.56 | −2.7 | −0.61 | −6.3 | −0.11 | −1.4 |
| Africa | EastAsia | −0.20 | −6.6 | −0.65 | −2.5 | −0.42 | −5.2 | −0.10 | −1.5 |
| Africa | CentralAsiaSiberia | −0.20 | −6.2 | −0.55 | −2.2 | −0.48 | −6.3 | −0.05 | −0.7 |
| Pygmy | WestEurasia | −0.19 | −4.8 | −0.46 | −1.4 | −0.43 | −4.6 | −0.04 | −0.5 |
| Africa | SouthAsia | −0.18 | −6.4 | −0.50 | −2.0 | −0.46 | −6.3 | −0.03 | −0.5 |
| CentralAsiaSiberia | Oceania | −0.13 | −3.9 | −0.15 | −0.6 | −0.09 | −1.1 | −0.03 | −0.4 |
| Pygmy | SouthAsia | −0.13 | −3.3 | −0.38 | −1.1 | −0.38 | −4.2 | 0.02 | 0.2 |
| EastAsia | Oceania | −0.13 | −4.1 | 0.00 | 0.0 | −0.17 | −2.1 | 0.04 | 0.6 |
| Khoesan | Pygmy | −0.10 | −2.6 | −0.14 | −0.4 | −0.16 | −1.6 | −0.12 | −1.5 |
| SouthAsia | WestEurasia | −0.08 | −4.3 | −0.20 | −1.2 | −0.05 | −1.0 | −0.10 | −2.7 |
| CentralAsiaSiberia | WestEurasia | −0.06 | −2.2 | −0.16 | −0.8 | −0.01 | −0.2 | −0.09 | −1.6 |
| EastAsia | WestEurasia | −0.06 | −2.1 | −0.00 | −0.0 | −0.08 | −1.0 | −0.02 | −0.3 |
| CentralAsiaSiberia | EastAsia | −0.00 | −0.2 | −0.18 | −1.1 | 0.07 | 1.2 | −0.08 | −1.8 |
| Africa | Pygmy | −0.00 | −0.1 | −0.06 | −0.2 | 0.03 | 0.4 | −0.06 | −0.8 |
| EastAsia | SouthAsia | 0.02 | 0.7 | 0.22 | 1.7 | −0.04 | −0.7 | 0.08 | 1.7 |
| CentralAsiaSiberia | SouthAsia | 0.02 | 0.7 | 0.05 | 0.3 | 0.02 | 0.4 | −0.00 | −0.0 |
| America | Oceania | 0.03 | 0.9 | 0.11 | 0.4 | 0.10 | 1.1 | 0.13 | 1.7 |
| Oceania | WestEurasia | 0.08 | 2.3 | −0.03 | −0.1 | 0.10 | 1.1 | −0.04 | −0.6 |
| Africa | Khoesan | 0.10 | 2.9 | 0.17 | 0.7 | 0.23 | 2.6 | 0.07 | 1.0 |
| America | WestEurasia | 0.11 | 3.6 | 0.11 | 0.4 | 0.19 | 2.2 | 0.08 | 1.3 |
| CentralAsiaSiberia | Pygmy | 0.14 | 3.4 | 0.32 | 0.9 | 0.43 | 4.5 | −0.04 | −0.4 |
| Oceania | SouthAsia | 0.14 | 4.8 | 0.22 | 0.9 | 0.13 | 1.7 | 0.04 | 0.7 |
| EastAsia | Pygmy | 0.15 | 3.6 | 0.49 | 1.4 | 0.37 | 3.9 | 0.04 | 0.5 |
| America | EastAsia | 0.18 | 5.9 | 0.09 | 0.3 | 0.28 | 3.6 | 0.11 | 1.8 |
| America | CentralAsiaSiberia | 0.18 | 6.2 | 0.34 | 1.7 | 0.23 | 2.9 | 0.18 | 3.1 |
| America | SouthAsia | 0.18 | 6.4 | 0.34 | 1.5 | 0.22 | 3.0 | 0.18 | 3.1 |
| Oceania | Pygmy | 0.24 | 5.4 | 0.46 | 1.3 | 0.45 | 4.6 | 0.02 | 0.2 |
| CentralAsiaSiberia | Khoesan | 0.25 | 6.0 | 0.57 | 2.9 | 0.64 | 6.3 | 0.09 | 1.1 |
| EastAsia | Khoesan | 0.25 | 6.2 | 0.68 | 3.2 | 0.59 | 5.9 | 0.14 | 1.7 |
| America | Pygmy | 0.26 | 5.9 | 0.58 | 1.6 | 0.58 | 5.7 | 0.09 | 1.0 |
| America | Khoesan | 0.37 | 8.7 | 0.76 | 3.3 | 0.77 | 7.3 | 0.22 | 2.5 |
We compute a statistic D(Population A, Population B, Chimp), measuring the difference in the rate of matching to chimpanzee in Population A compared to Population B. For all the autosomes, we observe highly significant signals (3.3<|Z|<9.4) of excess mismatching to chimpanzee in non-Africans compared to Africans, using a standard error from a Block Jackknife. We highlight |D|>0.002 in blue, and |Z|>3 in yellow. The deviations from zero are greatest in subsets of the genome where the time since two populations split comprises a relatively larger fraction of the total genetic divergence time between the populations; this is the direction expected from a mutation accumulation change since divergence. Compared to all the autosomes as a baseline, a least squares fit indicate that the deviations are 2.2-times higher on chromosome X, 2.0 times higher in the quintile of lowest B-statistic (closest to functionally important regions), and 0.43 times as high in the quintile of lowest B-statistic (furthest from functional regions).
Extended Data Figure 5Fewer accumulated mutations in Africans than in non-Africans confirmed by mapping to chimpanzee
We compute a statistic D(Population A, Population B, Chimp), measuring the difference in the rate of matching to chimpanzee in Population A compared to Population B. The evidence of mismatching to chimpanzee is seen when we restrict to the male X chromosome to eliminate possible effects due to differences in heterozygosity across populations, and map to the chimpanzee genome which is phylogenetically symmetrically related to all present-day humans. We find that in 78 randomly chosen Population A = African and Population B = non-African pairs of males, transversion substitutions show no consistent skew from zero, but transition substitutions do.
Extended Data Figure 63P-CLR scan for positive selection
The red line denotes the 99.9% quantile cutoff. The genes in the top 5 regions are labeled. A: Scan for selection on the San terminal branch. B: Scan for selection on the non-San terminal branch. C: Scan for selection on the ancestral modern human branch.
Extended Data Figure 7Scan for genomic locations where the great majority of present-day humans share a recent common ancestor
We carried out PSMC analysis on 40 pairs of haploid genomes chosen to sample some of the most deeply divergent present-day human lineages. We recorded the time since the most recent common ancestor (TMRCA) at each position, and rescaled to obtain an estimate of absolute time (Supplementary Information section 12). A: Distribution across the genome of the fraction of TMRCAs below specified date cutoffs. For the 100 kya cutoff, the maximum fraction observed anywhere in the genome is 68%. B: Distribution across the genome of the date T at which specified fractions of sample pairs are inferred to have a TMRCA less than T. C: Percentile points of the cumulative distribution function of B.