Anders Eriksson1, Andrea Manica2. 1. Evolutionary Ecology Group, Department of Zoology, University of Cambridge, Cambridge, United KingdomIntegrative Systems Biology Laboratory, King Abdullah University of Science and Technology (KAUST), Thuwal 23955-6900, Kingdom of Saudi Arabia aje44@cam.ac.uk am315@cam.ac.uk. 2. Evolutionary Ecology Group, Department of Zoology, University of Cambridge, Cambridge, United Kingdom aje44@cam.ac.uk am315@cam.ac.uk.
Abstract
Distinguishing between hybridization and population structure in the ancestral species is a key challenge in our understanding of how permeable species boundaries are to gene flow. The doubly conditioned frequency spectrum (dcfs) has been argued to be a powerful metric to discriminate between these two explanations, and it was used to argue for hybridization between Neandertal and anatomically modern humans. The shape of the observed dcfs for these two species cannot be reproduced by a model that represents ancient population structure in Africa with two populations, while adding hybridization produces realistic shapes. In this letter, we show that this result is a consequence of the spatial coarseness of the demographic model and that a spatially structured stepping stone model can generate realistic dcfs without hybridization. This result highlights how inferences on hybridization between recently diverged species can be strongly affected by the choice of how population structure is represented in the underlying demographic model. We also conclude that the dcfs has limited power in distinguishing between the signals left by hybridization and ancient structure.
Distinguishing between hybridization and population structure in the ancestral species is a key challenge in our understanding of how permeable species boundaries are to gene flow. The doubly conditioned frequency spectrum (dcfs) has been argued to be a powerful metric to discriminate between these two explanations, and it was used to argue for hybridization between Neandertal and anatomically modern humans. The shape of the observed dcfs for these two species cannot be reproduced by a model that represents ancient population structure in Africa with two populations, while adding hybridization produces realistic shapes. In this letter, we show that this result is a consequence of the spatial coarseness of the demographic model and that a spatially structured stepping stone model can generate realistic dcfs without hybridization. This result highlights how inferences on hybridization between recently diverged species can be strongly affected by the choice of how population structure is represented in the underlying demographic model. We also conclude that the dcfs has limited power in distinguishing between the signals left by hybridization and ancient structure.
Hybridization between different species can play a major role in evolution, both by bringing novel adaptations into species as well as by acting as a barrier to their divergence (Seehausen 2004; Abbott et al. 2013). However, detecting hybridization from genetic data can be challenging, as it requires distinguishing actual gene flow after the species split from shared variation that was present in the ancestral species (Abbott et al. 2013; Smith and Kronforst 2013; Sousa and Hey 2013). This problem is particularly challenging when considering hybridization among recently diverged species, where past population structure in the ancestral species can leave genetic signatures that are almost identical to those left by hybridization (Green et al. 2010; Eriksson and Manica 2012; Lowery et al. 2013).The challenges of distinguishing between actual hybridization and ancient population structure have been highlighted by the recent publication of Neandertal genomes (Green et al. 2010; Prüfer et al. 2013). The main finding coming out of the first analysis of the draft sequence of the Neandertal genome (Green et al. 2010) was that populations of anatomically modern humans (AMHs) differed in genetic similarity to Neandertal. Specifically, modern Europeans and Asians were significantly more genetically similar to this hominin than Africans (Green et al. 2010). Patterson’s D statistics (SOM 15 in Green et al. 2010) is arguably the best-known approach to quantify this pattern. This statistics is based on a panel of four individuals and focuses on biallelic sites where either the Eurasian or the African match the Neandertal (but not both) and where the Neandertal is different from the chimp. D is calculated as the fraction of such sites where the Eurasian genome matches the Neandertal minus the fraction where the African genome matches Neandertal. In a simple four-population model without hybridization, we expect Eurasian and African genomes to have the same probability of matching the Neandertal through incomplete lineage sorting, but hybridization between Neandertal and one of the modern human populations would give rise to an unbalance (Green et al. 2010). An analysis using Patterson’s D revealed that the observed values for Neandertal were more extreme than expected by chance and were taken as evidence for hybridization (Green et al. 2010). This test has been used in a number of other taxa, such as primates (Prüfer et al. 2012), flycatchers (Rheindt et al. 2013), and Heliconius butterflies (Martin et al. 2013). However, a problem in interpreting Patterson’s D is that ancestral population structure can produce patterns undistinguishable from hybridization (Durand et al. 2011). In the case of Neandertal, a spatially structured model with realistic demographic parameters can produce D values identical to the ones measured from real genomes, even in the absence of hybridization (Eriksson and Manica 2012).In an attempt to increase the power to detect hybridization, Yang et al. (2012) focused on the frequency distribution of Neandertal alleles in Eurasian populations at biallelic loci where Neandertal differ from the chimpanzee reference genome and modern-day Africans have the chimp allele. These loci have been called “doubly conditioned,” as they need to have the same allele in a modern African genome and the chimp genome (first condition) but to differ between chimp and Neandertal genomes (second condition; see fig. 1a for a schematic representation). Such loci should, in principle, be enriched for mutations that occurred in the Neandertal line and subsequently entered the human line through hybridization, and their relative frequency (the doubly conditioned frequency spectrum, dcfs, shown in fig. 1b) should be an informative measure of the strength of hybridization. Yang et al. (2012) showed that a population genetics model that represents ancient structure in Africa with two populations (see fig. 2a and b for a graphical representation of this model) predicts a deficit of rare doubly conditioned alleles (e.g., of frequency one in the sample) compared to the frequencies estimated from real data. Adding hybridization to such a model, however, restored the appropriate shape of the doubly conditioned allele frequency spectrum. Thus, the dcfs seems to be an informative metric to distinguish between hybridization and ancient population structure, and this result has been taken as a confirmation of hybridization between Neandertal and AMHs (e.g., Sankararaman et al. 2012).
Fig. 1.
(a) A schematic representation on how the sample frequency of the Neandertal allele of a doubly conditioned locus is calculated. A locus is doubly conditioned if chimp and Neandertal have different alleles (shown in blue and red, respectively), and the ancestral chimp (blue) allele is found in Africa. The frequency of the Neandertal (red) allele is then estimated in the Eurasian panel: in this example, the frequency is 3. (b) Observed dcfs (the dcfs depicts the relative abundance of doubly conditioned loci with different derived allele frequencies), as estimated by Yang et al. (2012). Photographs from Wikipedia Commons, taken by T. Lersch, T. Evanson, W. Warby, Dyor, P. Neo, J. Montrasio, Y. Picq, and Fae.
Fig. 2.
(a) Schematic representation of the “two-population model” in tree format. The ancestor of Neandertal and AMHs is structured into two populations. Neandertal splits from one of these two populations. The two populations keep exchanging migrants as they become AMHs, until that exchange decreases (but does not stop) when one population (the descendant of the parent population of Neandertal) leaves Africa to colonize Eurasia. (b) Block representation of the “two-population model,” where each block represents a population. (c) Schematic representation of the spatially structured model used in our analysis. The ancestor of Neandertal and AMHs is represented by a chain of interconnected populations with migration rate m0 (rather than just two as in the other model). The chain is separated into two when Neandertal speciates 320 kya, without any change in demographic parameters. Eventually, the African range becomes AMH at tmodern, when its demography changes and the migration rate becomes m. At texit, AMHs expand into Eurasia from the demes that were closest to the Neandertal range (note that the separation between Africa and Eurasia is generated by the range expansion and not by a change in migration rates, which stay at m throughout the AMH range).
(a) A schematic representation on how the sample frequency of the Neandertal allele of a doubly conditioned locus is calculated. A locus is doubly conditioned if chimp and Neandertal have different alleles (shown in blue and red, respectively), and the ancestral chimp (blue) allele is found in Africa. The frequency of the Neandertal (red) allele is then estimated in the Eurasian panel: in this example, the frequency is 3. (b) Observed dcfs (the dcfs depicts the relative abundance of doubly conditioned loci with different derived allele frequencies), as estimated by Yang et al. (2012). Photographs from Wikipedia Commons, taken by T. Lersch, T. Evanson, W. Warby, Dyor, P. Neo, J. Montrasio, Y. Picq, and Fae.(a) Schematic representation of the “two-population model” in tree format. The ancestor of Neandertal and AMHs is structured into two populations. Neandertal splits from one of these two populations. The two populations keep exchanging migrants as they become AMHs, until that exchange decreases (but does not stop) when one population (the descendant of the parent population of Neandertal) leaves Africa to colonize Eurasia. (b) Block representation of the “two-population model,” where each block represents a population. (c) Schematic representation of the spatially structured model used in our analysis. The ancestor of Neandertal and AMHs is represented by a chain of interconnected populations with migration rate m0 (rather than just two as in the other model). The chain is separated into two when Neandertal speciates 320 kya, without any change in demographic parameters. Eventually, the African range becomes AMH at tmodern, when its demography changes and the migration rate becomes m. At texit, AMHs expand into Eurasia from the demes that were closest to the Neandertal range (note that the separation between Africa and Eurasia is generated by the range expansion and not by a change in migration rates, which stay at m throughout the AMH range).However, it remains to be determined whether the dcfs can distinguish between hybridization and ancient structure when a spatially structured model with multiple populations is used instead of Yang et al.’s representation of ancient structure in the whole Africa continent with only two populations. Such spatially structured models better capture the global genetic clines in within-population genetic diversity observed in AMHs (Prugnolle et al. 2005; Ramachandran et al. 2005). Here we use the same spatially structured stepping stone model as previously presented in Eriksson and Manica (2012) to explore the properties of the dcfs with a fine-scale representation of ancient structure (fig. 2c, see supplementary material S1, Supplementary Material online, for details). Realistic demographic parameters were obtained by fitting the stepping stone to match worldwide patterns of spatial differentiation among modern populations and were further subsetted to focus on parameter combinations that predicted D between Africans and Europeans to be within 0.0020 U of the observed value 0.0457. This simple spatial model, which does not include any hybridization, predicts frequency spectra of doubly conditioned alleles (the dcfs) that are in line with observed values (gray lines and shaded ranges in fig. 3a), matching closely the empirical proportion of rare alleles (giving R2 = 99.2% for the best fit). Some demographic parameter combinations give rise to a slight excess of very common alleles, but there are a large number of combinations that fit the observed dcfs almost perfectly (ten examples are shown as lines in fig. 3a, gray lines; see SOM for details). This spatially explicit model (which has eight free parameters) provides a fit that is comparable (R2 = 99.2 vs. R2 = 99.7%) to the admixture model in Yang et al (2012) (which has nine free parameters; blue line in fig. 3a). It is also considerably better than the best model fit for ancient population structure presented in Yang et al (2012), which has an R2 = 93.7% (green line in fig. 3a). The large proportion of rare doubly conditioned alleles in our spatially structured model is a consequence of deep splits in gene genealogies, with old, relatively rare lineages being preserved by the fine-grained spatial structure in the model (fig. 3b). In other words, the presence of multiple (spatially structured) populations within Africa prevents lineages from coalescing too quickly, thereby allowing for a few European lineages to merge back with Neandertal before meeting any African lineage. In many cases, such lineages are only represented by one or two individuals, giving an excess of rare doubly conditioned loci.
F
(a) Doubly conditioned frequency spectrum of Neandertal alleles in five Europeans. Circles represent the empirical dcfs observed in the data by Yang et al (2012), and the colored bars show the distribution predicted by our spatially structured model of ancient population structure. The shaded lines show predictions for ten different parameter combinations among the good fits. For comparison, we show Yang et al.’s best model of ancient population structure (green line) and admixture (blue line). In contrast to simple demographic models, our spatial model correctly captures the relative abundance of rare alleles (frequencies of 1 and 2 in the sample). (b) Schematic representation of how spatial structure occasionally prevents a Eurasian lineage (in red) from coalescing back with other Eurasian and Africa lineages (in blue), generating a rare doubly conditioned locus. The key mutation generating the Neandertal-like allele is highlighted by a red star. Note that time on the Neandertal branch was compressed to make room for the out-of-Africa expansion.
(a) Doubly conditioned frequency spectrum of Neandertal alleles in five Europeans. Circles represent the empirical dcfs observed in the data by Yang et al (2012), and the colored bars show the distribution predicted by our spatially structured model of ancient population structure. The shaded lines show predictions for ten different parameter combinations among the good fits. For comparison, we show Yang et al.’s best model of ancient population structure (green line) and admixture (blue line). In contrast to simple demographic models, our spatial model correctly captures the relative abundance of rare alleles (frequencies of 1 and 2 in the sample). (b) Schematic representation of how spatial structure occasionally prevents a Eurasian lineage (in red) from coalescing back with other Eurasian and Africa lineages (in blue), generating a rare doubly conditioned locus. The key mutation generating the Neandertal-like allele is highlighted by a red star. Note that time on the Neandertal branch was compressed to make room for the out-of-Africa expansion.It is beyond the scope of this short letter to provide a formal test for alternative hybridization scenarios with Neandertal. Population structure affects a number of aspects of the similarities between Eurasians and Neandertal. For example, the degree of matching between ancient and derived SNPs in candidate regions for hybridization (SOM 17 in Green et al. [2010]) can be reproduced by a spatial model analogous to the one presented in this letter, without any hybridization (Eriksson and Manica 2012). A number of studies, including the first analyses of two new Neandertal genomes (Prüfer et al. 2013), provides an intricate picture of possible hybridization events among a number of hominins. Possibly, the clearest analysis pointing to hybridization is the dating of the Neandertal gene flow into modern humans based on linkage disequilibrium patterns (Sankararaman et al. 2012). However, such dates are based on the same demographic representation used in Yang et al. (2012). Thus, it will be interesting to see whether linkage disequilibrium patterns are affected by different spatial representations of population structure or not.In general, the very different results obtained by a model that represents genetic structure in Africa with two populations (Yang et al. 2012) versus our spatially structured model highlight the importance of the coarseness at which space is described. When investigating hybridization, especially in the case of recently diverged species, metrics have been devised to focus the power of the analysis on the key signals that would be expected from hybridization. However, spatial structuring of populations can easily mimic such signals. No matter how sophisticated the metrics are, the properties of different demographic models should be explored, in particular how robust the analysis is to the spatial scale of demographic processes.
Authors: Sohini Ramachandran; Omkar Deshpande; Charles C Roseman; Noah A Rosenberg; Marcus W Feldman; L Luca Cavalli-Sforza Journal: Proc Natl Acad Sci U S A Date: 2005-10-21 Impact factor: 11.205
Authors: Johannes Krause; Adrian W Briggs; Tomislav Maricic; Udo Stenzel; Martin Kircher; Nick Patterson; Richard E Green; Heng Li; Weiwei Zhai; Markus Hsi-Yang Fritz; Nancy F Hansen; Eric Y Durand; Anna-Sapfo Malaspinas; Jeffrey D Jensen; Tomas Marques-Bonet; Can Alkan; Kay Prüfer; Matthias Meyer; Hernán A Burbano; Jeffrey M Good; Rigo Schultz; Ayinuer Aximu-Petri; Anne Butthof; Barbara Höber; Barbara Höffner; Madlen Siegemund; Antje Weihmann; Chad Nusbaum; Eric S Lander; Carsten Russ; Nathaniel Novod; Jason Affourtit; Michael Egholm; Christine Verna; Pavao Rudan; Dejana Brajkovic; Željko Kucan; Ivan Gušic; Vladimir B Doronichev; Liubov V Golovanova; Carles Lalueza-Fox; Marco de la Rasilla; Javier Fortea; Antonio Rosas; Ralf W Schmitz; Philip L F Johnson; Evan E Eichler; Daniel Falush; Ewan Birney; James C Mullikin; Montgomery Slatkin; Rasmus Nielsen; Janet Kelso; Michael Lachmann; David Reich; Svante Pääbo Journal: Science Date: 2010-05-07 Impact factor: 47.728
Authors: Robert K Lowery; Gabriel Uribe; Eric B Jimenez; Mark A Weiss; Kristian J Herrera; Maria Regueiro; Rene J Herrera Journal: Gene Date: 2013-07-19 Impact factor: 3.688
Authors: Simon H Martin; Kanchon K Dasmahapatra; Nicola J Nadeau; Camilo Salazar; James R Walters; Fraser Simpson; Mark Blaxter; Andrea Manica; James Mallet; Chris D Jiggins Journal: Genome Res Date: 2013-09-17 Impact factor: 9.043
Authors: Kay Prüfer; Fernando Racimo; Nick Patterson; Flora Jay; Sriram Sankararaman; Susanna Sawyer; Anja Heinze; Gabriel Renaud; Peter H Sudmant; Cesare de Filippo; Heng Li; Swapan Mallick; Michael Dannemann; Qiaomei Fu; Martin Kircher; Martin Kuhlwilm; Michael Lachmann; Matthias Meyer; Matthias Ongyerth; Michael Siebauer; Christoph Theunert; Arti Tandon; Priya Moorjani; Joseph Pickrell; James C Mullikin; Samuel H Vohr; Richard E Green; Ines Hellmann; Philip L F Johnson; Hélène Blanche; Howard Cann; Jacob O Kitzman; Jay Shendure; Evan E Eichler; Ed S Lein; Trygve E Bakken; Liubov V Golovanova; Vladimir B Doronichev; Michael V Shunkov; Anatoli P Derevianko; Bence Viola; Montgomery Slatkin; David Reich; Janet Kelso; Svante Pääbo Journal: Nature Date: 2013-12-18 Impact factor: 49.962
Authors: Clio Der Sarkissian; Morten E Allentoft; María C Ávila-Arcos; Ross Barnett; Paula F Campos; Enrico Cappellini; Luca Ermini; Ruth Fernández; Rute da Fonseca; Aurélien Ginolhac; Anders J Hansen; Hákon Jónsson; Thorfinn Korneliussen; Ashot Margaryan; Michael D Martin; J Víctor Moreno-Mayar; Maanasa Raghavan; Morten Rasmussen; Marcela Sandoval Velasco; Hannes Schroeder; Mikkel Schubert; Andaine Seguin-Orlando; Nathan Wales; M Thomas P Gilbert; Eske Willerslev; Ludovic Orlando Journal: Philos Trans R Soc Lond B Biol Sci Date: 2015-01-19 Impact factor: 6.237
Authors: Deepti Gurdasani; Tommy Carstensen; Segun Fatumo; Guanjie Chen; Chris S Franklin; Javier Prado-Martinez; Heleen Bouman; Federico Abascal; Marc Haber; Ioanna Tachmazidou; Iain Mathieson; Kenneth Ekoru; Marianne K DeGorter; Rebecca N Nsubuga; Chris Finan; Eleanor Wheeler; Li Chen; David N Cooper; Stephan Schiffels; Yuan Chen; Graham R S Ritchie; Martin O Pollard; Mary D Fortune; Alex J Mentzer; Erik Garrison; Anders Bergström; Konstantinos Hatzikotoulas; Adebowale Adeyemo; Ayo Doumatey; Heather Elding; Louise V Wain; Georg Ehret; Paul L Auer; Charles L Kooperberg; Alexander P Reiner; Nora Franceschini; Dermot Maher; Stephen B Montgomery; Carl Kadie; Chris Widmer; Yali Xue; Janet Seeley; Gershim Asiki; Anatoli Kamali; Elizabeth H Young; Cristina Pomilla; Nicole Soranzo; Eleftheria Zeggini; Fraser Pirie; Andrew P Morris; David Heckerman; Chris Tyler-Smith; Ayesha A Motala; Charles Rotimi; Pontiano Kaleebu; Inês Barroso; Manj S Sandhu Journal: Cell Date: 2019-10-31 Impact factor: 41.582