| Literature DB >> 30083438 |
Christelle Fraïsse1,2,3, Camille Roux4, Pierre-Alexandre Gagnaire1, Jonathan Romiguier1, Nicolas Faivre1,2, John J Welch2, Nicolas Bierne1,2.
Abstract
Genome-scale diversity data are increasingly available in a variety of biological systems, and can be used to reconstruct the past evolutionary history of species divergence. However, extracting the full demographic information from these data is not trivial, and requires inferential methods that account for the diversity of coalescent histories throughout the genome. Here, we evaluate the potential and limitations of one such approach. We reexamine a well-known system of mussel sister species, using the joint site frequency spectrum (jSFS) of synonymous mutations computed either from exome capture or RNA-seq, in an Approximate Bayesian Computation (ABC) framework. We first assess the best sampling strategy (number of: individuals, loci, and bins in the jSFS), and show that model selection is robust to variation in the number of individuals and loci. In contrast, different binning choices when summarizing the jSFS, strongly affect the results: including classes of low and high frequency shared polymorphisms can more effectively reveal recent migration events. We then take advantage of the flexibility of ABC to compare more realistic models of speciation, including variation in migration rates through time (i.e., periodic connectivity) and across genes (i.e., genome-wide heterogeneity in migration rates). We show that these models were consistently selected as the most probable, suggesting that mussels have experienced a complex history of gene flow during divergence and that the species boundary is semi-permeable. Our work provides a comprehensive evaluation of ABC demographic inference in mussels based on the coding jSFS, and supplies guidelines for employing different sequencing techniques and sampling strategies. We emphasize, perhaps surprisingly, that inferences are less limited by the volume of data, than by the way in which they are analyzed.Entities:
Keywords: Approximate Bayesian Computation; Demographic inferences; Joint site frequency spectrum; Mytilus edulis; Next-generation sequencing
Year: 2018 PMID: 30083438 PMCID: PMC6071616 DOI: 10.7717/peerj.5198
Source DB: PubMed Journal: PeerJ ISSN: 2167-8359 Impact factor: 2.984
Figure 1Models of speciation.
Six classes of scenarios with different temporal patterns of migration are compared (A); and for those including migration, two versions are depicted assuming either homogeneity (“homo”) or heterogeneity (“hetero”) of effective migration rate across the genome (B). All scenarios assume that an ancestral population of effective size NA split Tsplit generations ago into two populations of constant sizes N1 and N2. At the two extremes, divergence occurs in allopatry (SI, strict isolation) or under continuous migration (IM, isolation with migration). Through time, migration occurs at a constant rate M12 from population 1 to population 2 and M21 in the opposite direction. Ancient migration (AM) and periodic ancient migration (PAM) scenarios both assume that populations started diverging in the presence of gene flow. Then they experienced a single period of isolation, Tiso, in the AM model while intermittent gene flow occurred in the PAM model. In the secondary contact (SC) and periodic secondary contact (PSC) scenarios, populations diverged in the absence of gene flow followed by a single period of secondary contact, Tsc, in the SC model while intermittent gene flow occurred in the PSC model.
Sampling design.
| Technique | |||||||||
|---|---|---|---|---|---|---|---|---|---|
| Population | Locality | Population | Locality | Population | Locality | ||||
| Exome capture | North Sea | Wadden Sea, Holland | 4 | Brittany | Roscoff, France | 4 | Europe | Tvärminne, Finland | 4 |
| Bay of Biscay | Lupin/Fouras, France | 4 | Mediterranean Sea | Sète, France | 4 | ||||
| RNA-seq | North Sea | Barfleur, France | 2 | Brittany | Roscoff, France | 2 | USA | Seattle, USA | 1 |
| Bay of Biscay | La Tremblade, France | 2 | Mediterranean Sea | Sète, France | 2 | ||||
Summary statistics (mscalc).
| Technique | S | S_sd | Sf | Sf_sd | Sx1 | Sx1_sd | Sx2 | Sx2_sd | Ss | Ss_sd | ||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Exome capture | 2 | 2 | 516 | 3,993 | 7.738 | 0.322 | 3.124 | 3.3 | 0.992 | |||||
| 4 | 4 | 557 | 5,092 | 9.142 | 0.097 | 3.555 | 3.896 | 1.594 | ||||||
| 8 | 8 | 512 | 5,000 | 9.766 | 0.025 | 3.504 | 4.258 | 1.979 | ||||||
| rna-seq | 2 | 2 | 2,147 | 17,275 | 8.046 | 0.81 | 2.966 | 2.809 | 1.462 | |||||
| 4 | 4 | 1,842 | 17,902 | 9.719 | 0.507 | 3.344 | 3.368 | 2.501 |
Notes:
Technique: rna sequencing (“rna-seq”) vs. exome enrichment sequencing (“exome capture”); n: number of individuals analyzed in each species; nlocus: total number of locus nSNP: total number of polymorphic sites. The following statistics were calculated for each locus. Their average (in black) and standard deviation (in bold) across all loci are given. S: number of polymorphic sites; Sf: number of fixed differences; Sx: number of exclusive polymorphic sites; Ss: number of shared polymorphic sites; π: number of pairwise differences (Tajima, 1983); θ: Watterson’s θ (Watterson, 1975); D: Tajima’s D (Tajima, 1989a, 1989b); FST = 1 − πS/πT: level of species differentiation, where πS is the average pairwise nucleotide diversity within species and πT is the total pairwise nucleotide diversity of the pooled sample across species; div: total inter-specific divergence; netdiv: net molecular divergence measured at synonymous positions.
Figure 2Decomposition of the unfolded joint site frequency spectrum for n = 2 individuals (i.e., four alleles) in each species.
The density of derived alleles in species 1 (M. edulis, x axis) and species 2 (M. galloprovincialis, y axis) is indicated by a number within each cell. Only sites showing two distinct alleles in the inter-specific alignment were considered, hence the cells {0; 0} and {4; 4} have been masked. The total number of polymorphic sites is 3,993 SNPs (“exome capture” data). (A) Decomposition of the jSFS into four classes of polymorphism without an outgroup sequence (i.e., the Wakeley–Hey classes): fixed differences (black), private polymorphisms in species 1 (blue) or species 2 (red) and shared polymorphisms (green). (B) Decomposition of the jSFS into seven classes of polymorphism by using the sequenced outgroup. Two alleles are differentially fixed between the two species: the derived allele can be fixed in species 1 (black) or in species 2 (gray). Exclusive polymorphism can be the result of a recent mutation specific to species 1 (blue) or species 2 (red); but it can also be the result of an ancestral mutation only fixed in species 2 (cyan) or in species 1 (orange). Shared polymorphisms are shown in green. (C) Decomposition of jSFS into 23 classes of polymorphism. Singletons and doubletons in each species were included as new classes. Note that in the case of n = 2, this is the full spectrum.
Posterior probabilities of the speciation models.
| (A) 11 models | (B) Homo vs. hetero for the best model | ||||||||||||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Technique | Statistics | Scenario | PP | BF1/2 | BF1/3 | Scenario | PP | BF1/2 | BF1/3 | Scenario | PP | BF1/2 | BF1/3 | Scenario | PP | BF1/2 | Scenario | PP | BF1/2 | Scenario | PP | BF1/2 | |||
| Exome capture | jsfs = 4 | PAM | Homo | 0.360 | 1.45 | 1.87 | PAM | Homo | 0.340 | 1.33 | 1.70 | PAM | Homo | 0.334 | 1.14 | 2.15 | Homo | 0.557 | 1.26 | Homo | 0.538 | 1.16 | Homo | 0.509 | 1.04 |
| jsfs = 7 | PAM | Homo | 0.309 | 1.26 | 1.34 | PAM | Homo | 0.250 | 1.17 | 1.21 | PAM | Homo | 0.337 | 1.33 | 2.15 | Homo | 0.695 | 2.27 | Homo | 0.519 | 1.08 | Homo | 0.551 | 1.23 | |
| jsfs = 23 | PSC | Hetero | 0.385 | 1.21 | 1.53 | PSC | Hetero | 0.363 | 1.09 | 1.59 | PSC | Hetero | 0.609 | 2.51 | 4.61 | Hetero | 0.990 | 99 | Hetero | 0.982 | 54 | Hetero | 1.000 | NA | |
| rna-seq | jsfs = 4 | PAM | Hetero | 0.227 | 1.14 | 1.60 | PAM | Hetero | 0.180 | 1.08 | 1.14 | – | – | – | – | – | Hetero | 0.556 | 1.25 | Hetero | 0.575 | 1.35 | – | – | – |
| jsfs = 7 | PAM | Hetero | 0.323 | 1.62 | 2.27 | PAM | Hetero | 0.339 | 1.43 | 2.15 | – | – | – | – | – | Hetero | 0.733 | 2.75 | Homo | 0.520 | 1.08 | – | – | – | |
| jsfs = 23 | PSC | Hetero | 0.414 | 1.07 | 2.60 | PSC | Hetero | 0.346 | 1.37 | 1.47 | – | – | – | – | – | Hetero | 0.818 | 4.49 | Hetero | 0.997 | 323.33 | – | – | – | |
Notes:
n: number of individuals analyzed in each species; technique: rna sequencing (“rna-seq”) vs. exome enrichment sequencing (“exome capture”); statistics: jsfs = 4 (four classes), jsfs = 7 (seven classes), jsfs = 23 (23 classes); PP: posterior probability; BF1/2: Bayes factor defined as PPbest model/PPSecond best model; BF1/3: Bayes factor defined as PPbest model/PPThird best model 11 models: SI, IM hetero, IM homo, AM homo, AM hetero, PAM homo, PAM hetero, SC homo, SC hetero, PSC hetero, PSC homo. Colors match Fig. 1.