| Literature DB >> 28122970 |
Antoine Fraimout1, Vincent Debat1, Simon Fellous2, Ruth A Hufbauer2,3, Julien Foucaud2, Pierre Pudlo4, Jean-Michel Marin5, Donald K Price6, Julien Cattel7, Xiao Chen8, Marindia Deprá9, Pierre François Duyck10, Christelle Guedot11, Marc Kenis12, Masahito T Kimura13, Gregory Loeb14, Anne Loiseau2, Isabel Martinez-Sañudo15, Marta Pascual16, Maxi Polihronakis Richmond17, Peter Shearer18, Nadia Singh19, Koichiro Tamura20, Anne Xuéreb2, Jinping Zhang21, Arnaud Estoup2.
Abstract
Deciphering invasion routes from molecular data is crucial to understanding biological invasions, including identifying bottlenecks in population size and admixture among distinct populations. Here, we unravel the invasion routes of the invasive pest Drosophila suzukii using a multi-locus microsatellite dataset (25 loci on 23 worldwide sampling locations). To do this, we use approximate Bayesian computation (ABC), which has improved the reconstruction of invasion routes, but can be computationally expensive. We use our study to illustrate the use of a new, more efficient, ABC method, ABC random forest (ABC-RF) and compare it to a standard ABC method (ABC-LDA). We find that Japan emerges as the most probable source of the earliest recorded invasion into Hawaii. Southeast China and Hawaii together are the most probable sources of populations in western North America, which then in turn served as sources for those in eastern North America. European populations are genetically more homogeneous than North American populations, and their most probable source is northeast China, with evidence of limited gene flow from the eastern US as well. All introduced populations passed through bottlenecks, and analyses reveal five distinct admixture events. These findings can inform hypotheses concerning how this species evolved between different and independent source and invasive populations. Methodological comparisons indicate that ABC-RF and ABC-LDA show concordant results if ABC-LDA is based on a large number of simulated datasets but that ABC-RF out-performs ABC-LDA when using a comparable and more manageable number of simulated datasets, especially when analyzing complex introduction scenarios.Entities:
Keywords: Drosophila suzukii; approximate Bayesian computation; invasion routes; population genetics; random forest
Mesh:
Year: 2017 PMID: 28122970 PMCID: PMC5400373 DOI: 10.1093/molbev/msx050
Source DB: PubMed Journal: Mol Biol Evol ISSN: 0737-4038 Impact factor: 16.240
Formulation of the Model Choice Analyses that were Carried Out Successively to Reconstruct Invasion Routes of D. suzukii Using ABC.
| Model Choice Analysis | Number of Compared Scenarios | Tackled Question | Potential source genetic group | Focal populations |
|---|---|---|---|---|
| 1a | 3 | What are the origins of western US populations? | Asia, Hawaii | US-Wat |
| 1b | US-Sok | |||
| 1c | US-SD | |||
| 1d | 7 | What are the relations among western US populations and their extra-continental sources? | Asia, Hawaii, US-Wat, US-Sok, US-SD | US-Wat + US-Sok + US-SD |
| 2a | 6 | What are the origins of eastern US populations? | Asia, Hawaii, western US | eastern US |
| 2b | 6 | What are the origins of European populations? | Asia, Hawaii, western US | Europe |
| 3a | 10 | Is there asymmetrical gene flow from Europe to eastern US? | Asia, Hawaii, western US, Europe | eastern US |
| 3b | 10 | Is there asymmetrical gene flow from eastern US to Europe? | Asia, Hawaii, western US, eastern US | Europe |
| 4 | 2 | Does the admixture with eastern US genes in northern Europe result from a secondary Asian introduction? | Asia, Hawaii, western US, eastern US, southern Europe | northern Europe |
| 5a | 21 | What are the origins of the Brazilian population? | Asia, Hawaii, western US, eastern US, southern Europe, northern Europe | Brazil |
| 5b | 21 | What are the origins of La Reunion population? | Asia, Hawaii, western US, eastern US, southern Europe, northern Europe | La Reunion |
Note.— Each numbered analysis is a comparison of a certain number of scenarios by ABC model choice. We summarize each analysis by stating the question that it addressed. A detailed verbal description of each compared scenario is given in supplementary table S2, Supplementary Material online. The 11 analyses are nested in the sense that each subsequent analysis use the result obtained from the previous one. For example, in Analysis 2a “What are the origins of eastern US populations” capitalizes on the history inferred for western US populations from analysis 1d. “Potential source genetic group” indicates all the potential source populations considered in the analyses for which one wants to identify the origin of the focal population (i.e., the target).
Results of Model Choice Analyses Using ABC-RF and ABC-LDA.
| Prior error rate | Posterior Probability of the best Model | Origin of the focal population using either ABC-RF or ABC-LDA (i.e., best model) | ||||||
|---|---|---|---|---|---|---|---|---|
| Analysis | Total Number of Scenarios | Number of Sum. Stats. | ABC-RF (s.d.) | ABC-LDA (large reference table) | ABC-LDA (small reference table) | ABC-RF (s.d.) | ABC-LDA (large reference table) [CI] | |
| 1a | 3 | 39 | 0.100 (± 0.002) | 0.075 | 0.085 | 1.000 (± 0.001) | 1.000 [1.000, 1.000] | Asia + Hawaii |
| 1b | 3 | 0.094 (± 0.001) | 0.070 | 0.091 | 0.989 (± 0.005) | 0.999 [0.999, 0.999] | ||
| 1c | 3 | 0.096 (± 0.001) | 0.079 | 0.087 | 0.998 (± 0.002) | 0.999 [0.999, 0.999] | ||
| 1d | 7 | 130 | 0.327 (± 0.001) | 0.274 | 0.368 | 0.690 (± 0.018) | 0.741 [0.714, 0.767] | Asia + Hawaii |
| 2a | 6 | 130 | 0.232 (± 0.001) | 0.203 | 0.241 | 0.779 (± 0.019) | 0.788 [0.764, 0.813] | Western US |
| 2b | 6 | 130 | 0.232 (± 0.001) | 0.179 | 0.393 | 0.716 (± 0.013) | 0.604 [0.592, 0.616] | Asia |
| 3a | 10 | 204 | 0.328 (± 0.001) | 0.265 | 0.364 | 0.744 (± 0.020) | 0.844 [0.820, 0.868] | Western US |
| 3b | 10 | 204 | 0.408 (± 0.001) | 0.358 | 0.423 | 0.510 (± 0.014) | 0.409 [0.391, 0.426] | Asia |
| 4 | 2 | 301 | 0.121 (± 0.001) | 0.111 | 0.212 | 1.000 (± 0.009) | 0.999 [0.999, 0.999] | Southern Europe + eastern US |
| 5a | 21 | 424 | 0.294 (± 0.001) | 0.230 | 0.423 | 0.631 (± 0.034) | 0.812 [0.788, 0.837] | Western US + eastern US |
| 5b | 21 | 424 | 0.298 (± 0.001) | 0.242 | 0.428 | 0.500 (± 0.027) | 0.290 [0.256, 0.323] | Northern Europe + southern Europe |
Note.— All model choice analyses were carried out using the prior set 1 (supplementary table S8, Supplementary Material online) and a single set of sample sites representative of the pre-defined genetic groups. Sample site JP-Tok for the native Asian group, US-Haw for the Hawaiian group, US-Wat, US-Sok and US-SD for the western US group, US-NC for the eastern US group, IT-Tre for the southern Europe group, GE-Dos for the northern Europe group, BR-PA for the Brazil group, and FR-Reu for the La Réunion group (fig. 1). Datasets were summarized using the whole set of summary statistics proposed by DIYABC (Cornuet et al. 2014). The total number of summary statistics (Number of Sum. Stats.) as well as the total number of compared scenarios (Total number of scenarios) are indicated for each analysis. Prior error rates and posterior probabilities of the best model chosen using ABC-RF were averaged over 10 replicate analyses. ABC-RF and ABC-LDA treatments yielded the same best model choice for all analyses and is denoted in the column “Origin of the focal population using either ABC-RF or ABC-LDA (i.e., best model)”. Admixture between two source populations are represented by a “+” sign. ABC-LDA posterior probabilities of the best models were estimated using “large reference tables” with 500,000 simulated datasets per scenario. ABC-LDA prior error rates were computed using reference tables of two different sizes: “large reference table” (i.e., 500,000 simulated datasets per scenario) and “small reference table” (i.e., 10,000 simulated datasets per scenario as for ABC-RF analyses). S.D. stands for standard deviation over 10 replicate analyses and CI for 95% confidence interval computed following Cornuet et al. (2008).
FWorldwide invasion scenario of D. suzukii inferred from microsatellite data and date of first observation. Map and schematic showing sample sites and the invasion routes taken by D. suzukii, as reconstructed by ABC-RF (Pudlo et al. 2016) on a total of 685 individuals from 23 geographic locations genotyped at 25 microsatellite loci (see results and methods for details). The native range is in dark grey, and the invasive range is in light-gray (cf. delimitation from fig. 1 in Asplen et al. 2015). The year in which D. suzukii was first observed at each sample site is indicated in italics. The 23 geographical locations that were sampled are represented by circles (native range), and squares, diamonds and triangles (introduced range). Squares indicate populations that experienced weak bottlenecks (i.e., median value of bottleneck severity < 0.12, see main text and table 3), diamonds indicate moderate bottlenecks (0.12 < bottleneck severity < 0.22) and triangles indicate strong bottlenecks (i.e., bottleneck severity > 0.3). The colors of the symbol for the sample sites and arrows between them correspond to the different genetic groups obtained using the clustering method BAPS (supplementary appendix S1, Supplementary Material online). The arrows indicate the most probable invasion pathways. A1–A5 indicate five separate admixture events between different sources. O1–O3 indicate the most probable sources within the native range for the primary introduction events. A1 = Hawaii + southeast China; A2 = Watsonville (western US) + Hawaii; A3 = southern Europe + eastern US; A4 = western US + eastern US; A5 = southern Europe + northern Europe. O1 = Japan; O2 = southeast China; O3 = northeast China.
Bottleneck Severity in Invasive Populations of D. suzukii.
| Prior 1 | Prior 2 | ||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|
| Introduction Type | Sample Site | Mean | Median | Mode | q5% | q95% | Mean | Median | Mode | q5% | q95% |
| Extra continental | US-Haw | 0.517 | 0.500 | 0.495 | 0.326 | 0.755 | 0.490 | 0.484 | 0.477 | 0.382 | 0.615 |
| US-Wat | 0.181 | 0.138 | 0.114 | 0.041 | 0.448 | 0.143 | 0.127 | 0.118 | 0.059 | 0.275 | |
| IT-Tre | 0.268 | 0.179 | 0.171 | 0.059 | 0.837 | 0.236 | 0.189 | 0.172 | 0.101 | 0.553 | |
| BR-PA | 0.158 | 0.126 | 0.104 | 0.020 | 0.407 | 0.208 | 0.193 | 0.172 | 0.067 | 0.399 | |
| FR-Reu | 0.259 | 0.215 | 0.186 | 0.051 | 0.626 | 0.245 | 0.214 | 0.190 | 0.069 | 0.534 | |
| Intra-continental | US-SD | 0.135 | 0.117 | 0.106 | 0.039 | 0.283 | 0.117 | 0.104 | 0.091 | 0.053 | 0.224 |
| US-NC | 0.089 | 0.067 | 0.061 | 0.015 | 0.230 | 0.111 | 0.098 | 0.090 | 0.046 | 0.218 | |
| Extra + intra continental | US-Sok | 0.116 | 0.099 | 0.088 | 0.027 | 0.250 | 0.119 | 0.106 | 0.099 | 0.052 | 0.229 |
| GE-Dos | 0.189 | 0.177 | 0.155 | 0.061 | 0.356 | 0.205 | 0.189 | 0.171 | 0.088 | 0.372 | |
| Prior values | 0.628 | 0.199 | NA | 0.021 | 1.904 | 0.517 | 0.500 | NA | 0.326 | 0.754 |
Note.— Extra-continental introductions correspond to a long distance introduction from a source located apart from the continent of the focal population, intra-continental introduction corresponds to an introduction event from a source located on the same continent than the focal population, and Extra + Intra continental introduction corresponds to a combination of the two types of sources. Mean, median and mode estimates as well as bounds of 90% credibility intervals (q5% and q95%), are indicated for each bottleneck severity parameter. We roughly classified the estimated bottleneck severity values into three classes (represented here by the three shades of gray): weak (i.e., median value of bottleneck severity < 0.12, in light gray), moderate (0.12 < bottleneck severity < 0.22, in gray), and strong (i.e., bottleneck severity > 0.3 in dark gray). The set of sample sites used for the ABC estimations presented here include: (i) for the native area: Japan (JP-Tok + JP-Sap), South-East China (CN-Nin) and North-East China (CN-Lan + CN-Lia), and (ii) for the invaded range: US-Wat, US-Sok and US-SD for western US, US-NC for eastern US, IT-Tre for southern Europe, GE-Dos for northern Europe, BR-PA for South-America (Brazil) and FR-Reu for La Réunion island. Code names of the sample sites are the same as in fig. 1, and supplementary table S1, Supplementary Material online in which bottleneck severity classes are also given for each sample site. See supplementary table S6, Supplementary Material online for results on a different set of representative sample sites.
FAdmixed or non-admixed origin of D. suzukii in Europe inferred for each sample sites. Potential source populations include (among others) the Asian native range (represented by the Japanese sample site JP-Tok) and the invasive genetic group from eastern US represented by one of the four invasive sample sites collected in this area (fig. 1). Four replicate independent ABC-RF treatments corresponding to the analysis 3b (table 2) were hence carried out for each targeted European sample site using one of the four eastern US sample site. The treatments labeled 1, 2, 3, and 4 in the pies of the figure have been carried out with the sample sites US-NC, US-Wis, US-Gen, and US-Col, respectively. A pie quarter in blue indicates that the best scenario corresponds to a single introduction event from Asia. A pie quarter in red indicates that the best scenario corresponds to an admixture event between Asia and eastern US. Dates in italic correspond to the dates of first record of the European sample sites.
Posterior Distributions of Admixture Rates for the Five Admixture Events Inferred for the Final Worldwide Invasion Scenario Described in figure 1.
| Admixture Event | Admixture Rate (gene fraction from pop x) | Mean | Median | Mode | q5% | q95% | |
|---|---|---|---|---|---|---|---|
| Prior 1 | A1 (US-Wat = US-Haw + CN-Nin) | 0.759 | 0.761 | 0.774 | 0.659 | 0.854 | |
| A2 (US-Sok = US-Haw + US-Wat) | 0.237 | 0.230 | 0.227 | 0.092 | 0.400 | ||
| A3 (DE-Dos = IT-Tre + US-NC) | 0.286 | 0.278 | 0.266 | 0.112 | 0.481 | ||
| A4 (BR-PA = US-NC + US-SD) | 0.467 | 0.461 | 0.442 | 0.187 | 0.754 | ||
| A5 (FR-Reu = IT-Tre + GE-Dos) | 0.388 | 0.380 | 0.426 | 0.132 | 0.670 | ||
| Prior 2 | A1 (US-Wat = US-Haw + CN-Nin) | 0.761 | 0.762 | 0.766 | 0.684 | 0.833 | |
| A2 (US-Sok = US-Haw + US-Wat) | 0.224 | 0.219 | 0.199 | 0.114 | 0.350 | ||
| A3 (DE-Dos = IT-Tre + US-NC) | 0.312 | 0.306 | 0.284 | 0.152 | 0.487 | ||
| A4 (BR-PA = US-NC + US-SD) | 0.422 | 0.418 | 0.429 | 0.247 | 0.613 | ||
| A5 (FR-Reu = IT-Tre + GE-Dos) | 0.370 | 0.368 | 0.356 | 0.178 | 0.569 |
Note.— Admixture events are denoted as in figure 1. Each admixture rate parameter r points to the name of the admixed site and corresponds to the fraction of genes originating from the source site in parentheses (1 − r genes originate from the other source site). Mean, median, and mode estimates as well as bounds of 90% credibility intervals (q5% and q95%) are indicated for each admixture parameter. Estimations assuming the prior set 1 and the prior set 2 (supplementary table S8, Supplementary Material online) are provided. The set of representative sample sites used for the ABC estimations presented here is the same than in table 3. Code names of the population sites are the same as in fig. 1 and supplementary table S1, Supplementary Material online. See supplementary table S7, Supplementary Material online for results on a different set of representative sample sites.