| Literature DB >> 30151039 |
Dora Henriques1,2, Melanie Parejo3,4, Alain Vignal5, David Wragg6, Andreas Wallberg7, Matthew T Webster7, M Alice Pinto1.
Abstract
The most important managed pollinator, the honeybee (Apis mellifera L.), has been subject to a growing number of threats. In western Europe, one such threat is large-scale introductions of commercial strains (C-lineage ancestry), which is leading to introgressive hybridization and even the local extinction of native honeybee populations (M-lineage ancestry). Here, we developed reduced assays of highly informative SNPs from 176 whole genomes to estimate C-lineage introgression in the most diverse and evolutionarily complex subspecies in Europe, the Iberian honeybee (Apis mellifera iberiensis). We started by evaluating the effects of sample size and sampling a geographically restricted area on the number of highly informative SNPs. We demonstrated that a bias in the number of fixed SNPs (FST = 1) is introduced when the sample size is small (N ≤ 10) and when sampling only captures a small fraction of a population's genetic diversity. These results underscore the importance of having a representative sample when developing reliable reduced SNP assays for organisms with complex genetic patterns. We used a training data set to design four independent SNP assays selected from pairwise FST between the Iberian and C-lineage honeybees. The designed assays, which were validated in holdout and simulated hybrid data sets, proved to be highly accurate and can be readily used for monitoring populations not only in the native range of A. m. iberiensis in Iberia but also in the introduced range in the Balearic islands, Macaronesia and South America, in a time- and cost-effective manner. While our approach used the Iberian honeybee as model system, it has a high value in a wide range of scenarios for the monitoring and conservation of potentially hybridized domestic and wildlife populations.Entities:
Keywords: Apis mellifera iberiensis; fixation index; informative SNPs; reduced SNP assays
Year: 2018 PMID: 30151039 PMCID: PMC6099811 DOI: 10.1111/eva.12623
Source DB: PubMed Journal: Evol Appl ISSN: 1752-4571 Impact factor: 5.183
Figure 1Geographic locations of the 176 whole‐genome sequenced individuals. The Iberian honeybees are distributed across the three transects: Atlantic (AT; N = 31), Central (CT; N = 61) and Mediterranean (MT, N = 25). Each dot represents a single colony and apiary
Figure 2Diagram depicting the different phases of development of the four reduced SNP assays (M1, M2, M3 and M4) using as a baseline whole‐genome sequence data from 117 Apis mellifera iberiensis (IHB) and 59 C‐lineage (C)
Sample sizes of training and holdout data sets for each population
| Population | Training set | Holdout set | Total |
|---|---|---|---|
|
| 88 | 29 | 117 |
| C‐lineage | 44 (23 + 21) | 15 (5 + 10) | 59 (28 + 31) |
| Total | 132 | 44 | 176 |
Population differentiation estimated from average genomewide FST
| Population |
|
| C‐lineage ( |
|---|---|---|---|
|
| 0.540 | 0.549 | 0.532 |
|
| 0.061 |
Fixed SNPs and 95% confidence interval (CI) estimated from random subsets of variable sample size (five replicates each) of Apis mellifera iberiensis and statistics for F ST values estimated from the false‐positive fixed SNPs
| Sample size subset | Mean number of fixed SNPs (±95% CI) | Mean number of false‐positive fixed SNPs | Mean % of false‐positive fixed SNPs with an | Mean minimum |
|---|---|---|---|---|
| 5 | 25,428 (±1,184) | 14,337 | 33.9 | 0.084 |
| 10 | 18,878 (±354) | 7,787 | 14 | 0.334 |
| 25 | 15,700 (±127) | 4,609 | 3.4 | 0.695 |
| 50 | 13,784 (±282) | 2,693 | 0.3 | 0.880 |
| 75 | 12,480 (±306) | 1,389 | 0.1 | 0.942 |
| 100 | 11,736 (±165) | 645 | 0 | 0.970 |
Calculated by subtracting the number of fixed SNPs estimated for each sample size subset from 11,091 fixed SNPs estimated for the complete data set of A. m. iberiensis (N = 117), which displays a minimum F ST = 1.
Calculated by retrieving the F ST values obtained from the complete A. m. iberiensis data set for the false positives and calculating the percentage with a F ST ≤0.95.
Fixed SNPs estimated from geographical subsets of Apis mellifera iberiensis and statistics for F ST values estimated from the false‐positive fixed SNPs
| Geographical subset | Number of fixed SNPs | Number of false‐positive fixed SNPs | % of false‐positive fixed SNPs with an | Minimum |
|---|---|---|---|---|
| PT | 17,738 | 6,647 | 20.2 | 0.275 |
| CT | 15,009 | 3,918 | 13.7 | 0.700 |
| MT | 15,384 | 4,293 | 11.8 | 0.676 |
| IP | 15,371 | 4,280 | 10.4 | 0.763 |
PT, Portugal; CT, Central transect; MT, Mediterranean transect; IP, Iberian Peninsula.
Calculated by subtracting the number of fixed SNPs estimated for each geographical subset from 11,091 fixed SNPs estimated for the complete data set of A. m. iberiensis (N = 117), which displays a minimum F ST = 1.
Calculated by retrieving the F ST values obtained from the complete A. m. iberiensis data set for the false positives and calculating the percentage with a F ST ≤0.95.
Figure 3Chromosome map showing the SNP positions of the four reduced assays (M1–M4)
Performance of the reduced (M1–M4) and random (R1–R4) SNP assays in estimating C‐lineage introgression (Q‐values) of holdout and simulated data sets as compared to the whole‐genome data set
| Assay | # of SNPs | Pearson's | Standard error | Mean error | # Ind error >0.05 | Max error | % Mean accuracy | Precision | Pure classified as hybrid | Hybrid classified as pure |
|---|---|---|---|---|---|---|---|---|---|---|
| (i) | (ii) | (iii) | (iv) | (v) | (vi) | (vii) | (viii) | |||
| M1 | 37 | 0.975 (0.958–0.985) | 0.046 | 0.026 | 12 | 0.189 | 97.42 | 0.043 | 0 | 0 |
| R1 | 37 | 0.949 (0.915–0.970) | 0.069 | 0.043 | 20 | 0.296 | 95.71 | 0.062 | 1 | 3 |
| M2 | 38 | 0.956 (0.927–0.974) | 0.046 | 0.041 | 20 | 0.200 | 95.93 | 0.053 | 1 | 0 |
| R2 | 38 | 0.967 (0.945–0.981) | 0.075 | 0.037 | 20 | 0.192 | 96.34 | 0.047 | 3 | 1 |
| M3 | 40 | 0.978 (0.964–0.987) | 0.048 | 0.028 | 13 | 0.150 | 97.24 | 0.038 | 0 | 0 |
| R3 | 40 | 0.933 (0.888–0.960) | 0.067 | 0.05 | 14 | 0.279 | 95.04 | 0.069 | 1 | 1 |
| M4 | 38 | 0.982 (0.969–0.989) | 0.044 | 0.026 | 13 | 0.137 | 97.41 | 0.036 | 1 | 0 |
| R4 | 38 | 0.925 (0.876–0.955) | 0.062 | 0.053 | 22 | 0.316 | 94.71 | 0.069 | 3 | 1 |
| M3 + M4 | 78 | 0.988 (0.979–0.993) | 0.04 | 0.018 | 9 | 0.139 | 98.18 | 0.030 | 0 | 0 |
| R3 + R4 | 78 | 0.967 (0.945–0.981) | 0.051 | 0.034 | 13 | 0.201 | 96.62 | 0.049 | 1 | 0 |
| M1 + M3 + M4 | 115 | 0.987 (0.979–0.993) | 0.037 | 0.018 | 8 | 0.147 | 98.15 | 0.030 | 0 | 0 |
| R1 + R3 + R4 | 115 | 0.976 (0.959–0.986) | 0.046 | 0.03 | 16 | 0.155 | 97.01 | 0.041 | 0 | 1 |
| M1 + M2 + M3 + M4 | 153 | 0.986 (0.977–0.992) | 0.003 | 0.02 | 9 | 0.140 | 98.02 | 0.031 | 0 | 0 |
| R1 + R2 + R3 + R4 | 153 | 0.981 (0.967–0.989) | 0.042 | 0.027 | 14 | 0.150 | 97.35 | 0.037 | 0 | 1 |
(i) Pearson's correlation coefficient r; (ii) mean standard error estimated from 200 bootstrap replicates by ADMIXTURE; (iii) mean error calculated by the absolute difference; (iv) number of individuals with error >0.05; (v) maximum error; (vi) mean accuracy calculated via percentage of absolute error; (vii) precision defined as the standard deviation of the absolute error; (viii) number of misclassified individuals (Q‐value threshold of 0.05).
Figure 4Accuracy of single and combined reduced (M1–M4) and random (R1–R4) SNP assays. The box denotes the first and third quartiles and median accuracy marked with a bold vertical line within the box. Outliers are indicated by circles. Random assays consistently have a larger interquartile range than the corresponding reduced assay