| Literature DB >> 29352185 |
A Jacobs1, M De Noia1, K Praebel2, Ø Kanstad-Hanssen3, M Paterno4, D Jackson5, P McGinnity6, A Sturm7, K R Elmer1, M S Llewellyn8.
Abstract
Caligid sea lice represent a significant threat to salmonid aquaculture worldwide. Population genetic analyses have consistently shown minimal population genetic structure in North Atlantic Lepeophtheirus salmonis, frustrating efforts to track louse populations and improve targeted control measures. The aim of this study was to test the power of reduced representation library sequencing (IIb-RAD sequencing) coupled with random forest machine learning algorithms to define markers for fine-scale discrimination of louse populations. We identified 1286 robustly supported SNPs among four L. salmonis populations from Ireland, Scotland and Northern Norway. Only weak global structure was observed based on the full SNP dataset. The application of a random forest machine-learning algorithm identified 98 discriminatory SNPs that dramatically improved population assignment, increased global genetic structure and resulted in significant genetic population differentiation. A large proportion of SNPs found to be under directional selection were also identified to be highly discriminatory. Our data suggest that it is possible to discriminate between nearby L. salmonis populations given suitable marker selection approaches, and that such differences might have an adaptive basis. We discuss these data in light of sea lice adaption to anthropogenic and environmental pressures as well as novel approaches to track and predict sea louse dispersal.Entities:
Mesh:
Year: 2018 PMID: 29352185 PMCID: PMC5775277 DOI: 10.1038/s41598-018-19323-z
Source DB: PubMed Journal: Sci Rep ISSN: 2045-2322 Impact factor: 4.379
Figure 1Population structuring in L. salmonis bases on the full SNP dataset. (A) Map showing the sampling sites of all four populations across the North-East Atlantic: Finnkirka (NF), Loch Duart (LD), Kenmare Bay (SWI) and Kilkieran Bay (KB). (B) DAPC plot of the first and second linear discriminant axis based on the full SNP dataset, explaining a total of 81.4% of the total variation. (C) Membership probability plot showing the population assignment probability for each individual. Shapefiles (for maps, rivers and lakes) were downloaded from natural earth (http://www.naturalearthdata.com/downloads/) and plotted in R. All data and software are open source.
Summary table.
| Population | N | Mean Coverage | Ho | He | Gis | π |
|---|---|---|---|---|---|---|
| KB | 13 | 19.7 | 0.278 | 0.304 | 0.086 | 0.3025986 |
| SWI | 14 | 18.6 | 0.265 | 0.304 | 0.128 | 0.3019599 |
| LD | 11 | 19.8 | 0.258 | 0.298 | 0.132 | 0.2952346 |
| NF | 12 | 20.2 | 0.267 | 0.312 | 0.143 | 0.3096919 |
Summary of sample sizes, mean sequencing coverage per individual and summary statistics, namely observed heterozygosity (Ho), expected heterozygosity (He), inbreeding coefficient (Gis) and genetic diversity (π).
AMOVA results showing the global population structure.
| Source of Var. | Nested in | % Var | F-stat | F-value | P-value | |
|---|---|---|---|---|---|---|
| Full SNP dataset | Within Ind. | — | 88.4 | F_it | 0.116 | — |
| Among Ind. | Population | 9.8 | F_is | 0.1 | p < 0.0001 | |
| Among Pop. | — | 1.8 | F_sc | 0.018 | p < 0.0001 | |
| Discriminatory SNPs | Within Ind. | — | 78.9 | F_it | 0.211 | — |
| Among Ind. | Population | 11.3 | F_is | 0.125 | p < 0.0001 | |
| Among Pop. | — | 9.8 | F_sc | 0.098 | p < 0.0001 |
Figure 2Detecting discriminatory loci using random forest and signals of selection. (A) Plot showing the results of the backwards purging approach, with the number of SNPs per subset plotted against the out-of-bag (OOB) error rate for each subset. The black line shows the smoothed estimates with 95% confidence-intervals (grey area). The two red dotted lines show the range of subsets (93–101 SNPs) with the lowest OOB error rate. (B) The inset shows the initial distribution of scaled importance values for each SNP before the backwards purging. The grey dotted line shows the importance threshold for the subset of SNPs used for backwards purging. (C) FST outlier analysis results showing individual SNP loci and 5% (blue line) and 95% (red line) confidence intervals. Outlier loci potentially under positive selection are in plotted in red and those potentially under balancing selection in blue. Squares mark FST outlier loci that were also detected as highly discriminatory using random forest and triangles those that are not shared. The significant outlier detected using BayeScan is labelled with ‘Locus 3621’.
Figure 3Population structure and population assignment in L. salmonis using discriminatory random forest loci. (A) DAPC plot of the first and second linear discriminant axis based on 98 highly discriminatory SNPs, explaining a total of 74.3% of the total variation. (B) Membership probability plot showing the population assignment probability for each individual. Each individual was correctly assigned to its sampling site. (C) Heatmap showing pairwise Fst between sampling sites based on the full SNP dataset (below diagonal) and based on the highly discriminatory SNP subset (above diagonal). Significant Fst values (inside each square) with P < 0.05 are highlighted in bold.
Outlier SNPs identified using the different approaches (Lositan, BayeScan and Random Forest) and annotation.
| Locus ID | Contig_position | LG | Fst (Lositan) | Lositan | BayeScan | RF | Annotation |
|---|---|---|---|---|---|---|---|
| 38173 | lsalatl2s740_42780 | 4 | 0.320968 | Yes | Putative | Yes | – |
| 3621 | lsalatl2s1185_140991 | 1 | 0.50726 | Yes | Yes | Yes | – |
| 40396 | lsalatl2s80_965936 | 1 | 0.20663 | Yes | No | Yes | PSA2 |
| 41679 | lsalatl2s85_1109389 | 4 | 0.181416 | Yes | No | No | — |
| 42860 | lsalatl2s907_144760 | — | 0.203467 | Yes | No | No | — |
| 4355 | lsalatl2s122_618061 | 7 | 0.207882 | Yes | No | No | — |
| 6832 | lsalatl2s139_1380660 | 1 | 0.199199 | Yes | No | No | — |
| 8287 | lsalatl2s14_555303 | 1 | 0.377272 | Yes | Putative | Yes | unchar. |
| 8241 | lsalatl2s14_1020918 | 1 | 0.217428 | Yes | No | Yes | unchar. |
| 9674 | lsalatl2s163_163880 | 14 | 0.201546 | Yes | No | No | — |
| 15099 | lsalatl2s228_333839 | 1 | 0.241913 | Yes | No | Yes | — |
| 1623 | lsalatl2s10843_736 | — | 0.212014 | Yes | No | Yes | — |
| 21928 | lsalatl2s3387_1782 | — | 0.185878 | Yes | No | Yes | — |
| 25024 | lsalatl2s39_920686 | 6 | 0.216516 | Yes | No | Yes | — |
| 26383 | lsalatl2s429_103294 | 14 | 0.175808 | Yes | No | No | unchar. |
| 29942 | lsalatl2s514_325267 | 14 | 0.164932 | Yes | No | No | — |
| 30716 | lsalatl2s535_184954 | 14 | 0.230334 | Yes | No | Yes | — |
| 2652 | lsalatl2s1135_117353 | 1 | 0.183753 | Yes | No | Yes | — |
| 30805 | lsalatl2s538_341294 | 6 | 0.20138 | Yes | No | Yes | unchar. |
Legend: RF stands for random forest, meaning SNPs that have been detected using the random forest approach. ‘Unchar.’ describes annotated genes that have not been characterized.
Overview of sample origins and population assignment success for the full SNP dataset and the random forest (RF) subset for the Besnier et al. (2014) dataset[13].
| Region | Populations | Assignment % (Full) | Assignment % (RF) |
|---|---|---|---|
| Canada | C857 | 93.75 | 94.79 |
| C858 | |||
| Ireland | I852 | 97.92 | 97.92 |
| I853 | |||
| Faroe | F850 | 94.74 | 97.90 |
| F851 | |||
| Shetland | S855 | 98.96 | 100.00 |
| S856 | |||
| Southern Norway | N813 | 90.53 | 91.58 |
| N854 | |||
| Northern Norway | N837 | 90.63 | 95.83 |
| N849 |