| Literature DB >> 32128126 |
Florianne Marandel1, Grégory Charrier2, Jean-Baptiste Lamy3, Sabrina Le Cam2,3, Pascal Lorance1, Verena M Trenkel1.
Abstract
Effective population size (Ne ) is a key parameter of population genetics. However, N e remains challenging to estimate for natural populations as several factors are likely to bias estimates. These factors include sampling design, sequencing method, and data filtering. One issue inherent to the restriction site-associated DNA sequencing (RADseq) protocol is missing data and SNP selection criteria (e.g., minimum minor allele frequency, number of SNPs). To evaluate the potential impact of SNP selection criteria on Ne estimates (Linkage Disequilibrium method) we used RADseq data for a nonmodel species, the thornback ray. In this data set, the inbreeding coefficient F IS was positively correlated with the amount of missing data, implying data were missing nonrandomly. The precision of Ne estimates decreased with the number of SNPs. Mean Ne estimates (averaged across 50 random data sets with2000 SNPs) ranged between 237 and 1784. Increasing the percentage of missing data from 25% to 50% increased Ne estimates between 82% and 120%, while increasing the minor allele frequency (MAF) threshold from 0.01 to 0.1 decreased estimates between 71% and 75%. Considering these effects is important when interpreting RADseq data-derived estimates of effective population size in empirical studies.Entities:
Keywords: NeEstimator; RADseq; effective population size; linkage disequilibrium; skates and rays
Year: 2020 PMID: 32128126 PMCID: PMC7042749 DOI: 10.1002/ece3.6016
Source DB: PubMed Journal: Ecol Evol ISSN: 2045-7758 Impact factor: 2.912
Figure 1Thornback ray Raja clavata
Figure 2Sampling locations of thornback rays in the Bay of Biscay. Number proportional to bubble surface
Number of SNPs available for different data selection thresholds for minor allele frequency (MAF) and missing data
| Missing data (%) | MAF lower threshold | |||
|---|---|---|---|---|
| 0.01 | 0.02 | 0.05 | 0.1 | |
| 25 | 4,816 | 3,849 | 2,374 | 1549 |
| 30 | 7,072 | 5,718 | 3,497 | 2,238 |
| 35 | 9,388 | 7,620 | 4,751 | 3,030 |
| 40 | 11,913 | 9,682 | 5,979 | 3,754 |
| 45 | 14,782 | 11,958 | 7,368 | 4,566 |
| 50 | 17,842 | 14,315 | 8,788 | 5,401 |
Figure 3Estimated miscall rate of SNPs as function of mean read depth of each SNP for raw data set for thornback ray in the Bay of Biscay. The color scale indicates the number of data points (number of individuals * number of SNPs)
Figure 4(a) Histogram of minor allele frequencies of SNPs with percentage missing data ≤50%. (b) Histogram of percent missing data for SNPs with minor allele frequency ≥0.01. (c) Percent missing SNPs per individuals for percentage missing data ≤50% and minor allele frequency ≥0.01
Figure 5Relationship between the missing data threshold and the inbreeding coefficient of selected SNPs (minor allele frequency ≥0.01; percent missing data ≤50%) for thornback ray in the Bay of Biscay
Figure 6Relationship between Ne estimates and the number of SNPs for thornback ray in the Bay of Biscay (minor allele frequency ≥0.01; percent missing data ≤25%). White line is mean of 50 random data sets and shaded area central 90% percentile band
Figure 7Relationship between Ne estimates and missing data percentage threshold for different threshold levels of the minor allele frequency for thornback ray in the Bay of Biscay. Continuous lines are mean values for 50 random data sets with 2000 SNPs and shaded areas central 90% percentile bands
Analysis of variance for testing the effects of threshold values for percent of missing data (NA) and minimum minor allele frequency (MAF) on log‐transformed effective population size (N) estimates
| Name |
| MS | F |
|
|---|---|---|---|---|
| NA | 5 | 12.21 | 1,590.25 | <.001 |
| MAF | 3 | 101.45 | 13,215.66 | <.001 |
| NA:MAF | 15 | 0.08 | 10.73 | <.001 |
| Residuals | 1,176 | 0.01 |
Figure 8Relationship between Ne estimates and sample size for thornback ray in the Bay of Biscay (minor allele frequency ≥0.01; percent missing data ≤25%). Continuous white line is mean value for 50 random data sets with 2000 SNPs and shaded areas central 90% percentile bands. Black dotted line is fitted model whose asymptote is plotted as continuous horizontal black line