| Literature DB >> 23550119 |
Gilbert Michael Macbeth1, Damien Broderick2, Rik C Buckworth3, Jennifer R Ovenden2.
Abstract
Estimates of genetic effective population size (Ne) using molecular markers are a potentially useful tool for the management of endangered through to commercial species. However, pitfalls are predicted when the effective size is large because estimates require large numbers of samples from wild populations for statistical validity. Our simulations showed that linkage disequilibrium estimates of Ne up to 10,000 with finite confidence limits can be achieved with sample sizes of approximately 5000. This number was deduced from empirical allele frequencies of seven polymorphic microsatellite loci in a commercially harvested fisheries species, the narrow-barred Spanish mackerel (Scomberomorus commerson). As expected, the smallest SD of Ne estimates occurred when low-frequency alleles were excluded. Additional simulations indicated that the linkage disequilibrium method was sensitive to small numbers of genotypes from cryptic species or conspecific immigrants. A correspondence analysis algorithm was developed to detect and remove outlier genotypes that could possibly be inadvertently sampled from cryptic species or nonbreeding immigrants from genetically separate populations. Simulations demonstrated the value of this approach in Spanish mackerel data. When putative immigrants were removed from the empirical data, 95% of the Ne estimates from jacknife resampling were greater than 24,000.Entities:
Keywords: bias; correspondence analysis; effective population size; nontarget populations; outliers
Year: 2013 PMID: 23550119 PMCID: PMC3618357 DOI: 10.1534/g3.112.005124
Source DB: PubMed Journal: G3 (Bethesda) ISSN: 2160-1836 Impact factor: 3.154
Count of A pairs within genotypes created from parental gametes at locus A and B, where j* (or k*) is not allele j (or k)
| Female Gametes | |||||
|---|---|---|---|---|---|
|
|
| ||||
| Male Gametes | 2 | 1 | 1 | 1 | |
| 1 | 0 | 1# | 0 | ||
| 1 | 1# | 0 | 0 | ||
| 1 | 0 | 0 | 0 | ||
The ‘#’ indicates where A combinations occur in genotypes but not gametes.
Locus and allele frequency summary
| Number of Alleles With Frequencies | ||||||
|---|---|---|---|---|---|---|
| Locus | Maximum Allele Frequency | Greater Than 0.10 | Between 0.01 and 0.001 | Less Than 0.001 | ||
| SCA30 | 5210 | 36 | 0.178 | 2 | 17 | 8 |
| SM3 | 5206 | 32 | 0.183 | 4 | 8 | 13 |
| SM37 | 4611 | 37 | 0.127 | 2 | 16 | 9 |
| SCA47 | 4781 | 27 | 0.486 | 3 | 4 | 14 |
| SCA49 | 4829 | 25 | 0.248 | 5 | 5 | 8 |
| 90RTE | 5266 | 24 | 0.735 | 1 | 6 | 11 |
| SCA8 | 5139 | 38 | 0.216 | 4 | 12 | 11 |
Sample size at each locus (S) and number of alleles (Na) for microsatellite loci used to genotype S. commerson with the maximum frequency and number of alleles within loci having frequencies less than or greater than the range shown.
Estimates of LDNE effective population size in S. commerson
| 0.05 | 0.02 | 0.01 | 0.001 | 0.0005 | 0.0001 | 0.0000 | |
|---|---|---|---|---|---|---|---|
| −40,163 | −799,447 | 79,842 | 17,503 | 3584 | 503 | 418 | |
| 19,595 | 24,728 | 22,209 | 12,759 | 3290 | 489 | 406 | |
| Infinite | Infinite | Infinite | 27,158 | 3921 | 517 | 428 | |
at different Pcrit thresholds with the upper and lower 95% confidence intervals.
Negative estimates indicate a large undefined Ne.
Figure 1 Frequency of 10,000 Ne estimates when simulating a population size of N = 3000 at different Pcrit values.
Figure 2 Frequency of 10,000 Ne estimates when simulating a population size of N = 10,000 at different Pcrit values. The frequency of all Ne estimates less than 20,000 and greater than 40,000 were pooled and are indicated on the x-axis limits of each graph.
Estimates of Ne in S. commerson after CA iterations
| CA Iteration (Removed) | 0.05 | 0.02 | 0.01 | 0.001 | 0.0005 | 0.0001 | 0.0000 |
|---|---|---|---|---|---|---|---|
| 0 (0) | −40,163 | −799,447 | 79,842 | 17,503 | 3584 | 503 | 418 |
| 1 (33) | −32,062 | −117,650 | 90,318 | 112,421 | 55,074 | 4968 | 5051 |
| 2 (38) | −33,926 | −114,426 | 91,549 | 104,569 | 53,546 | 8082 | 7947 |
| 3 (51) | −34,571 | −104,127 | 93,996 | 105,937 | 48,611 | 8838 | 9495 |
| 4 (60) | −37,447 | −99,305 | 86,818 | 113,630 | 51,105 | 133,636 | 171,370 |
| 5 (90) | −38,487 | −86,051 | 89,982 | 302,878 | −448,815 | −51,226 | −36,471 |
| 6 (119) | −35,678 | −76,242 | 120,453 | 302,946 | −146,528 | −38,189 | −30,685 |
| 7 (153) | −38,909 | −75,672 | 101,714 | 610,512 | −69,972 | −16,082 | −16,082 |
| 8 (170) | −32,038 | −65,015 | 296,541 | −795,394 | −58,191 | −14,132 | −14,132 |
| 9 (174) | −32,371 | −67,105 | 550,582 | −420,513 | −48,637 | −14,059 | −14,059 |
The removal of putative outliers from nine sequential CA iterations with the cumulative number of genotypes removed indicated in brackets and the following estimates of Ne at different Pcrit thresholds. CA, correspondence analysis.
Negative estimates indicate a large undefined Ne.
Lower 95% confidence interval of Ne from S. commerson genotypes
| CA Iteration (Removed) | 0.05 | 0.02 | 0.01 | 0.001 | 0.0005 | 0.0001 | 0.0000 |
|---|---|---|---|---|---|---|---|
| 0 (0) | 19,595 | 24,728 | 22,209 | 12,759 | 3290 | 489 | 406 |
| 1 (33) | 22,540 | 30,509 | 22,943 | 26,461 | 17,594 | 1988 | 2046 |
| 2 (38) | 21,571 | 30,713 | 23,011 | 26,119 | 17,498 | 2849 | 2913 |
| 3 (51) | 21,232 | 31,541 | 23,144 | 33,737 | 25,337 | 7606 | 8131 |
| 4 (60) | 20,110 | 31,970 | 22,720 | 26,879 | 16,904 | 16,696 | 14,799 |
| 5 (90) | 19,615 | 33,487 | 22,809 | 42,238 | 60,094 | −271,390 | −83,353 |
| 6 (119) | 20,379 | 35,118 | 24,305 | 29,804 | 53,307 | −98,902 | −59,311 |
| 7 (153) | 19,174 | 34,947 | 23,471 | 31,098 | 80,748 | −35,452 | −35,453 |
| 8 (170) | 21,646 | 37,832 | 27,703 | 36,446 | 151,392 | −23,066 | −23,066 |
| 9 (174) | 21,445 | 37,064 | 28,922 | 35,858 | −615,338 | −23,260 | −23,260 |
The removal of putative outliers from nine CA iterations with the cumulative number of genotypes removed indicated in brackets and the following estimates of the lower 95% confidence interval () at different Pcrit thresholds. CA, correspondence analysis.
Negative estimates indicate a large undefined .
Effect of S. commerson Ne estimates when adding nontarget species
| Gray Mackerel Genotypes Added | 0.05 | 0.02 | 0.01 | 0.001 | 0.0005 | 0.0001 | 0.0000 |
|---|---|---|---|---|---|---|---|
| 0 | −32,371 | −67,105 | 550,582 | −420,513 | −48,637 | −14,059 | −14,059 |
| 1 | −32,382 | −67,686 | 566,612 | −410,564 | −48,310 | 1303 | 1303 |
| 2 | −32,315 | −67,583 | 719,220 | −356,551 | −47,594 | 1031 | 1031 |
| 4 | −35,620 | −70,777 | 159,027 | −966,684 | −50,839 | 1138 | 1138 |
| 8 | −36,871 | −79,371 | 95,957 | 206,370 | 3930 | 1179 | 1179 |
| 16 | −37,624 | −94,247 | 43,218 | 2030 | 1088 | 1238 | 1238 |
| 32 | −45,964 | −1,040,355 | 16,140 | 1104 | 985 | 1233 | 1233 |
| 64 | 626,218 | 5420 | 2896 | 700 | 776 | 974 | 1014 |
| 100 | 23,439 | 5946 | 813 | 553 | 654 | 806 | 862 |
| 200 | 2189 | 418 | 233 | 376 | 455 | 547 | 620 |
Starting with S. commerson data with 174 outliers removed by nine CA iterations, Ne estimates at different Pcrit thresholds were determined after progressive addition of gray mackerel (S. semifasciatus) genotypes. CA, correspondence analysis.
Negative estimates indicate a large undefined .
Harmonic mean of before and after outlier genotypes removed
| Before Outlier Genotypes Removed | After Outlier Genotypes Removed | |||
|---|---|---|---|---|
| Generations | No Immigrants, | With Immigrants, | No Immigrants, | With Immigrants, |
| 100 | 9896 | 6236 | 13,911 | 17,100 |
| 200 | 10,543 | 3037 | 11,947 | 13,973 |
| 500 | 10,029 | 1282 | 11,151 | 11,558 |
| 1000 | 97,734 | 571 | 10,548 | 11,049 |
| 2000 | 11,834 | 176 | 12,359 | 12,295 |
| 100 | 10,732 | 11,096 | 10,841 | 11,267 |
| 200 | 10,557 | 10,932 | 10,670 | 11,094 |
| 500 | 10,211 | 9420 | 10,217 | 10,003 |
| 1000 | 9595 | 7629 | 9691 | 9736 |
| 2000 | 10,407 | 4456 | 10,508 | 10,564 |
Harmonic mean of at two Pcrit thresholds in simulated populations with N = 10,000 and sample size S = 5413 containing no immigrants or with 100 genotypes drawn from a single immigrant population. The immigrants are from populations diverging after a different number of generations from a common population. The harmonic mean in each column was based on n separate estimates before and after outlier genotypes were removed using the CA algorithm. CA, correspondence analysis.