| Literature DB >> 26058066 |
Frank L Forcino1, Lindsey R Leighton2, Pamela Twerdy2, James F Cahill3.
Abstract
Community ecologists commonly perform multivariate techniques (e.g., ordination, cluster analysis) to assess patterns and gradients of taxonomic variation. A critical requirement for a meaningful statistical analysis is accurate information on the taxa found within an ecological sample. However, oversampling (too many individuals counted per sample) also comes at a cost, particularly for ecological systems in which identification and quantification is substantially more resource consuming than the field expedition itself. In such systems, an increasingly larger sample size will eventually result in diminishing returns in improving any pattern or gradient revealed by the data, but will also lead to continually increasing costs. Here, we examine 396 datasets: 44 previously published and 352 created datasets. Using meta-analytic and simulation-based approaches, the research within the present paper seeks (1) to determine minimal sample sizes required to produce robust multivariate statistical results when conducting abundance-based, community ecology research. Furthermore, we seek (2) to determine the dataset parameters (i.e., evenness, number of taxa, number of samples) that require larger sample sizes, regardless of resource availability. We found that in the 44 previously published and the 220 created datasets with randomly chosen abundances, a conservative estimate of a sample size of 58 produced the same multivariate results as all larger sample sizes. However, this minimal number varies as a function of evenness, where increased evenness resulted in increased minimal sample sizes. Sample sizes as small as 58 individuals are sufficient for a broad range of multivariate abundance-based research. In cases when resource availability is the limiting factor for conducting a project (e.g., small university, time to conduct the research project), statistically viable results can still be obtained with less of an investment.Entities:
Mesh:
Year: 2015 PMID: 26058066 PMCID: PMC4461312 DOI: 10.1371/journal.pone.0128379
Source DB: PubMed Journal: PLoS One ISSN: 1932-6203 Impact factor: 3.240
A list of the 44 previously published datasets (some of the citations contain multiple datasets) including original characteristics of the complete dataset.
| Citation | Median Sample Size | Number of Samples | Number of Taxa | Mean Evenness | Environment | Primary Taxonomic Group | Geographic Location |
|---|---|---|---|---|---|---|---|
| Beehler 1983 [ | 97 | 8 | 31 | 0.72 | Forest | Birds and plants | Papua New Guinea |
| Arthur et al. 1976 [ | 85 | 38 | 17 | 0.74 | Lake | Parasites | Yukon, Canada |
| Cause et al. 2011 [ | 20 | 43 | 53 | 0.74 | Subtidal marine | Parasites | Dumont d’Urville Sea (East Antarctica) |
| Wong et al. 2004 [ | 24812 | 12 | 13 | 0.46 | Fresh water streams | Invertebrates | Kent, Uk, and Mississippi, USA |
| VanNimwegen et al. 2008 [ | 75 | 4 | 7 | 0.69 | Grasslands | Prairie dogs | Kansas, USA |
| Ieno and Bastido 1998 [ | 853 | 7 | 13 | 0.75 | Benthic marine | Bivalves and ploychaetes | Samborombon Bay, Argentina |
| Kinnunen and Tiainen 1999 [ | 147 | 40 | 7 | 0.59 | Farmland | Beetles | Finland |
| Nicolaidou et al. 2006 [ | 890 | 18 | 48 | 0.64 | Benthic lagoon | Bivalves | Ionian Sea, Greece |
| Arai and Mudry 1983 [ | 114 | 17 | 53 | 0.83 | River | Fish and parasites | British Columbia, Canada |
| Peres 1997 [ | 110 | 12 | 12 | 0.94 | Forest | Primates | Brazil |
| Dahle et al. 1998 [ | 944 | 15 | 421 | 0.70 | Benthic brackish | Marine invertebrates | Pechora Sea, Russia |
| Repecka and Mileriene 1991 [ | 511 | 19 | 20 | 0.95 | Marine | Fish | Kursia Bay, Lithuania |
| Hughes and Thomas 1971 [ | 94 | 16 | 16 | 0.69 | Benthic Estuary | Bivalves | Prince Edward Island, Canada |
| Hughes and Thomas 1971 [ | 76 | 21 | 18 | 0.67 | Benthic Estuary | Bivalves | Prince Edward Island, Canada |
| Hughes and Thomas 1971[ | 235.5 | 14 | 14 | 0.51 | Benthic Estuary | Bivalves | Prince Edward Island, Canada |
| Hughes and Thomas 1971[ | 648 | 51 | 51 | 0.72 | Benthic Estuary | Bivalves | Prince Edward Island, Canada |
| Ryu et al. 2011 [ | 4939 | 7 | 36 | 0.53 | Benthic marine to brackish | Benthic animals | Incheon North Harbor, Korea |
| Skrodowski and Porowski 2000 [ | 210 | 25 | 22 | 0.73 | Pine forest | Beetles | Poland |
| Snow and Snow 1971 [ | 146 | 13 | 65 | 0.76 | Neotropical forest | Birds | Trinadad |
| Snow and Snow 1988 [ | 234 | 7 | 12 | 0.50 | Mixed terrestrial | Birds and plants | England |
| Snow and Snow 1971[ | 1674 | 9 | 35 | 0.70 | Neotropical forest | Birds | Trinadad |
| Ulrich and Zalewski 2006 [ | 145 | 11 | 17 | 0.76 | Lake Islands | Beetles | Multiple |
| Dechitar 1972 [ | 338 | 31 | 144 | 0.93 | Lake | Parasites | Ontario, Canada |
| Anderson et al. 2011 [ | 850.5 | 42 | 39 | 0.52 | Northern mixed prairie | Grassland plants | Montana, USA |
| Anderson et al. 2011[ | 9.5 | 10 | 6 | 0.80 | Northern mixed prairie | Grassland plants | Montana, USA |
| Anderson et al. 2011[ | 261 | 25 | 15 | 0.50 | Northern mixed prairie | Grassland plants | Montana, USA |
| Anderson et al. 2011[ | 52 | 29 | 14 | 0.50 | Northern mixed prairie | Grassland plants | Montana, USA |
| Anderson et al. 2011[ | 29 | 42 | 31 | 0.55 | Northern mixed prairie | Grassland plants | Montana, USA |
| Anderson et al. 2011[ | 35 | 39 | 17 | 0.44 | Northern mixed prairie | Grassland plants | Montana, USA |
| Anderson et al. 2011[ | 29 | 42 | 41 | 0.63 | Northern mixed prairie | Grassland plants | Montana, USA |
| Anderson et al. 2011[ | 118 | 37 | 46 | 0.50 | Northern mixed prairie | Grassland plants | Montana, USA |
| Anderson et al. 2011[ | 30 | 37 | 43 | 0.67 | Northern mixed prairie | Grassland plants | Montana, USA |
| Anderson et al. 2011[ | 248 | 41 | 46 | 0.60 | Northern mixed prairie | Grassland plants | Montana, USA |
| Anderson et al. 2011[ | 573 | 42 | 37 | 0.47 | Northern mixed prairie | Grassland plants | Montana, USA |
| Anderson et al. 2011[ | 53 | 41 | 27 | 0.71 | Northern mixed prairie | Grassland plants | Montana, USA |
| Anderson et al. 2011[ | 20 | 41 | 30 | 0.72 | Northern mixed prairie | Grassland plants | Montana, USA |
| Miller et al. 2011 [ | 6431 | 68 | 117 | 0.66 | Marine | Fish | Pacific coast, USA |
| Petraitis et al. 2009 [ | 301 | 60 | 3 | 0.67 | Intertidal | Bivalves and algae | Maine, USA |
| Ramesh et al. 2010 [ | 132 | 95 | 334 | 0.77 | Tropical terrestrial | Plants | Karnataka, India |
| Stevens et al. 2011 [ | 33 | 280 | 155 | 0.90 | Grasslands | Plants and bryophytes | Atlantic coast, Europe |
| Stevens et al. 2011 [ | 51 | 40 | 100 | 0.93 | Grasslands | Plants and bryophytes | Atlantic coast, Europe |
| Stevens et al. 2011 [ | 52 | 445 | 355 | 0.95 | Grasslands | Plants and bryophytes | Atlantic coast, Europe |
| Ulrich and Gotelli 2010 [ | 248 | 6 | 25 | 0.77 | River | Fish | British Columbia, Canada |
| Ulrich and Gotelli 2010 [ | 495 | 8 | 99 | 0.88 | Lake Islands | Beetles | Multiple |
The environment refers to the broadest environment from which samples were collected. The primary taxonomic group is broadest category of the most abundant groups in dataset. This list is meant to show the diversity of the types of data included in the analysis.
A list of the 220 created datasets with simulated abundance structure along with the characteristics of the complete dataset prior to subsampling.
| Datasets | Number of samples | Number of taxa | Gradient Size | Median sample size |
|---|---|---|---|---|
| 1–10 | 15 | 15 | 1000 | 8418 |
| 11–20 | 15 | 15 | 5000 | 41528 |
| 21–30 | 20 | 20 | 100 | 178 |
| 31–40 | 20 | 20 | 5000 | 58014 |
| 41–50 | 20 | 30 | 100 | 278 |
| 51–60 | 20 | 40 | 100 | 256 |
| 61–70 | 20 | 40 | 100 | 356 |
| 71–80 | 20 | 50 | 100 | 255 |
| 81–90 | 20 | 50 | 100 | 493 |
| 91–100 | 25 | 100 | 100 | 936 |
| 101–110 | 30 | 20 | 100 | 194 |
| 111–120 | 30 | 60 | 100 | 376 |
| 121–130 | 40 | 20 | 100 | 169 |
| 131–140 | 50 | 20 | 100 | 181 |
| 141–150 | 50 | 50 | 100 | 474 |
| 151–160 | 50 | 50 | 5000 | 151434 |
| 161–170 | 50 | 75 | 100 | 761 |
| 171–180 | 50 | 100 | 100 | 982 |
| 181–190 | 50 | 200 | 100 | 2037 |
| 191–200 | 75 | 50 | 100 | 484 |
| 201–210 | 100 | 50 | 100 | 468 |
| 211–220 | 200 | 50 | 100 | 462 |
This list is meant to show how the dataset were structured, and the differences among the datasets. The number of samples, number of taxa, and gradient size were controlled for in the simulation. The median sample size was an output result of the randomized simulation, although it was influenced by the controlled parameters.
Fig 1(a) Mantel test and (b) PROTEST comparisons for each of the five median subsample sizes for each of the 44 previously published datasets. Each point (black circles) represents the mean, plus and minus one standard deviation, (a) R-statistic and (b) m2-values for the 1000 subsamples of one datasets at one sample size. There are five points for each dataset—one for each of the subsample sizes.
Fig 2(a) Mantel test and (b) PROTEST comparisons for each of the five median subsample sizes for each of the 220 created datasets with randomly selected abundance structures. Each point (black circles) represents the mean, plus and minus one standard deviation, (a) R-statistic and (b) m2-values for the 1000 subsamples of one datasets at one sample size. There are five points for each dataset—one for each of the subsample sizes.
A list of the results from the various tests of significance used to determine if there were differences in groupings of goodness-of-fit statistics at a sample size 50.
| Type of test | Groups being tested | p-value |
|---|---|---|
| T-test | High and low evenness dataset R-statistics | p < 0.001 |
| T-test | High and low evenness datasets m2-values | p = 0.003 |
| T-test | R-statistics for the datasets with 5 samples and 10 sample | p = 0.03 |
| T-test | m2-values for the datasets with 5 samples and 10 sample | p < 0.001 |
| ANOVA | R-statistics for the datasets with a richness of 10, 20, and 50 | p = 0.006 |
| Bonferroni corrected T-test | R-statistics for the datasets with a richness of 20 and 50 | p = 0.009 |
| Bonferroni corrected T-test | R-statistics for the datasets with a richness of 10 and 20 | p = 0.053 |
| Bonferroni corrected T-test | R-statistics for the datasets with a richness of 10 and 50 | p = 0.94 |
| ANOVA | m2-values for the datasets with a richness of 10, 20, and 50 | p = 0.53 |
| T-test | R-statistics the datasets with a richness of 10 and 50 (mixed evenness datasets) | p = 0.09 |
| T-test | m2-values the datasets with a richness of 10 and 50 (mixed evenness datasets) | p = 0.04 |
These tests were conducted on the 132 created datasets with selected abundance structures because those datasets resulted in the greatest amount of variation among samples sizes. Because of this variation, we used these tests to determine if certain parameters would require larger sample sizes.
Fig 3(a) Mantel test and (b) PROTEST comparisons for each of the five median subsample sizes for each of the 132 created datasets with selected abundance structures. Each point (black circles) represents the mean, plus and minus one standard deviation, (a) R-statistic and (b) m2-values for the 1000 subsamples of one datasets at one sample size. There are five points for each dataset—one for each of the subsample sizes.
A list of the minimum and maximum goodness-of-fit statistics for the different parameters for the 132 created datasets with selected abundance structures.
| Mantel Test | PROTEST | |||
|---|---|---|---|---|
| Min | Max | Min | Max | |
| All 132 | 0.53 | 0.98 | 0.56 | 0.98 |
| High Evenness | 0.53 | 0.92 | 0.65 | 0.98 |
| Low Evenness | 0.74 | 0.97 | 0.56 | 0.91 |
| Mixed Evenness | 0.92 | 0.98 | 0.62 | 0.98 |
| 5 Samples | 0.53 | 0.98 | 0.70 | 0.98 |
| 10 Samples | 0.64 | 0.97 | 0.56 | 0.98 |
| Richness = 10 | 0.58 | 0.98 | 0.65 | 0.98 |
| Richness = 20 | 0.53 | 0.97 | 0.56 | 0.97 |
| Richness = 50 | 0.59 | 0.97 | 0.75 | 0.97 |
| Richness = 10 (mixed evenness) | 0.92 | 0.98 | 0.65 | 0.98 |
| Richness = 50 (mixed evenness) | 0.92 | 0.97 | 0.89 | 0.98 |
These goodness-of-fit statistics are for the subsample size of 50.