| Literature DB >> 26872136 |
Trond Reitan1, Anders Nielsen1.
Abstract
Studies in ecology are often describing observed variations in a certain ecological phenomenon by use of environmental explanatory variables. A common problem is that the numerical nature of the ecological phenomenon does not always fit the assumptions underlying traditional statistical tests. A text book example comes from pollination ecology where flower visits are normally reported as frequencies; number of visits per flower per unit time. Using visitation frequencies in statistical analyses comes with two major caveats: the lack of knowledge on its error distribution and that it does not include all information found in the data; 10 flower visits in 20 flowers is treated the same as recording 100 visits in 200 flowers. We simulated datasets with various "flower visitation distributions" over various numbers of flowers observed (exposure) and with different types of effects inducing variation in the data. The different datasets were then analyzed first with the traditional approach using number of visits per flower and then by using count data models. The analysis of count data gave a much better chance of detecting effects than the traditionally used frequency approach. We conclude that if the data structure, statistical analyses and interpretations of results are mixed up, valuable information can be lost.Entities:
Mesh:
Year: 2016 PMID: 26872136 PMCID: PMC4752487 DOI: 10.1371/journal.pone.0149129
Source DB: PubMed Journal: PLoS One ISSN: 1932-6203 Impact factor: 3.240
False negative rate (type II error, upper panel) and false positive rate (type I error, lower panel) for the analyses of 10000 simulated datasets in each case.
The datasets were generated by use of three different distributions (Poisson, Negative binomial and lognormal Poisson). The best model (Poisson, negative binomial and zero-inflated negative binomial for count data) with and without effect was identified and compared to each other using BML. Datasets with an effect were simulated so as to yield approximately 10% false negative for the count data analysis. Each cell contains the percentage of false negatives/positives for the data type (count of frequency) followed by the ratio between them (frequency score divided by count data score). Note that a ratio >1 is favoring the count data approach while ratio < 1 is favoring the frequency approach. These are indicated in bold. Ratios larger than 4 in favor of count data is indicated with bold italic characters.
| Effect type | Distribution | ||||||||
|---|---|---|---|---|---|---|---|---|---|
| Poisson | Neg. binomial (k = 10) | Lognormal-Poisson (sd = 0.36) | |||||||
| Fixed categorical | 13.2% | 34.8% | 2.6 | 10.2% | 24.2% | 2.4 | 10.4% | 23.2% | 2.2 |
| Fixed linear | 9.9% | 40.9% | 9.0% | 21.4% | 2.4 | 9.4% | 19.6% | 2.1 | |
| Random intercept | 6.1% | 19.6% | 3.2 | 5.0% | 10.8% | 2.2 | 4.3% | 7.7% | 1.8 |
| Random slope | 9.0% | 22.0% | 2.4 | 10.6% | 8.4% | 10.2% | 7.2% | ||
| Fixed categorical | 0.62% | 3.0% | 1.2% | 3.2% | 2.7 | 1.5% | 3.3% | 3.2 | |
| Fixed linear | 1.8% | 4.9% | 2.7 | 3.8% | 5.2% | 1.4 | 4.1% | 5.6% | 1.4 |
| Random intercept | 2.0% | 2.7% | 1.4 | 1.9% | 2.2% | 1.2 | 2.0% | 2.7% | 1.4 |
| Random slope | 1.5% | 11.1% | 1.5% | 9.5% | 1.7% | 7.7% | |||
Fig 1ROC curves (false positive rate against true positive rate) for the different combinations of effect type and data distribution.
Solid and dashed lines illustrate the relationship for count data and frequency data respectively.
Area under curve (AUC) for the different combinations of effects and distributions.
In the analysis, the best model (Poisson, negative binomial and zero-inflated negative binomial for count data) with and without effect was identified and compared to each other using BML. Delta AUC is also given to illustrate the difference between the two models utilizing the two different data types (counts and frequencies). Note that the analyses using count data always perform better than the analyses using frequency data, though the difference is, in some cases marginal, e.g. random slope models drawn from datasets built on negative binomial or lognormal Poisson distribution.
| Effect type | Distribution | |||||
|---|---|---|---|---|---|---|
| Poisson | Neg. binomial (k = 10) | |||||
| Count | Freq. | Count | Freq. | Count | Freq. | |
| Fixed categorical | 99.5% | 92.9% | 99.5% | 95.9% | 99.2% | 96.3% |
| Fixed linear | 98.6% | 89.1% | 98.4% | 95.0% | 98.3% | 95.3% |
| Random intercept | 97.6% | 93.1% | 98.7% | 96.3% | 98.8% | 97.2% |
| Random slope | 98.3% | 90.4% | 97.6% | 96.8% | 97.9% | 97.5% |