| Literature DB >> 27812126 |
Aleksi Kallio1,2.
Abstract
Assessing the significance of patterns in presence-absence data is an important question in ecological data analysis, e.g., when studying nestedness. Significance testing can be performed with the commonly used fixed-fixed models, which preserve the row and column sums while permuting the data. The manuscript considers the properties of fixed-fixed models and points out how their strict constraints can lead to limited randomizability. The manuscript considers the question of relaxing row and column sun constraints of the fixed-fixed models. The Rasch models are presented as an alternative with relaxed constraints and sound statistical properties. Models are compared on presence-absence data and surprisingly the fixed-fixed models are observed to produce unreasonably optimistic measures of statistical significance, giving interesting insight into practical effects of limited randomizability.Entities:
Mesh:
Year: 2016 PMID: 27812126 PMCID: PMC5094661 DOI: 10.1371/journal.pone.0165456
Source DB: PubMed Journal: PLoS One ISSN: 1932-6203 Impact factor: 3.240
Fig 1Checkerboard units of 2 × 2 values with ones shown in black and zeros shown in white.
The schematic description shows how by swapping elements inside the unit it is possible to switch between the two units without changing margin sums.
Fig 2Count of correlations during sequential swaps.
Each line shows the fluctuation in correlation count during a single chain of swaps. Solid lines show swap chains that start from the original dataset on the left and terminate in result datasets on the right (marked with “o”). Dashed lines show chain of swaps for 10 Rasch randomized datasets (marked with “x”). Correlation counts were recorded every 10 attempted swaps. Lines are smoothed for readability with moving window of 50 counts.
p-values for Kolmogorov-Smirnov tests between the pairs of distributions.
Significance threshold is 0.017.
| Rasch and swapped Rasch | |
| swapped and swapped Rasch | |
| swapped and Rasch |
Statistics for original datasets and randomized datasets with both the fixed-fixed null model and Rasch null model.
Median statistic over all randomizations is reported for randomized data. p-values are defined as the empirical probability of observing statistic at least as extreme from the randomized data. For each dataset dimensions are given together with the fill ratio, i.e., the ratio of matrix cells with value 1.
| Dataset NOW MN5 | |||||
| Original | fixed-fixed model | Rasch model | |||
| Statistic | Statistic | p | Statistic | p | |
| Checkerboard unit count | 54728 | 53634.5 | 1 | 47403 | 0.982 |
| Correlation count (pos) | 23 | 3 | 0.001 | 9 | 0.013 |
| Correlation count (neg) | 0 | 0 | 1 | 0 | 1 |
| Correlation count (both) | 23 | 3 | 0.001 | 9 | 0.013 |
| Clustering error (k = 2) | 331.53 | 342.61 | 0.001 | 329.58 | 0.562 |
| Clustering error (k = 5) | 262.64 | 292.54 | 0.001 | 276 | 0.108 |
| Clustering error (k = 10) | 202.45 | 242 | 0.001 | 225.86 | 0.01 |
| Dataset NOW | |||||
| Original | fixed-fixed model | Rasch model | |||
| Statistic | Statistic | p | Statistic | p | |
| Checkerboard unit count | 55385342 | 53819460.5 | 1 | 52790464.5 | 0.998 |
| Correlation count (pos) | 2355 | 233 | 0.001 | 370 | 0.001 |
| Correlation count (neg) | 7 | 0 | 0.001 | 0 | 0.001 |
| Correlation count (both) | 2362 | 232 | 0.001 | 370 | 0.001 |
| Clustering error (k = 2) | 10269.63 | 10521.75 | 0.001 | 10459 | 0.018 |
| Clustering error (k = 5) | 9085.61 | 10040.86 | 0.001 | 9998.54 | 0.001 |
| Clustering error (k = 10) | 8209.21 | 9669.95 | 0.001 | 9637.17 | 0.001 |
| Dataset Vanuatu | |||||
| Original | fixed-fixed model | Rasch model | |||
| Statistic | Statistic | p | Statistic | p | |
| Checkerboard unit count | 14702 | 14065 | 1 | 11685.5 | 0.99 |
| Correlation count (pos) | 25 | 7 | 0.001 | 12 | 0.042 |
| Correlation count (neg) | 0 | 0 | 1 | 0 | 1 |
| Correlation count (both) | 21 | 5 | 0.001 | 10 | 0.054 |
| Clustering error (k = 2) | 215.37 | 216.49 | 0.108 | 208.96 | 0.831 |
| Clustering error (k = 5) | 154.88 | 163.92 | 0.001 | 153.7 | 0.581 |
| Clustering error (k = 10) | 110.33 | 125.94 | 0.001 | 116.16 | 0.159 |
Summary of characteristics of fixed-fixed constraints and stochastic constraints (Rasch).
| Fixed-fixed constraints | Stochastic constraints | |
|---|---|---|
| Maximum entropy model | sequential swaps | Rasch |
| Convergence | not known | trivial |
| Noise tolerance | tolerant | tolerant |
| Limitations | nested data | none known |
| Conservative | no | yes |
| Applications | empirical | empirical and analytical |