| Literature DB >> 25432893 |
Abstract
The utility of fossils in evolutionary contexts is dependent on their accurate placement in phylogenetic frameworks, yet intrinsic and widespread missing data make this problematic. The complex taphonomic processes occurring during fossilization can make it difficult to distinguish absence from non-preservation, especially in the case of exceptionally preserved soft-tissue fossils: is a particular morphological character (e.g., appendage, tentacle, or nerve) missing from a fossil because it was never there (phylogenetic absence), or just happened to not be preserved (taphonomic loss)? Missing data have not been tested in the context of interpretation of non-present anatomy nor in the context of directional shifts and biases in affinity. Here, complete taxa, both simulated and empirical, are subjected to data loss through the replacement of present entries (1s) with either missing (?s) or absent (0s) entries. Both cause taxa to drift down trees, from their original position, toward the root. Absolute thresholds at which downshift is significant are extremely low for introduced absences (two entries replaced, 6% of present characters). The opposite threshold in empirical fossil taxa is also found to be low; two absent entries replaced with presences causes fossil taxa to drift up trees. As such, only a few instances of non-preserved characters interpreted as absences will cause fossil organisms to be erroneously interpreted as more primitive than they were in life. This observed sensitivity to coding non-present morphology presents a problem for all evolutionary studies that attempt to use fossils to reconstruct rates of evolution or unlock sequences of morphological change. Stem-ward slippage, whereby fossilization processes cause organisms to appear artificially primitive, appears to be a ubiquitous and problematic phenomenon inherent to missing data, even when no decay biases exist. Absent characters therefore require explicit justification and taphonomic frameworks to support their interpretation.Entities:
Keywords: Missing data; paleontology; phylogeny; soft-bodied; stem-group; taphonomy
Mesh:
Year: 2014 PMID: 25432893 PMCID: PMC4380037 DOI: 10.1093/sysbio/syu093
Source DB: PubMed Journal: Syst Biol ISSN: 1063-5157 Impact factor: 15.683
Median thresholds for significant taxon shift by data set
| Downward movement | Introduced absences (1s to 0s) | Introduced missing (1s to ?s) | |||||||
|---|---|---|---|---|---|---|---|---|---|
| Data set dimensions | Median replacements | Median replacements | |||||||
| Data set, extant | Characters | Taxa | Taxa for thresholds | % Reaching thresholds | No. of entries | Percent (%) | % Reaching Thresholds | No. of entries | Percent (%) |
| Simulated ( | 100 | 20 | 733 | 99.7 | 2 | 6 | 85.3 | 10 | 50 |
| 70 | 20 | 19 | 100 | 1 | 4 | 84.2 | 9 | 51 | |
| 71 | 23 | 21 | 100 | 1 | 3 | 100 | 2 | 10 | |
| 427 | 69 | 66 | 100 | 13 | 21 | 31.8 | 13 | 21 | |
| 71 | 34 | 31 | 96.7 | 2 | 10 | 77.4 | 1 | 5 | |
| 83 | 64 | 62 | 100 | 2 | 22 | 72.6 | 2 | 18 | |
| Total empirical | 199 | 99.5 | 2 | 11 | 63.2 | 3 | 13 | ||
| Including non-threshold | 14 | 51 | |||||||
FSchematic representation of the stages of the experimental workflow used to investigate the effects of different types of data manipulation (introduced missing data and introduced absences) on the positions of taxa.
FChanging positions of taxa following different strategies of character replacement. Lines represent data set averages. Absences as presence for extinct taxa only.
FDistribution of linear fit coefficients for percent missing data against average taxon shift for data sets (40 simulated data sets, 5 empirical data sets, and 4 extinct only data sets).
FHistogram for distribution of thresholds for significant shift of taxa (absolute number of entries replaced). Null represents those taxa that do not reach a threshold.