| Literature DB >> 16477311 |
Kai Puolamäki1, Mikael Fortelius, Heikki Mannila.
Abstract
Given a collection of fossil sites with data about the taxa that occur in each site, the task in biochronology is to find good estimates for the ages or ordering of sites. We describe a full probabilistic model for fossil data. The parameters of the model are natural: the ordering of the sites, the origination and extinction times for each taxon, and the probabilities of different types of errors. We show that the posterior distributions of these parameters can be estimated reliably by using Markov chain Monte Carlo techniques. The posterior distributions of the model parameters can be used to answer many different questions about the data, including seriation (finding the best ordering of the sites) and outlier detection. We demonstrate the usefulness of the model and estimation method on synthetic data and on real data on large late Cenozoic mammals. As an example, for the sites with large number of occurrences of common genera, our methods give orderings, whose correlation with geochronologic ages is 0.95.Entities:
Mesh:
Year: 2006 PMID: 16477311 PMCID: PMC1361357 DOI: 10.1371/journal.pcbi.0020006
Source DB: PubMed Journal: PLoS Comput Biol ISSN: 1553-734X Impact factor: 4.475
Results for Artificially Generated Datasets
Results on the Large Mammal Dataset
Figure 1The Pair-Order Matrix O(π(i) < π(j)) between Sites for Dataset n = 10, n = 10 from the Eight Chains with the Best Likelihood
Black denotes probability one, and white denotes probability zero. For most pairs, the probability is close to zero or one, but some blocks of observations have many different orderings with high probability.
Figure 2The Data Matrix for the Dataset with n = 10 and n = 10
The sites have been ordered by E{π(n)} and the genera by E{a} (top). Probability that genus m is alive on site n in the dataset specified by n = 10 and n = 10 (middle). Probability that one is false (bottom). Black color denotes probability of one, and white probability of zero.
Parameters of Our Model, with Prior Distributions