| Literature DB >> 21909377 |
Wenbiao Hu1, Rebecca A O'Leary, Kerrie Mengersen, Samantha Low Choy.
Abstract
BACKGROUND: Classification and regression tree (CART) models are tree-based exploratory data analysis methods which have been shown to be very useful in identifying and estimating complex hierarchical relationships in ecological and medical contexts. In this paper, a Bayesian CART model is described and applied to the problem of modelling the cryptosporidiosis infection in Queensland, Australia. METHODOLOGY/PRINCIPALEntities:
Mesh:
Year: 2011 PMID: 21909377 PMCID: PMC3166077 DOI: 10.1371/journal.pone.0023903
Source DB: PubMed Journal: PLoS One ISSN: 1932-6203 Impact factor: 3.240
Figure 1The best tree identified from Bayesian regression trees.
At each terminal node the mean (μ) and number of individuals (n) are displayed.
Confusion or loss matrix – classification of observed versus predicted presence (‘Yes’) and absences (‘No’) from Bayesian CART model.
| Predicted | Observed | Total | |
| Yes | No | ||
| Yes | a (true) | b (false) | a+b |
| No | c (false) | d (true) | c+d |
| Total | a+c | b+d | N |
Figure 2The observed spatial distribution of SEIFA, temperature, rainfall and annual average incidence rates of cryptosporidiosis.
Top 5 of the set of 16 best trees (based on sensitivity, specificity, accuracy and deviance) for Bayesian classification trees.
| Training dataset | Validation dataset | ||||||||
| Trees | Sens | Spec | Post | Dev | Sens | Spec | Post | Dev | Size |
| 1 | 0.776 | 0.527 | −406.08 | 807.78 | 0.825 | 0.513 | −93.94 | 183.51 | 8 |
| 2 | 0.783 | 0.502 | −405.74 | 807.32 | 0.825 | 0.491 | −93.65 | 183.15 | 9 |
| 3 | 0.789 | 0.501 | −420.28 | 836.20 | 0.800 | 0.496 | −100.59 | 196.82 | 8 |
| 4 | 0.783 | 0.538 | −417.91 | 831.44 | 0.775 | 0.531 | −103.44 | 202.52 | 8 |
| 5 | 0.783 | 0.517 | −409.40 | 814.44 | 0.750 | 0.482 | −101.63 | 198.92 | 11 |
The table displays sensitivity (Sens), specificity (Spec), posterior (Post) and deviance (Dev) for both the training and validation datasets. The size of the tree (K; number of terminal nodes) is also shown.
Figure 3The best tree identified from Bayesian classification trees.
At each terminal node the predicted category of presence or absence is denoted respectively by pres or abs. The two numbers directly below this are in general a/b (e.g. 16/0) which denotes the number of observed absences “a” and presences “b” that are classified into this particular node.
Quantiles of sensitivity, specificity and log posterior for training and validation datasets over all accepted trees, for Bayesian classification trees.
| 2.50% | 50% | 97.50% | ||
| Training | Sensitivity | 0.081 | 0.466 | 0.938 |
| Specificity | 0.108 | 0.638 | 0.976 | |
| Log posterior | −441.580 | −414.580 | −394.710 | |
| Validation | Sensitivity | 0.050 | 0.475 | 0.950 |
| Specificity | 0.124 | 0.646 | 0.987 | |
| Log posterior | −109.860 | −100.090 | −91.965 |
Quantiles of log residual sums of squares (RSS), deviance and log posterior for training and validation datasets over all accepted trees, for Bayesian regression trees.
| 2.50% | 50% | 97.50% | ||
| Training | Log RSS | −55.446 | −51.261 | −49.960 |
| Deviance | 10.213 | 21.224 | 40.428 | |
| Log posterior | −21.284 | −12.823 | −12.478 | |
| Validation | RSS | −61.689 | −57.727 | −55.864 |
| Deviance | 8.597 | 14.823 | 28.864 | |
| Log posterior | −17.232 | −10.846 | −9.879 |
Changes (%) in relative risks with 95% credible intervals from Bayesian spatiotemporal CAR models of cryptosporidiosis in Queensland, Australia.
| Variables | Posterior mean | SD | MC error | RR (95%CI) |
|
| 0.1046 | 0.0440 | <0.01 | 1.11 (1.02–1.21) |
|
| −0.0003 | 0.0025 | <0.01 | 1.00 (0.99–1.01) |
|
| 0.0003 | 0.0009 | <0.01 | 1.00 (0.99–1.01) |
|
| 0.0005 | 0.0004 | <0.01 | 1.00(0.99–1.01) |