| Literature DB >> 29462950 |
Aeriel Belk1, Zhenjiang Zech Xu2, David O Carter3, Aaron Lynne4, Sibyl Bucheli5, Rob Knight6,7,8, Jessica L Metcalf9.
Abstract
Death investigations often include an effort to establish the postmortem interval (PMI) in cases in which the time of death is uncertain. The postmortem interval can lead to the identification of the deceased and the validation of witness statements and suspect alibis. Recent research has demonstrated that microbes provide an accurate clock that starts at death and relies on ecological change in the microbial communities that normally inhabit a body and its surrounding environment. Here, we explore how to build the most robust Random Forest regression models for prediction of PMI by testing models built on different sample types (gravesoil, skin of the torso, skin of the head), gene markers (16S ribosomal RNA (rRNA), 18S rRNA, internal transcribed spacer regions (ITS)), and taxonomic levels (sequence variants, species, genus, etc.). We also tested whether particular suites of indicator microbes were informative across different datasets. Generally, results indicate that the most accurate models for predicting PMI were built using gravesoil and skin data using the 16S rRNA genetic marker at the taxonomic level of phyla. Additionally, several phyla consistently contributed highly to model accuracy and may be candidate indicators of PMI.Entities:
Keywords: Random Forest regression; decomposition; microbiome; postmortem interval
Year: 2018 PMID: 29462950 PMCID: PMC5852600 DOI: 10.3390/genes9020104
Source DB: PubMed Journal: Genes (Basel) ISSN: 2073-4425 Impact factor: 4.096
A summary of all studies included in the meta-analysis. Studies were obtained from the QIITA open source microbiome study management platform [22].
| QIITA Study Number | QIITA Study Name | Our Study Name | Shorthand Name | Prep Number | Marker | Trim Length | OTU Table Type | Number of Days Sampled |
|---|---|---|---|---|---|---|---|---|
| 714 | A microbial clock provides an accurate estimate of the postmortem interval in a mouse model system | Mouse Decomposition 1 | mdc1 | 769 | 16S | 90 bp | reference-hit.biom | 48 |
| 1889 | A microbial clock provides an accurate estimate of the postmortem interval in a mouse model system—18S | Mouse Decomposition 1 | mdc1 | 1204 | 18S | 90 bp | all.biom | 48 |
| 10141 | Metcalf microbial community assembly and metabolic function during mammalian corpse decomposition | Mouse Decomposition 2 | mdc2 | 1265 | 16S | 90 bp | reference-hit.biom | 70 |
| 1038 | 18S | 90 bp | all.biom | 70 | ||||
| 345 | ITS | 100 bp | all.biom | 70 | ||||
| 10142 | Metcalf microbial community assembly and metabolic function during mammalian corpse decomposition Sam Houston State University (SHSU) winter | SHSU Winter | shsu_winter | 333 | 16S | 90 bp | reference-hit.biom | 132 |
| 1166 | 18S | 90 bp | all.biom | 132 | ||||
| 335 | ITS | 100 bp | all.biom | 132 | ||||
| 10143 | Metcalf microbial community assembly and metabolic function during mammalian corpse decomposition Sam Houston State University (SHSU) April 2012 exp. | SHSU Spring | shsu_spring | 1107 | 16S | 90 bp | reference-hit.biom | 82 |
| 1109 | 18S | 90 bp | all.biom | 82 | ||||
| 1110 | ITS | 100 bp | all.biom | 82 |
All studies were downloaded as deblur processed tables along with the corresponding metadata information. Different table types and trim lengths were selected based on the availability and the marker type. 16S: 16S ribosomal RNA; 18S: 18S ribosomal RNA; ITS: internal transcribed spacer regions; OTU: operational taxonomic units.
A comparison of the mean absolute error (MAE) of models built using data from each gene marker (16S rRNA, 18S rRNA, ITS) for each sample type (soil, skin_torso, skin_head).
| Genomic Marker | Study Name | Sample Type | Sequence Variants | Species Level | Genus Level | Family Level | Order Level | Class Level | Phylum Level |
|---|---|---|---|---|---|---|---|---|---|
| 16S | mdc1 | soil | 5.068 | 4.528 | 4.439 | 4.574 | 4.596 | 4.308 | 4.565 |
| skin_torso | 4.602 | 3.744 | 3.577 | 3.353 | 3.889 | 4.377 | 4.070 | ||
| skin_head | 4.272 | 3.816 | 3.816 | 3.747 | 3.442 | 4.672 | |||
| mdc2 | soil | 2.571 | 1.943 | 1.955 | 1.911 | 2.062 | 1.971 | ||
| skin_torso | 3.357 | 2.926 | 2.898 | 2.783 | 2.826 | 2.942 | 2.856 | ||
| skin_head | 3.001 | 2.383 | 2.379 | 2.340 | 2.467 | 2.369 | 2.405 | ||
| shsu_spring | soil | 5.225 | 3.594 | 3.632 | 3.660 | 3.966 | 3.868 | 3.877 | |
| skin_torso | 4.303 | 3.830 | 3.807 | 4.106 | 4.343 | 4.311 | 4.022 | ||
| skin_head | 3.890 | 3.506 | 3.385 | 3.577 | 3.342 | 3.006 | |||
| shsu_winter | soil | 4.985 | 3.922 | 3.980 | 3.947 | 3.848 | 4.026 | 3.783 | |
| skin_torso | 5.237 | 4.543 | 4.483 | 4.385 | 3.970 | 3.704 | |||
| 18S | mdc1 | soil | 4.370 | 3.125 | 3.072 | 3.135 | 2.813 | 2.942 | 2.733 |
| skin_torso | 4.333 | 3.821 | 3.447 | 3.030 | 3.549 | 4.521 | |||
| skin_head | 4.744 | 4.583 | 4.138 | 4.616 | 4.251 | 3.775 | 4.657 | ||
| mdc2 | soil | 3.505 | 3.237 | 3.208 | 3.107 | 3.221 | 3.330 | ||
| skin_torso | 3.907 | 3.870 | 3.856 | 3.676 | 3.910 | 3.867 | 3.704 | ||
| skin_head | 3.772 | 3.761 | 3.575 | 3.725 | 3.665 | 3.819 | 3.912 | ||
| shsu_spring | soil | 5.486 | 4.654 | 4.459 | 4.283 | 3.837 | 3.400 | ||
| skin_torso | 5.457 | 4.654 | 5.196 | 5.404 | 5.264 | 5.754 | 5.974 | ||
| skin_head | 4.645 | 4.571 | 4.370 | 5.148 | 5.028 | 4.763 | 5.218 | ||
| shsu_winter | soil | 5.239 | 4.429 | 4.442 | 4.239 | 4.042 | 3.504 | ||
| skin_torso | 5.141 | 4.880 | 4.721 | 4.962 | 5.028 | 4.660 | 4.604 | ||
| ITS | mdc2 | soil | 3.497 | 3.169 | 3.157 | 2.957 | 2.941 | 2.820 | 2.797 |
| skin_torso | 3.505 | 3.237 | 3.211 | 3.083 | 3.023 | 3.036 | |||
| skin_head | 3.648 | 3.561 | 3.523 | 3.483 | 3.509 | 3.413 | 3.305 | ||
| shsu_spring | soil | 5.586 | 4.735 | 4.836 | 4.629 | 4.980 | 4.713 | ||
| skin_torso | 4.837 | 4.671 | 4.563 | 4.688 | 4.786 | 4.860 | 5.500 | ||
| skin_head | 6.080 | 5.996 | 6.083 | 5.803 | 6.090 | 5.965 | 5.416 | ||
| shsu_winter | soil | 4.675 | 4.114 | 3.965 | 3.954 | 3.933 | 4.077 | ||
| skin_torso | 5.726 | 5.702 | 5.662 | 5.608 | 5.565 | 5.575 | 5.610 |
Data were collected from four studies (mouse decomposition 1 (mdc1), mouse decomposition 2 (mdc2), Sam Houston State University (SHSU) human April (shsu_spring), SHSU human February (shsu_winter)). The ITS marker was not sequenced for mdc1. Models were generated based on data from the first 25 days of decomposition and the model with the best MAE (days if decomposition) after parameter tuning was selected. The lowest error within each marker for each experiment is highlighted in bold, black text.
Figure 1The mean absolute error (MAE) rates for Random Forest models trained to predict the postmortem interval (PMI). For each marker type (16S bacterial and archaeal ribosomal RNA (rRNA), 18S microbial eukaryote rRNA, internal transcribed spacer regions (ITS) fungal gene marker), models were generated for three sample types (skin_head, skin_torso, soil) from four studies (mouse decomposition 1 (mdc1), mouse decomposition 2 (mdc2), Sam Houston State University (SHSU) human April (shsu_spring), SHSU human February (shsu_winter)). Skin_head samples were not collected for shsu_winter. Datasets were subset to include only the first 25 sampling days. Though all marker types performed well, the 16S rRNA marker generally resulted in the most accurate PMI prediction models.
The MAE of models used in cross-experiment testing using accumulated degree days (ADD) with a minimum developmental threshold of 0 °C.
| Genomic Marker | Training Dataset | Sample Type | Sequence Variants MAE | Species Level MAE | Genus Level MAE | Family Level MAE | Order Level MAE | Class Level MAE | Phylum Level MAE |
|---|---|---|---|---|---|---|---|---|---|
| 16S | Spring | soil | 88.693 | 57.929 | 59.251 | 57.045 | 56.936 | 55.367 | |
| skin | 92.598 | 90.197 | 90.584 | 104.770 | 109.412 | 117.672 | 135.749 | ||
| Winter | soil | 109.482 | 91.406 | 91.295 | 91.857 | 91.025 | 88.849 | ||
| skin | 120.764 | 129.695 | 130.763 | 122.418 | 124.701 | 123.043 | 108.737 | ||
| 18S | Spring | soil | 81.013 | 62.572 | 62.850 | 55.648 | 51.316 | 51.481 | |
| skin | 88.155 | 93.173 | 85.754 | 89.676 | 91.793 | 72.846 | 67.242 | ||
| Winter | soil | 96.145 | 82.780 | 75.628 | 72.228 | 67.725 | 71.757 | ||
| skin | 111.004 | 110.772 | 101.268 | 101.222 | 101.524 | 107.409 | 105.248 | ||
| ITS | Spring | soil | 111.806 | 94.797 | 94.742 | 93.504 | 85.282 | 80.856 | |
| skin | 101.852 | 96.815 | 99.272 | 101.086 | 104.162 | 106.043 | 94.468 | ||
| Winter | soil | 104.775 | 99.360 | 96.392 | 96.604 | 93.564 | 87.709 | ||
| skin | 114.027 | 110.865 | 107.026 | 113.294 | 117.302 | 115.937 | 87.274 |
Models were built using 16S rRNA marker human cadaver decomposition data from two seasons: spring and winter. Models were built on sequence variants data and family level, genus level, and species level taxonomy. Following model construction, the model was tested on the other dataset to evaluate the ability of the model to predict postmortem interval (PMI) beyond the original dataset. The lowest error for each marker within each cross-experiment test is in bold and black font.
Figure 2The feature importance measures the contribution of each phylum to the PMI regression model (results from SHSU spring study) using the 16S rRNA genetic marker. (A) The feature importance is correlated across three sample types. Each scatter plot shows the correlation between feature importances of every pair of models built from each sample type. Each dot represents a phylum and its value on the x- or y-axis represents its feature importance in the two models of sample types. The Spearman correlation coefficients are 0.90 (head vs. torso), 0.84 (head vs. soil), and 0.93 (torso vs. soil), with p-values < 0.01. The diagonal histogram plots show that most of phyla do not contribute much to regression models of each sample type. (B) The top ten phyla that are most informative for PMI prediction within each sample type. (C) The importance of the phyla to the regression models are highly correlated across spring and winter seasons. Each dot represents the importance of a phylum in winter season (y-axis) and in spring (x-axis). The correlation coefficients between winter and spring feature importances are 0.78 (soil) and 0.93 (torso), with p-values < 0.01. (D) same plot as (C), except axes are feature importance ranks instead of scores.