| Literature DB >> 30986212 |
Yu Zhang1, Jennifer L Pechal2, Carl J Schmidt3,4, Heather R Jordan5, Wesley W Wang6, M Eric Benbow2,7,8, Sing-Hoi Sze1,9, Aaron M Tarone10.
Abstract
BACKGROUND: The postmortem microbiome can provide valuable information to a death investigation and to the human health of the once living. Microbiome sequencing produces, in general, large multi-dimensional datasets that can be difficult to analyze and interpret. Machine learning methods can be useful in overcoming this analytical challenge. However, different methods employ distinct strategies to handle complex datasets. It is unclear whether one method is more appropriate than others for modeling postmortem microbiomes and their ability to predict attributes of interest in death investigations, which require understanding of how the microbial communities change after death and may represent those of the once living host. METHODS ANDEntities:
Mesh:
Substances:
Year: 2019 PMID: 30986212 PMCID: PMC6464165 DOI: 10.1371/journal.pone.0213829
Source DB: PubMed Journal: PLoS One ISSN: 1932-6203 Impact factor: 3.240
Confusion matrices for prediction of postmortem interval, event location, and manner of death with the microbiota from all anatomic locations (ears, eyes, nose, mouth, and rectum) using three machine learning methods: xgboost, random forest, and neural network.
The results for the three methods are put within the same table in the order of xgboost/ random forest/ neural network.
| Predictor Variable | prediction/ | < 24 h | 25–48 h | 49–72 h | > 73 h |
|---|---|---|---|---|---|
| 296/ 300/ 285 | 78/ 63/ 71 | 19/ 28/ 22 | 07/ 06/ 05 | ||
| 79/ 76/ 78 | 271/288/268 | 11/ 18/ 20 | 11/ 26/ 15 | ||
| 04/ 03/ 13 | 02/ 02/ 08 | 34/ 20/ 20 | 00/ 01/ 03 | ||
| 01/ 01/ 04 | 02/ 00/ 06 | 02/ 00/ 04 | 30/ 15/ 25 | ||
| 75/ 65/ 69 | 03/ 00/ 19 | 05/ 00/ 07 | 03/ 00/ 07 | ||
| 31/ 42/ 30 | 580/ 586/ 544 | 25/ 39/ 22 | 14/ 19/ 09 | ||
| 05/ 05/ 05 | 05/ 03/ 22 | 75/ 67/ 72 | 11/ 06/ 06 | ||
| 01/ 00/ 08 | 01/ 00/ 04 | 01/ 00/ 05 | 12/ 13/ 18 | ||
| 282/296/266 | 12/ 11/ 26 | 29/ 18/ 51 | 27/ 38/ 27 | ||
| 07/ 03/ 17 | 142/ 141/ 120 | 04/ 03/ 11 | 00/ 00/ 12 | ||
| 42/ 37 /42 | 03/ 09/ 09 | 211/ 221/169 | 14/ 26/ 20 | ||
| 06/ 01 / 12 | 05/ 01/ 07 | 01/ 03 / 14 | 62/ 39/ 44 |
Accuracy and p-value obtained from 5-fold cross validation for three machine learning methods (xgboost, random forest and neural network) for the prediction of postmortem interval, event location and manner of death using the microbiota from all anatomic locations (ears, eyes, nose, mouth, and rectum).
| Predictor Variable | Performance Metric | xgboost | random forest | neural network |
|---|---|---|---|---|
| Accuracy | 0.745 | 0.736 | 0.706 | |
| p-value | < 2.0e-16 | < 2.0e-16 | < 2.0e-16 | |
| Accuracy | 0.876 | 0.863 | 0.830 | |
| p-value | < 2.0e-16 | < 2.0e-16 | < 2.0e-16 | |
| Accuracy | 0.823 | 0.823 | 0.707 | |
| p-value | < 2.2e-16 | < 2.0e-16 | < 2.0e-16 |
Evaluation statistics for three methods (xgboost, random forest and neural network) for prediction of the postmortem interval, event location, and manner of death.
The results for the three methods are put within the same table in the order of xgboost/ random forest/ neural network.
| Predictor Variable | Performance Metric | < 24 h | 25–48 h | 49–72 h | > 73 h |
|---|---|---|---|---|---|
| Sensitivity | 0.78/0.79/0.75 | 0.77/0.82/0.76 | 0.52/0.30/0.30 | 0.63/0.31/0.52 | |
| Specificity | 0.78/0.79/0.79 | 0.80/0.76/0.77 | 0.99/0.99/0.97 | 0.99/1.00/0.98 | |
| Pos Pred Value | 0.74/0.76/0.74 | 0.73/0.71/0.70 | 0.85/0.77/0.45 | 0.86/0.94/0.64 | |
| Neg Pred Value | 0.81/0.82/0.80 | 0.83/0.85/0.82 | 0.96/0.94/0.94 | 0.98/0.96/0.97 | |
| Prevalence | 0.45/0.45/0.45 | 0.42/0.42/0.42 | 0.08/0.08/0.08 | 0.06/0.06/0.06 | |
| Detection Rate | 0.35/0.35/0.34 | 0.32/0.34/0.32 | 0.04/0.02/0.02 | 0.04/0.02/0.03 | |
| Detection Prevalence | 0.47/0.47/0.45 | 0.44/0.48/0.45 | 0.05/0.03/0.05 | 0.04/0.02/0.04 | |
| Balanced Accuracy | 0.78/0.79/0.77 | 0.78/0.79/0.77 | 0.75/0.65/0.64 | 0.81/0.66/0.75 | |
| Sensitivity | 0.67/0.58/0.62 | 0.98/0.99/0.92 | 0.71/0.63/0.68 | 0.30/0.33/0.45 | |
| Specificity | 0.99/1.00/0.96 | 0.73/0.61/0.76 | 0.97/0.98/0.96 | 1.00/1.00/0.98 | |
| Pos Pred Value | 0.87/0.97/0.68 | 0.89/0.85/0.90 | 0.78/0.83/0.69 | 0.80/1.00/0.51 | |
| Neg Pred Value | 0.95/0.94/0.94 | 0.95/0.98/0.81 | 0.96/0.95/0.95 | 0.97/0.97/0.97 | |
| Prevalence | 0.13/0.13/0.13 | 0.70/0.70/0.70 | 0.13/0.13/0.13 | 0.05/0.05/0.05 | |
| Detection Rate | 0.09/0.08/0.08 | 0.68/0.69/0.64 | 0.09/0.08/0.09 | 0.01/0.02/0.02 | |
| Detection Prevalence | 0.10/0.08/0.12 | 0.77/0.81/0.71 | 0.11/0.10/0.12 | 0.02/0.02/0.04 | |
| Balanced Accuracy | 0.83/0.79/0.79 | 0.86/0.80/0.84 | 0.84/0.81/0.82 | 0.65/0.66/0.71 | |
| Sensitivity | 0.84/0.88/0.79 | 0.88/0.87/0.74 | 0.86/0.90/0.69 | 0.60/0.38/0.43 | |
| Specificity | 0.87/0.87/0.80 | 0.98/0.99/0.94 | 0.90/0.88/0.88 | 0.98/0.99/0.96 | |
| Pos Pred Value | 0.81/0.82/0.72 | 0.93/0.96/0.75 | 0.78/0.75/0.70 | 0.84/0.89/0.57 | |
| Neg Pred Value | 0.89/0.92/0.85 | 0.97/0.97/0.94 | 0.94/0.96/0.87 | 0.95/0.92/0.92 | |
| Prevalence | 0.40/0.40/0.40 | 0.19/0.19/0.19 | 0.29/0.29/0.29 | 0.12/0.12/0.12 | |
| Detection Rate | 0.33/0.35/0.31 | 0.17/0.17/0.14 | 0.25/0.26/0.20 | 0.07/0.05/0.05 | |
| Detection Prevalence | 0.41/0.43/0.44 | 0.18/0.17/0.19 | 0.32/0.35/0.28 | 0.09/0.05/0.09 | |
| Balanced Accuracy | 0.85/0.87/0.79 | 0.93/0.93/0.84 | 0.88/0.89/0.79 | 0.79/0.69/0.69 |
1 Sensitivity: the proportion of positives that are correctly identified.
2 Specificity: the proportion of negatives that are correctly identified.
3 Pos Pred Value: proportions of positive results that are true positive.
4 Neg Pred Value: proportions of negative results that are true negative.
5 Prevalence: the proportion of a population who have a specific characteristic in a given time period.
6 Detection rate: the proportion of individuals with a particular condition who test positive for that condition when measured by some method.
7 Detection Prevalence: the proportion of the predicted events.
8 Balanced Accuracy: the average of the proportion of correct classifications within a class.
Fig 1Common features among machine learning methods to predict case attributes.
Venn diagram for shared features among top 100 features for predicting postmortem interval, event location, manner of death across three models xgboost (xg), random forest (rf), and neural network (nn). Identities of the features found by all methods can be found in S2–S5 Tables.
Fig 2Method prediction accuracy based on number of anatomical areas.
Prediction with all combinations of swabs for postmortem interval, event location and manner of death. Results for the most accurate model (highest accuracy) for a given number of samples from different subareas (anatomic areas). xgboost = blue dashed line; random forest = orange short dashed line; and neural network = gray solid line.
Machine learning method accuracy from each model with a combination of subareas for the attribution predictor. The order of subareas (1 through 5) reflects the sequential addition and the respective accuracy when included in the model through the use of a greedy algorithm.
“Subarea” is the microbiota from a specific anatomic area. The model with the highest accuracy within each method for each predictor variable is indicated with an asterisk (*).
| Predictor Variable | Machine Learning Method | Subarea 1 | Subarea 2 (accuracy) | Subarea 3 (accuracy) | Subarea 4 (accuracy) | Subarea 5 (accuracy) |
|---|---|---|---|---|---|---|
| xgboost | rectum | mouth | nose | eyes | ears* | |
| random forest | eyes | mouth | nose* | ears | rectum | |
| neural network | mouth | eyes | nose | ears* | rectum | |
| xgboost | rectum | ears | mouth | nose | eyes* | |
| random forest | eyes | nose | rectum | ears | mouth* | |
| neural network | rectum* | ears | eyes | nose | mouth* | |
| xgboost | eyes | mouth | nose | rectum | ears* | |
| random forest | eyes | ears | nose | rectum | mouth* | |
| neural network | eyes | ears | nose | mouth | rectum* |