| Literature DB >> 26607695 |
Pierre Miasnikof1, Vasily Giannakeas1, Mireille Gomes1, Lukasz Aleksandrowicz1, Alexander Y Shestopaloff2, Dewan Alam1,3, Stephen Tollman4, Akram Samarikhalaj1,5, Prabhat Jha6,7.
Abstract
BACKGROUND: Verbal autopsies (VA) are increasingly used in low- and middle-income countries where most causes of death (COD) occur at home without medical attention, and home deaths differ substantially from hospital deaths. Hence, there is no plausible "standard" against which VAs for home deaths may be validated. Previous studies have shown contradictory performance of automated methods compared to physician-based classification of CODs. We sought to compare the performance of the classic naive Bayes classifier (NBC) versus existing automated classifiers, using physician-based classification as the reference.Entities:
Mesh:
Year: 2015 PMID: 26607695 PMCID: PMC4660822 DOI: 10.1186/s12916-015-0521-2
Source DB: PubMed Journal: BMC Med ISSN: 1741-7015 Impact factor: 8.775
Description of datasets
| Variable | MDS Study | Agincourt Study | Matlab Study | PHMRC Study |
|---|---|---|---|---|
| Region | India | South Africa | Bangladesh | Multiple |
| Sample size | 12,225 | 5,823 | 3,270 | 2,064 |
| Ages | 1-59 months | 15-64 years | 20-64 years | 28 days – 11 years |
| Number of causes of death | 15 | 16 | 15 | 21 |
| Population | Community deaths | Community deaths | Community deaths | Hospital deaths |
| Cause of death physician classification | Dual, independent coding of VA records, disagreements resolved by reconciliation, and remaining cases by adjudication by a third physician | Dual, independent coding of VA records, disagreements resolved by third physician | Single coding of VA records followed by a second screening by another physician or experienced paramedic | Hospital certified cause of death, including clinical and diagnostic tests |
MDS Million Death Study, PHMRC Population Health Metrics Research Consortium, VA verbal autopsy
Comparison of NBC to other VA classifiers
| Feature | InterVA-4 | OTMa | NBC |
|---|---|---|---|
| Learns from training set | No | Yes | Yes |
| Uses Bayes rule | Yes | No | Yes |
| Uses naive assumption | Yes | Yes | Yes |
| Accounts for absence of symptom | No | No | Yes |
NBC naïve Bayes classifier, VA verbal autopsy, OTM open-source Tariff Method
aOur earlier publication demonstrates that the performance of our OTM to the original Tariff method is comparable [8]; the OTM performed almost exactly as the original Tariff method on the hospital-based dataset without the health care experience (HCE) variables (for the top cause), but less well than the same analysis with HCE variables. Note that results in the original Tariff publication without HCE were only available for the top assigned cause [7]. HCE variables are those that capture any information that the respondent may know about the decedent’s experiences with health care
Mean overall sensitivity (and 95 % uncertainty intervals) on three datasets for 35 train/test iterations
| Study (training/testing sample size)a | NBC | OTM | InterVA-4b | Median, all three classifiers |
|---|---|---|---|---|
| MDS | 0.57 | 0.50 | 0.43 | 0.50 |
| (11,000/555)c | (0.57, 0.58) | (0.50,0.51) | (0.40,0.45) | |
| Agincourt | 0.48 | 0.42 | 0.38 | 0.42 |
| (2,300/2,300) | (0.48,0.48) | (0.41,0.42) | (0.36,0.41) | |
| Matlab | 0.51 | 0.50 | 0.45 | 0.50 |
| (1,000/1,000) | (0.50,0.51) | (0.50,0.51) | (0.43,0.47) | |
| Median, all three datasets | 0.51 | 0.50 | 0.43 | 0.50 |
NBC naïve Bayes classifier, OTM open-source Tariff Method, VA verbal autopsy, MDS Million Death Study
aTraining/testing sample size, with no training required for InterVA-4
bInterVA-4 was evaluated on a testing dataset of 50 randomly selected records out of 555 records, in each of the 35 iterations
cSensitivity using 555/555 training/testing records from the MDS dataset were 0.55 (0.54, 0.55) and 0.49 (0.48, 0.50), respectively, for NBC and OTM
Specificity achieved by all automated classifiers across all datasets ranged from 0.96 to 0.97, and the largest uncertainty interval observed was (0.96,0.97)
Partial chance-corrected concordance (and 95 % uncertainty intervals) on three datasets for 35 train/test iterations
| Study (training/testing sample size)a | NBC | OTM | InterVA-4b | Median all three classifiers |
|---|---|---|---|---|
| MDS | 0.54 | 0.47 | 0.39 | 0.47 |
| (11,000/555) | (0.54, 0.55) | (0.46,0.47) | (0.36,0.41) | |
| Agincourt | 0.43 | 0.38 | 0.34 | 0.38 |
| (2,300/2,300) | (0.44,0.45) | (0.37,0.38) | (0.32,0.37) | |
| Matlab | 0.47 | 0.47 | 0.41 | 0.47 |
| (1,000/1,000) | (0.47,0.49) | (0.46,0.47) | (0.39,0.43) | |
| Median all three datasets | 0.47 | 0.47 | 0.39 | 0.47 |
NBC naïve Bayes classifier, OTM open-source Tariff Method, VA verbal autopsy, MDS Million Death Study
aTraining/testing sample size, with no training required for InterVA-4
bInterVA-4 was evaluated on a testing data set of 50 randomly selected records out of 555 records, in each of the 35 iterations
Number of instances with zero sensitivity for CODs
| Study (number of CODs) | NBC | OTM | InterVA-4 |
|---|---|---|---|
| MDS (15) | 2 | 10 | 6 |
| Agincourt (16) | 0 | 13 | 1 |
| Matlab (16) | 0 | 9 | 1 |
COD cause of death, NBC naïve Bayes classifier, OTM open-source Tariff Method, VA verbal autopsy, MDS Million Death Study
Fig. 1The mean, minimum, and maximum CSMFs as reported by the three classifiers across datasets for a. 15 causes using data from the Million Death Study, b. 16 causes using data from the Agincourt study, c. 15 causes using data from the Matlab study. The MDS results use 11,000 training cases and 555 test cases. CSMF cause-specific mortality fraction
Mean CSMF accuracy (and 95 % uncertainty intervals) on three datasets for 35 train/test iterations
| Study (training/testing sample size)a | NBC | OTM | InterVA-4b | Median all three classifiers |
|---|---|---|---|---|
| MDS | 0.88 | 0.57 | 0.71 | 0.71 |
| (11,000/555) | (0.87,0.88) | (0.56,0.57) | (0.69,0.73) | |
| Agincourt | 0.87 | 0.42 | 0.66 | 0.66 |
| (2,300/2,300) | (0.87,0.88) | (0.42,0.43) | (0.63,0.68) | |
| Matlab | 0.92 | 0.57 | 0.65 | 0.65 |
| (1,000/1,000) | (0.92,0.93) | (0.56,0.58) | (0.62,0.67) | |
| Median all three datasets | 0.88 | 0.57 | 0.66 | 0.66 |
CSMF cause-specific mortality fraction, NBC naïve Bayes classifier, OTM open-source Tariff Method, VA verbal autopsy, MDS Million Death Study
aTraining/testing sample size, with no training required for InterVA-4
bInterVA-4 was evaluated on a testing data set of 50 randomly selected records out of 555 records, in each of the 35 iterations
Mean sensitivity (and 95 % uncertainty intervals) for various non-hospital deaths (MDS) and hospital deaths (PHMRC) train/test combinations for 35 train/test iterations
| Train-test combination | NBC | OTM |
|---|---|---|
| MDS-MDS | 0.61 | 0.54 |
| (0.60,0.62) | (0.52,0.55) | |
| PHMRC-MDS | 0.50 | 0.41 |
| (0.49,0.51) | (0.40,0.42) | |
| PHMRC-PHMRC | 0.46 | 0.40 |
| (0.45,0.47) | (0.38,0.41) | |
| MDS-PHMRC | 0.37 | 0.32 |
| (0.36,0.39) | (0.31,0.34) |
Note: We selected 400 records for training and testing, respectively, in each of the 35 iterations. MDS cases used in this table are non-hospital based deaths, while PHMRC are hospital-based deaths. MDS Million Death Study, PHMRC Population Health Metrics Research Consortium