| Literature DB >> 31288814 |
Serena Jeblee1,2, Mireille Gomes3,4, Prabhat Jha3,4, Frank Rudzicz5,6,7,8, Graeme Hirst5,6.
Abstract
BACKGROUND: A verbal autopsy (VA) is a post-hoc written interview report of the symptoms preceding a person's death in cases where no official cause of death (CoD) was determined by a physician. Current leading automated VA coding methods primarily use structured data from VAs to assign a CoD category. We present a method to automatically determine CoD categories from VA free-text narratives alone.Entities:
Keywords: Cause of death; Computer-coded verbal autopsy (CCVA); Machine learning; Natural language processing; Physician-certified verbal autopsy (PCVA); Tariff method; Verbal autopsy
Mesh:
Year: 2019 PMID: 31288814 PMCID: PMC6617656 DOI: 10.1186/s12911-019-0841-9
Source DB: PubMed Journal: BMC Med Inform Decis Mak ISSN: 1472-6947 Impact factor: 2.796
CoD categories used for adult deaths (15–69 years), and child deaths (29 days–14 years)
| Acute respiratory infections |
| Diarrhea |
| Pulmonary Tuberculosis |
| Other and unspecified infections |
| Neoplasms |
| Nutrition |
| Cardiovascular disease |
| Chronic respiratory disease |
| Liver cirrhosis |
| Other non-communicable diseases |
| Road and transport injuries |
| Other injuries |
| Ill-defined |
| Suicide |
| Maternal |
CoD categories used for neonatal deaths (<29 days)
| Prematurity/low birth weight |
| Neonatal infections (not including tetanus) |
| Birth asphyxia/trauma |
| Ill-defined or cause unknown |
| Other (all other ICDs not included in above) |
Fig. 1Adult CoD category distribution (15 categories)
Fig. 2Child CoD category distribution (15 categories)
Fig. 3Neonate CoD category distribution (5 categories)
Fig. 4Adult CoD category distribution (48 WHO categories)
Fig. 5Child CoD category distribution (39 WHO categories)
Fig. 6Neonate CoD category distribution (17 WHO categories)
Description of datasets used. MDS: Million Death Study dataset, RCT: Randomized Control Trial dataset
| MDS | RCT | MDS+RCT | Agincourt | |
|---|---|---|---|---|
| Adult records (15–69 years) | 9,207 | 5,105 | 14,312 | 8,151 |
| Child records (29 days–14 years) | 1,717 | 255 | 1,972 | 1,674 |
| Neonatal records (<29 days) | 451 | 170 | 621 | 197 |
| Region | India | India (Gujarat, Punjab) | India | South Africa |
Two example narratives (adult deaths)
| Narrative | Physician certified CoD category |
|---|---|
| Heart failure. The patient death due to breathlessness. The person suffering paralysis and stroke lost on year with chest pain very pressure after then person was head. | Cardiovascular disease |
| One day 13/03/01 he fell ill with some fever and chest pain who called the Doctor. On 15/03/01 the deceased was crying in the chest pain and high fever. We were ready to shift. The patient to the Hospital, some water came out from the deceased mouth and closed his eyes and passed away. | Acute respiratory infections |
Mean scores on the combined MDS and RCT datasets for each of the four classifiers
| Precision | Sensitivity | F 1 | PCCC | CSMFA | CCCSMFA | |
|---|---|---|---|---|---|---|
| Adult (15–69 years) | ||||||
| Naïve Bayes | .710 | .710 | .704 | .689 | .929 | .801 |
| Random forest | .733 | .730 | .728 | .711 | .948 | .854 |
| SVM | .746 | .737 | .740 | .718 |
|
|
| Neural network |
|
|
|
|
|
|
| Child (29 days–14 years) | ||||||
| Naïve Bayes | .647 | .595 | .608 | .565 | .851 | .585 |
| Random forest | .687 | .620 | .638 | .591 | .872 | .643 |
| SVM | .686 | .658 | .666 | .632 |
|
|
| Neural network |
|
|
|
| .904 | .733 |
| Neonate (<29 days) | ||||||
| Naïve Bayes | .507 | .516 | .493 | .376 | .826 | .509 |
| Random forest | .534 | .542 | .524 | .411 | .852 | .581 |
| SVM | .537 | .538 | .524 | .404 |
|
|
| Neural network |
|
|
|
| .825 | .507 |
Adult and child results classified into 15 categories; neonatal records into 5 categories. Bold indicates the best score in each column for each age group. PCCC: partially chance-corrected concordance, CSMFA: cause-specific mortality fraction (CSMF) accuracy, CCCSMFA: chance-corrected CSMFA
Fig. 7Individual level results comparison (MDS+RCT)
Fig. 8Population level results comparison (MDS+RCT)
Mean scores using WHO categories on the combined MDS and RCT datasets for each of the four classifiers
| Precision | Sensitivity | F1 | PCCC | CSMFA | CCCSMFA | |
|---|---|---|---|---|---|---|
| Adult (15–69 years) | ||||||
| Naïve Bayes | .591 | .593 | .580 | .583 | .869 | .643 |
| Random forest | .644 | .647 | .634 | .638 | .905 | .742 |
| SVM |
|
|
|
|
|
|
| Neural network | .630 | .654 | .620 | .646 | .840 | .567 |
| Child (29 days–14 years) | ||||||
| Naïve Bayes | .493 | .402 | .427 | .379 | .768 | .369 |
| Random forest |
| .507 | .514 | .488 |
|
|
| SVM | .567 |
|
|
| .796 | .446 |
| Neural network | .512 | .494 | .474 | .474 | .753 | .330 |
| Neonate (<29 days) | ||||||
| Naïve Bayes | .434 | .469 | .435 | .399 | .797 | .448 |
| Random forest | .424 | .455 | .426 | .384 | .798 | .450 |
| SVM |
|
|
|
|
|
|
| Neural network | .328 | .361 | .306 | .278 | .634 | .007 |
Adult: 48 categories, child: 39 categories, neonate: 17 categories. Bold indicates the best score in each column for each age group. PCCC: partially chance-corrected concordance, CSMFA: cause-specific mortality fraction (CSMF) accuracy, CCCSMFA: chance-corrected CSMFA
Mean scores on the Agincourt dataset
| Precision | Sensitivity | F 1 | PCCC | CSMFA | CCCSMFA | |
|---|---|---|---|---|---|---|
| Adult (15–69 years) | ||||||
| Naïve Bayes | .517 | .517 | .513 | .481 |
|
|
| Random forest | .511 | .517 | .496 | .480 | .844 | .577 |
| SVM | .569 | .566 | .561 | .543 | .901 | .730 |
| Neural network |
|
|
|
| .918 | .777 |
| Child (29 days–14 years) | ||||||
| Naïve Bayes | .488 | .440 | .435 | .395 | .761 | .351 |
| Random forest | .521 | .502 | .487 | .463 | .816 | .501 |
| SVM | .535 | .518 | .512 | .479 |
|
|
| Neural network |
|
|
|
| .869 | .645 |
| Neonate (<29 days) | ||||||
| Naïve Bayes |
|
|
|
| .702 | .191 |
| Random forest | .409 | .496 | .427 | .366 |
|
|
| SVM | .387 | .417 | .371 | .266 | .693 | .165 |
| Neural network | .356 | .412 | .354 | .259 | .636 | .012 |
CCCSMFA was calculated using.632 as the mean of random allocation, as suggested in [12]
Mean scores on the Agincourt dataset using the WHO categories
| Precision | Sensitivity | F 1 | PCCC | CSMFA | CCCSMFA | |
|---|---|---|---|---|---|---|
| Adult (15–69 years) | ||||||
| Naïve Bayes | .433 | .448 | .431 | .432 |
|
|
| Random forest | .438 | .464 | .436 | .448 | .832 | .543 |
| SVM |
|
|
| .490 | .857 | .612 |
| Neural network | .470 | .495 | .451 | .480 | .750 | .322 |
| Child (29 days–14 years) | ||||||
| Naïve Bayes | .378 | .388 | .370 | .360 | .793 | .437 |
| Random forest | .456 | .450 | .431 | .425 | .799 | .453 |
| SVM |
|
|
|
|
|
|
| Neural network | .388 | .428 | .374 | .402 | .667 | .095 |
| Neonate (<29 days) | ||||||
| Naïve Bayes | .276 | .384 | .305 | .296 | .610 | -.060 |
| Random forest | .292 | .369 | .314 | .279 | .673 | .111 |
| SVM |
|
|
|
|
|
|
| Neural network | .156 | .265 | .179 | .160 | .502 | -.353 |
CCCSMFA was calculated using.632 as the mean of random allocation, as suggested in [12]