| Literature DB >> 33052403 |
Abstract
Importance: Nonfatal gunshot injuries are the most common firearm injury, but where they frequently occur remains unclear owing to data limitations. Natural language processing can be applied to medical text narratives of gunshot injury records to classify injury location and inform prevention efforts. Objective: To examine the performance of natural language processing (NLP) and machine learning models to predict nonfatal gunshot injury locations and generate new national estimates of the locations in which these injuries occur. Design, Setting, and Participants: Cross-sectional study of data from the National Electronic Injury Surveillance System Firearm Injury Surveillance Study on nonfatal gunshot injuries that occurred in the US between January 1, 1993, and December 31, 2015. The unweighted sample included 59 025 gunshot injuries that were initially treated at emergency departments. Data were analyzed from June 1, 2019 to July 24, 2020. Main Outcomes and Measures: The primary outcomes were classification of injury location and subsequent estimation of nonfatal gunshot injury location. The NLP was used to generate 6 sets of predictors, and 4 machine learning models were trained to classify the missing locations: multinomial support vector machines, lasso regression, XgBoost gradient descent, and feed-forward neural networks. For each of the 6 sets of NLP predictors, 70% of records with locations were randomly sampled to form the training set and the remaining 30% of records composed the test set. The best-performing model was validated by comparing the predicted locations were with those from existing national estimates of nonfatal gunshot injuries stratified by location and intent.Entities:
Mesh:
Year: 2020 PMID: 33052403 PMCID: PMC7557517 DOI: 10.1001/jamanetworkopen.2020.20664
Source DB: PubMed Journal: JAMA Netw Open ISSN: 2574-3805
Patient Demographic Characteristics, Injury Type, Location, and Intent
| Characteristic | No. (%) | |||
|---|---|---|---|---|
| Missing (n = 26 392) | Test set (n = 9790) | Train set (n = 22 843) | Overall (n = 59 025) | |
| Patient sex | ||||
| Male | 23 915 (90.6) | 8615 (88.0) | 20 100 (88.0) | 52 630 (89.2) |
| Female | 2468 (9.4) | 1173 (12.0) | 2737 (12.0) | 6378 (10.8) |
| Patient disposition | ||||
| Treated/released | 10 240 (38.8) | 4166 (42.6) | 9823 (43.0) | 24 229 (41.0) |
| Transferred/released | 753 (2.9) | 368 (3.8) | 829 (3.6) | 1950 (3.3) |
| Transferred/hospital | 32 (0.1) | 11 (0.1) | 27 (0.1) | 70 (0.1) |
| Hospitalized | 14 944 (56.6) | 5063 (51.7) | 11 709 (51.3) | 31 716 (53.7) |
| Observation | 198 (0.8) | 145 (1.5) | 331 (1.4) | 674 (1.1) |
| Crime involved in injury | ||||
| Unknown | 19 368 (73.4) | 5170 (52.8) | 12 110 (53.0) | 36 648 (62.1) |
| Yes | 4178 (15.8) | 1975 (20.2) | 4622 (20.2) | 10 775 (18.3) |
| No | 2846 (10.8) | 2645 (27.0) | 6111 (26.8) | 11 602 (19.7) |
| Race of patient | ||||
| Not stated | 5992 (22.7) | 1621 (16.6) | 3740 (16.4) | 11 353 (19.2) |
| White | 4284 (16.2) | 2229 (22.8) | 5129 (22.5) | 11 642 (19.7) |
| Black | 12 983 (49.2) | 4890 (49.9) | 11 431 (50.0) | 29 304 (49.6) |
| Other | 3133 (11.9) | 1050 (10.7) | 2543 (11.1) | 6726 (11.4) |
| Age group levels, y | ||||
| 0-14 | 755 (2.9) | 485 (5.0) | 1037 (4.5) | 2277 (3.9) |
| 15-24 | 12 576 (47.7) | 4259 (43.5) | 10 202 (44.7) | 27 037 (45.8) |
| 24-34 | 7205 (27.3) | 2624 (26.8) | 5876 (25.7) | 15 705 (26.6) |
| 35-44 | 3375 (12.8) | 1179 (12.0) | 2898 (12.7) | 7452 (12.6) |
| 45-54 | 1414 (5.4) | 702 (7.2) | 1519 (6.6) | 3635 (6.2) |
| 55-65 | 575 (2.2) | 267 (2.7) | 682 (3.0) | 1524 (2.6) |
| >65 | 328 (1.2) | 221 (2.3) | 524 (2.3) | 1073 (1.8) |
| Primary body part affected | ||||
| Head/neck | 3593 (13.6) | 1449 (14.8) | 3368 (14.7) | 8410 (14.2) |
| Trunk | ||||
| Upper | 5399 (20.5) | 1589 (16.2) | 3714 (16.3) | 10 702 (18.1) |
| Lower | 4601 (17.4) | 1562 (16.0) | 3578 (15.7) | 9741 (16.5) |
| Arm/hand | 3722 (14.1) | 1571 (16.0) | 3759 (16.5) | 9052 (15.3) |
| Leg/foot | 8700 (33.0) | 3465 (35.4) | 8063 (35.3) | 20 228 (34.3) |
| Other | 258 (1.0) | 95 (1.0) | 225 (1.0) | 578 (1.0) |
| Stratum of hospital | ||||
| Small | 603 (2.3) | 536 (5.5) | 1235 (5.4) | 2374 (4.0) |
| Medium | 1599 (6.1) | 636 (6.5) | 1627 (7.1) | 3862 (6.5) |
| Large | 3749 (14.2) | 1429 (14.6) | 3357 (14.7) | 8535 (14.5) |
| Very large | 19 728 (74.7) | 6796 (69.4) | 15 790 (69.1) | 42 314 (71.7) |
| Children’s | 713 (2.7) | 393 (4.0) | 834 (3.7) | 1940 (3.3) |
| Incident intent | ||||
| Unknown | 4678 (17.7) | 479 (4.9) | 1239 (5.4) | 6396 (10.8) |
| Unintentional | 2602 (9.9) | 1490 (15.2) | 3342 (14.6) | 7434 (12.6) |
| Assault | 17 951 (68.0) | 7261 (74.2) | 16 887 (73.9) | 42 099 (71.3) |
| Suicide | 930 (3.5) | 453 (4.6) | 1127 (4.9) | 2510 (4.3) |
Unknown patient categories are omitted if they comprise less than 2% of the patient characteristic; therefore, the numbers may not sum to the total number listed in the heading.
Model Performance by Classifier
| Model | Precision | Recall | Accuracy | AUC | ||||||
|---|---|---|---|---|---|---|---|---|---|---|
| Home | Street | Public Place | Other | Home | Street | Public Place | Other | |||
| Lasso shared words | ||||||||||
| TF-IDF normalization | 0.819 | 0.792 | 0.726 | 0.732 | 0.815 | 0.876 | 0.639 | 0.478 | 0.787 | 0.805 |
| N-grams | 0.769 | 0.756 | 0.641 | 0.532 | 0.770 | 0.848 | 0.569 | 0.212 | 0.734 | 0.740 |
| Word embeddings | 0.700 | 0.707 | 0.611 | 0.437 | 0.720 | 0.819 | 0.489 | 0.133 | 0.685 | 0.679 |
| Lasso all words | ||||||||||
| TF-IDF normalization | 0.818 | 0.789 | 0.715 | 0.730 | 0.812 | 0.875 | 0.630 | 0.458 | 0.783 | 0.805 |
| N-grams | 0.771 | 0.755 | 0.631 | 0.527 | 0.769 | 0.850 | 0.562 | 0.197 | 0.732 | 0.737 |
| Word embeddings | 0.711 | 0.700 | 0.593 | 0.473 | 0.726 | 0.818 | 0.472 | 0.126 | 0.682 | 0.680 |
| SVM shared words | ||||||||||
| TF-IDF normalization | 0.778 | 0.781 | 0.740 | 0.740 | 0.810 | 0.869 | 0.599 | 0.442 | 0.772 | 0.766 |
| N-grams | 0.757 | 0.776 | 0.675 | 0.609 | 0.799 | 0.852 | 0.588 | 0.179 | 0.748 | 0.734 |
| Word embeddings | 0.739 | 0.749 | 0.684 | 0.605 | 0.769 | 0.858 | 0.553 | 0.173 | 0.733 | 0.697 |
| SVM all words | ||||||||||
| TF-IDF normalization | 0.761 | 0.755 | 0.710 | 0.615 | 0.791 | 0.858 | 0.580 | 0.204 | 0.747 | 0.716 |
| N-grams | 0.758 | 0.767 | 0.671 | 0.566 | 0.788 | 0.857 | 0.577 | 0.164 | 0.743 | 0.726 |
| Word embeddings | 0.744 | 0.740 | 0.661 | 0.609 | 0.766 | 0.862 | 0.527 | 0.148 | 0.726 | 0.692 |
| NN shared words | ||||||||||
| TF-IDF normalization | 0.626 | 0.656 | 0.607 | 0.436 | 0.685 | 0.816 | 0.343 | 0.080 | 0.637 | 0.621 |
| N-grams | 0.765 | 0.765 | 0.627 | 0.565 | 0.777 | 0.831 | 0.593 | 0.182 | 0.734 | 0.742 |
| Word embeddings | 0.709 | 0.716 | 0.607 | 0.467 | 0.718 | 0.823 | 0.509 | 0.142 | 0.690 | 0.693 |
| NN all words | ||||||||||
| TF-IDF normalization | 0.771 | 0.758 | 0.650 | 0.548 | 0.778 | 0.841 | 0.592 | 0.188 | 0.738 | 0.742 |
| N-grams | 0.767 | 0.760 | 0.616 | 0.587 | 0.769 | 0.838 | 0.581 | 0.166 | 0.730 | 0.737 |
| Word embeddings | 0.704 | 0.700 | 0.594 | 0.434 | 0.733 | 0.817 | 0.456 | 0.120 | 0.680 | 0.688 |
| XgBoost shared words | ||||||||||
| TF-IDF normalization | 0.787 | 0.613 | 0.831 | 0.677 | 0.575 | 0.949 | 0.381 | 0.493 | 0.680 | 0.849 |
| N-grams | 0.765 | 0.602 | 0.794 | 0.576 | 0.581 | 0.951 | 0.332 | 0.228 | 0.661 | 0.817 |
| Word embeddings | 0.626 | 0.656 | 0.607 | 0.436 | 0.685 | 0.816 | 0.343 | 0.080 | 0.637 | 0.695 |
| XgBoost all words | ||||||||||
| TF-IDF normalization | 0.763 | 0.610 | 0.824 | 0.567 | 0.584 | 0.947 | 0.362 | 0.279 | 0.669 | 0.816 |
| N-grams | 0.766 | 0.603 | 0.793 | 0.533 | 0.579 | 0.952 | 0.332 | 0.235 | 0.660 | 0.813 |
| Word embeddings | 0.630 | 0.647 | 0.580 | 0.448 | 0.682 | 0.830 | 0.296 | 0.086 | 0.632 | 0.703 |
Abbreviations: NN, neural network; TF-IDF, term frequency-inverse document frequency; SVM, support vector machines.
Figure 1. Predictive Value of Specific Words From the Narrative Text Stratified by Gunshot Injury Location
Figure shows the predictive words and medical coder abbreviations from the Lasso regression model fitted on shared words with term frequency-inverse document frequency normalization. Some duplication was present because of words used by the coders (ie, “accidently” and “accidentally”).
Figure 2. Weighted Annual National Estimates of Nonfatal Gunshot Injury Locations With and Without Natural Language Processing Adjustment Stratified by Location
Error bars indicate the 95% CIs.
Intent and Victim Race and Sex by Location
| Location/Variable | Weighted mean annual estimate, No. (%) | |
|---|---|---|
| Unadjusted NEISS-FISS | NLP-Adjusted NEISS-FISS | |
| Intent | ||
| Assault | 6066.9 (40.5) | 9841.3 (39.9) |
| Self-harm | 2065.3 (13.8) | 3331.7 (13.5) |
| Unintentional | 5846.8 (39.1) | 8888.9 (36) |
| Unknown | 988.7 (6.6) | 2606.6 (10.6) |
| Victim race | ||
| Black | 6288.0 (47) | 10 670.3 (42.7) |
| Not stated | 4284.9 (15.5) | 4315.7 (17.3) |
| Other | 3419.7 (14.1) | 3560.4 (14.3) |
| White | 10 540.9 (23.4) | 6423.8 (25.7) |
| Victim sex | ||
| Female | 2553.7 (16.9) | 3828.9 (15.4) |
| Male | 12 531.9 (83.1) | 21 007.0 (84.6) |
| Intent | ||
| Assault | 13 170.2 (92.5) | 25 427.1 (85.6) |
| Self-harm | 101.2 (0.7) | 172.4 (0.6) |
| Unintentional | 388.6 (2.7) | 792.7 (2.7) |
| Unknown | 580.1 (4.1) | 3327.0 (11.2) |
| Victim race | ||
| Black | 8566.8 (59.3) | 16 267.8 (55.2) |
| Not stated | 1655.3 (11.5) | 4335.7 (30.0) |
| Other | 2352.3 (16.3) | 5002.7 (34.6) |
| White | 1872.9 (13.0) | 3840.9 (26.6) |
| Victim sex | ||
| Male | 1219.9 (8.1) | 2440.0 (8.4) |
| Female | 13 221.3 (91.9) | 27 665.3 (91.6) |
| Intent | ||
| Assault | 6032.1 (79.8) | 8756.5 (78.7) |
| Self-harm | 133.3 (1.8) | 147.3 (1.3) |
| Unintentional | 1022.7 (13.5) | 1347.9 (12.1) |
| Unknown | 366.4 (4.9) | 874.1 (7.9) |
| Victim race | ||
| Black | 2909.1 (37.6) | 4802.3 (38.9) |
| Not stated | 1202.8 (15.6) | 2152.1 (17.4) |
| Other | 1582.0 (20.5) | 2446.7 (19.8) |
| White | 2038.5 (26.4) | 2951.8 (23.9) |
| Victim sex | ||
| Female | 847.6 (11.0) | 1160.4 (10.2) |
| Male | 6884.7 (89.8) | 10 202.5 (89.8) |
Abbreviations: NEISS-FISS, National Electronic Injury Surveillance System Firearm Injury Surveillance System; NLP, natural language processing.