| Literature DB >> 35239440 |
Alina Arseniev-Koehler1,2, Susan D Cochran2,3,4, Vickie M Mays2,5,6, Kai-Wei Chang2,7, Jacob G Foster1,2.
Abstract
SignificanceWe introduce an approach to identify latent topics in large-scale text data. Our approach integrates two prominent methods of computational text analysis: topic modeling and word embedding. We apply our approach to written narratives of violent death (e.g., suicides and homicides) in the National Violent Death Reporting System (NVDRS). Many of our topics reveal aspects of violent death not captured in existing classification schemes. We also extract gender bias in the topics themselves (e.g., a topic about long guns is particularly masculine). Our findings suggest new lines of research that could contribute to reducing suicides or homicides. Our methods are broadly applicable to text data and can unlock similar information in other administrative databases.Entities:
Keywords: gender; mortality surveillance; natural language processing; topic models; word embeddings
Mesh:
Year: 2022 PMID: 35239440 PMCID: PMC8915886 DOI: 10.1073/pnas.2108801119
Source DB: PubMed Journal: Proc Natl Acad Sci U S A ISSN: 0027-8424 Impact factor: 11.205
Sample topics within narratives of violent death
| Topic label | Seven most representative terms |
|---|---|
| Physical aggression | Tackled, lunged_toward, began_attacking, advanced_toward, attacked, slapped, intervened |
| Causal language | Sparked, preceded, triggered, precipitated, led, prompted, culminated |
| Preparation for death | Disposal, deeds, prepaid_funeral, burial, worldly, miscellaneous, pawning |
| Cleanliness | Unkempt, messy, disorganized, cluttered, dirty, tidy, filthy |
| Everything seemed fine | Fell_asleep, everything_seemed_fine, seemed_fine, wakes_up, ran_errands, ate_breakfast, watched_television |
| Suspicion and paranoia | Conspiring_against, plotting_against, restraining_order_filed_against, belittled, please_forgive, making_fun, reminded |
| Reclusive behavior and chronic illness | Recluse, heavy_drinker, very_ill, chronic_alcoholic, bedridden, reclusive, recovering_alcoholic |
Most representative terms are listed in order of highest to lowest cosine similarity to each topic’s atom vector. Topic labels are manually assigned. As part of preprocessing the narratives, we transformed commonly occurring phrases into single terms (29).
Fig. 1.Prevalence of 225 topics in narratives of 272,964 decedents of violent death, by manner of death. Each row represents the fraction of narratives with a given topic by manner of death, with rows standardized across all manners of death.
Fig. 2.Latent gendered meanings of topics vs. prevalence of topics in female vs. male decedents’ narratives. N = 225 topics. For clarity, labels are shown only for topics with high or low gender meanings or gender prevalence ratios; overlapping labels are removed. The y axis represents cosine similarity between a given topic and the gender dimension in semantic space. The x axis represents the ratio of female decedents’ narratives containing a given topic compared to narratives of male decedents.
Characteristics of violent deaths with two selected topics
| Topic | ||
|---|---|---|
| Characteristic | Rifles and shotguns: AOR (95% CI) | Sedative and pain medications: AOR (95% CI) |
| Female decedent | 0.49 (0.48 to 0.51) | 2.52 (2.47 to 2.58) |
| Decedent race/ethnicity | ||
| American Indian/Alaska Native, NH | 1.31 (1.20 to 1.42) | 0.46 (0.41 to 0.52) |
| Asian/Pacific Islander, NH | 0.48 (0.43 to 0.54) | 0.64 (0.59 to 0.70) |
| Black or African American, NH | 0.88 (0.85 to 0.91) | 0.54 (0.51 to 0.56) |
| Hispanic | 0.59 (0.56 to 0.62) | 0.63 (0.60 to 0.67) |
| Two or more races, NH | 1.01 (0.92 to 1.10) | 0.80 (0.73 to 0.88) |
| Unknown race, NH | 0.70 (0.56 to 0.87) | 0.70 (0.56 to 0.87) |
| Decedent age, y | ||
| 20 to 29 | 0.96 (0.91 to 1.00) | 1.37 (1.29 to 1.46) |
| 30 to 39 | 0.90 (0.86 to 0.95) | 1.74 (1.64 to 1.85) |
| 40 to 49 | 0.93 (0.88 to 0.98) | 1.97 (1.86 to 2.10) |
| 50 to 59 | 1.03 (0.98 to 1.08) | 2.17 (2.04 to 2.30) |
| 60+ | 1.40 (1.33 to 1.47) | 1.68 (1.58 to 1.79) |
| Manner of death | ||
| Homicide | 0.79 (0.77 to 0.82) | 0.14 (0.13 to 0.15) |
| Legal intervention | 1.09 (1.01 to 1.17) | 0.22 (0.19 to 0.26) |
| Undetermined | 0.06 (0.06 to 0.07) | 2.01 (1.95 to 2.07) |
| Unintentional | 3.16 (2.84 to 3.51) | 0.13 (0.10 to 0.19) |
| Multiple decedents in incident | 1.76 (1.68 to 1.84) | 0.40 (0.37 to 0.43) |
| Word count | 1.00 (1.00 to 1.00) | 1.00 (1.00 to 1.00) |
N = 272,964 decedents. Topics are coded as present in any amount or not (1/0) in the narrative either of law enforcement reports or of medical examiner/coroner reports. AOR, adjusted odds ratio; NH, non-Hispanic.
*Referent = male.
†Referent = non-Hispanic White.
‡Referent = 12 to 19.
§Referent = suicide.
¶Referent = incidents with a single decedent.
#The combined word count of both narratives.