| Literature DB >> 32351413 |
Morwenna Senior1, Matthias Burghart1, Rongqin Yu1, Andrey Kormilitzin1, Qiang Liu1, Nemanja Vaci1, Alejo Nevado-Holgado1, Smita Pandit2, Jakov Zlodre2, Seena Fazel1.
Abstract
BACKGROUND: Oxford Mental Illness and Suicide tool (OxMIS) is a brief, scalable, freely available, structured risk assessment tool to assess suicide risk in patients with severe mental illness (schizophrenia-spectrum disorders or bipolar disorder). OxMIS requires further external validation, but a lack of large-scale cohorts with relevant variables makes this challenging. Electronic health records provide possible data sources for external validation of risk prediction tools. However, they contain large amounts of information within free-text that is not readily extractable. In this study, we examined the feasibility of identifying suicide predictors needed to validate OxMIS in routinely collected electronic health records.Entities:
Keywords: OxMIS; bipolar disorder; electronic health records; feasibility; natural language processing; risk assessment; schizophrenia; suicide
Year: 2020 PMID: 32351413 PMCID: PMC7175991 DOI: 10.3389/fpsyt.2020.00268
Source DB: PubMed Journal: Front Psychiatry ISSN: 1664-0640 Impact factor: 4.157
Study 1 sample characteristics for patients with severe mental illness (n=57).
| Variable | Yes (%) | Missing (%) | Type of data field |
|---|---|---|---|
| 34 (60%) | 0 | Structured | |
| 47 (10.8) | 0 | Structured | |
| 16 (28%) | 0 | Free-text | |
| 18 (32%) | 0 | Structured | |
| 18 (32%) | 0 | Structured | |
| 26 (46%) | 0 | Free-text | |
| Secondary: 26 (46%) | 15 (26%) | Free-text | |
| 2 (4%) | 4 (7%) | Free-text | |
| 1 (2%) | 0 | Free-text | |
| 51 (89%) | 0 | Free-text | |
| 19 (33%) | 0 | Free-text | |
| Inpatient: 3 (5%) | 0 | Structured | |
| ≤7 days: 7 (12%) | 0 | Structured | |
| ≤7: 22 (39%) | 0 | Structured | |
| 31 (54%) | 14 (25%) | Free-text | |
| 4 (7%) | 9 (16%) | Free-text | |
| 1 (2%) | 0 | Structured |
Variables identified using manual review of electronic health records. Number of patients (%), unless stated otherwise. Percentages were calculated out of total of 57 patients, including those for whom information was missing.
Summary of annotated electronic health records documents used to train the named entity recognition model.
| Variable | Number of annotated text spans | |
|---|---|---|
| Phase 1 | Phase 2 | |
| History of violence | 391 | 350 |
| History of self-harm | 559 | 397 |
| Formal education | 174 | 200 |
| Medication | 1774 | 3860 |
| Benefits recipient | 188 | 195 |
| Drug/alcohol use disorder | 190 | 130 |
| (Parental) suicide | 19 | 77 |
| Psychiatric admission | 332 | 260 |
Text spans are words or word combinations that refer to the concept of interest (the variable), as selected by the manual annotator. The model was trained in two phases: first using GATE software and second using Prodigy—an active learning-based annotation tool. The annotated documents shown in this table constituted the “gold-standard” training dataset used in model development. EHR, electronic health record.
Figure 1Illustrative examples of sentence classification by named entity recognition model.
Figure 2Risk of suicide within 12 months according to OxMIS (Oxford Mental Illness and Suicide tool). Risk was calculated using variables manually extracted from electronic health records. Where variables were unknown, the risk calculator gave a range of risk scores (represented by lines). The line at 0.5 indicates an arbitrary cut-off for an increased risk level.
Named entity recognition model performance for concepts related to suicide predictors.
| Variable | Manually annotated | Correctly identified | Spurious | Missed | Precision | Recall | F1 |
|---|---|---|---|---|---|---|---|
| 80 | 60 | 22 | 20 | 0.73 | 0.75 | 0.74 | |
| 90 | 78 | 26 | 12 | 0.75 | 0.87 | 0.80 | |
| 29 | 24 | 32 | 5 | 0.43 | 0.83 | 0.56 | |
| 719 | 692 | 128 | 27 | 0.84 | 0.96 | 0.90 | |
| 44 | 35 | 15 | 9 | 0.70 | 0.80 | 0.74 | |
| 28 | 17 | 13 | 11 | 0.57 | 0.61 | 0.59 | |
| 12 | 11 | 19 | 1 | 0.37 | 0.92 | 0.52 | |
| 53 | 36 | 28 | 17 | 0.56 | 0.68 | 0.62 | |
| 1055 | 953 | 283 | 102 | 0.77 | 0.90 | 0.83 |
Numbers in manually annotated/correctly identified/spurious/missed columns reflect the absolute numbers of text spans related to the concepts in the sample of free-text EHR documents used to assess the model. Spurious results are text spans identified by the model which were not annotated by the researcher (false positives). Micro-averaging figures for overall model performance are based on model performance when text-spans across all concepts are combined. F1 is a measure of overall model performance. EHR, electronic health records.