| Literature DB >> 29743531 |
Andrea C Fernandes1,2, Rina Dutta3,4, Sumithra Velupillai3,4, Jyoti Sanyal3,4, Robert Stewart3,4, David Chandran3,4.
Abstract
Research into suicide prevention has been hampered by methodological limitations such as low sample size and recall bias. Recently, Natural Language Processing (NLP) strategies have been used with Electronic Health Records to increase information extraction from free text notes as well as structured fields concerning suicidality and this allows access to much larger cohorts than previously possible. This paper presents two novel NLP approaches - a rule-based approach to classify the presence of suicide ideation and a hybrid machine learning and rule-based approach to identify suicide attempts in a psychiatric clinical database. Good performance of the two classifiers in the evaluation study suggest they can be used to accurately detect mentions of suicide ideation and attempt within free-text documents in this psychiatric database. The novelty of the two approaches lies in the malleability of each classifier if a need to refine performance, or meet alternate classification requirements arises. The algorithms can also be adapted to fit infrastructures of other clinical datasets given sufficient clinical recording practice knowledge, without dependency on medical codes or additional data extraction of known risk factors to predict suicidal behaviour.Entities:
Mesh:
Year: 2018 PMID: 29743531 PMCID: PMC5943451 DOI: 10.1038/s41598-018-25773-2
Source DB: PubMed Journal: Sci Rep ISSN: 2045-2322 Impact factor: 4.379
Figure 1Schematic Outline of classification tool development and evaluation study. Event notes are free-text fields, where day-to-day notes can be recorded in any layout or format. Correspondence notes are used to attach any formal correspondences between patient and clinical staff or between clinical staff members. These documents constitute of letters and questionnaires recorded in word documents (which included any electronic questionnaires administered to the patient), pdf documents and any other documents related to the patient. “Suicid* ideat*” is the text pattern used to filter variations of the phrase “suicidal ideation”, within each sentence in the clinical notes, where the asterisk denotes a wild card allowing for any combination of letters in the word after the initial specified sequence.
Classifier Performance Results for Suicide Ideation (n = 500 instances).
| Classifier | Gold Standard | |
|---|---|---|
| True Event | Non-True Event | |
| True Event | a = 265 | c = 24 |
| Non-True Event | b = 37 | d = 174 |
|
| ||
| Precision (PPV) (a/a + c) | 91.7% | |
| Recall (Sensitivity) (a/a + b) | 87.8% | |
Classifier Performance Results for Suicide Attempt after Post-Processing (n = 500).
| Classifier | Gold Standard | |
|---|---|---|
| Suicide Attempt | Not Suicide Attempt | |
| Suicide Attempt | a = 381 | c = 79 |
| Not Suicide Attempt | b = 7 | d = 33 |
|
| ||
| Precision (PPV) (a/a + c) | 82.8% | |
| Recall (Sensitivity) (a/a + b) | 98.2% | |
Figure 2Venn diagrams comparing patient numbers obtained when (a) using NLP to identify suicide ideation versus using Risk Assessment (structured) fields only; (b) using NLP to identify suicide attempts versus using Risk Assessment (structured) fields only and (c) using NLP to identify suicide attempt versus using ICD-10 codes for suicide attempt only.