| Literature DB >> 35172999 |
Riley Botelle1, Vishal Bhavsar2, Giouliana Kadra-Scalzo3, Aurelie Mascio3, Marcus V Williams4, Angus Roberts5,6, Sumithra Velupillai3, Robert Stewart3,7.
Abstract
OBJECTIVE: This paper evaluates the application of a natural language processing (NLP) model for extracting clinical text referring to interpersonal violence using electronic health records (EHRs) from a large mental healthcare provider.Entities:
Keywords: health informatics; mental health; psychiatry; public health
Mesh:
Substances:
Year: 2022 PMID: 35172999 PMCID: PMC8852656 DOI: 10.1136/bmjopen-2021-052911
Source DB: PubMed Journal: BMJ Open ISSN: 2044-6055 Impact factor: 3.006
Figure 1Process of annotation, development and evaluation of natural language processing (NLP) models.
Examples of text fragments, with keywords italicised, extracted for annotation in this study, alongside corresponding labels and assigned annotations
| Example of text fragment | Label | Annotation |
| ‘They were | Violence presence, victim | Affirmed |
| ‘Patient used to | Violence presence, perpetrator; physical, domestic | Affirmed |
| ‘Patient stabbed his roommate’ | Violence presence, perpetrator; physical, domestic | Affirmed |
| ‘Expressed a lot of interest in | Violence presence | Irrelevant |
| ‘No | Violence presence | Negated |
Figure 2Flow chart of extract annotation process.
NLP model performances on the training and testing dataset (3771 text extracts) and well as a blind test set with a 90% probability threshold (100 sentences) for the six labels
| Annotation label | Training set (average score on 10-fold cross-validation) | Blind test set | ||
| Precision | Recall | F1 score | F1 score | |
| Violence presence | 93% | 93% | 93% | 95% |
| Patient status: perpetrator | 89% | 89% | 89% | 85% |
| Patient status: victim | 91% | 89% | 91% | 90% |
| Violence type: domestic | 94% | 94% | 94% | 93% |
| Violence type: physical | 91% | 92% | 91% | 98% |
| Violence type: sexual | 98% | 97% | 97% | 93% |
Proportion of each label in the training and testing dataset—affirmed or negated/irrelevant
| Annotation label | Affirmed, N (%) | Negated or irrelevant, N (%) | Total |
| Violence presence | 2199 (58) | 1572 (42) | 3771 |
| Patient status: perpetrator | 1350 (61) | 849 (39) | 2199 |
| Patient status: victim | 731 (33) | 1468 (67) | 2199 |
| Violence type: domestic | 723 (33) | 1476 (67) | 2199 |
| Violence type: physical | 1724 (78) | 475 (22) | 2199 |
| Violence type: sexual | 353 (16) | 1846 (84) | 2199 |
Each text extract was first annotated for the violence presence label, then if this was affirmed, further annotated for the other labels related to patient status and violence type (see figure 1 for further details). Therefore, denominator totals for the violence presence label is larger than that for the other labels.
Overlap of labels present in affirmed annotations, showing the number and percentage of annotations that shared different labels
| Perpetrator, N (%) | Victim, N (%) | Sexual, N (%) | Physical, N (%) | |
| Perpetrator | – | – | – | – |
| Victim | 113 (8.4) | – | – | – |
| Sexual | 150 (11.1) | 199 (27.2) | – | – |
| Physical | 1078 (79.9) | 616 (84.3) | 304 (86.1) | – |
| Domestic | 331 (24.5) | 318 (43.5) | 104 (29.5) | 593 (34.4) |
|
|
Kappa agreement between manually and automatically assigned categories in the training and testing set (3771 sentences)
| Annotation label | Model-to-annotator agreement |
| Violence presence | 98.1% |
| Patient status: perpetrator | 97.4% |
| Patient status: victim | 96.2% |
| Violence type: domestic | 98.7% |
| Violence type: physical | 98.3% |
| Violence type: sexual | 96.8% |
Model error analysis on training and testing set (3771 sentences)
| Annotation label | False positives | False negatives | Total number of errors |
| Violence presence | 24 | 10 | 34 |
| Patient status: perpetrator | 35 | 12 | 47 |
| Patient status: victim | 5 | 40 | 45 |
| Violence type: domestic | 10 | 5 | 15 |
| Violence type: physical | 7 | 27 | 34 |
| Violence type: sexual | 11 | 10 | 21 |
Characteristics of patients whose text extracts were annotated as part of this study
| Frequency, N (%) | |
| Age (years) | |
| <20 | 167 (5.9) |
| 20 to <40 | 791 (27.9) |
| 40 to <60 | 1273 (45.0) |
| 60 to <80 | 458 (16.2) |
| 80< | 141 (5.0) |
| Missing | 2 (0.1) |
| Gender | |
| Male | 1216 (42.9) |
| Female | 1614 (57.0) |
| Missing | 2 (0.1) |
| Marital status | |
| Single | 1865 (65.9) |
| Married/Cohabiting | 344 (12.2) |
| Divorced/Separated | 262 (9.3) |
| Widowed | 85 (3.0) |
| Missing | 276 (9.8) |
| Ethnicity | |
| White | 1482 (52.3) |
| Black | 104 (3.7) |
| Asian | 160 (5.7) |
| Mixed | 885 (31.3) |
| Other | 90 (3.2) |
| Missing | 111 (3.9) |
| ICD-10 diagnosis | |
| F0–9: organic, including symptomatic, mental disorders | 185 (6.5) |
| F10–19: mental and behavioural disorders due to psychoactive substance use | 94 (3.3) |
| F20–29: schizophrenia, schizotypal and delusional disorders | 1031 (36.4) |
| F30–39: mood (affective) disorders | 451 (15.9) |
| F40–49: neurotic, stress-related and somatoform disorders | 203 (7.2) |
| F50–59: behavioural syndromes associated with physiological disturbances and physical factors | 18 (0.6) |
| F60–69: disorders of adult personality and behaviour | 236 (8.3) |
| F70–79: mental retardation | 53 (1.9) |
| F80–89: disorders of psychological development | 98 (3.5) |
| F90–99: behavioural and emotional disorders with onset usually occurring in childhood and adolescence and unspecified mental disorder | 211 (7.6) |
| No axis 1 diagnosis | 25 (0.9) |
| G: diseases of the nervous system, X: intentional self-harm, assault or Z: factors influencing health status and contact with health services | 163 (5.8) |
| Missing | 64 (2.3) |
| Total |
*ICD-10 categories G and X were combined with ICD-10 category Z due to small numbers of participants (n<10) in these categories, in order to limit identification of participants.
ICD, International Classification of Diseases.