| Literature DB >> 36148210 |
Guy Divita1, Kathleen Coale1, Jonathan Camacho Maldonado1, Rafael Jiménez Silva1, Elizabeth Rasch1.
Abstract
This paper describes the identification of body function (BF) mentions within the clinical text within a large, national, heterogeneous corpus to highlight structural challenges presented by the clinical text. BF in clinical documents provides information on dysfunction or impairments in the function or structure of organ systems or organs. BF mentions are embedded in highly formatted structures where the formats include implied scoping boundaries that confound existing natural language processing segmentation and document decomposition techniques. This paper describes follow-up work to adapt a rule-based system created using National Institutes of Health records to a larger, more challenging corpus of Social Security Administration data. Results of these systems provide a baseline for future work to improve document decomposition techniques. ©.Entities:
Keywords: ICF; body function; document decomposition; information extraction; natural language processing
Year: 2022 PMID: 36148210 PMCID: PMC9485548 DOI: 10.3389/fdgth.2022.914171
Source DB: PubMed Journal: Front Digit Health ISSN: 2673-253X
Prior work extracting ICF constructs.
| Authors | Entities extracted | Data | System components |
|---|---|---|---|
| Divita et al. | ICF codes:
Strength (b730) Range of Motion (b710) Reflex (b750) Body location Function qualifiers | Physical Therapy/Occupational Therapy notes NIH Clinical Center Data (Biomedical Translational Research Informatics (BTRIS)) records | V3NLP Framework/Sophia/UIMA |
| Kukafka, Bales, Burkhardt and Friedman | ICF codes:
b117 (intellectual functions) d420 (transferring oneself) d530 (toileting) d550 (eating) d5400 (putting on clothes) | Rehabilitation discharge summaries | MedLEE ((Medical Language Extraction and Encoding) |
| Newman-Griffis and Fosler-Lussier | ICF codes:
d410 (Changing basic body position) d415 (Maintaining a body position) d420 (Transferring oneself) d430 (Lifting and carrying objects) d435 (Moving objects with lower extremities) d440 (Fine hand use) d450 (Walking) d455 (Moving around) d460 (Moving around in different locations) d470 (Using transportation) d475 (Driving) | Physical activity reports from NIH Clinical Center Data (Biomedical Translational Research Informatics (BTRIS)) records | Automated-ICF-coding |
ICF, International Classification of Functioning, Disability and Health; NIH, National Institutes of Health.
Distribution of manual annotations in SSA records.
| Annotation type | Training | (Mean per file) | % of total annotations | Std | Testing | (Mean per file) | % of total annotations | Std |
|---|---|---|---|---|---|---|---|---|
| Files | 357 | 90 | ||||||
| Annotations | 6,752 | 17.7 | 20.5 | 1,907 | 19.7 | 22.54 | ||
| BF mention | 1,541 | 4.3 | 22.9 | 4.6 | 464 | 5.1 | 24.3 | 5.8 |
| Strength | 641 | 1.8 | 9.5 | 2.8 | 167 | 1.8 | 8.8 | 3.1 |
| ROM | 872 | 2.4 | 12.9 | 4.2 | 261 | 2.9 | 13.7 | 4.7 |
| Reflex | 253 | 0.7 | 3.8 | 1.5 | 59 | 0.6 | 3.1 | 1.0 |
| Body location | 1,250 | 3.5 | 18.5 | 4.6 | 309 | 3.4 | 16.2 | 4.7 |
| Qualifiers | 1,745 | 4.9 | 25.8 | 5.7 | 514 | 5.7 | 27.0 | 6.6 |
| BF context | 387 | 1.1 | 5.3 | 1.5 | 133 | 1.5 | 7.0 | 1.8 |
| Possible BF | 63 | 0.2 | 0.9 | 0.6 | 27 | 0.3 | 1.4 | 0.8 |
| Chars per file | 2,228.5 | 655.1 | 2,196.8 | 566.0 | ||||
| Lines per file | 83.0 | 31.1 | 82.9 | 28.5 | ||||
| Tokens per file | 389.2 | 116.8 | 387.0 | 105.4 |
BF, Body function; SSA, Social Security Administration.
Figure 1OCR and format challenges: page headers and footers.
Figure 2OCR challenges: multicolumn text munged.
Figure 3(A) Fully formed body function mentions. (B) Examples of body function mentions in the clinical text.
Figure 4Body function context annotation.
Inter-rater agreement between two annotators.
| Round | Macro F1 | Body function F1 |
|---|---|---|
| 1 | 0.52 | 0.38 |
| 2 | 0.57 | 0.52 |
| 3 | 0.77 | 0.71 |
Figure 5Body function mention frequency training sample by page.
Figure 6Body function mention frequency testing sample by page.
Figure 7Body function mention frequency percentage.
Token-based body function evaluation test sample (results from prior work are in parentheses).
| Label | F-1 Score | Recall | Precision | Accuracy |
|---|---|---|---|---|
| BF mention | 0.5871 (0.61) | 0.7795 (0.94) | 0.4709 (0.45) | 0.4489 |
| Qualifiers | 0.6018 (0.56) | 0.7326 (0.85) | 0.5107 (0.42) | 0.5494 |
| Type | 0.5882 (0.63) | 0.6658 (0.88) | 0.5268 (0.49) | 0.5381 |
| Body location | 0.4249 (0.46) | 0.4941 (0.82) | 0.3727 (0.32) | 0.4113 |
BF, Body function.
Entity-based body function evaluation (test sample).
| Label | F-1 Score | Recall | Precision |
|---|---|---|---|
| BF mention | 0.6682 | 0.8857 | 0.5364 |
| Qualifiers | 0.6327 | 0.7976 | 0.5242 |
| Type | 0.6257 | 0.7811 | 0.5218 |
| Body location | 0.4473 | 0.7281 | 0.3228 |
BF, Body function.
Entity-based body function evaluation (training).
| Label | F-1 Score | Recall | Precision |
|---|---|---|---|
| BF mention | 0.6616 | 0.8920 | 0.5258 |
| Qualifiers | 0.6300 | 0.7983 | 0.5204 |
| Type | 0.5662 | 0.7313 | 0.4620 |
| Body location | 0.4797 | 0.7634 | 0.3497 |
BF, Body function.
Figure 8Scoping error example.
Figure 9Ambiguous section name, slot value structures.