| Literature DB >> 35342285 |
Yukinori Mashima1,2, Takashi Tamura3, Jun Kunikata1, Shinobu Tada4, Akiko Yamada2, Masatoshi Tanigawa2, Akiko Hayakawa3, Hirokazu Tanabe3, Hideto Yokoi1,2,4.
Abstract
Objective: In recent years, natural language processing (NLP) techniques have progressed, and their application in the medical field has been tested. However, the use of NLP to detect symptoms from medical progress notes written in Japanese, remains limited. We aimed to detect 2 gastrointestinal symptoms that interfere with the continuation of chemotherapy-nausea/vomiting and diarrhea-from progress notes using NLP, and then to analyze factors affecting NLP. Materials and methods: In this study, 200 patients were randomly selected from 5277 patients who received intravenous injections of cytotoxic anticancer drugs at Kagawa University Hospital, Japan, between January 2011 and December 2018. We aimed to detect the first occurrence of nausea/vomiting (Group A) and diarrhea (Group B) using NLP. The NLP performance was evaluated by the concordance with a review of the physicians' progress notes used as the gold standard.Entities:
Keywords: Natural language processing; data mining; drug therapy; drug-related side effects and adverse reactions; electronic health records; pharmacovigilance; progress notes
Year: 2022 PMID: 35342285 PMCID: PMC8943584 DOI: 10.1177/11769351221085064
Source DB: PubMed Journal: Cancer Inform ISSN: 1176-9351
Figure 1.Dataset preparation. EMR, electronic medical record.
Figure 2.Processing for each progress note. AE, adverse event.
Background characteristics.
| Group A for
nausea/vomiting | Group B for diarrhea | ||
|---|---|---|---|
| Age, years | 63 [50-70] | 65 [59-70] | n/a |
| Female | 65.9% (55.3-75.1) | 58.8% (48.2-68.7) | .43 |
| Outpatient | 49.4% (39.0-59.8) | 48.2% (37.9-58.7) | 1 |
| Cancer type | |||
| Gastrointestinal cancer | 23.5% (15.7-33.6) | 22.4% (14.7-32.4) | 1 |
| Pancreas and biliary cancer | 43.5% (33.5-54.1) | 47.1% (36.8-57.6) | .76 |
| Breast cancer | 24.7% (16.7-34.9) | 18.8% (11.8-28.5) | .46 |
| Ovarian cancer | 8.2% (3.8-16.3) | 11.8% (6.3-20.5) | .61 |
| Total number of anticancer drugs | 178 | 158 | n/a |
| Frequency of the top 5 drugs | |||
| Gemcitabine | 14.6% (10.1-20.6) | 19.0% (13.6-25.9) | .31 |
| Fluorouracil | 14.0% (9.6-20.0) | 10.1% (6.2-15.9) | .32 |
| Oxaliplatin | 12.9% (8.7-18.7) | 15.2% (10.4-21.7) | .64 |
| Cyclophosphamide | 11.8% (7.8-17.4) | 8.2% (4.8-13.7) | .37 |
| Paclitaxel | 9.0% (5.5-14.2) | 10.8% (6.7-16.6) | .59 |
| Observation period, days | 21 [14–22] | 21 [14–28] | n/a |
| Progress notes per patient | 5 [3–10] | 5 [3–10] | n/a |
| Characters per progress note | 422.5 [187.75–766] | 390 [181.25–675.5] | n/a |
| Total records in all progress notes, lines | 20 106 | 22 057 | n/a |
| Frequency of records containing dictionary word | |||
| As positive findings | 0.8% (0.6-0.9) | 0.3% (0.2-0.3) | <.001
|
| As negative findings | 1.1% (1.0-1.3) | 0.2% (0.2-0.3) | <.001
|
| As past history | 0.11% (0.07-0.17) | 0.03% (0.01-0.06) | .0011
|
Values are shown as a median [interquartile range] or percentage (95% confidence interval).
P < .005; **P < .001; n/a, not applicable.
NLP system performance.
| GS positive | GS negative | Total | |||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|
| Pattern diagram | Group A | Group B | Pattern diagram | Group A | Group B | Group A | Group B | ||||
| Matched | a) |
| 23 | 13 | b) |
| 48 | 70 | 71 | 83 | .0027
|
| Misdetection | c) |
| 6 | 0 | d) |
| 7 | 1 | 13 | 1 | .0012
|
| Overlooked | e) |
| 1 | 1 | n/a | 1 | 1 | 1 | |||
Abbreviations: AE, adverse event; GS, gold standard; n/a, not applicable.
Arrows indicate timeline, and circles mean timing defined as ‘GS positive’ in progress notes review, inverted triangles mean timing outputted as ‘AE-occurred’ by the NLP system. Values are shown as a number of cases and percentage (95% confidence interval). a)Correct detection, The system outputted the same date and time as GS positive; b)True negative, The system outputted the same as GS negative; c)Early detection, The system outputted date and time that were earlier than GS positive; d)False positive, The system detected an AE incorrectly when the GS was negative; e)Delayed detection or False negative, The system outputted date and time later than GS positive, or the system did not detect the AE despite the GS being positive.
P < .005.
Error analysis.
| Error types | Symptoms to be detected | Error cases | Examples | |
|---|---|---|---|---|
| Misdetection | Overlooked | Original Japanese texts (Translated to English) | ||
| Negative findings | Nausea/vomiting | 8 | n/a | 吐き気は |
| Diarrhea | 1 | n/a | 口腔粘膜炎・ | |
| Past history | Nausea/vomiting | 4 | n/a | 10/14 に吐いた。 |
| Diarrhea | 0 | n/a | n/a | |
| Others§ | Nausea/vomiting | 1 | 1 | 便が出そうで出ないのが |
| Diarrhea | 0 | 1 | 便が軟らかい | |
The spelling variant of negation could not be recognized; ‡The spelling variant of negation could not be recognized, and the dependency structure analysis for the parallel structure was incorrect; §Other errors were noted due to word-sense ambiguity; ||The Japanese word ‘気持ち悪い’ can mean either nausea or discomfort; n/a, not applicable.
Analysis of GS negatives.
| GS negative | ||||||
|---|---|---|---|---|---|---|
| True negative
| False positive
| |||||
| Group A (n = 19) | Group B (n = 11) | Group A (n = 7) | Group B (n = 1) | |||
| Total records containing dictionary word, lines | 3024 | 2198 | n/a | 2641 | 612 | n/a |
| As negative findings | 1.9% (1.4 to 2.4) | 1.3% (0.9 to 1.9) | 0.15 | 1.3% (0.9 to 1.8) | 1.0% (0.4 to 2.2) | 0.68 |
| As past history | 0.07% (0.00 to 0.26) | 0 | n/a | 0.04% (-0.02 to 0.24) | 0 | n/a |
Abbreviations: AE, adverse event; GS, gold standard; n/a, not applicable.
True negative, The system outputted the same as GS negative; ‡ False positive, The system detected an AE incorrectly when the GS was negative.
Figure 3.Word frequencies of AE-related words. AE-related words consist of one morpheme and were counted if they were understood to relate to each symptom; Black bars indicate frequencies of nausea/vomiting-related words in Group A, and the white bar indicates the frequency of diarrhea-related words in Group B; Values shown above the bars indicate the frequency of each word; For example, ‘嘔吐’ means vomiting in English, ‘悪心’ means nausea, and ‘下痢’ means diarrhea; Onomatopoeia, Japanese-specific expressions, misspellings and so on are lined up.
Abbreviation: AE, adverse events.