| Literature DB >> 34347314 |
Adam N Berman1, David W Biery1, Curtis Ginder2, Olivia L Hulme2, Daniel Marcusa2, Orly Leiva2, Wanda Y Wu1, Nicholas Cardin3, Jon Hainer4, Deepak L Bhatt1, Marcelo F Di Carli1,4, Alexander Turchin3, Ron Blankstein1,4.
Abstract
OBJECTIVE: Accurate ascertainment of comorbidities is paramount in clinical research. While manual adjudication is labor-intensive and expensive, the adoption of electronic health records enables computational analysis of free-text documentation using natural language processing (NLP) tools. HYPOTHESIS: We sought to develop highly accurate NLP modules to assess for the presence of five key cardiovascular comorbidities in a large electronic health record system.Entities:
Keywords: cardiovascular comorbidities; natural language processing
Mesh:
Year: 2021 PMID: 34347314 PMCID: PMC8428009 DOI: 10.1002/clc.23687
Source DB: PubMed Journal: Clin Cardiol ISSN: 0160-9289 Impact factor: 3.287
FIGURE 1Note selection process. Schematic overview of the note selection process for manual adjudication of the five diagnostic concepts targeted for NLP development. Each of the training sets and test set contained 200 unique notes with the same proportion of outpatient and hospital discharge summaries. The NLP designer was blinded to the gold‐standard test set adjudication
Cohen's Kappa on the adjudication of the 200 test set notes
| Module | Cohen's Kappa |
|---|---|
| Hypertension | 0.97 |
| Dyslipidemia | 1.00 |
| Diabetes | 0.96 |
| Coronary artery disease | 0.97 |
| Stroke/TIA | 0.98 |
Unique note‐level and sentence‐level positive references for each diagnostic concept in the 200 test set notes
| Module | Note‐level references | Sentence‐level references |
|---|---|---|
| Hypertension | 82 | 212 |
| Dyslipidemia | 68 | 169 |
| Diabetes | 29 | 128 |
| Coronary artery disease | 54 | 217 |
| Stroke/TIA | 41 | 168 |
FIGURE 2Example adjudication of multi‐layered diagnostic sentence. Example sentence and associated REDCap form of how adjudicators were instructed to input all available classification information for multi‐layered diagnostic information. In the sentence, “Mr. Smith has a history of CAD s/p MI in 2018 requiring 2 stents to his LAD,” adjudicators would click “unspecified CAD,” “myocardial infarction,” and “revascularization” to capture all available data points
FIGURE 3Schematic of building NLP phrase structures. Schematic of building phrase structures to capture diagnostic concepts using defined word classes. This example sentence referencing the placement of stents in a coronary artery would then resolve to an output indicating that the patient had a coronary revascularization procedure
Performance characteristics of each of the five modules
| Performance characteristics of each of the five modules | ||||||
|---|---|---|---|---|---|---|
|
| ||||||
| Sensitivity | Specificity | PPV | ||||
| Original | Corrected | Original | Corrected | Original | Corrected | |
| Note level | 97.5 (91.3–99.7) | 97.5 (91.3–99.7) | 99.2 (95.4–100) | 99.2 (95.4–100) | 98.7 (93.1–100) | 98.7 (93.1–100) |
| Sentence level | 96.2 (92.7–98.4) | 96.3 (92.9–98.4) | NA | NA | 98.1 (95.2–99.5) | 98.1 (95.3–99.5) |
|
| ||||||
| Sensitivity | Specificity | PPV | ||||
| Original | Corrected | Original | Corrected | Original | Corrected | |
| Note level | 97.1 (89.8–99.6) | 97.1 (89.9–99.7) | 100 (97.2–100) | 100 (97.2–100) | 100 (94.6–100) | 100 (94.6–100) |
| Sentence level | 94.7 (90.1–97.5) | 94.8 (90.4–97.6) | NA | NA | 99.4 (96.6–100) | 99.4 (96.6–100) |
|
| ||||||
| Sensitivity | Specificity | PPV | ||||
| Original | Corrected | Original | Corrected | Original | Corrected | |
| Note level | 100 (88.1–100) | 100 (88.1–100) | 98.2 (95.0–99.6) | 98.2 (95.0–99.6) | 90.6 (75.0–98.0) | 90.6 (75.0–98.0) |
| Sentence level | 90.6 (84.2–95.1) | 90.8 (84.4–95.1) | NA | NA | 95.1 (89.6–98.2) | 95.2 (89.8–98.2) |
|
| ||||||
| Sensitivity | Specificity | PPV | ||||
| Original | Corrected | Original | Corrected | Original | Corrected | |
| Note level | 98.2 (90.1–100) | 98.2 (90.1–100) | 94.5 (89.5–97.6) | 94.5 (89.5–97.6) | 86.9 (75.8–94.2) | 86.9 (75.8–94.2) |
| Sentence level | 88.5 (83.5–92.4) | 88.7 (83.8–92.5) | NA | NA | 93.2 (88.9–96.2) | 93.3 (89.1–96.3) |
|
| ||||||
| Sensitivity | Specificity | PPV | ||||
| Original | Corrected | Original | Corrected | Original | Corrected | |
| Note level | 95.1 (83.5–99.4) | 95.1 (83.5–99.4) | 98.7 (95.5–99.8) | 98.7 (95.5–99.8) | 95.1 (83.5–99.4) | 95.1 (83.5–99.4) |
| Sentence level | 85.7 (79.5–90.6) | 86.1 (80.0–90.9) | NA | NA | 94.1 (89.1–97.3) | 94.3 (89.4–97.4) |
Performance characteristics of NLP sub‐categories
| Performance characteristics of NLP sub‐categories | ||||||
|---|---|---|---|---|---|---|
|
| ||||||
| Sensitivity | Specificity | PPV | ||||
| Original | Corrected | Original | Original | Corrected | ||
| Note level | 100 (76.8–100) | 100 (76.8–100) | 100 (98.0–100) | 100 (98.0–100) | 100 (76.8–100) | 100 (76.8–100) |
| Sentence level | 96.6 (82.2–99.9) | 96.6 (82.2–99.9) | NA | NA | 100 (87.7–100) | 100 (87.7–100) |
|
| ||||||
| Sensitivity | Specificity | PPV | ||||
| Original | Corrected | Original | Corrected | Original | Corrected | |
| Note level | 100 (85.2–100) | 100 (85.2–100) | 98.3 (95.1–99.6) | 98.3 (95.1–99.6) | 88.5 (69.8–97.6) | 88.5 (69.8–97.6) |
| Sentence level | 88 (80.0–93.6) | 88.2 (80.4–93.8) | NA | NA | 92.6 (85.4–97.0) | 92.8 (85.7–97.0) |
|
| ||||||
| Sensitivity | Specificity | PPV | ||||
| Original | Corrected | Original | Corrected | Original | Corrected | |
| Note level | 97.9 (88.7–100) | 97.9 (88.7–100) | 96.7 (92.5–98.9) | 96.7 (92.5–98.9) | 90.2 (78.6–96.7) | 90.2 (78.6–96.7) |
| Sentence level | 85.6 (77.9–91.4) | 86.1 (78.6–91.7) | NA | NA | 91.8 (85.0–96.2) | 92.1 (85.5–96.3) |
|
| ||||||
| Sensitivity | Specificity | PPV | ||||
| Original | Corrected | Original | Corrected | Original | Corrected | |
| Note level | 82.6 (61.2–95.1) | 82.6 (61.2–95.1) | 98.9 (96.0–99.9) | 98.9 (96.0–99.9) | 90.5 (69.6–98.8) | 90.5 (69.6–98.8) |
| Sentence level | 86 (72.1–94.7) | 86 (72.1–94.7) | NA | NA | 92.5 (79.6–98.4) | 92.5 (79.6–98.4) |
|
| ||||||
| Sensitivity | Specificity | PPV | ||||
| Original | Corrected | Original | Corrected | Original | Corrected | |
| Note level | 95.8 (78.9–99.9) | 95.8 (78.9–99.9) | 98.3 (95.1–99.6) | 98.3 (95.1–99.6) | 88.5 (69.8–97.6) | 88.5 (69.8–97.6) |
| Sentence level | 87.8 (78.2–94.3) | 87.8 (78.2–94.3) | NA | NA | 94.2 (85.8–98.4) | 94.2 (85.8–98.4) |
|
| ||||||
| Sensitivity | Specificity | PPV | ||||
| Original | Corrected | Original | Corrected | Original | Corrected | |
| Note level | 95.0 (75.1–99.9) | 95.2 (76.2–99.9) | 98.3 (95.2–99.7) | 98.3 (95.2–99.7) | 86.4 (65.1–97.1) | 87.0 (66.4–97.2) |
| Sentence level | 96.7 (88.7–99.6) | 96.8 (88.8–99.6) | NA | NA | 88.1 (77.8–94.7) | 88.2 (78.1–94.8) |
|
| ||||||
| Sensitivity | Specificity | PPV | ||||
| Original | Corrected | Original | Corrected | Original | Corrected | |
| Note level | 80.0 (44.4–97.5) | 80.0 (44.4–97.5) | 100.0 (98.1–100) | 100.0 (98.1–100) | 100.0 (63.1–100) | 100.0 (63.1–100) |
| Sentence level | 70.6 (44.0–89.7) | 73.7 (48.8–90.9) | NA | NA | 92.3 (64.0–99.8) | 93.3 (68.1–99.8) |
|
| ||||||
| Sensitivity | Specificity | PPV | ||||
| Original | Corrected | Original | Corrected | Original | Corrected | |
| Note level | 100.0 (86.8–100) | 100.0 (87.2–100) | 98.9 (95.9–99.9) | 98.8 (95.9–99.9) | 92.9 (76.5–99.1) | 93.1 (77.2–99.2) |
| Sentence level | 90.2 (79.8–96.3) | 90.3 (80.1–96.4) | NA | NA | 84.6 (73.5–92.4) | 84.9 (73.9–92.5) |
|
| ||||||
| Sensitivity | Specificity | PPV | ||||
| Original | Corrected | Original | Corrected | Original | Corrected | |
| Note level | 100.0 (59.0–100) | 100.0 (59.0–100) | 98.4 (95.5–99.7) | 98.4 (95.5–99.7) | 70.0 (34.8–93.3) | 70.0 (34.8–93.3) |
| Sentence level | 100.0 (73.5–100) | 100.0 (75.3–100) | NA | NA | 75.0 (47.6–92.7) | 76.5 (50.1–93.2) |