| Literature DB >> 33827555 |
Eva S Klappe1, Florentien J P van Putten2, Nicolette F de Keizer2, Ronald Cornet2.
Abstract
BACKGROUND: Accurate, coded problem lists are valuable for data reuse, including clinical decision support and research. However, healthcare providers frequently modify coded diagnoses by including or removing common contextual properties in free-text diagnosis descriptions: uncertainty (suspected glaucoma), laterality (left glaucoma) and temporality (glaucoma 2002). These contextual properties could cause a difference in meaning between underlying diagnosis codes and modified descriptions, inhibiting data reuse. We therefore aimed to develop and evaluate an algorithm to identify these contextual properties.Entities:
Keywords: Electronic health record; Problem list; Problem-oriented medical record; Reuse of clinical data; Rule-based algorithm development; Single-center and multicenter validation
Mesh:
Year: 2021 PMID: 33827555 PMCID: PMC8028823 DOI: 10.1186/s12911-021-01477-y
Source DB: PubMed Journal: BMC Med Inform Decis Mak ISSN: 1472-6947 Impact factor: 2.796
Characteristics of the single-center and multicenter dataset
| Dataset/characteristics | Single-center dataset: Amsterdam UMC | Multicenter dataset: five Dutch hospitals |
|---|---|---|
| Total records available, n | 288,935 | 1,035,059 |
| Modified descriptions, n(%) | 73,280 (25.4) | 175,210 (16.9) |
| Time period | 1-1-2017–31-12-2017 | 28-4-2018–29-5-2019 |
| Medical specialties, n | 37 | 62 original; 41 after clustering |
| Usage for this study | Development and internal validation of algorithm | Multicenter validation of algorithm and to measure the frequency of types of contextual properties |
Fig. 1Algorithm regular expressions and corresponding categories. The rectangles in the first and second column contain the regular expressions. the rectangles in the third column contain the properties that result from the inclusion of the regular expressions. 1Modified descriptions can be classified according to multiple contextual properties (uncertainty, laterality and/or temporality)
Interrater reliability and kappa scores for the internal (n = 980) and multicenter (n = 996) validation set
| Property (n)/dataset | Interrater reliability (%)a | Kappa score | ||
|---|---|---|---|---|
| Internal validation set (n = 980) | Multicenter validation set (n = 996) | Internal validation set (n = 980) | Multicenter validation set (n = 996) | |
| Laterality | 98.0 (NESK = 245, NFJP = 233) | 98.1 (NESK = 288, NFJP = 293) | 0.94 | 0.95 |
| Temporality | 97.7 (NESK = 163, NFJP = 157) | 98.1 (NESK = 96, NFJP = 85) | 0.91 | 0.88 |
| Uncertainty | 96.1 (NESK = 135, NFJP = 107) | 97.5 (NESK = 98, NFJP = 87) | 0.82 | 0.85 |
| Removal of uncertainty | 98.3 (NESK = 11, NFJP = 14) | 99.3 (NESK = 8, NFJP = 11) | 0.36 | 0.63 |
Descending on Kappa scores
aNESK is the sum of the records in the corresponding property according to annotator ESK and NFJP is the sum of the records in the corresponding property according to annotator FJP
Actual prevalence, recall, specificity and precision of all four properties in the internal (n = 980) and multicenter validation set (n = 996)
| Property | Actual prevalence (%) (95% CI) | Recall (95% CI) | Specificity (95% CI) | Precision (95% CI) | ||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Internal validation set | Multicenter validation set | Weighted mean (%) | Internal validation set | Multicenter validation set | Weighted mean (%) | Internal validation set | Multicenter validation set | Weighted mean (%) | Internal validation set | Multicenter validation set | Weighted mean (%) | |
| Lateralitya | 25.3 (22.6–28.2) | 29.2 (26.4–32.1) | 27.4 (24.7–30.1) | 100 (98.5–100) | 99.7 (98.1–100) | 99.9 (99.4–100) | 99.0 (98.1–99.6) | 97.2 (95.7–98.2) | 98.6 (97.9–99.2) | 97.3 (94.4–98.9) | 93.5 (90.2–96.0) | 96.1 (95.3–97.0) |
| Temporality | 16.9 (14.6–19.4) | 9.4 (7.7–11.4) | 12.4 (7.1–17.7) | 97.6 (93.9–99.3) | 96.8 (91.0–99.3) | 97.4 (95.3–99.4) | 99.5 (98.7–99.9) | 98.9 (98.0–99.5) | 99.3 (98.9–99.7) | 97.6 (93.9–99.3) | 90.1 (82.5–95.1) | 96.0 (95.2–96.9) |
| Uncertainty | 11.9 (10.0–14.1) | 10.2 (8.4–12.3) | 11.0 (9.8–12.2) | 99.1 (95.3–100) | 86.3 (78.0–92.3) | 98.1 (96.1–100) | 98.5 (97.4–99.2) | 98.8 (97.8–99.4) | 98.6 (98.1–99.2) | 89.9 (83.4–94.5) | 88.9 (81.0–94.3) | 89.4 (88.1–90.8) |
| Removal of uncertaintya | 0.9 (0.4–1.7) | 1.4 (0.8–2.3) | 1.2 (0.8–1.6) | 100 (66.4–100) | 57.1 (28.9–82.3) | 90.4 (78.5–100) | 99.4 (98.7–99.8) | 99.9 (99.4–100) | 99.8 (99.6–100) | 60.0 (32.3–83.7) | 88.9 (51.8–99.7) | 80.6 (78.9–82.2) |
The numbers are calculated using the reference standard
aRemoval of Uncertainty and Laterality both contained zeros in the confusion matrix
Error analysis of false positives (FP) and false negatives (FN) in the internal validation set (n = 980)
| Total = FP + FN | Missing terms (%) | Simple extension (%) | Outside frameworka (%) | Annotation / implementation (%) | |
|---|---|---|---|---|---|
| Laterality | 7 = 7 + 0 | 0 (0.0) | 6 (85.7) | 0 (0.0) | 1 (14.3) |
| Temporality | 8 = 4 + 4 | 4 (50.0) | 3 (37.5) | 1 (12.5) | 0 (0.0) |
| Uncertainty | 14 = 13 + 1 | 2 (14.3) | 8 (57.1) | 4 (28.6) | 0 (0.0) |
| Removal of Uncertainty | 6 = 6 + 0 | 0 (0) | 5(8.3) | 0 (0.0) | 1 (16.7) |
| Total | 35 = 30 + 5 | 7 (20.0) | 22 (62.8) | 5 (14.3) | 1 (2.9) |
aOutside framework consisted only of errors that related to term order
Error analysis of false positives (FP) and false negatives (FN) in the multicenter validation set (n = 996)
| Total = FP + FN | Missing terms (%) | Simple extension (%) | Outside frameworka (%) | Annotation / implementation (%) | |
|---|---|---|---|---|---|
| Laterality | 21 = 20 + 1 | 1 (4.8) | 19 (90.4) | 0 (0.0) | 1 (4.8) |
| Temporality | 13 = 10 + 3 | 3 (23.1) | 7 (53.8) | 1 (7.7) | 2 (15.4) |
| Uncertainty | 25 = 11 + 14 | 6 (24.0) | 7 (28.0) | 8 (32.0) | 4 (16.0) |
| Removal of Uncertainty | 7 = 1 + 6 | 3 (42.9) | 1 (14.3) | 0 (0.0) | 3 (42.8) |
| Total | 66 = 42 + 24 | 16 (24.3) | 34 (51.5) | 9 (13.6) | 7 (10.6) |
aOutside framework consisted only of errors that related to term order
Number and percentages of the properties identified by the algorithm in the modified descriptions per dataset
| Property/dataset | Amsterdam UMC dataset | Multicenter dataset | Amsterdam UMC dataset (n = 73,280) | Multicenter dataset | ||||
|---|---|---|---|---|---|---|---|---|
| N | Apparent prevalence % (95% CI) | N | Apparent prevalence % (95% CI) | N | Actual prevalence % (95% CI) | N | Actual prevalence % (95% CI) | |
| Laterality | 17,656 | 24.1 (23.8–24.4) | 54,081 | 30.9 (30.7–31.1) | 17,934 | 24.5 (24.2–24.8) | 54,931 | 31.4 (31.1–31.6) |
| Uncertainty | 11,834 | 16.1 (15.9–16.4) | 16,779 | 9.6 (9.4–9.7) | 12,235 | 16.7 (16.4–17.0) | 17,347 | 9.9 (9.5–10.3) |
| Temporality | 10,758 | 14.7 (14.4–14.9) | 16,582 | 9.5 (9.3–9.6) | 11,130 | 15.2 (14.9–15.4) | 17,156 | 9.8 (9.7–9.9) |
| Removal of uncertainty | 677 | 0.9 (0.9–1.0) | 1,991 | 1.1 (1.1–1.2) | 751 | 1.0 (1.0–1.1) | 2,208 | 1.3 (1.2–1.3) |
Ordered by descending percentages of contextual properties
Original names of the specialties and the new names of the specialties
| Original names | New names |
|---|---|
| Allergology | Allergology |
| Anesthesiology | Anesthesiology |
| Pharmacists | Pharmacists |
| Audiology | Audiology |
| Audiological centres | Audiology |
| Cardiology | Cardiology |
| Cardio pulmonary surgery | Thoracic surgery |
| Thoracic surgery | Thoracic surgery |
| Surgery (Dutch: chirurgie) | Surgery |
| Dermatology | Dermatology |
| Diabetes Nurses | Specialized nurses (diabetes, emergency care, oncology, ostomy and unspecified) |
| Dieticians | Dieticians |
| Occupational therapists | Occupational and physio therapists |
| Physio therapists | Occupational and physio therapists |
| Gastroenterology | Gastric, intestinal and liver diseases |
| Geriatrics | Geriatrics |
| Specialized nurse, diabetes | Specialized nurses (diabetes, emergency care, oncology, ostomy and unspecified) |
| Specialized nurse, oncology | Specialized nurses (diabetes, emergency care, oncology, ostomy and unspecified) |
| Specialized nurse, ostomy | Specialized nurses (diabetes, emergency care, oncology, ostomy and unspecified) |
| Gynecology / obstetrics | Obstetrics and Gynaecology |
| Surgery (Dutch: heelkunde) | Surgery |
| Intensive care medicine | Intensive care medicine |
| Internal medicine | Internal medicine |
| Oral surgery | Oral and maxillo-facial surgery |
| Ear, nose, throat (ENT) | Ear, nose, throat (ENT) |
| Pediatrics | Pediatrics |
| Clinical genetics | Clinical genetics |
| Lung diseases | Lung diseases |
| Gastric, intestinal and liver diseases | Gastric, intestinal and liver diseases |
| Medical microbiology | Medical microbiology |
| Oral and maxillo-facial surgery | Oral and maxillo-facial surgery |
| Neurosurgery | Neurosurgery |
| Neurology | Neurology |
| Obstetrics and gynecology | Obstetrics, gynecology and women's diseases |
| Oncology nurses | Specialized nurses (diabetes, emergency care, oncology, ostomy and unspecified) |
| Ophthalmology | Ophthalmology |
| Optometrist | Ophthalmology |
| Orthopedics | Orthopedics |
| Orthoptist | Ophthalmology |
| Others: general practitioner | Others: general practitioner |
| Others: emergency care | Emergency care |
| Others: sports medicine | Sports medicine |
| Others | Others |
| Physician Assistant | physician assistant |
| Plastic surgery | Plastic surgery |
| Psychiatry | Psychiatry |
| Psychologists | Psychologists |
| Psychologists, unspecified | Psychologists |
| Radiology | Radiology |
| Radiotherapy | Radiotherapy |
| Rheumatology | Rheumatology |
| Rehabilitation | Rehabilitation |
| Emergency care | Emergency care |
| Sports medicine | Sports medicine |
| Dentists: general practitioner | Dentists |
| Dentist specialists oral diseases and oral surgery | MKA surgery |
| Urology | Urology |
| Obstetricians, authorized for ultrasound | Obstetricians (authorized for ultrasound and unspecified) |
| Obstetricians, unspecified | Obstetricians (authorized for ultrasound and unspecified) |
| Specialized nurse, emergency care | Specialized nurses (diabetes, emergency care, oncology, ostomy and unspecified) |
| Nurse, unspecified | Specialized nurses (diabetes, emergency care, oncology, ostomy and unspecified) |
| Women’s diseases | Obstetrics, gynecology and women's diseases |
Confusion matrix example
| Outcome of the algorithm | Golden standard: the manual annotated datasets | |
|---|---|---|
| Yes | No | |
| Yes | True Positive (TP) | False Positive (FP) |
| No | False Negative (FN) | True Negative (TN) |
The uncertainty confusion matrix of the internal validation set
| Uncertainty | Golden standard | |
|---|---|---|
| Annotators predicted: Yes | Annotators predicted: No | |
| Algorithm predicted: Yes | TP = 116 | FP = 13 |
| Algorithm predicted: No | FN = 1 | TN = 850 |
The laterality confusion matrix of the internal validation set
| Laterality | Golden standard | |
|---|---|---|
| Annotators predicted: Yes | Annotators predicted: No | |
| Algorithm predicted: Yes | TP = 248 | FP = 7 |
| Algorithm predicted: No | FN = 0 | TN = 725 |
The temporality confusion matrix of the internal validation set
| Temporality | Golden standard | |
|---|---|---|
| Annotators predicted: Yes | Annotators predicted: No | |
| Algorithm predicted: Yes | TP = 162 | FP = 4 |
| Algorithm predicted: No | FN = 4 | TN = 810 |
The removal of uncertainty confusion matrix of the internal validation set
| Removal of uncertainty | Golden standard | |
|---|---|---|
| Annotators predicted: Yes | Annotators predicted: No | |
| Algorithm predicted: Yes | TP = 9 | FP = 6 |
| Algorithm predicted: No | FN = 0 | TN = 965 |
The uncertainty confusion matrix of the multicenter validation set
| Uncertainty | Golden standard | |
|---|---|---|
| Annotators predicted: Yes | Annotators predicted: No | |
| Algorithm predicted: Yes | TP = 88 | FP = 11 |
| Algorithm predicted: No | FN = 14 | TN = 883 |
The laterality confusion matrix of the multicenter validation set
| Laterality | Golden standard | |
|---|---|---|
| Annotators predicted: Yes | Annotators predicted: No | |
| Algorithm predicted: Yes | TP = 290 | FP = 20 |
| Algorithm predicted: No | FN = 1 | TN = 685 |
The temporality confusion matrix of the multicenter validation set
| Temporality | Golden standard | |
|---|---|---|
| Annotators predicted: Yes | Annotators predicted: No | |
| Algorithm predicted: Yes | TP = 91 | FP = 10 |
| Algorithm predicted: No | FN = 3 | TN = 892 |
The removal of uncertainty confusion matrix of the multicenter validation set
| Removal of uncertainty | Golden standard | |
|---|---|---|
| Annotators predicted: Yes | Annotators predicted: No | |
| Algorithm predicted: Yes | TP = 8 | FP = 1 |
| Algorithm predicted: No | FN = 6 | TN = 981 |
Specialties and properties in n and % in the multicenter dataset (n = 175,210). Ordered by descending number of modified descriptions. Shown is the actual prevalence, which was determined using the Rogan–Gladen estimator
| Specialties | Modified descriptions, n | Uncertainty, n (%) | Laterality, n (%) | Temporality, n (%) | Removal of uncertainty, n (%) |
|---|---|---|---|---|---|
| Total | 175,210 | 17,347 (9.9) | 54,931 (31.4) | 17,156 (9.8) | 2208 (1.3) |
| Internal medicine | 30,457 | 2326 (7.6) | 4254 (14.0) | 4822 (15.8) | 562 (1.8) |
| Pediatrics | 18,776 | 1818 (9.7) | 2018 (10.7) | 1475 (7.9) | 238 (1.3) |
| Ophthalmology | 16,901 | 332 (2.0) | 13,471 (79.7) | 605 (3.6) | 253 (1.5) |
| Neurology | 12,375 | 1990 (16.1) | 4451 (36.0) | 798 (6.4) | 145 (1.2) |
| Cardiology | 10,766 | 460 (4.3) | 983 (9.1) | 351 (3.3) | 232 (2.2) |
| Surgery | 9418 | 298 (3.2) | 4626 (49.1) | 1255 (13.3) | 25 (0.3) |
| Orthopedics | 8279 | 192 (2.3) | 7355 (88.8) | 1005 (12.1) | 1 (0.01) |
| Emergency care | 7796 | 235 (3.0) | 3304 (42.4) | 2357 (30.2) | 21 (0.3) |
| Ear, nose, throat (ENT) | 7349 | 652 (8.9) | 2908 (39.6) | 366 (5.0) | 8 (0.1) |
| Lung diseases | 6196 | 611 (9.9) | 676 (10.9) | 298 (4.8) | 80 (1.3) |
| Other | 6019 | 288 (4.8) | 1591 (26.4) | 725 (12.0) | 36 (0.6) |
| Clinical genetics | 5655 | 5287 (93.5) | 238 (4.2) | 2 (0.04) | 27 (0.5) |
| Obstetrics, gynecology and women's diseases | 4914 | 226 (4.6) | 315 (6.4) | 185 (3.8) | 114 (2.3) |
| Anesthesiology | 3767 | 192 (5.1) | 640 (17.0) | 456 (12.1) | 8 (0.2) |
| Rheumatology | 3158 | 243 (7.7) | 435 (13.8) | 353 (11.2) | 35 (1.1) |
| Gastric, intestinal and liver diseases | 3043 | 197 (6.5) | 174 (5.7) | 529 (17.4) | 33 (1.1) |
| Plastic surgery | 2918 | 64 (2.2) | 1636 (56.1) | 44 (1.5) | 1 (0.03) |
| Geriatrics | 2889 | 359 (12.4) | 650 (22.5) | 118 (4.1) | 25 (0.9) |
| Neurosurgery | 2135 | 256 (12.0) | 879 (41.2) | 270 (12.6) | 19 (0.9) |
| Dermatology | 2047 | 397 (19.4) | 427 (20.9) | 48 (2.3) | 10 (0.5) |
| Radiotherapy | 1856 | 83 (4.5) | 909 (49.0) | 53 (2.9) | 6 (0.3) |
| Urology | 1306 | 35 (2.7) | 486 (37.2) | 58 (4.4) | 49 (3.8) |
| Others: general practitioner | 1223 | 42 (3.4) | 222 (18.2) | 19 (1.6) | 4 (0.3) |
| Audiology | 1127 | 4 (0.4) | 133 (11.8) | 0 (0.0) | 0 (0.0) |
| Thoracic surgery | 1065 | 68 (6.4) | 141 (13.2) | 87 (8.2) | 3 (0.3) |
| Specialized nurses (diabetes, emergency care, oncology, ostomy and unspecified) | 749 | 12 (1.6) | 299 (39.9) | 146 (19.5) | 5 (0.7) |
| Rehabilitation | 724 | 23 (3.2) | 453 (62.6) | 102 (14.1) | 0 (0.0) |
| Sports medicine | 638 | 0 (0.0) | 31 (4.9) | 0 (0.0) | 0 (0.0) |
| Psychiatry | 486 | 23 (4.7) | 27 (5.6) | 36 (7.4) | 0 (0.0) |
| Obstetricians (authorized for ultrasound and unspecified) | 354 | 21 (5.9) | 9 (2.5) | 9 (2.5) | 1 (0.3) |
| Physician assistant | 225 | 10 (4.4) | 160 (71.1) | 0 (0.0) | 0 (0.0) |
| Oral and maxillo-facial surgery | 207 | 1 (0.5) | 101 (48.8) | 4 (3.4) | 1 (0.9) |
| Intensive care | 138 | 9 (6.5) | 24 17.4) | 3 (2.2) | 0 (0.0) |
| Radiology | 69 | 4 (5.8) | 35 (50.7) | 2 (2.9) | 0 (0.0) |
| Medical microbiology | 59 | 6 (10.2) | 1 (1.7) | 0 (0.0) | 49 (83.1) |
| Allergology | 55 | 14 (25.5) | 5 (9.1) | 0 (0.0) | 0 (0.0) |
| Dentists | 53 | 1 (1.9) | 3 (5.7) | 1 (1.9) | 0 (0.0) |
| Occupational and physio therapists | 11 | 0 (0.0) | 7 (63.6) | 0 (0.0) | 0 (0.0) |
| Psychologists | 5 | 0 (0.0) | 1 (20.0) | 0 (0.0) | 0 (0.0) |
| Pharmacists | 1 | 0 (0.0) | 0 (0.0) | 0 (0.0) | 0 (0.0) |
| Dieticians | 1 | 0 (0.0) | 0 (0.0) | 0 (0.0) | 0 (0.0) |