| Literature DB >> 35090449 |
Jingzhi Yu1, Jennifer A Pacheco2, Anika S Ghosh2, Yuan Luo2, Chunhua Weng3, Ning Shang3, Barbara Benoit4, David S Carrell5, Robert J Carroll6, Ozan Dikilitas7, Robert R Freimuth8, Vivian S Gainer4, Hakon Hakonarson9, George Hripcsak3, Iftikhar J Kullo7, Frank Mentch9, Shawn N Murphy4, Peggy L Peissig10, Andrea H Ramirez6, Nephi Walton11, Wei-Qi Wei6, Luke V Rasmussen12.
Abstract
INTRODUCTION: Currently, one of the commonly used methods for disseminating electronic health record (EHR)-based phenotype algorithms is providing a narrative description of the algorithm logic, often accompanied by flowcharts. A challenge with this mode of dissemination is the potential for under-specification in the algorithm definition, which leads to ambiguity and vagueness.Entities:
Keywords: Algorithm: Natural Language; Ambiguity; Electronic Health Records (EHR); Phenotyping; Under-Specification; Vagueness
Mesh:
Year: 2022 PMID: 35090449 PMCID: PMC8796627 DOI: 10.1186/s12911-022-01759-z
Source DB: PubMed Journal: BMC Med Inform Decis Mak ISSN: 1472-6947 Impact factor: 2.796
Fig. 1Example of raising issues of vagueness and under-specification in the PheKB database, from the Chronic Kidney Disease phenotype. https://phekb.org/phenotype/chronic-kidney-disease
Counts of vagueness and under-specification in narrative phenotype algorithms
| Code | Category | Sub-category | Description | Total instances | Phenotype count (%) |
|---|---|---|---|---|---|
| 1.1 | Definition of variable | Attributes of variable | Under-specification in attributes (min, max, etc.) of a variable | 47 | 13 (68.4%) |
| 1.1.1.a | Time point | Temporal entity | Under-specification of the time anchor or point of reference for a certain criterion | 22 | 11 (57.9%) |
| 1.1.1.b | Time point | Temporal interval | Under-specification of the range of time you are looking at to find a certain criteria (diagnosis, medication, lab, etc.) | 6 | 5 (26.3%) |
| 1.1.2.a | Threshold | Missing threshold | Vagueness or under-specification for a criterion in the phenotype algorithm | 2 | 2 (10.5%) |
| 1.1.2.b | Threshold | Quantifying qualitative terms | Vagueness or under-specification in the qualitative term describing a criterion (e.g., chronic, young, old, severe, negative, positive) and lacking quantitative values | 1 | 1 (5.3%) |
| 1.1.2.c | Threshold | Units | The units associated with the numeric value (e.g., mg/dL) are not specified | 2 | 1 (5.3%) |
| 1.2 | Definition of variable | Alternatives to missing data | Request for instructions when data elements not available | 6 | 5 (26.3%) |
| 1.3 | Definition of variable | Code/acronym/term definition | Under-specification regarding acronyms, variables or codes. This could be related to: 1. Local and unique codes 2. Coding/terminology system (including use of base codes) 3. Vague terminology/codes | 28 | 11 (57.9%) |
| 1.4 | Definition of variable | Location in EHR | Under-specification regarding how or where certain criteria/variables should be obtained within the EHR | 10 | 6 (31.6%) |
| 2.1 | Data dictionary | Data delivery | Under-specification regarding how the data dictionaries should be structured and how to be delivered to site | 3 | 2 (10.5%) |
| 2.2 | Data dictionary | Information inclusion | Under-specification regarding what results should be included in the data dictionary | 31 | 10 (52.6%) |
| 2.3 | Data dictionary | Results presentation and formatting | Under-specification regarding the formatting of the results in the data dictionary. This may include numeric formatting (e.g., number of decimal places), or granularity of units (e.g., date of birth vs. age) | 27 | 8 (42.1%) |
| 3.1 | Logic | Discordant logic | Discrepancy between the written description and the flow chart or the procedures in the flowchart | 17 | 8 (42.1%) |
| 3.2 | Logic | Missing rationale or context | Under-specification in the rationale and/or context of the phenotype for its appropriate application | 11 | 8 (42.1%) |
| 3.3 | Logic | Population criteria | Vagueness and under-specification in the criteria differences between the case and control or other cohort definitions | 20 | 11 (57.9%) |
A total of 304 instances were found across 253 comments (a single comment could exhibit more than one category). Sub-codes are more specific and considered distinct from a higher-level code. Total instances denote the aggregate count of unique instances of under-specifications found across all phenotypes
Fig. 2Categories of under-specification and other common issues identified in narrative phenotype algorithms
Examples of under-specifications in categories with prevalence in over 50% of narrative phenotypes algorithms
| Code | Category | Sub-category | Examples |
|---|---|---|---|
| 1.1 | Definition of variable | Attributes of variable | 1. For Bilirubin, do we need to collect total bilirubin, [conjugated], [unconjugated], or all 2. By critical care, do you mean emergency department and/or other "critical" departments, & if so, which types? intensive care, and/or some type of cardiac critical care? |
| 1.1.1.a | Time point | Temporal entity | 1. TPN Dx are only excluded if the [sic] occur in the 365 days before first NAFLD Diagnosis code? 2. Which date selected if there are multiple CPTs on multiple dates? What is the definition of the 1st MACE event? |
| 1.3 | Definition of variable | Code/acronym/term definition | 1. Clarification on use of “3 digit” ICD code 2. Are LOINC codes available for MRSA culture tests? |
| 3.3 | Logic | Population criteria | 1. Case 1 & 2 criteria are "AND" criteria, i.e., all 3 criteria must be met? 2. How can we define a case who satisfies the criteria defined for both case 1 and 2? |
| 2.2 | Data dictionary | Information inclusion | 1. Data dictionary indicates that you want height, weight, and BMI as repeated measures. Should the user include all such codes? 2. Do you only want the encounters (LOS) that only have height or weight? |