| Literature DB >> 26342218 |
Huan Mo1, William K Thompson2, Luke V Rasmussen3, Jennifer A Pacheco4, Guoqian Jiang5, Richard Kiefer5, Qian Zhu6, Jie Xu7, Enid Montague7, David S Carrell8, Todd Lingren9, Frank D Mentch10, Yizhao Ni9, Firas H Wehbe3, Peggy L Peissig11, Gerard Tromp12, Eric B Larson8, Christopher G Chute13, Jyotishman Pathak14, Joshua C Denny15, Peter Speltz1, Abel N Kho7, Gail P Jarvik16, Cosmin A Bejan1, Marc S Williams17, Kenneth Borthwick18, Terrie E Kitchner11, Dan M Roden19, Paul A Harris1.
Abstract
BACKGROUND: Electronic health records (EHRs) are increasingly used for clinical and translational research through the creation of phenotype algorithms. Currently, phenotype algorithms are most commonly represented as noncomputable descriptive documents and knowledge artifacts that detail the protocols for querying diagnoses, symptoms, procedures, medications, and/or text-driven medical concepts, and are primarily meant for human comprehension. We present desiderata for developing a computable phenotype representation model (PheRM).Entities:
Keywords: computable representation; data models; electronic health records; phenotype algorithms; phenotype standardization
Mesh:
Year: 2015 PMID: 26342218 PMCID: PMC4639716 DOI: 10.1093/jamia/ocv112
Source DB: PubMed Journal: J Am Med Inform Assoc ISSN: 1067-5027 Impact factor: 4.497
Figure 1:Phenotype algorithm for identifying type 2 diabetes mellitus (T2DM) from electronic medical records (EMR or EHR). T1DM: type 1 diabetes mellitus; Dx: diagnoses, defined as recorded using International Classification of Diseases, 9th Revision (ICD-9) codes; med: medication; physcn: physicians; Rx: prescriptions. More details can be found in the appendix and on PheKB.org.
A list of desiderata
|
Recommendations for clinical data representation to support phenotyping
1. Structure clinical data into queryable forms. 2. Recommend use of a common data model, but also support customization for the variability and availability of EHR data among sites. Recommendations for phenotype representation models
3. Support both human-readable and computable representations. 4. Implement set operations and relational algebra. 5. Represent phenotype criteria with structured rules. 6. Support defining temporal relations between events. 7. Use standardized terminologies, ontologies, and facilitate reuse of value sets. 8. Define representations for text searching and natural language processing. 9. Provide interfaces for external software algorithms. 10. Maintain backward compatibility. |
Features of selected algorithms available on PheKB
| Algorithms | Data elements | Challenges informing desiderataa |
|---|---|---|
| Atrial fibrillation | CPT, ICD-9, ECG reports |
Text-based queries (complex regular expressions; D8) Using specific clinical documents (ECG reports; D2) |
| Cardiac conduction |
CPT, ICD-9, laboratories, medications, ECG reports, PL |
Sequential timeline of events (D6); NLP tasks: note section identification, concept extraction (D8) Numeric readings (length of QRS interval) extracted from text-based ECG reports (D1). |
| Cataract |
CPT, ICD-9, medications, clinical documents, ophthalmology image documents (handwritten) |
Complex exclusions for control group (D4) Complex rule model (D5) Concept extraction with NLP (D8) Handwritten document recognition |
| Clopidogrel poor metabolizers |
CPT, ICD-9, laboratories, medications, H&P (with PMH), PL |
Sequential events (D6) Patient follow-up requirements (D5, 6) NLP concept extraction (D8) |
| Crohn’s disease |
ICD-9, medications, clinical documents, pathology reports |
Keyword search (D8) Multiple groups (D4, 5) Close relationship with ulcerative colitis |
| Dementia | ICD-9, medication | Code counts (D5) |
| Diabetic retinopathy |
CPT, ICD-9, medications, PL, encounter with specialists |
Initial population is from another algorithm (D4) Concept extraction and negation detection with NLP (D8) |
| Drug-induced liver injury | ICD-9, medications, laboratories |
Concept extraction with NLP (D8) Complex rule model (D5) Complex temporality (D6) |
| Height | ICD-9, laboratories, medications, height, age |
Complex temporality (D6) Event selection (D4, 6) |
| HDL | ICD-9, laboratories, medications |
Identification of the first occurrence of events (D5) Complex rule model with temporality (D5, 6) |
| Hypothyroidism | CPT, ICD-9, laboratories, medications, clinical documents |
Selection and exclusion of events (D4) Follow-up requirements for control (D6) |
| Lipids | ICD-9, laboratories, medications | Event selection (D4, D6) |
| Multiple sclerosis | ICD-9, medications, PL, H&P, discharge summaries, other notes |
Keyword search (D8) Different levels of certainty (multiple groups, D4, 5) |
| Peripheral arterial disease | CPT, ICD-9, laboratories, medications, clinical notes, radiological reports |
Multivariable logistic regression model (scoring, D5) Extraction of ankle-brachial index from free-text (D9) Keyword extraction (D8) |
| RBC indices | CPT, ICD-9, laboratories, medications | Event selection and exclusion (D4, 6) |
| Rheumatoid arthritis |
ICD-9, medications, clinical notes |
Concept extraction (D8) Ambiguity of abbreviations (i.e., “RA”; D8) Logistic regression |
| Severe early childhood obesity | ICD-9, medications, vital signs, age |
Event selection (D4) BMI calculation, and mapped to age appropriate percentiles (D9) |
| Type 2 diabetes mellitus | ICD-9, laboratories, medications | Complex nested Boolean logic (D5) |
| Warfarin dose and response | Medications, laboratories, notes from anticoagulation clinics |
Dosage extraction with NLP (D1, 8, 9) Temporality of sequential events (D6) |
| WBC indices | CPT, ICD-9, laboratories, medications | Complex selection and exclusion of events (D4–6) |
aD1–D10 in parentheses represent the desiderata elements corresponding to each challenge. All phenotype algorithms benefit from D1, D2, and D7.
BMI: body mass index; CPT: current procedural terminology; ECG: electrocardiogram; HDL: high-density lipoprotein; H&P: history and physical examination (notes); ICD-9: International Classification of Diseases, 9th Revision; NLP: natural language processing; PL: problem list; PMH: past medical history; QRS: the QRS complex which indicates ventricular depolarization in ECG; RA: rheumatoid arthritis; RBC: red blood cells; WBC: white blood cells.
Figure 2:Schematic of desiderata for computable phenotype electronic health record-driven phenotyping. Numerals 1–9 in the figure correspond to Desiderata 1–9 (Desideratum 10 is not depicted in this Figure).