Literature DB >> 26342218

Desiderata for computable representations of electronic health records-driven phenotype algorithms.

Huan Mo¹, William K Thompson², Luke V Rasmussen³, Jennifer A Pacheco⁴, Guoqian Jiang⁵, Richard Kiefer⁵, Qian Zhu⁶, Jie Xu⁷, Enid Montague⁷, David S Carrell⁸, Todd Lingren⁹, Frank D Mentch¹⁰, Yizhao Ni⁹, Firas H Wehbe³, Peggy L Peissig¹¹, Gerard Tromp¹², Eric B Larson⁸, Christopher G Chute¹³, Jyotishman Pathak¹⁴, Joshua C Denny¹⁵, Peter Speltz¹, Abel N Kho⁷, Gail P Jarvik¹⁶, Cosmin A Bejan¹, Marc S Williams¹⁷, Kenneth Borthwick¹⁸, Terrie E Kitchner¹¹, Dan M Roden¹⁹, Paul A Harris¹.

Abstract

BACKGROUND: Electronic health records (EHRs) are increasingly used for clinical and translational research through the creation of phenotype algorithms. Currently, phenotype algorithms are most commonly represented as noncomputable descriptive documents and knowledge artifacts that detail the protocols for querying diagnoses, symptoms, procedures, medications, and/or text-driven medical concepts, and are primarily meant for human comprehension. We present desiderata for developing a computable phenotype representation model (PheRM).
METHODS: A team of clinicians and informaticians reviewed common features for multisite phenotype algorithms published in PheKB.org and existing phenotype representation platforms. We also evaluated well-known diagnostic criteria and clinical decision-making guidelines to encompass a broader category of algorithms.
RESULTS: We propose 10 desired characteristics for a flexible, computable PheRM: (1) structure clinical data into queryable forms; (2) recommend use of a common data model, but also support customization for the variability and availability of EHR data among sites; (3) support both human-readable and computable representations of phenotype algorithms; (4) implement set operations and relational algebra for modeling phenotype algorithms; (5) represent phenotype criteria with structured rules; (6) support defining temporal relations between events; (7) use standardized terminologies and ontologies, and facilitate reuse of value sets; (8) define representations for text searching and natural language processing; (9) provide interfaces for external software algorithms; and (10) maintain backward compatibility.
CONCLUSION: A computable PheRM is needed for true phenotype portability and reliability across different EHR products and healthcare systems. These desiderata are a guide to inform the establishment and evolution of EHR phenotype algorithm authoring platforms and languages.

Entities: CellLine Chemical Disease Gene Species

Keywords: computable representation; data models; electronic health records; phenotype algorithms; phenotype standardization

Mesh：

Year: 2015 PMID： 26342218 PMCID： PMC4639716 DOI： 10.1093/jamia/ocv112

Source DB: PubMed Journal: J Am Med Inform Assoc ISSN： 1067-5027 Impact factor: 4.497

INTRODUCTION

Electronic health records (EHRs) are increasingly used for clinical and translational research through the creation of phenotype algorithms, consisting of structured selection criteria designed to produce research-quality phenotypes. These algorithms operate on diverse classes of EHR data to select individuals with given traits (e.g., identifying records for continuous trait analyses or marking records as a case, a control, or neither for given conditions)., Examples include identifying patients with hypothyroidism matched to hypothyroidism-free controls, evaluating cardiac conduction duration in electrocardiograms of “heart-healthy” individuals, and determining medication responses. Typically, these algorithms define the workflow for querying clinical data regarding diagnoses, procedures, medications, laboratory or radiology reports, and other EHR data, and can require natural language processing (NLP) or text mining. Multi-site studies have shown that these algorithms often are portable between sites.,, Currently, most phenotype algorithms are recorded as human-readable descriptive text documents that can be shared via knowledge bases such as the Phenotype KnowledgeBase (PheKB, http://phekb.org) and PhenotypePortal (http://phenotypeportal.org). Algorithms described via text and flowcharts (such as the type 2 diabetes mellitus [T2DM] algorithm shown in Figure 1 and the Desiderata section) require human translation to computable formats and are often ambiguous. Implementation across different institutions requires human experts to interpret the algorithm and translate it into executable operations and queries. This situation has hampered cross-institutional collaboration.

Figure 1:

Phenotype algorithm for identifying type 2 diabetes mellitus (T2DM) from electronic medical records (EMR or EHR). T1DM: type 1 diabetes mellitus; Dx: diagnoses, defined as recorded using International Classification of Diseases, 9th Revision (ICD-9) codes; med: medication; physcn: physicians; Rx: prescriptions. More details can be found in the appendix and on PheKB.org. To enable cross-site phenotype execution, we suggest two needed initiatives: (1) creation of a common phenotype representation model (PheRM) as a computable representation of phenotype algorithms and (2) development of infrastructure to allow standards-based authoring and execution of PheRM-based algorithms for a variety of EHR systems. In this paper, we leveraged our experiences with the Electronic Medical Records and Genomics (eMERGE) Network, Pharmacogenomics Research Network (PGRN), Strategic Health IT Advanced Research Project (SHARP), and the National Patient-Centered Clinical Research Network (PCORnet) to propose desiderata for PheRM (Table 1).

Table 1:

A list of desiderata

Recommendations for clinical data representation to support phenotyping

1. Structure clinical data into queryable forms.

2. Recommend use of a common data model, but also support customization for the variability and availability of EHR data among sites.

Recommendations for phenotype representation models

3. Support both human-readable and computable representations.

4. Implement set operations and relational algebra.

5. Represent phenotype criteria with structured rules.

6. Support defining temporal relations between events.

7. Use standardized terminologies, ontologies, and facilitate reuse of value sets.

8. Define representations for text searching and natural language processing.

9. Provide interfaces for external software algorithms.

10. Maintain backward compatibility.

BACKGROUND

With the implementation of Meaningful Use (MU), EHRs have been increasing in ubiquity, functionality, and comprehensiveness. One recent advance has been the coupling of DNA bio-repositories to EHR data to enable genomic discoveries. In particular, the eMERGE network, a large scale, multi-site network of research organizations of 11 academic medical centers, has been at the forefront of mining biobank resources (both EHRs and associated DNA samples) for genomic medicine. Identification of research subjects from patient populations using phenotype algorithms is the starting point for these projects. Data components in phenotyping may include the full range of clinical data stored in the EHR, such as demographics, vital signs, laboratory tests, medication, diagnoses, procedures, and other documentation. However, each EHR can have a different data model. One approach to facilitate research interoperability among different sites has been the Observational Health Data Sciences and Informatics (OHDSI) program, which has built on the Observational Medical Outcomes Partnership (OMOP) common data model (CDM). This CDM provides a standardized data interface for a vibrant ecosystem of healthcare big-data analyses (http://omop.org/OSCAR), including tools, web applications, and application program interfaces. Similarly, PCORnet and the Informatics for Integrating Biology and the Bedside (i2b2) based Shared Health Research Information Network, are advancing common data models among their groups. These CDMs typically cover more focused, common data elements to enable a broad range of queries. Phenotype algorithms are typically developed in an iterative fashion with expert review for validation to rule-based models, but can also utilize machine learning methods. The efficacy of a phenotype algorithm is usually measured with information retrieval metrics, such as sensitivity, specificity, positive predictive value, and F-measure. At present, most existing phenotype algorithms are expressed in pseudo-code and not directly executable, because there is no widely adopted standards and underlying data structures. Thus, implementation requires human experts to translate descriptive algorithms from documents to ad hoc queries in local EHR research repositories, a process which is prone to inconsistencies or errors. One of the major efforts in establishing a standard language for a related task is the Quality Data Model (QDM) from National Quality Forum, which has been designed to represent electronic clinical quality measures. QDM has been shown to be capable of representing many phenotype algorithms from PheKB., Systems such as i2b2 system, SHARP, and Eureka! Clinical Analytics all have internal data and algorithm representations, some of which may be shared across sites. In general, these systems provide graphical interfaces that can standardize queries, but complex scoring metrics, counting rules, and nested temporal references or sequencing of events—such as found in many eMERGE algorithms49—often exceed their capability. Phenotype algorithms have adopted a variety of logical and computational modalities., Modalities (e.g., scoring rules, counting rules) adopted in clinical diagnostic criteria have potential application in phenotype algorithms. In addition, machine learning and statistical model-based phenotype algorithms have been increasingly reported. Most current phenotype algorithms (CPT) use both structured and unstructured EHR elements. Structured EHR data usually include demographic information (e.g., age, sex, race, death), billing codes (i.e., International Classification of Disease version 9 (ICD-9), Current Procedural Terminology), most laboratory tests, vital signs, medications, and more. Unstructured EHR elements usually include clinical notes (e.g., history and physical examinations, progress notes, discharge summaries, nursing notes), some non-billing medical problems and most family history elements, some medications records and refills, diagnostic reports (e.g., radiology, microbiology, pathology), and more.

METHODS

A group of clinicians and informaticians reviewed 21 eMERGE phenotype algorithms (Table 2) and several authoring tools (Measure Authoring Tool [www.emeasuretool.cms.gov], i2b2, Eureka!, PhenotypePortal, the Vanderbilt Synthetic Derivative, and the Marshfield Personalized Medicine Research Project interface) for common features. These phenotyping algorithms were of different complexity and included both disease and drug response phenotypes using algorithms from the eMERGE and Pharmacogenomics of Very Large Populations (PGPop) networks. We also evaluated the ability to represent selected well-known diagnostic criteria (e.g., Duke criteria for infective endocarditis, CHADS2 criteria for anticoagulation therapy in atrial fibrillation (AF)) as potential phenotypes (see Supplementary Appendix Part 2). After proposal by a smaller team of investigators, the desiderata were evaluated and refined by all authors, which included investigators from eMERGE, PGRN, PGPop, SHARPn, PCORNet, and HMO Research Network.

Table 2:

Features of selected algorithms available on PheKB

Algorithms	Data elements	Challenges informing desiderata^a
Atrial fibrillation	CPT, ICD-9, ECG reports	Text-based queries (complex regular expressions; D8) Using specific clinical documents (ECG reports; D2)
Cardiac conduction10,94,95	CPT, ICD-9, laboratories, medications, ECG reports, PL	Sequential timeline of events (D6); NLP tasks: note section identification, concept extraction (D8) Numeric readings (length of QRS interval) extracted from text-based ECG reports (D1).
Cataract⁹⁶^,⁹⁷	CPT, ICD-9, medications, clinical documents, ophthalmology image documents (handwritten)	Complex exclusions for control group (D4) Complex rule model (D5) Concept extraction with NLP (D8) Handwritten document recognition
Clopidogrel poor metabolizers¹¹	CPT, ICD-9, laboratories, medications, H&P (with PMH), PL	Sequential events (D6) Patient follow-up requirements (D5, 6) NLP concept extraction (D8)
Crohn’s disease	ICD-9, medications, clinical documents, pathology reports	Keyword search (D8) Multiple groups (D4, 5) Close relationship with ulcerative colitis
Dementia	ICD-9, medication	Code counts (D5)
Diabetic retinopathy	CPT, ICD-9, medications, PL, encounter with specialists	Initial population is from another algorithm (D4) Concept extraction and negation detection with NLP (D8)
Drug-induced liver injury¹⁴^,⁹⁸	ICD-9, medications, laboratories	Concept extraction with NLP (D8) Complex rule model (D5) Complex temporality (D6)
Height	ICD-9, laboratories, medications, height, age	Complex temporality (D6) Event selection (D4, 6)
HDL⁹⁹^,¹⁰⁰	ICD-9, laboratories, medications	Identification of the first occurrence of events (D5) Complex rule model with temporality (D5, 6)
Hypothyroidism⁵	CPT, ICD-9, laboratories, medications, clinical documents	Selection and exclusion of events (D4) Follow-up requirements for control (D6)
Lipids	ICD-9, laboratories, medications	Event selection (D4, D6)
Multiple sclerosis	ICD-9, medications, PL, H&P, discharge summaries, other notes	Keyword search (D8) Different levels of certainty (multiple groups, D4, 5)
Peripheral arterial disease	CPT, ICD-9, laboratories, medications, clinical notes, radiological reports	Multivariable logistic regression model (scoring, D5) Extraction of ankle-brachial index from free-text (D9) Keyword extraction (D8)
RBC indices¹⁰¹	CPT, ICD-9, laboratories, medications	Event selection and exclusion (D4, 6)
Rheumatoid arthritis	ICD-9, medications, clinical notes	Concept extraction (D8) Ambiguity of abbreviations (i.e., “RA”; D8) Logistic regression⁵⁴ (D9)
Severe early childhood obesity	ICD-9, medications, vital signs, age	Event selection (D4) BMI calculation, and mapped to age appropriate percentiles (D9)
Type 2 diabetes mellitus¹⁹^,¹⁰²	ICD-9, laboratories, medications	Complex nested Boolean logic (D5)
Warfarin dose and response¹⁰³	Medications, laboratories, notes from anticoagulation clinics	Dosage extraction with NLP (D1, 8, 9) Temporality of sequential events (D6)
WBC indices¹⁰⁴	CPT, ICD-9, laboratories, medications	Complex selection and exclusion of events (D4–6)

aD1–D10 in parentheses represent the desiderata elements corresponding to each challenge. All phenotype algorithms benefit from D1, D2, and D7.

BMI: body mass index; CPT: current procedural terminology; ECG: electrocardiogram; HDL: high-density lipoprotein; H&P: history and physical examination (notes); ICD-9: International Classification of Diseases, 9th Revision; NLP: natural language processing; PL: problem list; PMH: past medical history; QRS: the QRS complex which indicates ventricular depolarization in ECG; RA: rheumatoid arthritis; RBC: red blood cells; WBC: white blood cells.

DESIDERATA

Based on our review, we propose the following desiderata for PheRM and its software implementation (see Figure 2 and Table 1). We acknowledge that phenotyping is not a standalone practice, and, instead, is closely coupled with EHR infrastructure and clinical practice. Therefore, we have included recommendations (representing the phenotyping community) to the EHR development community (Desiderata 1 and 2) as well as those regarding PheRM itself (Desiderata 3–10).

Figure 2:

Schematic of desiderata for computable phenotype electronic health record-driven phenotyping. Numerals 1–9 in the figure correspond to Desiderata 1–9 (Desideratum 10 is not depicted in this Figure). A list of desiderata Recommendations for clinical data representation to support phenotyping 1. Structure clinical data into queryable forms. 2. Recommend use of a common data model, but also support customization for the variability and availability of EHR data among sites. Recommendations for phenotype representation models 3. Support both human-readable and computable representations. 4. Implement set operations and relational algebra. 5. Represent phenotype criteria with structured rules. 6. Support defining temporal relations between events. 7. Use standardized terminologies, ontologies, and facilitate reuse of value sets. 8. Define representations for text searching and natural language processing. 9. Provide interfaces for external software algorithms. 10. Maintain backward compatibility. Features of selected algorithms available on PheKB Text-based queries (complex regular expressions; D8) Using specific clinical documents (ECG reports; D2) CPT, ICD-9, laboratories, medications, ECG reports, PL Sequential timeline of events (D6); NLP tasks: note section identification, concept extraction (D8) Numeric readings (length of QRS interval) extracted from text-based ECG reports (D1). CPT, ICD-9, medications, clinical documents, ophthalmology image documents (handwritten) Complex exclusions for control group (D4) Complex rule model (D5) Concept extraction with NLP (D8) Handwritten document recognition CPT, ICD-9, laboratories, medications, H&P (with PMH), PL Sequential events (D6) Patient follow-up requirements (D5, 6) NLP concept extraction (D8) ICD-9, medications, clinical documents, pathology reports Keyword search (D8) Multiple groups (D4, 5) Close relationship with ulcerative colitis CPT, ICD-9, medications, PL, encounter with specialists Initial population is from another algorithm (D4) Concept extraction and negation detection with NLP (D8) Concept extraction with NLP (D8) Complex rule model (D5) Complex temporality (D6) Complex temporality (D6) Event selection (D4, 6) Identification of the first occurrence of events (D5) Complex rule model with temporality (D5, 6) Selection and exclusion of events (D4) Follow-up requirements for control (D6) Keyword search (D8) Different levels of certainty (multiple groups, D4, 5) Multivariable logistic regression model (scoring, D5) Extraction of ankle-brachial index from free-text (D9) Keyword extraction (D8) ICD-9, medications, clinical notes Concept extraction (D8) Ambiguity of abbreviations (i.e., “RA”; D8) Logistic regression (D9) Event selection (D4) BMI calculation, and mapped to age appropriate percentiles (D9) Dosage extraction with NLP (D1, 8, 9) Temporality of sequential events (D6) aD1–D10 in parentheses represent the desiderata elements corresponding to each challenge. All phenotype algorithms benefit from D1, D2, and D7. BMI: body mass index; CPT: current procedural terminology; ECG: electrocardiogram; HDL: high-density lipoprotein; H&P: history and physical examination (notes); ICD-9: International Classification of Diseases, 9th Revision; NLP: natural language processing; PL: problem list; PMH: past medical history; QRS: the QRS complex which indicates ventricular depolarization in ECG; RA: rheumatoid arthritis; RBC: red blood cells; WBC: white blood cells.

Recommendations for clinical data representation to support phenotype

1. Structure clinical data into queryable forms

Clinical data are practically structured to promote efficient queries of all clinical information for an individual patient. On the other hand, phenotyping requires population-wide searching of individuals with similar characteristics (e.g., elevated LDL for a hyperlipidemia phenotype). Relational databases have been widely used for data storage as parts of enterprise data warehouse solutions. To further facilitate querying, where possible, clinical data stored in such data warehouses should be atomized (as first normal form), such as storing a blood pressure into a systolic reading and a diastolic reading. Precalculating commonly derived observations (e.g., periods of drug exposure, as implemented in the OMOP drug era model) also facilitate more efficient querying. Currently available documents are mostly poorly structured, and require information extraction or indexing to make them queryable.

2. Recommend a common data model, but also support customization for the variability and availability of EHR data among sites

To achieve a common PheRM, a common EHR data representation should be implemented where possible. Huser and Cimino analyzed three public integrated data repositories (IDRs) and proposed desiderata for their common design patterns. Potential candidates for CDM include Clinical Information Modeling Initiative, Mini-Sentinel Common Data Model (recommended by US Food and Drug Administration, www.mini-sentinel.org), i2b2 Star Schema, and OMOP CDM.,,, Additionally, the Institute of Medicine has recently initiated an effort to standardize structured capture of social and behavioral domains in the EHR. EHR implementations and systems are heterogeneous, and CDMs must have the flexibility to adapt to a variety of institutional IDRs. One challenge in standardization is labeling and referencing of specific document types, and many EHR sites may have specific but nonstandard documents that address a particular question. Custom approaches can generically circumnavigate this limitation. For example, the colon polyp phenotype in the eMERGE network used colonoscopy surgical and pathology reports, which are not yet labeled in a standard manner or mapped to CDMs in most of the IDR systems in the network. This algorithm separates the implementation into transportable tasks (e.g., concept extraction through NLP, grouping, extraction of covariates) implemented as a fully executable Konstanz Information Miner (KNIME) package with institutional adaptation tasks (i.e., database querying for the proper document types). Creating a portable infrastructure that implements the algorithmic rules and thus only requires the user to build the “last mile” of the solution can accelerate algorithm implementation across other sites.

Recommendations for phenotype representation models

3. Support both human-readable and computable representations

The investigators and initiators for most phenotype projects are clinical experts, epidemiologists, geneticists, and other subject matter experts. As important communication tools among researchers of different expertise, the phenotype representations should support a human-readable format or transformation for clinical experts to ensure medical accuracy. Additionally, phenotype algorithms should include clear scientific and clinical definitions to enable creation of gold standards for evaluation and to facilitate reuse. For example, one algorithm may allow any cause of hypothyroidism when evaluating treatment efficacy while another may focus on only on primary autoimmune hypothyroidism when evaluating genetic causes. It is strongly preferable that the human-readable format and computable formats be computationally coupled such that one can be automatically generated from the other; otherwise it would risk inconsistency between them. For example, the QDM provides a transformation from machine-readable XML to human-readable HTML via automated Extensible Stylesheet Language Transformations.

4. Implement set operations and relational algebra

Phenotyping is a population level process, which includes intersection (e.g., patients billed with ICD-9 codes for T2DM and patients treated with oral hypoglycemic medications), union (e.g., patients treated with angiotensin converting enzyme inhibitors or patients treated with angiotensin receptor blockers), or exclusion (e.g., patients who have diabetes but have never had retinopathy diagnosed). Relational algebra in database theory is a typical set model. The capability to handle set operations and seamless connections to rule-based models (see Desideratum 5) will directly affect the usability of phenotype algorithms. Virtually all phenotype algorithms explicitly exclude certain other conditions, exposures, or laboratory results operating on either the patient-level or on particular episode(s) of care. Such exclusions are commonly present in control algorithms but also present in many case algorithms. For example, the methotrexate toxicity algorithm excludes patients with known organic liver disease, and for cases, excludes episodes of liver function test elevation while the patient is taking leflunomide (another common rheumatoid arthritis medication with liver toxicity as a side effect).

5. Represent phenotype criteria with structured rules

PheRM should support structured and rule-based logical representations, which has been successfully adopted in QDM, OMOP Health Outcomes of Interest (HOI http://omop.org/HOI),,, and JBOSS® Drools based phenotyping.,

Nested logical structure

Phenotyping algorithms can involve multiple complex logical steps, integrating various operations (e.g., Boolean, comparative, aggregative, temporal). A complex, nested logical structure is supported by QDM. On the other hand, while interface tools such as i2b2 may limit the number of nested levels, some allow users to reference prior patient sets to support more complex workflows.

Boolean

Boolean values can be generated with comparative or temporal operations (see below), set projection (i.e., nonempty set as TRUE, empty set as FALSE). Many of the phenotyping rules are Boolean operations. Common Boolean operators include AND, OR, and NOT (similar to intersection and union). For example, in the T2DM algorithm (Figure 1), every step generates a Boolean value for each patient to follow a decision tree to determine if the patient is a case or control.

Comparative operations

In phenotyping, comparative operators can be used to threshold a variable (e.g., the numeric result of a laboratory test, such as a white blood cell count), or to compare a numeric variable to another numeric variable (e.g., comparing the LDL value after statin treatment to LDL value before treatment). In addition, important raw data are not always ready to be used directly from an EHR. For example, body mass index (BMI) often needs to be calculated from weight and height. Thus, supporting basic arithmetic functions will broaden the application. Rules to exclude nonbiologic values may also be needed, such as a BMI of 1000 kg/m2.

Aggregative operations

Aggregative functions (e.g., COUNT, FIRST) bridge across different levels of clinical information (e.g., from events to patients). In addition, more complex counting and scoring rules should be implemented. In fact, these rules are extremely popular in clinical diagnostic criteria (see Supplementary Appendix Part 2), including the Modified Duke Criteria for diagnosis of infective endocarditis, the CHADS2 score for antithrombotic therapy in AF, or the 2013 guidelines for cholesterol management. In addition, most regression-based predictive models in phenotyping can be represented as a scoring system, such as an algorithm to find rheumatoid arthritis.

Negation

In phenotyping, negation has two meanings. It can be a negative assertion (e.g., “patient denies headache”), which can be extracted with NLP, or an empty set from aggregation in many computer languages (e.g., Perl, Python), similar to exclusion (see Desideratum 5). These two interpretations can be conflicting, and need to be distinguished. Care must be taken to not imply negation from missing values that are not available due to the variability of the EHR systems.

6. Support defining temporal relations between events

Temporal relationships are widely used in phenotype algorithms, especially for studying response and side effects of medications., The first type is sequential clinical events, such as an algorithm to identify patients that have subsequent cardiovascular events while still on clopidogrel, which requires ordered and appropriately spaced sequences of ischemic and medication events computed from the timestamps of records. On the other hand, temporality can also be captured through narrative text, requiring advanced NLP to parse grammatical features (past tense of verbs) and relative temporal expressions (“five years ago,” “1980s,” or location within a “past medical history” section). This strategy has been tested in the 2012 i2b2 challenge,, and applied in a prior analysis of colorectal cancer screening and in an identification of methotrexate-induced liver toxicity. Frequently in an EHR, the true incident date for a disease is not defined even when using NLP, since it may precede the patient’s enrollment in the given clinic or hospital system.

7. Use standardized terminologies, ontologies, and facilitate reuse of value sets

To allow phenotype algorithms in PheRM to be supported in different EHR systems, accommodating non-standardized terminologies is important. Many EHR systems employ local ad hoc terminologies, but the use of local terminology should be limited in PheRM, because it will hinder the portability of algorithms. Both HL7 and OMOP CDM recommend standardized coding systems for clinical terminology, such as ICD-9/10 for billed diagnoses, RxNorm for medication, Logical Observation Identifiers Names and Codes (LOINC) for laboratory tests, and Systematized Nomenclature of Medicine-Clinical Terms (SNOMED-CT) for describing medical conditions. Therefore, EHR databases should provide mapping between standardized terminology systems and their local systems. Phenotype algorithms and quality measures often enumerate lists of concepts to define a medical condition, and these lists have been conventionally called value sets, such as all the ICD-9 codes to define T2DM. Authoring these value sets requires expertise and manual curation, and such sets should be available for reuse by other investigators. To facilitate authoring, i2b2 uses the intrinsic hierarchical structure of medical ontology to allow a user to select all concepts under the same semantic nodes. Local ontologies are supported in i2b2 for the convenience that it offers for their research domain. Broad pathophysiological groupings of ICD-9 codes have been developed for genetic and clinical research, including codes designed to enable phenome-wide association studies, and groupings designed for the Agency for Healthcare Research and Quality Clinical Classifications Software. The same value set sometimes can be reused in a variety of projects. For instance, the value set of all the angiotensin-converting-enzyme inhibitors can be used in research projects on diabetic nephropathy, congestive heart failure, or adverse drug reaction. Such information can be stored and managed in the Value Set Authority Center (provided by the National Library of Medicine, https://vsac.nlm.nih.gov/), and the Common Terminology Services 2 (an Object Management Group standard, http://www.omg.org/spec/CTS2/).

8. Define representations for text searching and NLP

Documentation of a detailed description of a patient’s clinical presentation and management in free text is indispensable in clinical care and in validating that a patient has a given disease. Clinical documents are commonly used for phenotype research. Text searching and NLP are major strategies to validate coded data or define more granular phenotypes than what is possible via structured data, such as subtypes of multiple sclerosis, physical exam findings,, or the collection of all blood pressure measures. NLP-derived features have been widely applied for machine learning-based phenotype algorithms. PheRM should include NLP and text searching. Patterns of NLP recurring in phenotype algorithms have included: identifying targeted document types (e.g., colonoscopy reports), section location, concept identification, and negation and context filtering. Here, we propose the PheRM should allow for specification of inclusion or exclusions of elements based on: document type, section location, concept instances (with removal of non-patient and negated concepts), and keywords. In addition to NLP, keyword and regular-expression text searches have been applied widely in phenotype algorithms. For example, an AF algorithm includes a keyword search from electrocardiogram reports for different variances in phrasing AF, such as “A-fib”, “Atr. Fibrillation.”, With assistance from section separators and negation masks, text searching can achieve a higher accuracy and faster execution (than comprehensive de novo NLP) for many phenotypes.

9. Provide interfaces for external software algorithms

Development of phenotype algorithms is a rapidly evolving field, as are complementary computational algorithms and tools, such as NLP and statistical models. For example, the severe childhood obesity algorithm requires age appropriate percentiles for BMI, which may require an external calculator and/or additional percentile data. These dynamic tasks are difficult to represent or program with static languages (such as XML). Likely the optimal method to “interface” with external software packages would be to allow inclusion of new specifications of data elements that could be calculated external to the phenotype algorithm. As a related endeavor, the eMERGE colon polyps algorithm was delivered as a standard executable KNIME workflow, with a simple Java Snippet unit connecting to a customized NLP package to parse the colonoscopy reports. The T2DM algorithm has a KNIME workflow implementation available on PheKB.

10. Maintain backward compatibility

A PheRM must be developed according to current existing EHR data, but robust enough to evolve to make use of new clinical data and standards. In addition, unlike a quality measure, which only focuses on records of a limited and recent period, phenotype algorithms frequently use information dated back to as early as the first day of utilization of the EHR to obtain enough data for statistical significance. The information usually comes from records across multiple distinct historical eras of EHR development, and from multiple generations of EHR client software and templates. An obvious example is the need to support both ICD9 and ICD10, as well as different historical versions of ICD9 (e.g., allergic bronchopulmonary aspergillosis was billed as “518.89,” but has been billed as “518.6” since 1997). Since phenotype algorithms often examine historical data, such capabilities are still required even after the United States formally adopts ICD10. Acknowledging that robust data normalization across EHRs (especially for historical data) is also a difficult and yet unachieved task, we recommend prioritizing the development of functionality and support of data elements for PheRM. For example, data elements that have been widely used in previous phenotype algorithms should be standardized first: billing codes, RxNorm codes for medications, Logical Observation Identifiers Names and Codes for laboratory tests, and diagnoses on problem lists. Progressive normalization of EHR data with CDMs may simplify backward compatibility.

An example: the desiderata applied to T2DM phenotype algorithm

The T2DM algorithm first ascertains T2DM diagnosis with grouped T2DM ICD-9 codes, use of oral hypoglycemic medications represented in grouped RxNorm codes (as Desideratum 7), or multiple mentions of T2DM in clinical narratives (Desiderata 5 [a counting rule] and 8); then it differentiates T2DM from type 1 diabetes mellitus (T1DM) patients by excluding patients with T1DM ICD-9 codes (as Desideratum 4 [exclusion]), enforcing absence of insulin use or oral medications should preceded insulin use (as Desiderata 5 [aggregation function of first appearance] and 6); for some cases, it confirms diabetes diagnoses with laboratory values. Its implementation and inter-institutional operation requires supports of other listed desiderata (with details in Supplementary Appendix A Part 1).

DISCUSSION

To develop these desiderata for a standardized PheRM, we have investigated phenotyping modalities adopted in algorithms from eMERGE, PGRN, SHARPn, and PGPop networks (Table 2), and evaluated popular clinical diagnostic and decision-making algorithms. We have also investigated currently available phenotyping tools, and find that these tools are evolving along with our proposed desiderata and are able to perform increasingly complex phenotype queries. As tests for the feasibility and sufficiency of these modalities, algorithms, and tools, the ongoing Phenotype Execution Modeling Architecture (PhEMA) (http://projectphema.org) collaboration has been actively implementing these desiderata and delivering phenotype workflows (Supplementary Appendix Part 3). Since phenotyping is a knowledge-intensive process based on a global evaluation of each patient, missing only a few features in a phenotyping platform or standard language will result in difficulty representing elementary algorithms. It is challenging to list all the technical requirements and details in one paper. Thus, ongoing collaboration between developers of phenotype languages and tools, and user communities (including both geneticists and clinicians) will be imperative. The desiderata (D1–10) we proposed cover multiple domains: Partnership with evolutions of EHR repositories (D1, D2); A balance between human-readable and computational representations (D3); Common computational elements in phenotype algorithms (D4–D8); Extensibility with external tools and modules (D9); Flexibility in accommodating to different institutions and states of the art (D2, D10). While there are similarities between phenotype algorithms and healthcare-focused algorithms like quality measures, eligibility criteria for clinical trials, and clinical decision support rules, the implementation for each has differences. For example, quality measures often are more focused on sensitivity while phenotype algorithms for research studies, including EHR-based genomics studies, are typically more focused on positive predictive value. In addition, many phenotype algorithms use NLP, and corroboration with different data elements, whereas quality measures and clinical decision support utilize predominantly structured data. For the purpose of this paper, “phenotype algorithm” typically refers to the application of decision logic applied for EHR-based biomedical research purposes. Nevertheless, we anticipate most desiderata for phenotyping algorithms may apply to other healthcare applications. For example, we have successfully translated the “last mile” solution in phenotyping (described in Desideratum 2) to electronic clinical quality measures. However a formal evaluation across all categories of algorithms is outside the scope of this paper. Strategies of phenotyping are evolving with new informatics and data representation methods, new EHR data elements, and new medical knowledge. A PheRM will also need to be able to evolve continually. A persistent trend, however, has been the need to access detailed information and the context of information from a variety of sources. For example, “glaucoma” diagnosed or mentioned by an ophthalmologist (a matched specialist) provides much higher confidence than when mentioned as self-report or by non-ophthalmologist clinicians. In addition, a diagnosis is typically developed and confirmed by a clinician over time. Thus, a standalone assertion in the medical records can be misleading. Computational reconstruction of the clinical timelines and connecting diverse clinical elements using medical knowledge may provide a more accurate capture of phenotypes. For example, elevated liver function tests in a patient with rheumatoid arthritis can be a side effect of a medication but may also result from a primary viral infection, heart failure, sepsis, or other causes. Designing computational medical knowledge maps to interrelate different information sources may improve phenotyping. Examples include historical expert systems such as INTERNIST-1 and DXplain. More recent, data-driven approaches include resources such as Side Effect Resource -2 and MEDication-Indication. Limitations caution interpretation of this work. First, these desiderata are based on the experiences of the authors and the algorithms and systems explored to date. A robust community-based phenotyping ecosystem will sustain the continuous evolution of these desiderata with ever expanding knowledge and experience. Second, these desiderata mainly focus on knowledge-driven phenotype approaches, and have not yet addressed data-driven approaches, such as unsupervised or deep learning. Third, these desiderata are written during a period of rapid EHR evolution and adoption due to MU incentives. Availability of CDMs, standards, and data types available will evolve similarly as the field continues maturing. Fourth, it is unclear the degree to which these desiderata will apply to international experiences with computable phenotypes. Finally, these desiderata may not be addressable by any one single system, but represent an overarching series of goals for such work. Our mission is to create research quality information from data gathered in a non-research enterprise. Clinically-derived data comes with the advantages of larger scale, reduced cost, repeated observations, and the ability to observe rare events. It is important to understand that this is not just about technology; efforts by clinicians to record quality data and robustly use EHRs enable greater secondary use potential. We are optimistic that this new endeavor will lead us to align and expand the “research quality prospective data” from direct clinical trials.

FUNDING

This work was funded primarily by R01 GM105688 from the National Institute of General Medical Sciences. Additional contribution came from the eMERGE Network sites funded by the National Human Genome Research Institute through the following grants: U01 HG006828 (Cincinnati Children’s Hospital Medical Center); U01-HG004610 and U01-HG006375 (Group Health Cooperative/University of Washington); U01-HG004608 (Marshfield Clinic); U01-HG04599 and U01-HG06379 (Mayo Clinic); U01-HG004609 and U01-HG006388 (Northwestern University); U01-HG006389 (Essentia Institute of Rural Health); U01-HG04603 and U01-HG006378 (Vanderbilt University); and U01-HG006385 (Vanderbilt University serving as the Coordinating Center). Additional support came from R01-LM010685 and R01 GM103859.

COMPETING INTEREST

None.

CONTRIBUTORS

J.C.D., J.P., and W.K.T. provided leadership for the project; H.M., J.C.D, J.P., W.K.T., J.A.P., L.V.R., C.G.C., G.T. drafted desiderata elements; H.M and J.C.D. drafted the manuscript; W.K.T., J.A.P., L.V.R., R.K., and H.M. led executability and adaptability studies; G.J and R.K. lead standardization and library development; J.X. and E.M. led environmental scan and usability studies; Q.Z., J.A.P., and H.M. led algorithm modeling studies; L.V.R. and P.S. led authoring environment and modularization studies; F.W., A.N.K., and G.J. led clinical data modeling; W.K.T. and C.A.B. led NLP studies; all authors contributed expertise and edits.

SUPPLEMENTARY MATERIAL

Supplementary material is available online at http://jamia.oxfordjournals.org/.

106 in total

1. Effective mapping of biomedical text to the UMLS Metathesaurus: the MetaMap program.

Authors: A R Aronson
Journal: Proc AMIA Symp Date: 2001

2. Development of an ensemble resource linking MEDications to their Indications (MEDI).

Authors: Wei-Qi Wei; Robert M Cronin; Hua Xu; Thomas A Lasko; Lisa Bastarache; Joshua C Denny
Journal: AMIA Jt Summits Transl Sci Proc Date: 2013-03-18

3. Evaluation of a method to identify and categorize section headers in clinical documents.

Authors: Joshua C Denny; Anderson Spickard; Kevin B Johnson; Neeraja B Peterson; Josh F Peterson; Randolph A Miller
Journal: J Am Med Inform Assoc Date: 2009-08-28 Impact factor: 4.497

4. Completeness, accuracy, and computability of National Quality Forum-specified eMeasures.

Authors: Andy Amster; Joseph Jentzsch; Ham Pasupuleti; K G Subramanian
Journal: J Am Med Inform Assoc Date: 2014-10-17 Impact factor: 4.497

5. R PheWAS: data analysis and plotting tools for phenome-wide association studies in the R environment.

Authors: Robert J Carroll; Lisa Bastarache; Joshua C Denny
Journal: Bioinformatics Date: 2014-04-14 Impact factor: 6.937

6. Automated identification of drug and food allergies entered using non-standard terminology.

Authors: Richard H Epstein; Paul St Jacques; Michael Stockin; Brian Rothman; Jesse M Ehrenfeld; Joshua C Denny
Journal: J Am Med Inform Assoc Date: 2013-06-07 Impact factor: 4.497

7. Characterization of statin dose response in electronic medical records.

Authors: W-Q Wei; Q Feng; L Jiang; M S Waitara; O F Iwuchukwu; D M Roden; M Jiang; H Xu; R M Krauss; J I Rotter; D A Nickerson; R L Davis; R L Berg; P L Peissig; C A McCarty; R A Wilke; J C Denny
Journal: Clin Pharmacol Ther Date: 2013-10-04 Impact factor: 6.875

8. Use of population health data to refine diagnostic decision-making for pertussis.

Authors: Andrew M Fine; Ben Y Reis; Lise E Nigrovic; Donald A Goldmann; Tracy N Laporte; Karen L Olson; Kenneth D Mandl
Journal: J Am Med Inform Assoc Date: 2010 Jan-Feb Impact factor: 4.497

9. A common CNR1 (cannabinoid receptor 1) haplotype attenuates the decrease in HDL cholesterol that typically accompanies weight gain.

Authors: Qiping Feng; Lan Jiang; Richard L Berg; Melissa Antonik; Erin MacKinney; Jennifer Gunnell-Santoro; Catherine A McCarty; Russell A Wilke
Journal: PLoS One Date: 2010-12-31 Impact factor: 3.240

10. Automated extraction of clinical traits of multiple sclerosis in electronic medical records.

Authors: Mary F Davis; Subramaniam Sriram; William S Bush; Joshua C Denny; Jonathan L Haines
Journal: J Am Med Inform Assoc Date: 2013-10-22 Impact factor: 4.497

53 in total

1. Developing a data element repository to support EHR-driven phenotype algorithm authoring and execution.

Authors: Guoqian Jiang; Richard C Kiefer; Luke V Rasmussen; Harold R Solbrig; Huan Mo; Jennifer A Pacheco; Jie Xu; Enid Montague; William K Thompson; Joshua C Denny; Christopher G Chute; Jyotishman Pathak
Journal: J Biomed Inform Date: 2016-07-05 Impact factor: 6.317

2. Evaluation of Semantic Web Technologies for Storing Computable Definitions of Electronic Health Records Phenotyping Algorithms.

Authors: Václav Papež; Spiros Denaxas; Harry Hemingway
Journal: AMIA Annu Symp Proc Date: 2018-04-16

3. Clinical Concept Value Sets and Interoperability in Health Data Analytics.

Authors: Sigfried Gold; Andrea Batch; Robert McClure; Guoqian Jiang; Hadi Kharrazi; Rishi Saripalle; Vojtech Huser; Chunhua Weng; Nancy Roderer; Ana Szarfman; Niklas Elmqvist; David Gotz
Journal: AMIA Annu Symp Proc Date: 2018-12-05

4. Developing and Validating a Computable Phenotype for the Identification of Transgender and Gender Nonconforming Individuals and Subgroups.

Authors: Yi Guo; Xing He; Tianchen Lyu; Hansi Zhang; Yonghui Wu; Xi Yang; Zhaoyi Chen; Merry Jennifer Markham; François Modave; Mengjun Xie; William Hogan; Christopher A Harle; Elizabeth A Shenkman; Jiang Bian
Journal: AMIA Annu Symp Proc Date: 2021-01-25

5. A Computable Phenotype Improves Cohort Ascertainment in a Pediatric Pulmonary Hypertension Registry.

Authors: Alon Geva; Jessica L Gronsbell; Tianxi Cai; Tianrun Cai; Shawn N Murphy; Jessica C Lyons; Michelle M Heinz; Marc D Natter; Nandan Patibandla; Jonathan Bickel; Mary P Mullen; Kenneth D Mandl
Journal: J Pediatr Date: 2017-06-16 Impact factor: 4.406

6. Comparison of the cohort selection performance of Australian Medicines Terminology to Anatomical Therapeutic Chemical mappings.

Authors: Guan N Guo; Jitendra Jonnagaddala; Sanjay Farshid; Vojtech Huser; Christian Reich; Siaw-Teng Liaw
Journal: J Am Med Inform Assoc Date: 2019-11-01 Impact factor: 4.497

7. Making work visible for electronic phenotype implementation: Lessons learned from the eMERGE network.

Authors: Ning Shang; Cong Liu; Luke V Rasmussen; Casey N Ta; Robert J Caroll; Barbara Benoit; Todd Lingren; Ozan Dikilitas; Frank D Mentch; David S Carrell; Wei-Qi Wei; Yuan Luo; Vivian S Gainer; Iftikhar J Kullo; Jennifer A Pacheco; Hakon Hakonarson; Theresa L Walunas; Joshua C Denny; Ken Wiley; Shawn N Murphy; George Hripcsak; Chunhua Weng
Journal: J Biomed Inform Date: 2019-09-19 Impact factor: 6.317

8. Phenotyping in Pediatric Traumatic Brain Injury.

Authors: Michael A Carlisle; Tellen D Bennett
Journal: Pediatr Crit Care Med Date: 2018-10 Impact factor: 3.624

9. Predicting age by mining electronic medical records with deep learning characterizes differences between chronological and physiological age.

Authors: Zichen Wang; Li Li; Benjamin S Glicksberg; Ariel Israel; Joel T Dudley; Avi Ma'ayan
Journal: J Biomed Inform Date: 2017-11-04 Impact factor: 6.317

10. Evaluating electronic health record data sources and algorithmic approaches to identify hypertensive individuals.

Authors: Pedro L Teixeira; Wei-Qi Wei; Robert M Cronin; Huan Mo; Jacob P VanHouten; Robert J Carroll; Eric LaRose; Lisa A Bastarache; S Trent Rosenbloom; Todd L Edwards; Dan M Roden; Thomas A Lasko; Richard A Dart; Anne M Nikolai; Peggy L Peissig; Joshua C Denny
Journal: J Am Med Inform Assoc Date: 2016-08-07 Impact factor: 4.497