Pascal S Brandt1, Jennifer A Pacheco2, Luke V Rasmussen2. 1. Department of Biomedical Informatics and Medical Education, University of Washington, Seattle, Washington, USA. 2. Feinberg School of Medicine, Northwestern University, Chicago, Illinois, USA.
Abstract
OBJECTIVE: The objective of this study is to create a repository of computable, technology-agnostic phenotype definitions for the purposes of analysis and automatic cohort identification. MATERIALS AND METHODS: We selected phenotype definitions from PheKB and excluded definitions that did not use structured data or were not used in published research. We translated these definitions into the Clinical Quality Language (CQL) and Fast Healthcare Interoperability Resources (FHIR) and validated them using code review and automated tests. RESULTS: A total of 33 phenotype definitions met our inclusion criteria. We developed 40 CQL libraries, 231 value sets, and 347 test cases. To support these test cases, a total of 1624 FHIR resources were created as test data. DISCUSSION AND CONCLUSION: Although a number of challenges were encountered while translating the phenotypes into structured form, such as requiring specialized knowledge, or imprecise, ambiguous, and conflicting language, we have created a repository and a development environment that can be used for future research on computable phenotypes.
OBJECTIVE: The objective of this study is to create a repository of computable, technology-agnostic phenotype definitions for the purposes of analysis and automatic cohort identification. MATERIALS AND METHODS: We selected phenotype definitions from PheKB and excluded definitions that did not use structured data or were not used in published research. We translated these definitions into the Clinical Quality Language (CQL) and Fast Healthcare Interoperability Resources (FHIR) and validated them using code review and automated tests. RESULTS: A total of 33 phenotype definitions met our inclusion criteria. We developed 40 CQL libraries, 231 value sets, and 347 test cases. To support these test cases, a total of 1624 FHIR resources were created as test data. DISCUSSION AND CONCLUSION: Although a number of challenges were encountered while translating the phenotypes into structured form, such as requiring specialized knowledge, or imprecise, ambiguous, and conflicting language, we have created a repository and a development environment that can be used for future research on computable phenotypes.
Sets of criteria that are used to identify cohorts of patients for clinical research
are referred to as phenotype definitions, or
phenotypes for brevity. Phenotype definitions must be executed
by implementers, often by manually translating textual descriptions of selection
criteria into executable code, in order to identify patient cohorts. The
heterogeneity of both the representation of the logic, as well as the data model
that underlies the logic, is a key barrier to evaluating phenotype implementations
to gain insight into the process of phenotype development.Systems such as Informatics for Integrating Biology and the Bedside (i2b2) and the Observational
Health Data Sciences and Informatics (OHDSI) platform, provide computable phenotypes that are
bound to their respective data models. These computable representations are
automatically translated into queries for a specific database system when the
phenotype is used for cohort identification. However, i2b2 and OHDSI phenotypes
cannot be directly shared between platforms, nor are they comparable without some
translation.Within the electronic clinical quality measure space, Clinical Quality Language (CQL;
https://cql.hl7.org/) is used for
the representation of similar criteria but is technology-agnostic. This means that
it is not coupled to any specific software implementation. Furthermore, CQL has been
shown to be a feasible logical expression language for representing clinically
validated phenotype definitions. It supports a wide range of Boolean, temporal,
aggregate, and other operations. The language is data model independent but does
require the selection of a data model when writing CQL. It works out of the box with
Fast Healthcare Interoperability Resources (FHIR; https://hl7.org/FHIR/), which is widely used and has recently become
the legally required standard for clinical data exchange in the United States.
Additionally, the Common Data Model Harmonization project provides mappings from FHIR to many other
common healthcare data models, maximizing potential utilization of CQL-based
phenotype definitions.
OBJECTIVES
In this work, we developed a database of phenotype definitions represented in a
structured, unambiguous, computable, technology-agnostic standardized format. This
representation would allow automated computational analysis and cohort
identification against data platforms that support FHIR, and CQL or for which
another mapping exists. We additionally provide a suite of test cases that, together
with the provided testing configuration, can be used to validate the correctness of
the phenotype definitions. Targeted to informaticians and research data analysts, we
provide these phenotypes to the clinical research informatics community as an
initial repository of computable technology-agnostic phenotype definitions, which we
hope we and others will extend over time, using the same methods and tools and
development environment.
MATERIALS AND METHODS
Data set
We selected phenotype definitions from the PheKB phenotype repository, as it is the most
mature and widely used in the United States. PheKB was initiated in 2012 and has
had phenotypes contributed from many projects and collaborative groups, most
notably the electronic Medical Records and Genomics Network. The repository is continuously growing,
with over 100 phenotypes in various stages of development.We extracted all public phenotype definitions available on May 22, 2020. We
automatically selected from the list of publicly available phenotypes those with
a status of “Final.” From this collection, we reviewed each
definition and excluded any phenotype entry that (1) did not make use of
structured data [ie, was entirely natural language processing (NLP)-based], (2)
that was a generic repository for submitting data, or (3) was not used in a
published research study. These criteria were chosen to ensure that our final
data set consisted only of completed and clinically validated phenotype
definitions. For each phenotype that met these criteria, we downloaded all files
linked to the phenotype definition. Source code for this step of the process is
available on GitHub (https://github.com/PheMA/phekb-export).
Translation
Two authors (PSB and LVR) independently translated each of the phenotype
definitions using the available metadata and artifacts downloaded from PheKB.
One author was primarily responsible for the translation of each phenotype into
CQL and FHIR, but the authors were not entirely blinded. Group discussion
amongst all authors was used to confirm interpretation of phenotype definitions
that were ambiguous.Standard terminologies such as the International Classification of Diseases
versions 9 (ICD-9) and 10 (ICD-10), Current Procedural Terminology (CPT),
Logical Observation Identifiers Names and Codes (LOINC), and RxNorm were used
for structured data and were represented using FHIR ValueSet
resources. These terminologies were usually specified in the phenotype
definitions, but where they were not, we used the recommended default
terminologies from the FHIR standard. We developed an open-source tool to
translate value sets in various formats into FHIR resources, and built an
interface to allow web-based interaction with the tool (https://github.com/PheMA/terminology-manager). The tool supports
Comma Separated Values files, as well as concept sets exported from the OHDSI
platform, into ValueSet resources. It also supports searching,
inspecting, and importing value sets directly from the Value Set Authority
Center (VSAC), using
the VSAC FHIR server.For each phenotype, we created a single CQL library that contained the logic
required to identify matching patients. Logic shared between phenotypes was
authored in shared libraries that were imported using the CQL
include operator. The shared libraries were iteratively
refined and expanded (with already developed phenotypes refactored to
incorporate new library functions) during the course of the project.We did not implement NLP logic, as there is no widely accepted standard
representation or implementation of this type of logic. To our knowledge, there
is currently no way to natively express NLP constructs using FHIR or CQL,
although this is an active area of research.,We adopted a number of conventions for the standards-based representation. First,
in this work, we only represent phenotype cases, and not controls, suspected
cases, or subtypes. Case definitions usually contain the most, and most varied,
criteria; thus, they serve as a good basis for analysis or extension. We adopted
the convention of creating a CQL statement in each library called
“Case,” which represents the entry point for evaluating the
phenotype definition. Additionally, unless explicitly stated otherwise, we
modeled drugs based on their RxNorm ingredient name and lab values based on the
highest ranked appropriate LOINC code.
Development environment
We made use of several open-source tools during the phenotype translation process
and published this development environment on GitHub (https://github.com/PheMA/phekb-phenotypes). We used Visual
Studio Code (https://code.visualstudio.com) as our primary CQL authoring
environment, and for syntax highlighting we used the
language-cql plugin (https://github.com/Jonnokc/Clinical-Quality-Language).To translate CQL into the equivalent machine-readable representation, known as
the Expression Logical Model (ELM), we used the reference implementation of the
CQL to ELM translator (https://github.com/cqframework/clinical_quality_language). For
testing, we used the CQL Testing Framework (CTF) developed by the Agency for
Healthcare Research and Quality (https://github.com/AHRQ-CDS/CQL-Testing-Framework). The CTF
provides a mechanism to specify test data, which are materialized as FHIR
resources, using a simple YAML file. The CTF also provides a configurable test
runner, which can run a specific CQL library against the test data generated by
the YAML specification, and assert that the results match what is expected. This
allows authors to carefully create test data to make sure phenotypes correctly
identify potentially tricky edge and corner cases.
Validation
We used 2 methods to ensure that our CQL phenotype definitions were correctly
translated from the artifacts available in PheKB. First, each phenotype was
translated by a single author, and then verified using a code review process.
The primary author created a pull request on GitHub (a way of isolating code for
a given purpose, in this case representing a single phenotype definition), and
the second author reviewed the code to make sure it accurately represented the
phenotype definition as described in PheKB.Secondly, we used an approach from software engineering called test-driven
development to ensure that our translations of phenotype logic and value sets
were correct. We made use of the CTF to implement this approach. In addition to
allowing the CQL author to express both test cases and FHIR data using YAML, the
CTF integrates with the Mocha JavaScript testing framework (https://mochajs.org) in order to
evaluate phenotype logic using the given data to assert that results produced
are correct. This evaluation is done using the ELM representation of the
phenotype, and the open-source JavaScript CQL engine (https://github.com/cqframework/cql-execution).All tests were run automatically on each code commit to ensure no regressions
were introduced. The full development and validation pipeline is shown in Figure 1.
Figure 1.
Phenotype definition development and verification pipeline.
Phenotype definition development and verification pipeline.
RESULTS
At the time of our extraction, there were a total of 71 publicly available phenotype
definitions in PheKB with a status of “Final.” We excluded 2
definitions that were not actually phenotypes. One was used as a placeholder to
publish new value sets, and one was the description of a risk model. We excluded 3
more that used only NLP criteria. Finally, from the remaining phenotypes, we
identified only those with associated publications. This selection process resulted
in 33 total phenotype definitions and is illustrated in Figure 2.
Figure 2.
Phenotype definition selection process.
Phenotype definition selection process.We created a total of 40 CQL libraries—one for each phenotype, and 7 helper
libraries, totaling 3327 lines of CQL code. A total of 231 value sets were
assembled, of which 216 were manually created and 15 were imported from VSAC. These
value sets consist of 17 948 individual codes, of which 13,340 are unique.
Additionally, 347 test cases were written that collectively contain 2044 test
assertions. To support these test cases, 347 patients, 96 encounters, 101
procedures, 335 medication orders, 385 conditions, and 360 observations were
manually created as FHIR resources using the CTF.
DISCUSSION
While building this repository, we observed numerous advantages to using popular
healthcare-specific standards such as FHIR and CQL. Advantages include convenient
conceptual models, mechanisms for verification, and the availability of tools,
documentation and expertise to provide assistance during development. This
repository will provide a collection of diverse, computable, verified, and
standardized phenotype definitions that will aid automated analysis.In addition, we note other benefits above our primary objective. First, the methods
and tools we used are documented within the repository and can be adopted by other
researchers and developers. Also, our use of a logical representation that is
technology and data model independent may facilitate automated execution by allowing
implementation sites to implement their own data providers for existing phenotype
definitions. Similarly, the use of common standard terminologies may enable
automatic mapping during local execution.During this work, we also experienced first hand many of the challenges that face
phenotype implementers. We encountered numerous occurrences of ambiguity,
underspecify, and imprecise language. For example, the Clopidogrel Poor
Metabolizers phenotype uses the phrase “within 30 days,”
but does not specify whether the interval boundary should be inclusive or not, or
whether 30 days both before or after the event should be considered. In each case,
the primary CQL developer had to confer with the other authors in order to determine
the exact semantics of the phenotype definition. Even then, we would occasionally
rely on subjective decisions regarding the intent. This resulted in a considerable
slowdown in implementation. We note that a benefit of having translated the
narrative definition to CQL is that the up-front investment in time has removed the
ambiguities for all subsequent users.Additionally, some phenotype definitions relied heavily on domain or tribal knowledge
not specified within the definition itself. This makes it difficult for
non-clinicians or healthcare outsiders to replicate research or use existing
phenotype definitions for new research. For example, the High-Density
Lipoproteins phenotype requires that a cohort member have at least one
“random glucose test,” but does not specify how these tests are to be
identified. We also encountered contradictory criteria definitions, for example, the
Bone Scan Utilization phenotype requires that a cohort member
be both >35 and ≥35 years old. The creation of this repository
demonstrates a step forward for these phenotypes. Although a formal representation
may not eliminate all these issues, it would require phenotype authors to be more
precise at the definition phase, which would reduce the cognitive load on
implementers.Although we did not formally track the amount of time to implement each phenotype,
authoring a formal definition in CQL and providing confirmatory tests does require
an additional investment in time and resources for phenotype authors. Furthermore,
it requires specialized informatics knowledge that has its own learning curve.
However, we believe that the reduction in ambiguity benefits all phenotype authors
(informaticians and research data analysts), and that when a phenotype is planned
for reuse or broader dissemination, the time spent by the phenotype author in
formalizing its representation using CQL has a cumulative payback each time the
phenotype is reused.This work demonstrates the realization of previous desiderata for computable
phenotypes, including supporting human and computable formats, set operations and
relational algebra, using well-defined temporal relationships, using standard
terminologies, and supporting standards–compliant interfaces to external
software. Given the
benefits of a concrete, unambiguous phenotype definition, we hope that phenotype
repository managers will encourage the inclusion of computable definitions, in
addition to providing APIs to allow integrating with their repositories.We acknowledge the following limitations of this work. First, given the subjective
nature of interpreting narrative phenotype definitions, we cannot guarantee fidelity
of the intended definition. The only way to determine semantic correctness would
have been to reach out to the original phenotype definition authors, who may not
have a definitive answer (given elapsed time from when some phenotypes were
authored). Additionally, our CQL-based phenotype definitions were not clinically
validated on actual datasets, although they are derived from clinically valid
phenotype definitions.
CONCLUSION
This repository of structured phenotype definitions provides clear definitions of
phenotype algorithms, represented in a format that facilitates automation of cohort
identification within supported data platforms. We believe that the provided data
set and development environment can be a resource for clinical informatics
practitioners and researchers who want to study phenotype definitions or identify
cohorts of patients for biomedical knowledge discovery.In the future, we plan to evaluate the phenotype definitions in the translated
dataset. This includes evaluating a single phenotype definition (represented using
the standard proposed in this work) at 3 large academic medical centers, with
performance being evaluated using manual chart review.
FUNDING
The Fulbright Foreign Student Program and the South African National Research
Foundation (to PSB). In part by NIH grant R01GM105688 and by the NHGRI through the
grant U01HG011169 (Northwestern University; to LVR and JAP).
AUTHOR CONTRIBUTIONS
All authors helped select the set of phenotypes to translate. PSB and LVR developed
the representation standard, the development environment, and translated the PheKB
phenotypes into the standard format. JAP was consulted to help resolve ambiguities
during translation. PSB wrote the first draft of the manuscript, and all authors
helped refine and edit the final version.
CONFLICT OF INTEREST STATEMENT
PSB is a consultant for Commure, Inc. LVR and JAP have no competing interests to
disclose.
DATA AVAILABILITY
The raw data underlying this article are available in PheKB at https://phekb.org/, and the
phenotypes available in GitHub are available at https://github.com/PheMA/phekb-phenotypes.
Authors: Jacqueline C Kirby; Peter Speltz; Luke V Rasmussen; Melissa Basford; Omri Gottesman; Peggy L Peissig; Jennifer A Pacheco; Gerard Tromp; Jyotishman Pathak; David S Carrell; Stephen B Ellis; Todd Lingren; Will K Thompson; Guergana Savova; Jonathan Haines; Dan M Roden; Paul A Harris; Joshua C Denny Journal: J Am Med Inform Assoc Date: 2016-03-28 Impact factor: 4.497
Authors: Shawn N Murphy; Griffin Weber; Michael Mendis; Vivian Gainer; Henry C Chueh; Susanne Churchill; Isaac Kohane Journal: J Am Med Inform Assoc Date: 2010 Mar-Apr Impact factor: 4.497
Authors: Catherine A McCarty; Rex L Chisholm; Christopher G Chute; Iftikhar J Kullo; Gail P Jarvik; Eric B Larson; Rongling Li; Daniel R Masys; Marylyn D Ritchie; Dan M Roden; Jeffery P Struewing; Wendy A Wolf Journal: BMC Med Genomics Date: 2011-01-26 Impact factor: 3.063
Authors: George Hripcsak; Jon D Duke; Nigam H Shah; Christian G Reich; Vojtech Huser; Martijn J Schuemie; Marc A Suchard; Rae Woong Park; Ian Chi Kei Wong; Peter R Rijnbeek; Johan van der Lei; Nicole Pratt; G Niklas Norén; Yu-Chuan Li; Paul E Stang; David Madigan; Patrick B Ryan Journal: Stud Health Technol Inform Date: 2015
Authors: Na Hong; Andrew Wen; Daniel J Stone; Shintaro Tsuji; Paul R Kingsbury; Luke V Rasmussen; Jennifer A Pacheco; Prakash Adekkanattu; Fei Wang; Yuan Luo; Jyotishman Pathak; Hongfang Liu; Guoqian Jiang Journal: J Biomed Inform Date: 2019-10-14 Impact factor: 6.317
Authors: Omri Gottesman; Helena Kuivaniemi; Gerard Tromp; W Andrew Faucett; Rongling Li; Teri A Manolio; Saskia C Sanderson; Joseph Kannry; Randi Zinberg; Melissa A Basford; Murray Brilliant; David J Carey; Rex L Chisholm; Christopher G Chute; John J Connolly; David Crosslin; Joshua C Denny; Carlos J Gallego; Jonathan L Haines; Hakon Hakonarson; John Harley; Gail P Jarvik; Isaac Kohane; Iftikhar J Kullo; Eric B Larson; Catherine McCarty; Marylyn D Ritchie; Dan M Roden; Maureen E Smith; Erwin P Böttinger; Marc S Williams Journal: Genet Med Date: 2013-06-06 Impact factor: 8.822
Authors: Huan Mo; William K Thompson; Luke V Rasmussen; Jennifer A Pacheco; Guoqian Jiang; Richard Kiefer; Qian Zhu; Jie Xu; Enid Montague; David S Carrell; Todd Lingren; Frank D Mentch; Yizhao Ni; Firas H Wehbe; Peggy L Peissig; Gerard Tromp; Eric B Larson; Christopher G Chute; Jyotishman Pathak; Joshua C Denny; Peter Speltz; Abel N Kho; Gail P Jarvik; Cosmin A Bejan; Marc S Williams; Kenneth Borthwick; Terrie E Kitchner; Dan M Roden; Paul A Harris Journal: J Am Med Inform Assoc Date: 2015-09-05 Impact factor: 4.497
Authors: Pascal S Brandt; Richard C Kiefer; Jennifer A Pacheco; Prakash Adekkanattu; Evan T Sholle; Faraz S Ahmad; Jie Xu; Zhenxing Xu; Jessica S Ancker; Fei Wang; Yuan Luo; Guoqian Jiang; Jyotishman Pathak; Luke V Rasmussen Journal: Learn Health Syst Date: 2020-06-25
Authors: Pascal S Brandt; Jennifer A Pacheco; Prakash Adekkanattu; Evan T Sholle; Sajjad Abedian; Daniel J Stone; David M Knaack; Jie Xu; Zhenxing Xu; Yifan Peng; Natalie C Benda; Fei Wang; Yuan Luo; Guoqian Jiang; Jyotishman Pathak; Luke V Rasmussen Journal: J Am Med Inform Assoc Date: 2022-08-16 Impact factor: 7.942
Authors: Carina Nina Vorisek; Moritz Lehne; Sophie Anne Ines Klopfenstein; Paula Josephine Mayer; Alexander Bartschke; Thomas Haese; Sylvia Thun Journal: JMIR Med Inform Date: 2022-07-19