Literature DB >> 27563686

A Framework to Support the Sharing and Reuse of Computable Phenotype Definitions Across Health Care Delivery and Clinical Research Applications.

Rachel L Richesson¹, Michelle M Smerek², C Blake Cameron³.

Abstract

INTRODUCTION: The ability to reproducibly identify clinically equivalent patient populations is critical to the vision of learning health care systems that implement and evaluate evidence-based treatments. The use of common or semantically equivalent phenotype definitions across research and health care use cases will support this aim. Currently, there is no single consolidated repository for computable phenotype definitions, making it difficult to find all definitions that already exist, and also hindering the sharing of definitions between user groups.
METHOD: Drawing from our experience in an academic medical center that supports a number of multisite research projects and quality improvement studies, we articulate a framework that will support the sharing of phenotype definitions across research and health care use cases, and highlight gaps and areas that need attention and collaborative solutions. FRAMEWORK: An infrastructure for re-using computable phenotype definitions and sharing experience across health care delivery and clinical research applications includes: access to a collection of existing phenotype definitions, information to evaluate their appropriateness for particular applications, a knowledge base of implementation guidance, supporting tools that are user-friendly and intuitive, and a willingness to use them. NEXT STEPS: We encourage prospective researchers and health administrators to re-use existing EHR-based condition definitions where appropriate and share their results with others to support a national culture of learning health care. There are a number of federally funded resources to support these activities, and research sponsors should encourage their use.

Entities: Disease Gene Species

Keywords: Computable Phenotypes; Data Standards; Electronic Health Records; Learning Health Care Systems

Year: 2016 PMID： 27563686 PMCID： PMC4975566 DOI： 10.13063/2327-9214.1232

Source DB: PubMed Journal: EGEMS (Wash DC) ISSN： 2327-9214

Introduction

Computable phenotypes, or electronic health record (EHR)-based condition definitions, enable the identification of cohorts of patients with certain diseases or clinical profiles for disease management registries, quality improvement programs, evaluation studies, and interventional research. Regardless of the application, cohort identification requires queries of clinical data stores that are both valid and reproducible. Currently, there is no single consolidated repository for computable phenotypes, making it difficult to find all definitions that already exist, and also hindering the sharing of definitions between user groups. Health services researchers and quality assessment groups—i.e., the National Quality Forum (NQF), National Committee for Quality Assurance (NCQA), the Centers for Medicare & Medicaid Services (CMS), and the Agency for Healthcare Research and Quality (AHRQ)—provide computable phenotypes on a number of websites.1–5 In addition, researchers and registry developers create definitions utilizing different design and evaluation methods. Because the definitional logic is often underspecified or unreported in scientific journals, it is not clear if the findings reported in published research or quality improvement are comparable or relevant to clinical populations, hindering the application of evidence-based medical and nursing care. We believe that a minimal set of well-constructed and explicit EHR-based phenotype definitions will create efficiencies for health care organizations that must increasingly support growing numbers of data requests related to comparative effectiveness research (CER), quality improvement, and chronic disease management. We further believe that such a set will facilitate synergies between research and care delivery, enabling “learning health care” practices6 and subsequently improving patient outcomes. A large-scale and multipurpose approach to sharing phenotype definitions will support the reuse of well-constructed and validated computable phenotypes, and will subsequently reduce the variation in definitions across conditions. Drawing from our experience from an academic medical center supporting a number of multisite research projects, we articulate a framework that will support the sharing of phenotype definitions across research and health care use cases, and highlight gaps or areas that need attention and collaborative solutions.

Background and Context

A “computable phenotype” is a definition of a condition, disease, or characteristic or clinical event that is based solely on data that can be processed by a computer.7 Computable phenotype definitions provide the specifications to identify populations of patients with conditions of interest, and can be combined with other criteria, such as age or other demographic information, to develop cohort populations for a variety of purposes. Quality monitoring organizations (such as NQF, NCQA, and AHRQ) create computable phenotype definitions for the development and monitoring of health care quality measures. A number of research networks have developed phenotype definitions to enable the use of EHR data for observational research (including comparative effectiveness studies) and interventional trials.8–11 Various multisite studies12,13 use these definitions to develop registries for drug safety surveillance14 or chronic disease management.15 There are numerous and distinct use cases for computable phenotypes for health care delivery (e.g., personalized medicine, guideline-based care, chronic disease management, and quality measurement) and biomedical research (genomic, observational, CER, health services research, and interventional trials.) Each use case represents different scientific disciplines whose phenotype development efforts have heretofore been undertaken in isolation, without the benefit of cooperation. Further, there are no standards of practice that encourage reuse of existing definitions or the use of common definitions for health care delivery and research uses. The lack of coordination of phenotype definitions among researchers, clinicians, and administrators has led to the unintentional proliferation of numerous definitions for many conditions and clinical profiles. Because each definition applies different logic (e.g., various combinations of diagnosis or procedure codes, medications, or laboratory tests) for querying EHR data, the resulting cohorts are often not directly comparable. It is unknown how much semantic variation in definitions actually exists, because this information is often underspecified in research publications. A recent report on national trends in diabetes specifically lists several related conditions (including hypoglycemia, neuropathy, chronic kidney disease, peripheral vascular disease, cognitive decline, cancers, and even differentiating type 1 from type 2 diabetes) whose prevalence could not be reported due to inconsistent EHR documentation and definitions across the United States.16 The consequent likelihood that research, patient care, and quality measurement communities are using different phenotype definitions for the same condition is more concerning. The COPD Outcomes-based Network for Clinical Effectiveness & Research Translation (CONCERT) assessed 980 patients sampled from various EHR systems using a clinical phenotype definition for chronic obstructive pulmonary disease (COPD), and found that just over half of those met the criteria for the well-accepted research definition for the condition.13 Further, they found that the patient populations retrieved by the clinical and research definitions for COPD had significantly different comorbidities and risk factors.13 This implies that disease management registries and quality improvement programs might be identifying populations that are different from those used in the development of the evidence upon which those supporting treatment strategies and interventions are based. The “research informs practice informs research” cycle that is the essence of learning health care systems entails that the clinical features used to define research and patient populations be well understood and comparable. Hence semantically equivalent phenotype definitions must be used to identify clinically equivalent populations. We believe that creating a centralized collection of explicitly defined computable phenotypes, with an accompanying knowledge base of development and validation documentation, is the first step toward consolidating effort and harmonizing definitions. Information, resources, and tools that facilitate the reuse of existing phenotypes will reduce the variation in phenotype definitions across all use cases, facilitate conversations between health care and research communities about how to compare definitions for different use cases, and ultimately lead to harmonization of definitions that will simplify and support the identification of clinically equivalent populations for research and health care purposes.

Framework Components

The reuse of phenotype definitions can be facilitated by their explicit representation and tools to support their evaluation and implementation in new applications. We propose that the deliberate and informed reuse of existing definitions will require four components: (1) searchable libraries of explicitly defined phenotype definitions; (2) supporting knowledge bases with information and methods; (3) tools to identify, evaluate, and implement existing phenotype definitions; and (4) motivated users and stakeholders to use them (Fig 1).

Figure 1.

Overview of Framework to Support the Reuse of Phenotype Definitions in Learning Health Care Systems

Searchable Libraries of Phenotype Definitions

The sharing of information about computable phenotype definitions will allow implementers to reuse appropriate existing definitions rather than creating their own. This requires access to an ample set of phenotype definitions, along with information that enables them to be evaluated and easily implemented. The ideal library should be indexed so that users can search by a number of different features including the clinical condition; the data elements; logic; the intended use case; limitations; and orientation toward precision, sensitivity or specificity. Mo and colleagues call for a formal computable representation of phenotype definitions that will enable scalability of the definitions by allowing them to be applied to different data systems.17 Their desiderata includes the following: human-readable and computable forms, structured rules, formalisms for temporal relations, representations for text searching and natural language processing, and interfaces for external software algorithms. They endorse the use of standardized terminologies, ontologies, and also the reuse of value sets. Additional information can be included in the library or underlying knowledge base to support users’ semantic understanding of the phenotype definition, and to enable selection of the appropriate definition to identify patient cohorts with the intended clinical features. Therefore, the definitions in a phenotype library should include metadata or supporting information about a definition, its intended use, the clinical rationale or research justification for the definition, and data about clinical and scientific validation in various health care settings. As an example, actual blood pressure measurements, even when they are available for long periods, did not contribute significantly to predictive models for hypertension control.18 Without clear supporting documentation, clinical subject matter experts may reject, as lacking face validity, well-validated phenotype definitions that do not match their expectations or intuition. Clinical practice and disease definitions change over time. Therefore, phenotype definitions in a phenotype library should reference the underlying clinical definitions or guidelines upon which they are based, in order to better identify legacy definitions that are out of date. In addition, phenotype definitions should conform to existing required and emerging terminologies and standards—e.g., SNOMED CT, LOINC, RxNorm, LOINC, NDF-RT—for representing clinical data, as endorsed by the Office of the National Coordinator.19 Adherence to standards allows for a modular design that reduces development and implementation costs, particularly at scale where multiple use cases for that standard may exist concurrently. Because phenotype definitions might perform differently when implemented in different patient populations and EHR systems, information about the performance of phenotypes in specific organizations should be collected from implementers and shared with future users. Implementation information is necessary to understand how standard definitions perform across diverse populations, heterogeneous organizations and EHRs systems. Specifically, information about the underlying population and quality (i.e., completeness, accuracy, consistency) of data that were used to validate the definitions have important implications for interpreting the validation results. For example, if a test population had 50 percent missing data in one of the defining variables for the phenotype, the provision of this information provides important contextual information about the definition’s performance. Similarly, the testing of phenotype definitions in populations with high versus low prevalence of disease will yield different results. Recommendations for data quality assessment reporting in pragmatic trials20 and observational research21,22 can provide insight into which data quality dimensions (e.g., completeness, accuracy) might be most useful to evaluate the phenotype definition. To maximize the socialization and collaboration around shared phenotypes, the ideal phenotype library should support communication between phenotype developers and implementers. The Centers for Medicare & Medicaid Services (CMS) employs a standardized approach for enabling users to post questions and share comments, and for maintaining quality measure definitions across multiple programs.23 Such a framework could be adapted for use with computable phenotype libraries. During early development, draft phenotype specifications could be posted in a library for public review to evaluate feasibility and refine use cases. During the validation phase, the testing methodology could be opened to public comment. Once validated, the library could facilitate communications of best practices and feedback. This would allow implementers to share information about their experiences implementing phenotype definitions in their local systems, and allow others to ask questions to inform the many practical decisions that are made when implementing abstract logic in local data systems. A collaborative or interactive component would also allow users to relate their experience implementing definitions in different vendor systems and in different patient populations. Over time, the library could collect data on usage and impact, and aggregate published literature based on each phenotype. A record of projects that have used or endorsed different phenotype definitions can enhance understanding of phenotype intent and performance, and can assist potential implementers in the selection of appropriate phenotypes. Because phenotype definitions are dynamic, the library should reference a phenotype life cycle, the current phenotype life cycle stage, and should include the status (e.g., in development, draft, final), as well as tracking the version number or date of last revision. Phenotypes could be marked as retired or archived in cases where clinical practice changes or the underlying clinical definitions or data standards become out of date. The Phenotype Knowledge Base (PheKB)24 is a large and well-indexed portal for hosting computable phenotypes, though enhancements are needed to accommodate the above information requirements. PheKB includes human-readable definitions and machine-readable code in some cases, but the code is not fully executable across heterogeneous EHR systems. The PheKB does have an interface for reporting contextual data and performance metrices of phenotype definitions, but a useful and usable display of these data is not yet standardized. Also, it is not known how widely PheKB is used outside of the Electronic Medical Records and Genomics (eMERGE) Network or Pharmacogenetic Research Network, whose goals are to implement decision support around clinically actionable genetic variants for clinical conditions. While several National Institutes of Health (NIH) Collaboratory and National Patient-Centered Clinical Research Network (PCORnet) investigators have added their phenotype definitions to PheKB, an increased uptake of PheKB by other research and clinical groups will require targeted marketing. Broader usage of PheKB might drive enhancements to the PheKB resource, but also will likely increase the number of user requirements. Several authoring tools exist, including the PheMA project,25 which provides generalizable computable representations and automated mapping tools.26 Other research networks, such as Observational Health Data Sciences and Informatics (OHDSI)27 and PCORnet,11 include dedicated phenotype working groups and internal inventories of phenotype definitions.

Knowledge Base of Information and Methods

Researchers and health care organizations need information about how to develop, evaluate, and implement phenotype definitions. The particular use case influences the nature of the phenotype definition and system requirements. For example, in quality measurement, the purpose of the phenotype definition is to identify “bread and butter” instances of a particular condition. Patients whose disease status is negative or indeterminate are excluded. By contrast, genomic research usually aims to reliably identify both cases and controls (negative cases). The phenotype definition must identify with reasonable certainty not only patients who have the condition (i.e., have adequate sensitivity), but also patients who clearly do not have the condition (i.e., high specificity). Definitions used in disease management registries or population health promotion activities have needs for higher sensitivity at the cost of specificity, whereas CER requires higher specificity and precision. Guidance from different health care and research communities can inform users about important features and performance thresholds for phenotype definitions for different use cases. Information to clarify data dependencies and implementation requirements is needed to facilitate the sharing of phenotypes across groups. For example, some definitions include natural language processing (NLP) components that might not be feasible for some target systems. The information in the knowledge base can include methods and case studies from projects that have implemented the definitions in multiple organizations; their customizations and lessons learned can inform future users. Evidence-based practice guidelines that include justification for a definition’s logic as well as the definition of “gold standard” for validation of EHR-based phenotype definitions should also be available. Although PheKB does include some “knowledge” in the form of phenotype development methods and validation protocols, it is limited and not tailored for different phenotype users. Information for a broader range of use cases is needed. Rethinking Clinical Trials: The Living Textbook of Pragmatic Clinical Trials9 provides a model for disseminating information in the form of “lessons learned” and case studies, rather than as empirical research. Many other research-network websites and collaborative networks perform this function, but a central portal to the knowledge from various networks would support potential implementers from multiple domains.

Tools

Formal representations of computable phenotypes, mappings to reference coding systems and (common) information models, and executable code can support the implementation of definitions in different populations. Mo’s desiderata highlights recommendations for clinical data representation to support phenotyping.17 This specifically calls for the structure of clinical data into queryable forms and the use of a common data model to support customization for the variability and availability of EHR data among sites. Since there currently are a number of (different) common data models used in research networks,28–30 there is a need for tools and platforms to implement a given phenotype definition in different contexts. Knowledge, authoring tools, and vocabulary mapping tools to support these activities can also be centrally available through a shared knowledge base31 or links to a code sharing base like GitHub. Similarly, the implementation of these definitions require terminology mappings (e.g., from drug class names in NDF-RT and medication sets from RxNorm to product codes in (NDC).32 Terminology integration resources, such as RxNorm, the Unified Medical Language System (UMLS) and UMLS Terminology Services (UTS) tools, can benefit phenotype use cases in many networks. To be more broadly used, these tools should be centrally available with supporting instructions for people from many different domains and levels of technical expertise. Specifically, tools are needed for the following uses: (1) searching for phenotype definitions that are endorsed or mandated; (2) browsing existing phenotypes to find ones that are potentially relevant that can be reused; (3) the display of relevant information to help potential implementers understand existing definitions and their strengths and limitations for particular uses; (4) the implementation of those definitions in local EHR systems with, e.g., executable code tailored to common data models, or mappings between coding systems; (5) developing new phenotype definitions if needed; and (6) reporting implementation results, along with characteristics of test data sets) for others to view (Table 1).

Table 1.

Types of Tools and Functionality Required to Support the Sharing and Reuse of Computable Phenotype Definitions Across Health Care Delivery and Clinical Research Applications

FUNCTION	PURPOSE	EXAMPLE OR POTENTIAL TOOL
Search for phenotype definitions.	Identify validated or endorsed phenotype definitions.	PheKB
Browse for phenotype definitions.	Assess landscape.	PheKB
Display pertinent context information.	Aid potential implementers in assessing a definitions fit for their use case.	needed*
Provide executable code in different formats (SQL, SAS, R, etc.) and crosswalks for mapping between different coding systems.	Implement phenotype definitions in heterogeneous systems.	PheKB,
GitHub
Develop new phenotype definitions.	Create new definitions when existing ones aren’t a good fit.	PheMA26CALYPSO33
Display implementation results with characteristics of the data in which phenotypes were implemented.	Provide additional information users need to consider when determining whether a definition is a good fit for their use case.	needed*

Note:

This represents a gap where tooling is needed. We are not aware of existing tools that support this function.

We see gaps and unmet needs in all areas except for phenotype development. At least two scalable authoring tools exist—PheMA26 with its execution support31 and OHDSI’s CALYPSO (Criteria Assessment Logic for Your Population Study in Observational data).33,34 Xu et al. provide a detailed inventory of other search and authoring tools.25 In addition to guided phenotype authoring tools based on the underlying model of the phenotype library, other tools theoretically could support an “import and transformation” process that could take existing definitions developed locally (with local tools) and store them in the central repository for other to access and use. The learning health system cannot exist on phenotypes alone. Any phenotype library would need to provide a service-based API that other computable clinical “services” might be able to access in a standardized way, e.g., electronic clinical-trial management tools that might access existing phenotype definitions in order to define the inclusion and exclusion criteria for a research trial. A number of functional components, e.g., standard models and vocabulary services, would in turn be needed to fully support the reuse of phenotype definitions on a grand scale.

Motivated Users and Stakeholders

The sharing of definitions and experience will require deliberate action on the part of potential phenotype developers and implementers, and useful and intuitive tools can support this behavior. Aligning existing computable phenotypes with users’ needs will likely positively influence their uptake, as will engaging all stakeholders in the design and development of phenotype resources and tools described in this framework. Additionally, a number of approaches can be used to motivate individuals to search for existing definitions and to share the outcomes of computable phenotype implementations. Possible approaches include creating incentives, increasing perceived benefit, establishing new social norms, or regulating with policies or regulations.

Perceived Benefits and Value

Collaboration is fostered when the collaborators expect or perceive a beneficial outcome. The more beneficial or significant the outcome, the higher the participation and commitment level among collaborators will be. Wilcox et al. assert that the costs for sustaining research infrastructure can be covered if value can be created.35 Thus, clear demonstrations of reduced workload, reduced costs, or faster development resulting from the reuse of phenotype definitions might motivate potential users.

Incentives

Tangible incentives can be created through policy or legislation. Examples include quality reporting incentives (e.g., the CMS Physician Quality Reporting System and the financial rewards of the Meaningful Use program), and punitive consequences for noncompliance with Food and Drug Administration (FDA) reporting specifications. Although these types of incentives might be effective, they are time-consuming and expensive to achieve. Alternative incentives might derive from some sort of peer pressure from the scientific community to report phenotype definitions as part of the research protocol or study results reporting in publications, or rewards for such behavior from research sponsors or in academic promotion rubrics.

Shared Values and Principles

A set of agreed upon assumptions and principles for research networks, sponsors, and health care regulators to adopt is the first step in addressing the complex challenges to reusing phenotype definitions. These should include a stated commitment to reproducible science and the standardized reporting of phenotype definitions, use case, and validation results. Additional principles could include an expectation that users of computable phenotypes will search for and consider existing definitions before creating their own. For conditions where a phenotype definition already exists, researchers should carefully consider whether the benefit of developing new definitions tailored to their specific use cases outweighs the losses incurred by sacrificing interoperability. Other principles might include that phenotype definitions should be placed in the public domain—regardless of whether they derived from federally funded research, national quality reporting incentive programs, or private ventures. Because phenotype definitions are developed for different purposes, populations, and settings, it is not feasible to define a set of definitions for all research needs. However, the explicit documentation and sharing of phenotype definitions and supporting evidence will enable researchers to evaluate and select the best available definitions for their populations and research needs. While there may be potential research integrity risks associated with using data or methods without full understanding of their limitations, a repository with information about the intent, maturity, and limitations of particular phenotype definitions can inform and empower potential users to use them appropriately and at their own prudence.

Vision of Shared Phenotype Definitions

The need for a shared or common vision has been identified as important success factors in collaborative projects. We provide a vision in the form of two scenarios that might motivate pan-network or cross-use case sharing of phenotype definitions (Box 1).

Scenario 1

An intervention specialist working for the Southeastern Diabetes Initiative (SEDI) wants to identify patients with type 2 diabetes across a number of health care providers in order to develop treatment programs and community interventions that will improve diabetes care. The specialist needs operational definitions for type 2 diabetes, as well as a number of associated conditions such as hypertension and chronic kidney disease. She goes to a central phenotype library and finds definitions for each condition that are appropriate for broad population screening and that can be implemented in all the SEDI sites, including one with no capacity for accessing clinical notes. She shares a link for each selected phenotype definition, plus implementation guidance and appropriate code, with the data specialists at each SEDI site. Each site implements the definition and reports their results to the phenotype library. One SEDI site had problems with the code and reported this experience as well. The original developer of the phenotype contacted the SEDI site with a suggestion. This suggestion was helpful and was therefore added to the knowledge base for other SEDI sites to access and review. Later, the study was published in a journal and referenced the link to the computable phenotype logic and supporting implementation tools. Using these definitions and tools, a new group of researchers replicated the intervention in an urban population on the West Coast and published their findings. This scenario was enabled by the following: Searchable libraries of explicitly defined phenotype definitions; Supporting knowledge bases with information and methods; Supporting tools; and Users and stakeholders motivated to consider reusing existing definitions; benefits from reuse and shared phenotype definitions were realized by the users.

Scenario 2

A clinician reviews the literature and finds a study of a new medical intervention for uncontrolled hypertension. She wants to implement it on a similar population in her clinic. The published article includes a narrative discussion of the inclusion and exclusion criteria (e.g., includes diagnosis of hypertension and excludes chronic kidney disease) with hyperlinks to a public phenotype library that hosts the computable phenotype specifications for the intervention population. The clinician points her data analyst to the phenotype specifications and requests a data warehouse query to estimate the number of patients that might be eligible for the planned intervention. After obtaining the required institutional approvals, she implements the intervention and conducts a formal quality improvement study. She publishes that study and references a public link to the phenotype library and knowledge base for the specific computable phenotype-definition logic and supporting implementation tools. Future implementers access the library for implementation details, rather than contacting this clinical investigator, allowing her more time to research and plan new chronic disease management interventions. This scenario was enabled by the following: Searchable libraries of explicitly defined phenotype definitions; Supporting knowledge bases with information and methods; Supporting tools; and Users and stakeholders motivated to consider reusing existing definitions; benefits from reuse and shared phenotype definitions were realized by the users.

Communication, Marketing, and Engagement

Communication and marketing of a set of principles and vision might enhance the engagement, participation, and support of stakeholders from multiple organizations and domains. Communication campaigns that inform potential users about the availability of existing computable phenotypes and increase their perception that reusing existing definitions will save them work, or produce a better definition (that has been previously tested) than they can do alone. Professional societies and medical advocacy groups may choose to endorse and curate authoritative phenotypes as a complement to guideline development activities. Further, models of sharing behavior could be manufactured and made visible, such as online exchanges between investigators that describe challenges or observations in implementing particular definitions in certain settings.

Protection from Risks

Inherent in understanding the motivation for sharing is to understand what fears or hesitations research investigators or project implementers might have. Anecdotally, the risks to sharing are concerns about publication, copyright, or inappropriate use. Phenotype developers might not feel their definition is of broad interest, thinking it too institution- or protocol specific to be of interest to other users, or they may have concern that it is not ready. These factors need to be researched and understood in order to create stronger alternative inventives or beliefs that will motivate developers of phenotypes to share their definitions.

Discussion

Computable phenotype definitions that are developed and represented in an explicit and standardized manner are necessary to ensure the consistency of clinical populations sampled for different purposes. The use of semantically equivalent phenotype definitions can enable the comparison of results across studies, and ensure that all patients can be reliably identified and offered evidence-based treatment options and opportunities for research. We do not suggest that a single definition per condition is feasible, nor that one definition per use case will necessarily be sufficient. However, we do suggest that some minimum set of definitions per condition can be identified to address the majority of use cases. It will be important to have resources and communication in place to ensure that the definitions are as accurate and scalable as possible, and that users can identify the definitions that are the best fit for their intended purposes. Within research networks, member investigators have a vested interest in maintaining the health of the network, and therefore are well incentivized to support policies, communication channels, and tools that enable and encourage the sharing and reuse of phenotype definitions within the network. Sharing phenotype definitions across networks or domains (e.g., from research to health care quality improvement) will be more challenging to motivate, as it involves multiple organizations and complex systems whose incentive structures may differ. Evidence-based methods that support the collaboration of diverse stakeholders to solve challenging problems in complex systems should be applied to support the sharing and standardization of computable phenotypes between health care and research. The lack of supporting theories and methods for complex cross-boundary collaborations36 illustrates a gap in learning health sciences that should be addressed.37 The learning health care paradigm will demand continuous development and refinement of new phenotypes to identify conditions of interest and to reflect changes in health care practice and EHR systems. Clinicians, health care administrators, investigators, and patients benefit from the use of explicitly defined and validated definitions for sampling, potential research participant identification, and broader analyses using data from EHRs. Collaboration around the development of computable phenotypes for emerging diseases, especially where consensus in professional societies is slow to emerge (e.g., the early years of HIV/AIDS) or varies over time, e.g., the Diagnostic and Statistical Manual of Mental Disorders, Fifth Edition (DSM-5)’s new classification of the autism spectrum—which is not concordant with prior definitions, might expedite their investigation and build consensus in professional society guidelines in a rapid learning environment. Similarly, standardized processes to update and periodically revalidate definitions—as knowledge of disease increases, and as coded terminologies, EHRs and patterns of health care delivery mature—will be required. The creation of a culture for sharing, reusing, and harmonizing phenotype definitions will require changes in thinking and behavior that can be enhanced by the following call to action for researchers and clinicians: (1) champion cultural changes and resource allocations that will enable the reuse of computable phenotype definitions where appropriate; (2) survey the landscape for existing and previously validated definitions that will meet the particular need before creating a new definition, and (3) provide phenotype definition logic and implementation performance or validation results, so that others can benefit from this knowledge. Champion cultural changes and resource allocations that will enable the reuse of computable phenotype definitions where appropriate. Survey the landscape for existing and previously validated definitions that will meet the particular need before creating a new definition. Provide phenotype definition logic and implementation performance or validation results, so that others can benefit from this knowledge. The vision of shared phenotype definitions between research and health care activities will ultimately require governance structures to control curation of phenotype knowledge, raising a number of questions that will need to be addressed: Who should be the guardians of such knowledge—a centrally controlled federal agency or commercial entity, or both? What are the types of criteria that would be used to accept a phenotype definition into the repository? Specifically, what gold standard evidence-based practice guideline sources are deemed of sufficient quality to be acceptable as a basis for phenotype definition? The perceived benefits of shared phenotypes might drive funding or advocacy for developing and enhancing resources to support the sharing and reuse of computable phenotype definitions across health care delivery and clinical research applications, but measurable results or return on investment effort will be necessary to maintain them and motivate widespread use in learning health systems. Financial models for phenotype contributors and users will need to be explored. Ultimately, the vision of shared phenotype definitions will only transpire if the libraries, knowledge bases, tools, and processes are usable and useful for users, and if the sharing of definitions creates efficiency for research and health care teams, as well as a synergy between them that benefits patients, payors, and other stakeholders.

Conclusions and Call to Action

The implementation of learning health care systems is gaining momentum, and the ability to reproducibly identify clinically equivalent patient populations is critical to implementing and evaluating evidence-based treatments in health care systems. The use of common or semantically equivalent phenotype definitions across research and health care use cases can support this aim. A national infrastructure for reusing phenotype definitions and sharing experience across health care delivery and clinical research applications will reduce duplicate efforts and increase efficiencies. Both research and provider communities need access to a collection of existing definitions, information to evaluate their appropriateness for particular applications, a knowledge base of implementation guidance, supporting tools that are user-friendly and intuitive, and a willingness to use them. We encourage prospective researchers and health administrators to reuse existing EHR-based condition definitions where appropriate and to share their results with others to support a national culture of learning health care. A number of federally funded resources support these activities, and research sponsors should encourage their use.

18 in total

1. Design considerations, architecture, and use of the Mini-Sentinel distributed data system.

Authors: Lesley H Curtis; Mark G Weiner; Denise M Boudreau; William O Cooper; Gregory W Daniel; Vinit P Nair; Marsha A Raebel; Nicolas U Beaulieu; Robert Rosofsky; Tiffany S Woodworth; Jeffrey S Brown
Journal: Pharmacoepidemiol Drug Saf Date: 2012-01 Impact factor: 2.890

Review 2. Review and evaluation of electronic health records-driven phenotype algorithm authoring tools for clinical and translational research.

Authors: Jie Xu; Luke V Rasmussen; Pamela L Shaw; Guoqian Jiang; Richard C Kiefer; Huan Mo; Jennifer A Pacheco; Peter Speltz; Qian Zhu; Joshua C Denny; Jyotishman Pathak; William K Thompson; Enid Montague
Journal: J Am Med Inform Assoc Date: 2015-07-29 Impact factor: 4.497

3. Curing consortium fatigue.

Authors: Magdalini Papadaki; Gigi Hirsch
Journal: Sci Transl Med Date: 2013-08-28 Impact factor: 17.956

4. Predicting changes in hypertension control using electronic health records from a chronic disease management program.

Authors: Jimeng Sun; Candace D McNaughton; Ping Zhang; Adam Perer; Aris Gkoulalas-Divanis; Joshua C Denny; Jacqueline Kirby; Thomas Lasko; Alexander Saip; Bradley A Malin
Journal: J Am Med Inform Assoc Date: 2013-09-17 Impact factor: 4.497

5. Multicenter study comparing case definitions used to identify patients with chronic obstructive pulmonary disease.

Authors: Valentin Prieto-Centurion; Andrew J Rolle; David H Au; Shannon S Carson; Ashley G Henderson; Todd A Lee; Peter K Lindenauer; Mary A McBurnie; Richard A Mularski; Edward T Naureckas; William M Vollmer; Binoy J Joese; Jerry A Krishnan
Journal: Am J Respir Crit Care Med Date: 2014-11-01 Impact factor: 21.405

6. Construction of a multisite DataLink using electronic health records for the identification, surveillance, prevention, and management of diabetes mellitus: the SUPREME-DM project.

Authors: Gregory A Nichols; Jay Desai; Jennifer Elston Lafata; Jean M Lawrence; Patrick J O'Connor; Ram D Pathak; Marsha A Raebel; Robert J Reid; Joseph V Selby; Barbara G Silverman; John F Steiner; W F Stewart; Suma Vupputuri; Beth Waitzfelder
Journal: Prev Chronic Dis Date: 2012-06-07 Impact factor: 2.830

7. A comprehensive framework for data quality assessment in CER.

Authors: Erin Holve; Michael Kahn; Meredith Nahm; Patrick Ryan; Nicole Weiskopf
Journal: AMIA Jt Summits Transl Sci Proc Date: 2013-03-18

8. A Standards-based Semantic Metadata Repository to Support EHR-driven Phenotype Authoring and Execution.

Authors: Guoqian Jiang; Harold R Solbrig; Richard Kiefer; Luke V Rasmussen; Huan Mo; Peter Speltz; William K Thompson; Joshua C Denny; Christopher G Chute; Jyotishman Pathak
Journal: Stud Health Technol Inform Date: 2015

9. Prevalence and Access of Secondary Source Medication Data: Evaluation of the Southeastern Diabetes Initiative (SEDI).

Authors: Bradi B Granger; Melodie Staton; Lindsay Peterson; Shelley A Rusincovitch
Journal: AMIA Jt Summits Transl Sci Proc Date: 2015-03-25

10. Launching PCORnet, a national patient-centered clinical research network.

Authors: Rachael L Fleurence; Lesley H Curtis; Robert M Califf; Richard Platt; Joe V Selby; Jeffrey S Brown
Journal: J Am Med Inform Assoc Date: 2014-05-12 Impact factor: 4.497

16 in total

Review 1. A primer on quantitative bias analysis with positive predictive values in research using electronic health data.

Authors: Sophia R Newcomer; Stan Xu; Martin Kulldorff; Matthew F Daley; Bruce Fireman; Jason M Glanz
Journal: J Am Med Inform Assoc Date: 2019-12-01 Impact factor: 4.497

2. Big Data for Nutrition Research in Pediatric Oncology: Current State and Framework for Advancement.

Authors: Charles A Phillips; Brad H Pollock
Journal: J Natl Cancer Inst Monogr Date: 2019-09-01

3. Ascertaining Depression Severity by Extracting Patient Health Questionnaire-9 (PHQ-9) Scores from Clinical Notes.

Authors: Prakash Adekkanattu; Evan T Sholle; Joseph DeFerio; Jyotishman Pathak; Stephen B Johnson; Thomas R Campion
Journal: AMIA Annu Symp Proc Date: 2018-12-05

4. Why Everyone Should Care About "Computable Phenotypes".

Authors: Robert C Tasker
Journal: Pediatr Crit Care Med Date: 2017-05 Impact factor: 3.624

5. Computable Phenotype Implementation for a National, Multicenter Pragmatic Clinical Trial: Lessons Learned From ADAPTABLE.

Authors: Faraz S Ahmad; Iben M Ricket; Bradley G Hammill; Lisa Eskenazi; Holly R Robertson; Lesley H Curtis; Cecilia D Dobi; Saket Girotra; Kevin Haynes; Jorge R Kizer; Sunil Kripalani; Mathew T Roe; Christianne L Roumie; Russ Waitman; W Schuyler Jones; Mark G Weiner
Journal: Circ Cardiovasc Qual Outcomes Date: 2020-05-29

6. Using machine learning to identify health outcomes from electronic health record data.

Authors: Jenna Wong; Mara Murray Horwitz; Li Zhou; Sengwee Toh
Journal: Curr Epidemiol Rep Date: 2018-09-20

7. SNOMED CT Concept Hierarchies for Sharing Definitions of Clinical Conditions Using Electronic Health Record Data.

Authors: Duwayne L Willett; Vaishnavi Kannan; Ling Chu; Joel R Buchanan; Ferdinand T Velasco; John D Clark; Jason S Fish; Adolfo R Ortuzar; Josh E Youngblood; Deepa G Bhat; Mujeeb A Basit
Journal: Appl Clin Inform Date: 2018-08-29 Impact factor: 2.342

8. Data and knowledge standards for learning health: A population management example using chronic kidney disease.

Authors: Blake Cameron; Brian Douthit; Rachel Richesson
Journal: Learn Health Syst Date: 2018-08-03

9. Concordium 2015: Strategic Uses of Evidence to Transform Delivery Systems.

Authors: Erin Holve; Samantha Weiss
Journal: EGEMS (Wash DC) Date: 2016-08-11

10. Smart Medical Information Technology for Healthcare (SMITH).

Authors: Alfred Winter; Sebastian Stäubert; Danny Ammon; Stephan Aiche; Oya Beyan; Verena Bischoff; Philipp Daumke; Stefan Decker; Gert Funkat; Jan E Gewehr; Armin de Greiff; Silke Haferkamp; Udo Hahn; Andreas Henkel; Toralf Kirsten; Thomas Klöss; Jörg Lippert; Matthias Löbe; Volker Lowitsch; Oliver Maassen; Jens Maschmann; Sven Meister; Rafael Mikolajczyk; Matthias Nüchter; Mathias W Pletz; Erhard Rahm; Morris Riedel; Kutaiba Saleh; Andreas Schuppert; Stefan Smers; André Stollenwerk; Stefan Uhlig; Thomas Wendt; Sven Zenker; Wolfgang Fleig; Gernot Marx; André Scherag; Markus Löffler
Journal: Methods Inf Med Date: 2018-07-17 Impact factor: 2.176