| Literature DB >> 34508578 |
Martin Chapman1, Shahzad Mumtaz2, Luke V Rasmussen3, Andreas Karwath4, Georgios V Gkoutos4, Chuang Gao2, Dan Thayer5, Jennifer A Pacheco3, Helen Parkinson6, Rachel L Richesson7, Emily Jefferson2, Spiros Denaxas8, Vasa Curcin1.
Abstract
BACKGROUND: High-quality phenotype definitions are desirable to enable the extraction of patient cohorts from large electronic health record repositories and are characterized by properties such as portability, reproducibility, and validity. Phenotype libraries, where definitions are stored, have the potential to contribute significantly to the quality of the definitions they host. In this work, we present a set of desiderata for the design of a next-generation phenotype library that is able to ensure the quality of hosted definitions by combining the functionality currently offered by disparate tooling.Entities:
Keywords: EHR-based phenotyping; computable phenotype; electronic health records; phenotype library
Mesh:
Year: 2021 PMID: 34508578 PMCID: PMC8434766 DOI: 10.1093/gigascience/giab059
Source DB: PubMed Journal: Gigascience ISSN: 2047-217X Impact factor: 6.524
Figure 1:The stages of the phenotype definition lifecycle supported by a next-generation phenotype library.
Phenotype definition formats
| Format | Description | Example | Category |
|---|---|---|---|
| Code list | A set of codes that must exist in a patient’s health record in order to include them within a phenotype cohort | COVID-19 ICD-10 code “U07.1" | Rule-based |
| Simple data elements | Formalizing the relationship between code-based data elements using logical connectives | COVID-19 ICD-10 code “U07.1" AND ICD-11 code “RA01.0" | Rule-based |
| Complex data elements | Formalizing the relationship between complex data elements, such as those derived via NLP | Patient’s blood pressure reading >140 OR patient notes contain “high BP" | Rule-based |
| Temporal | Prefix rules with temporal qualifiers | Albumin levels increased by 25% over 6 hours, high blood pressure reading has to occur during hospitalization | Rule-based |
| Trained classifier | Use rule-based definitions as the basis for constructing a classifier for future (or additional) cohorts | A | Probabilistic |
Figure 2:Python (executable) vs CQL (modelling) [21] representation of pharyngitis phenotype.
Figure 3:An example data provenance trace showing an update to a dementia phenotype, using the W3C PROV standard. The initial version of the phenotype (1) is updated by four edit activities (2), each of which modifies a component of the definition (e.g., record extract logic, diagnostic codes, previous history) (3), in order to generate a new version (4), and the process is linked with the author making these edits (5).
Phenotype validation mechanisms
| Mechanism | Description | Example |
|---|---|---|
| Disease registries | Compare the phenotype cohort with those present in the registry | Comparison of a diabetes phenotype cohort with those patients present in a diabetes registry (e.g., T1D exchange) |
| Chart review | Compare the phenotype cohort with the patients identified by manual review of medical records | Comparison with a diabetes gold standard, produced by double manual review of patient medical records |
| Cross-EHR concordance | Compare percentage of cases identified by a phenotype across different sources, and identify any overlap | Comparison of the percentage of patients identified by a diabetes phenotype in primary and secondary care EHRs, and the identification of any case overlap |
| Risk factors | Compare the magnitude of the phenotype cohort with standard risk calculations | Comparison with the output of a Cox hazards model |
| Prognosis | Compare the magnitude of the phenotype cohort with external prognosis models | Comparison with a survival analysis |
| Genetic associations | Compare whether the presence of a patient in a phenotype cohort is consistent with their genetic profile | A patient is more likely to be a valid member of a diabetes cohort if they have the HLA-DR3 gene |
Suggested library API functions with all requests made in, and responses returned in, YAML+Markdown/JSON/XML formats
| Function | User access level | Description | |
|---|---|---|---|
| Search | Simple search | Public | A free text search, examining the entire contents of the portal and returning a list of phenotypes that match the search criteria |
| Advanced search | Public | A free text search, examining specified sections of the portal (e.g., main content, just metadata) and returning a list of phenotypes that match the search criteria | |
| Phenotype extraction | Extracting specific phenotype(s) | Public | Given a phenotype ID supplied by a user (or generated by the platform), the API returns the phenotype definition |
| Extracting all phenotypes | Public | Return a full list of phenotypes | |
| Adding new phenotype(s) | Authorized users | Only authorized users should be allowed to submit either a single or group of phenotype definitions | |
| Updating a phenotype definition | Updating the contents of a specific phenotype | Authorized users | Each aspect of a phenotype definition—including constituent code lists, links to datasets where that phenotype appears, and other metadata—can be updated by passing a phenotype ID and the names of the fields to update and their new values. Each update should mark a version number to keep record of any updates over time |
| Updating a complete phenotype with multiple features | Authorized users | Update a phenotype's contents by passing a phenotype ID and submitting an updated phenotype definition file to replace the previous version for public view | |
| Submission of a new validation case study for an existing phenotype | Authorized users | Adding a new use case to validate an existing phenotype (identified by a phenotype ID) by passing a file | |
| Deletion of a phenotype | Removing a phenotype from public view (soft delete) | Private to portal administrators | An administrator of the portal can hide a phenotype definition by providing a phenotype ID |
| Removing a phenotype from the library (hard delete) | Private to portal administrators | An administrator of the portal can delete a phenotype definition entirely by providing a phenotype ID | |
Figure 4:Overview of the services that constitute the HDR UK phenotype library.
Figure 5:Metadata structure adopted by CALIBER (left) and PheKB (right).