| Literature DB >> 27629872 |
Harry Hochheiser1,2, Melissa Castine3, David Harris4, Guergana Savova4, Rebecca S Jacobson3,5,6.
Abstract
BACKGROUND: Standards, methods, and tools supporting the integration of clinical data and genomic information are an area of significant need and rapid growth in biomedical informatics. Integration of cancer clinical data and cancer genomic information poses unique challenges, because of the high volume and complexity of clinical data, as well as the heterogeneity and instability of cancer genome data when compared with germline data. Current information models of clinical and genomic data are not sufficiently expressive to represent individual observations and to aggregate those observations into longitudinal summaries over the course of cancer care. These models are acutely needed to support the development of systems and tools for generating the so called clinical "deep phenotype" of individual cancer patients, a process which remains almost entirely manual in cancer research and precision medicine.Entities:
Keywords: Cancer; Deep phenotyping; Information extraction; Information model
Mesh:
Year: 2016 PMID: 27629872 PMCID: PMC5024416 DOI: 10.1186/s12911-016-0358-4
Source DB: PubMed Journal: BMC Med Inform Decis Mak ISSN: 1472-6947 Impact factor: 2.796
Modeling requirements
| Requirement | Description | |
|---|---|---|
| R1 | Appropriate | Use accepted terminologies and vocabularies whenever possible |
| R2 | Cancer-specific content | Provide expressivity necessary to develop appropriately detailed descriptions of cancer treatment and progression |
| R3 | Available tooling | Align with existing APIs, schemata, validators, etc. |
| R4 | Community-driven modeling | Use community contributions and critiques to improve models |
| R5 | Compatibility with existing NLP | Facilitate interaction with existing NLP tools and type systems. |
| R6 | Combinations of structured and unstructured data | Support the combination of structured data represented in EMRs with unstructured details extracted from clinical texts. |
| R7 | Multi-level | Support summarization of data across multiple levels of abstraction, ranging from instances/mentions to documents, episodes (collections of records indicating a distinct phase in disease progression such as diagnosis or treatment), and high-level summaries of cancers and tumors. |
| R8 | Provenance | Preserve and expose linkages between abstracted models and source data |
Fig. 1A schematic representation of the workflow used by the authors to generate the FHIR cancer models
Fig. 2Classes used in cancer phenotype representations. Individual mentions extracted from NLP (Level 1) are instantiated as FHIR Objects, which are collected in Compositions corresponding to individual documents (Level 2). These FHIR objects become events that are aggregated into distinct Episodes of care (Level 3) and eventually analyzed to form patient and phenotype level summaries (Level 4)
Fig. 3Example patient records and their representation as compositions
Fig. 4Summarization of records from Fig. 3 into Episodes and Patient/Phenotype Summary
Attributes of (a) cancer and (b) Tumor phenotypes
| (a) | |
| Cancer Phenotype | |
| Cancer Type | carcinoma, sarcoma, etc. |
| Histologic Type | ductal, lobular, etc. |
| Tumor Extent | in-situ, invasive, etc. |
| Cancer Stage | Stage I, Stage IIA, etc. |
| T Classification | Primary Tumor Classification (pTis, T2a, etc.) |
| N Classification | Regional lymph node classification (pNx, N1, etc.) |
| M Classification | Distant metastasis classification (M0, M1, etc.) |
| Manifestations | Clinical and Molecular classifications of the cancer (hypercalcemia, hypercoaguability, etc.) |
| (b) | |
| Tumor Phenotype | |
| Cancer Type | by cell of origin (carcinoma, sarcoma, etc.) |
| Histologic Type | ductal, lobular, etc. |
| Tumor Extent | in-situ, invasive, etc. |
| Manifestations | Clinical and Molecular classifications of the tumor (size, receptor status, Nottingham score, etc.) |
Fig. 5An example abstraction rule and its expression in SWRL. Summarization rules convert assertions extracted from individual documents into higher-level summaries. (1) A subset of the upper-levels of the information model showing key concepts in representation of both instance and summary models. (2) A mapping of those concepts to levels in the information model. (3) A subset of the elements used in a Patient/Phentoype level summary. (4) A graphical example of a rule taking instances (5) and transforming them into a summary representation (6). This rule indicates that the value of a FISH test will take precedence over results of an IHC test. This rule is given in English (7), SWRL (8), and Drools (9)
Sample competency questions
| Category | Description | Sample question |
|---|---|---|
| Clinicalcriteria | Find patients matching some desired criteria, independent of temporal relations | Which patients have had atypical endometriosis? |
| Eventrel | Find patients who experience two or more clinically-relevant events, related by a specified time interval. | Which patients were given chemotherapy within eight weeks of their death? |
| Stratification | Given two sets of patients similar in key respects, compare certain outcomes based on stratification of categorical values such as care, phenotype, etc. | What portion of BRCA patients with PALB2 were given PARP inhibitor therapy? |
| Triangulation | Some information cannot be interpreted on the basis of any one source. Integration of related information from multiple sources is required to develop full understanding. | Which patients had medications that were ordered (as per physician charts), but not administered (as per Medication Administration Records (MARs) or nursing records)? |
| Schema | What information is available on which patients? | For which patients do I have a valid date of death? |