Literature DB >> 25717392

EHR-based phenome wide association study in pancreatic cancer.

Tomasz Adamusiak1, Mary Shimoyama2.   

Abstract

BACKGROUND: Pancreatic cancer is one of the most common causes of cancer-related deaths in the United States, it is difficult to detect early and typically has a very poor prognosis. We present a novel method of large-scale clinical hypothesis generation based on phenome wide association study performed using Electronic Health Records (EHR) in a pancreatic cancer cohort.
METHODS: The study population consisted of 1,154 patients diagnosed with malignant neoplasm of pancreas seen at The Froedtert & The Medical College of Wisconsin academic medical center between the years 2004 and 2013. We evaluated death of a patient as the primary clinical outcome and tested its association with the phenome, which consisted of over 2.5 million structured clinical observations extracted out of the EHR including labs, medications, phenotypes, diseases and procedures. The individual observations were encoded in the EHR using 6,617 unique ICD-9, CPT-4, LOINC, and RxNorm codes. We remapped this initial code set into UMLS concepts and then hierarchically expanded to support generalization into the final set of 10,164 clinical concepts, which formed the final phenome. We then tested all possible pairwise associations between any of the original 10,164 concepts and death as the primary outcome.
RESULTS: After correcting for multiple testing and folding back (generalizing) child concepts were appropriate, we found 231 concepts to be significantly associated with death in the study population.
CONCLUSIONS: With the abundance of structured EHR data, phenome wide association studies combined with knowledge engineering can be a viable method of rapid hypothesis generation.

Entities:  

Year:  2014        PMID: 25717392      PMCID: PMC4333703     

Source DB:  PubMed          Journal:  AMIA Jt Summits Transl Sci Proc


Introduction

The Health Information Technology for Economic and Clinical Health (HITECH) Act introduced the concept of Meaningful Use of information technology in health care. As part of this process, the legislation mandated the use of standard terminologies for electronic exchange of health information. Patient clinical records represent a largely untapped treasure trove of research information, which only recently has become more accessible thanks to the increasing adoption of Electronic Health Records and healthcare data standards. The need to integrate and exchange clinical data has long been recognized1, but it was the HITECH Act that provided the final piece of the puzzle in terms of financial incentives. A number of terminology standards are currently in use. LOINC (Logical Observation Identifiers Names and Codes) is a universal standard for identifying laboratory observations2. RxNorm is a standardized nomenclature for generic and branded drugs, as well as drug delivery devices. RxNorm provides normalized names for clinical drugs and links its names to many of the drug vocabularies commonly used in pharmacy management and drug interaction software3. The Healthcare Common Procedure Coding System (HCPCS) maintained by the Centers for Medicare & Medicaid Services (CMS) is a standardized coding system for describing items and services provided in the delivery of healthcare4. It incorporates Current Procedural Terminology (CPT), a coding system maintained by the American Medical Association (AMA) to identify medical services and procedures furnished by physicians and other health care professionals5. International Classification of Diseases, Clinical Modification (ICD-9-CM) is an adaption created by the U.S. National Center for Health Statistics (NCHS) and used in assigning diagnostic and procedure codes associated with inpatient, outpatient, and physician office utilization in the United States6. All these terminologies are integrated within the UMLS (Unified Medical Language System) maintained by the National Library of Medicine (NLM)7. Current state of the art in extracting actionable information from EHR relies on large scale text-mining and NLP of clinical notes8,9 or either focuses on a specific terminology within the EHR, e.g., ICD-9-CM10,11 or looks into a handcrafted, small subset of EHR variables12. Our approach is novel in the sense that we analyzed the complete corpus of structured data within the EHR across all available terminology standards, as well as used an existing knowledge base (UMLS) to expand and generalize the findings.

Methods

Extract, Load and Transform (ELT)

A Limited Data Set, as defined under the Health Insurance Portability and Accountability Act (HIPAA), was obtained from the Medical College of Wisconsin Clinical Research Data Warehouse for this analysis. The data extract was in the form of standard Epic Clarity tables for a subset of patients that had an encounter or a problem list code in the Malignant neoplasm of pancreas (ICD9:157) code subset: Malignant neoplasm of pancreas Malignant neoplasm of head of pancreas Malignant neoplasm of body of pancreas Malignant neoplasm of tail of pancreas Malignant neoplasm of pancreatic duct Malignant neoplasm of islets of langerhans Malignant neoplasm of other specified sites of pancreas Malignant neoplasm of pancreas, part unspecified Data was loaded into our in-house clinical analytics portal (ClinMiner), which was used to dynamically translate between any of the underlying clinical terminologies, and provided a consolidated view of the underlying patient data in a single UMLS perspective13. Drug information in EHR was encoded using MediSpan terminology, one of the RxNorm sources, which facilitated its automatic translation into UMLS. Labs were encoded as orders using CPT-4 codes or using a fixed category from the CLARITY_COMPONENT lookup table. We have manually mapped 130 tests from CLARITY_COMPONENT to LOINC, which provided coverage for over 97% of all lab observations (1 493 101 observations in total). Remaining ~3% lab observations were left unmapped and excluded from further analysis. The source annotation space covered 6 617 unique ICD-9, CPT-4, LOINC, and RxNorm codes. This code set was then remapped into UMLS to facilitate further analysis, which resulted in 6 741 distinct UMLS CUIs (Concept Unique Identifiers). This code set was then expanded across a limited set of is_a and selected other relationships (e.g., has_ingredient for RxNorm drugs) as an extension of the method previously proposed in a method similar to that of parent child analysis described by Grossmann et al.14. But not beyond the original set of UMLS Metathesaurus semantic types of the expanded concepts to exclude functional concepts from the analysis and to keep the general meaning of the originating concept in the expansion. Additionally, the UMLS traversal was limited to either the UMLS Metathesaurus itself, or any of the following terminologies specific to Meaningful Use: RxNorm, NDF-RT, LOINC, SNOMED CT, HCPCS, and ICD-9-CM. This resulted in 18 038 concepts. Finally, we discarded 7 874 concepts that did not increase information content (i.e., were redundant in terms of partitioning of the underlying data) to reach the final ‘phenome’ of 10 162 concepts.

Statistical analysis

A chi-squared test was used to (χ2) to test the significance of the associations. To correct for multiple testing we used a Bonferroni correction and tested at a level of p < 4.9 × 10−6(0.05/10162). Odds Ratio (OR) and Relative Risk (RR) were used to assess the effect size of associations found to be significant.

Results

713 concepts were found to be significantly associated with death in the study population. Where both parent and its child concepts were found to be significant, child concepts were removed to further generalize the results and final result set was thus reduced to 231 terms. A breakdown of all concepts by category and number of observations is shown in Figure!1.
Figure 1:

Breakdown of all 10 164 concepts by semantic type category and proportion of observations annotated with a particular concept. Categories are defined as follows. Lab is any UMLS concept in the semantic type tree: A2.3.1. Clinical Attribute, A2.2.1 Laboratory or Test Result, or B1.3.1.1 Laboratory Procedure. Procedure is a UMLS concept that is in the B1.3.1 Health Activity branch, but is not a 1.3.1.1 Laboratory Procedure. Problem is a concept with a semantic type under B2.2.1.2 Pathologic function or A2.2.2 Sign or symptom. Medication groups all concepts classified by the UMLS Semantic Network either under semantic type A1.4 Substance or under A1.3.3 Clinical Drug. Finally, OTHER groups all other semantic types.

Most of the terms were positively correlated with death and only the following 9 concepts were found to be associated with lower relative risk of death in the study population: Immunoassay for tumor antigen, quantitative; CA 125 Vitamin D; 25 hydroxy, includes fraction(s), if performed Prealbumin measurement Racial group Benzoic acid or derivative Iodine AND/OR iodine compound Ionic iodinated contrast media Triiodobenzoic Acids sevoflurane Inhalant Solution For practical reasons, only the top ten (five from each side) significant associations are shown in Table 1. The complete result set encompassing all 231 significant associations is available as supplementary materials at http://dx.doi.org/10.6084/m9.figshare.816958.
Table 1:

Top ten events by effect size significantly associated with death in the study cohort.

LabelCUISemantic TypeExposed DeceasedExposed AliveNot Exposed DeceasedNot Exposed AlivepORRR
Increased Risk (RR > 1)
Cytopathology, fluids, washings or brushings, except cervical or vaginal; smears with interpretationC0374051Laboratory Procedure2888083108.31 × 10−119.122.80
CimetidineC0008783Pharmacologic Substance1768103212.03 × 10−67.142.60
Hyposmolality and/or hyponatremiaC0020645Finding22108063166.54 × 10−75.612.44
Osmolality; bloodC0373690Laboratory Procedure43257912952.29 × 10−104.612.32
Haptoglobin; quantitativeC0373631Laboratory Procedure25148023131.17 × 10−64.572.28
Decreased Risk (RR < 1)
sevoflurane Inhalant SolutionC1253873Clinical Drug7857313311.90 × 10−60.180.24
Triiodobenzoic AcidsC0041013Organic Chemical563324842823.00 × 10−150.280.39
Ionic iodinated contrast mediaC0361904Indicator, Reagent, or Diagnostic Aid573324842816.66 × 10−150.290.39
Iodine AND/OR iodine compoundC0303013Inorganic Chemical Pharmacologic Substance603384782781.38 × 10−140.300.40
Benzoic acid or derivativeC0578497Organic Chemical583284882804.41 × 10−140.300.41

Abbreviations: CUI – Concept Unique Identifier; OR – Odds Ratio; RR – Relative Risk.

Discussion

As with any retrospective observation the primary limitation is a lack of a prospective control group, which means the results can be biased due to an imbalanced design. It is also worth noting, that correlation does not imply causation. For example, while cytopathology was found to be associated with an increased risk of death, it is more likely due to selection bias. Patients with more advanced disease more frequently underwent the procedure as part of their diagnostic process. There are also limitation due to data incompleteness. For example, here we looked at known deaths from the EHR only and did not include data from outside sources such the National Death Index. On the other hand, retrospective designs have the advantage of observing real clinical practice. We have observed that the use of contrast media and medical gases used to induce anesthesia lowered the risk of death in the study population. This confirms an already known association between hospital resource utilization and patient mortality15,16. Cimetidine, an H2 receptor antagonist, has a known off-label use as an anticancer drug17–19. Paradoxically, its use was associated with an increased risk of death in our study population. However, this subpopulation was also older than the rest of the cohort, and likely increased mortality was due to a more advanced disease process. Without access to clinical notes, we can only speculate that perhaps this was a part of an experimental treatment. We see the potential to use this approach to automatically generate groupers or value sets of closely related concepts. This could be used either in the EHR to alert the physician to other possibly relevant features of patient presentation as well as on the research side to make more informative patient cohort selections. A major critique would be that they we only looked at association of the concept and not the value. Presence of an observation on a patient-level also discards the temporal and frequency information. On the other hand, this would also increase dimensionality of the analysis (cf. curse of dimensionality) and would require not only a more sophisticated statistical approach but could also suffer from lower statistical power. These are some of the challenges that we hope to address in future work.

Conclusions

Information contained in EHR combined with knowledge engineering could be used a a viable method of rapid hypothesis generation, but requires comprehensive validation.
  17 in total

1.  ICD-9-CM.

Authors:  R Finnegan
Journal:  J Am Med Rec Assoc       Date:  1986-07

2.  Implementation of RxNorm as a terminology mediation standard for exchanging pharmacy medication between federal agencies.

Authors:  Fola Parrish; Nhan Do; Omar Bouhaddou; Pradnya Warnekar
Journal:  AMIA Annu Symp Proc       Date:  2006

3.  Induction mortality and resource utilization in children treated for acute myeloid leukemia at free-standing pediatric hospitals in the United States.

Authors:  Marko Kavcic; Brian T Fisher; Yimei Li; Alix E Seif; Kari Torp; Dana M Walker; Yuan-Shung Huang; Grace E Lee; Sarah K Tasian; Marijana Vujkovic; Rochelle Bagatell; Richard Aplenc
Journal:  Cancer       Date:  2013-02-21       Impact factor: 6.860

4.  The Unified Medical Language System.

Authors:  D A Lindberg; B L Humphreys; A T McCray
Journal:  Methods Inf Med       Date:  1993-08       Impact factor: 2.176

5.  Identifying phenotypic signatures of neuropsychiatric disorders from electronic medical records.

Authors:  Svetlana Lyalina; Bethany Percha; Paea LePendu; Srinivasan V Iyer; Russ B Altman; Nigam H Shah
Journal:  J Am Med Inform Assoc       Date:  2013-08-16       Impact factor: 4.497

Review 6.  Cimetidine: an anticancer drug?

Authors:  Martina Kubecova; Katarina Kolostova; Daniela Pinterova; Grzegorz Kacprzak; Vladimir Bobek
Journal:  Eur J Pharm Sci       Date:  2011-02-15       Impact factor: 4.384

7.  Cimetidine suppresses lung tumor growth in mice through proapoptosis of myeloid-derived suppressor cells.

Authors:  Yisheng Zheng; Meng Xu; Xiao Li; Jinpeng Jia; Kexing Fan; Guoxiang Lai
Journal:  Mol Immunol       Date:  2012-12-05       Impact factor: 4.407

8.  PheWAS: demonstrating the feasibility of a phenome-wide scan to discover gene-disease associations.

Authors:  Joshua C Denny; Marylyn D Ritchie; Melissa A Basford; Jill M Pulley; Lisa Bastarache; Kristin Brown-Gentry; Deede Wang; Dan R Masys; Dan M Roden; Dana C Crawford
Journal:  Bioinformatics       Date:  2010-03-24       Impact factor: 6.937

9.  Relationship between global end-diastolic volume and cardiac output in critically ill infants and children.

Authors:  Corrado Cecchetti; Riccardo Lubrano; Sebastian Cristaldi; Francesca Stoppa; Maria Antonietta Barbieri; Marco Elli; Raffaele Masciangelo; Daniela Perrotta; Elisabetta Travasso; Claudia Raggi; Marco Marano; Nicola Pirozzi
Journal:  Crit Care Med       Date:  2008-03       Impact factor: 7.598

10.  Correlating electronic health record concepts with healthcare process events.

Authors:  George Hripcsak; David J Albers
Journal:  J Am Med Inform Assoc       Date:  2013-08-23       Impact factor: 4.497

View more
  2 in total

1.  Clinician Factors Associated With Prostate-Specific Antigen Screening in Older Veterans With Limited Life Expectancy.

Authors:  Victoria L Tang; Ying Shi; Kathy Fung; Jessica Tan; Roxanne Espaldon; Rebecca Sudore; Melisa L Wong; Louise C Walter
Journal:  JAMA Intern Med       Date:  2016-05-01       Impact factor: 21.873

2.  Next generation phenotyping using the unified medical language system.

Authors:  Tomasz Adamusiak; Naoki Shimoyama; Mary Shimoyama
Journal:  JMIR Med Inform       Date:  2014-03-18
  2 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.