Maxwell Salvatore1, Lauren J Beesley1, Lars G Fritsche2, David Hanauer3, Xu Shi1, Alison M Mondul4, Celeste Leigh Pearce4, Bhramar Mukherjee5. 1. Department of Biostatistics, University of Michigan School of Public Health, Ann Arbor, MI 48109, United States. 2. Department of Biostatistics, University of Michigan School of Public Health, Ann Arbor, MI 48109, United States; Rogel Cancer Center, University of Michigan Medicine, Ann Arbor, MI 48109, United States; Center for Statistical Genetics, University of Michigan School of Public Health, Ann Arbor, MI, United States. 3. Department of Pediatrics, University of Michigan Medical School, Ann Arbor, MI 48109, United States. 4. Department of Epidemiology, University of Michigan School of Public Health, Ann Arbor, MI 48109, United States. 5. Department of Biostatistics, University of Michigan School of Public Health, Ann Arbor, MI 48109, United States. Electronic address: bhramar@umich.edu.
Abstract
BACKGROUND: Traditional methods for disease risk prediction and assessment, such as diagnostic tests using serum, urine, blood, saliva or imaging biomarkers, have been important for identifying high-risk individuals for many diseases, leading to early detection and improved survival. For pancreatic cancer, traditional methods for screening have been largely unsuccessful in identifying high-risk individuals in advance of disease progression leading to high mortality and poor survival. Electronic health records (EHR) linked to genetic profiles provide an opportunity to integrate multiple sources of patient information for risk prediction and stratification. We leverage a constellation of temporally associated diagnoses available in the EHR to construct a summary risk score, called a phenotype risk score (PheRS), for identifying individuals at high-risk for having pancreatic cancer. The proposed PheRS approach incorporates the time with respect to disease onset into the prediction framework. We combine and contrast the PheRS with more well-known measures of inherited susceptibility, namely, the polygenic risk scores (PRS) for prediction of pancreatic cancer. METHODOLOGY: We first calculated pairwise, unadjusted associations between pancreatic cancer diagnosis and all possible other diagnoses across the medical phenome. We call these pairwise associations co-occurrences. After accounting for cross-phenotype correlations, the multivariable association estimates from a subset of relatively independent diagnoses were used to create a weighted sum PheRS. We constructed time-restricted risk scores using data from 38,359 participants in the Michigan Genomics Initiative (MGI) based on the diagnoses contained in the EHR at 0, 1, 2, and 5 years prior to the target pancreatic cancer diagnosis. The PheRS was assessed for predictability in the UK Biobank (UKB). We tested the relative contribution of PheRS when added to a model containing a summary measure of inherited genetic susceptibility (PRS) plus other covariates like age, sex, smoking status, drinking status, and body mass index (BMI). RESULTS: Our exploration of co-occurrence patterns identified expected associations while also revealing unexpected relationships that may warrant closer attention. Solely using the pancreatic cancer PheRS at 5 years before the target diagnoses yielded an AUC of 0.60 (95% CI = [0.58, 0.62]) in UKB. A larger predictive model including PheRS, PRS, and the covariates at the 5-year threshold achieved an AUC of 0.74 (95% CI = [0.72, 0.76]) in UKB. We note that PheRS does contribute independently in the joint model. Finally, scores at the top percentiles of the PheRS distribution demonstrated promise in terms of risk stratification. Scores in the top 2% were 10.20 (95% CI = [9.34, 12.99]) times more likely to identify cases than those in the bottom 98% in UKB at the 5-year threshold prior to pancreatic cancer diagnosis. CONCLUSIONS: We developed a framework for creating a time-restricted PheRS from EHR data for pancreatic cancer using the rich information content of a medical phenome. In addition to identifying hypothesis-generating associations for future research, this PheRS demonstrates a potentially important contribution in identifying high-risk individuals, even after adjusting for PRS for pancreatic cancer and other traditional epidemiologic covariates. The methods are generalizable to other phenotypic traits.
BACKGROUND: Traditional methods for disease risk prediction and assessment, such as diagnostic tests using serum, urine, blood, saliva or imaging biomarkers, have been important for identifying high-risk individuals for many diseases, leading to early detection and improved survival. For pancreatic cancer, traditional methods for screening have been largely unsuccessful in identifying high-risk individuals in advance of disease progression leading to high mortality and poor survival. Electronic health records (EHR) linked to genetic profiles provide an opportunity to integrate multiple sources of patient information for risk prediction and stratification. We leverage a constellation of temporally associated diagnoses available in the EHR to construct a summary risk score, called a phenotype risk score (PheRS), for identifying individuals at high-risk for having pancreatic cancer. The proposed PheRS approach incorporates the time with respect to disease onset into the prediction framework. We combine and contrast the PheRS with more well-known measures of inherited susceptibility, namely, the polygenic risk scores (PRS) for prediction of pancreatic cancer. METHODOLOGY: We first calculated pairwise, unadjusted associations between pancreatic cancer diagnosis and all possible other diagnoses across the medical phenome. We call these pairwise associations co-occurrences. After accounting for cross-phenotype correlations, the multivariable association estimates from a subset of relatively independent diagnoses were used to create a weighted sum PheRS. We constructed time-restricted risk scores using data from 38,359 participants in the Michigan Genomics Initiative (MGI) based on the diagnoses contained in the EHR at 0, 1, 2, and 5 years prior to the target pancreatic cancer diagnosis. The PheRS was assessed for predictability in the UK Biobank (UKB). We tested the relative contribution of PheRS when added to a model containing a summary measure of inherited genetic susceptibility (PRS) plus other covariates like age, sex, smoking status, drinking status, and body mass index (BMI). RESULTS: Our exploration of co-occurrence patterns identified expected associations while also revealing unexpected relationships that may warrant closer attention. Solely using the pancreatic cancer PheRS at 5 years before the target diagnoses yielded an AUC of 0.60 (95% CI = [0.58, 0.62]) in UKB. A larger predictive model including PheRS, PRS, and the covariates at the 5-year threshold achieved an AUC of 0.74 (95% CI = [0.72, 0.76]) in UKB. We note that PheRS does contribute independently in the joint model. Finally, scores at the top percentiles of the PheRS distribution demonstrated promise in terms of risk stratification. Scores in the top 2% were 10.20 (95% CI = [9.34, 12.99]) times more likely to identify cases than those in the bottom 98% in UKB at the 5-year threshold prior to pancreatic cancer diagnosis. CONCLUSIONS: We developed a framework for creating a time-restricted PheRS from EHR data for pancreatic cancer using the rich information content of a medical phenome. In addition to identifying hypothesis-generating associations for future research, this PheRS demonstrates a potentially important contribution in identifying high-risk individuals, even after adjusting for PRS for pancreatic cancer and other traditional epidemiologic covariates. The methods are generalizable to other phenotypic traits.
Authors: David A Hanauer; Mohammed Saeed; Kai Zheng; Qiaozhu Mei; Kerby Shedden; Alan R Aronson; Naren Ramakrishnan Journal: J Am Med Inform Assoc Date: 2014-06-13 Impact factor: 4.497
Authors: Fei Chen; Erica J Childs; Evelina Mocci; Paige Bracci; Steven Gallinger; Donghui Li; Rachel E Neale; Sara H Olson; Ghislaine Scelo; William R Bamlet; Amanda L Blackford; Michael Borges; Paul Brennan; Kari G Chaffee; Priya Duggal; Manal J Hassan; Elizabeth A Holly; Rayjean J Hung; Michael G Goggins; Robert C Kurtz; Ann L Oberg; Irene Orlow; Herbert Yu; Gloria M Petersen; Harvey A Risch; Alison P Klein Journal: Cancer Epidemiol Biomarkers Prev Date: 2019-04-23 Impact factor: 4.254
Authors: Joshua C Denny; Lisa Bastarache; Marylyn D Ritchie; Robert J Carroll; Raquel Zink; Jonathan D Mosley; Julie R Field; Jill M Pulley; Andrea H Ramirez; Erica Bowton; Melissa A Basford; David S Carrell; Peggy L Peissig; Abel N Kho; Jennifer A Pacheco; Luke V Rasmussen; David R Crosslin; Paul K Crane; Jyotishman Pathak; Suzette J Bielinski; Sarah A Pendergrass; Hua Xu; Lucia A Hindorff; Rongling Li; Teri A Manolio; Christopher G Chute; Rex L Chisholm; Eric B Larson; Gail P Jarvik; Murray H Brilliant; Catherine A McCarty; Iftikhar J Kullo; Jonathan L Haines; Dana C Crawford; Daniel R Masys; Dan M Roden Journal: Nat Biotechnol Date: 2013-12 Impact factor: 54.908
Authors: Chen Wu; Peter Kraft; Rachael Stolzenberg-Solomon; Emily Steplowski; Michelle Brotzman; Mousheng Xu; Poorva Mudgal; Laufey Amundadottir; Alan A Arslan; H Bas Bueno-de-Mesquita; Myron Gross; Kathy Helzlsouer; Eric J Jacobs; Charles Kooperberg; Gloria M Petersen; Wei Zheng; Demetrius Albanes; Marie-Christine Boutron-Ruault; Julie E Buring; Federico Canzian; Guangwen Cao; Eric J Duell; Joanne W Elena; J Michael Gaziano; Edward L Giovannucci; Goran Hallmans; Amy Hutchinson; David J Hunter; Mazda Jenab; Guoliang Jiang; Kay-Tee Khaw; Andrea LaCroix; Zhaoshen Li; Julie B Mendelsohn; Salvatore Panico; Alpa V Patel; Zhi Rong Qian; Elio Riboli; Howard Sesso; Hongbing Shen; Xiao-Ou Shu; Anne Tjonneland; Geoffrey S Tobias; Dimitrios Trichopoulos; Jarmo Virtamo; Kala Visvanathan; Jean Wactawski-Wende; Chengfeng Wang; Kai Yu; Anne Zeleniuch-Jacquotte; Stephen Chanock; Robert Hoover; Patricia Hartge; Charles S Fuchs; Dongxin Lin; Brian M Wolpin Journal: Gut Date: 2012-11-24 Impact factor: 23.059