Literature DB >> 30323197

The added value of genetic information in colorectal cancer risk prediction models: development and evaluation in the UK Biobank prospective cohort study.

Todd Smith^1,2, Marc J Gunter³, Ioanna Tzoulaki^4,5,6, David C Muller^7,8.

Abstract

Colorectal cancer (CRC) risk prediction models could be used to risk-stratify the population to provide individually tailored screening provision. Using participants from the UK Biobank prospective cohort study, we evaluated whether the addition of a genetic risk score (GRS) could improve the performance of two previously validated models. Inclusion of the GRS did not appreciably improve discrimination of either model, and led to substantial miscalibration. Following recalibration the discrimination did not change, but good calibration for models incorporating the GRS was recovered. Comparing predictions between models with and without the GRS, 5% of participants or fewer changed their absolute risk by ±0.3% or more in either model. In summary, addition of a GRS did not meaningfully improve the performance of validated CRC-risk prediction models. At present, provision of genetic information is not useful for risk stratification for CRC.

Entities: Chemical Disease Species

Mesh：

Year: 2018 PMID： 30323197 PMCID： PMC6203780 DOI： 10.1038/s41416-018-0282-8

Source DB: PubMed Journal: Br J Cancer ISSN： 0007-0920 Impact factor: 7.640

Background

Colorectal cancer (CRC) is a substantial global health burden[1] and there is strong evidence that screening can reduce CRC mortality.[2-4] The efficacy of screening programmes may be enhanced by targeting screening and screening intensity to those at greatest risk.[5] Genome-wide association studies (GWAS) have identified over 40 independent loci unequivocally associated with the risk of CRC,[6] and there is increasing interest in developing genetic risk scores (GRS) for a personalised risk assessment.[7] To justify their use in clinical or population health practice, GRS must provide additional information over and above previously validated risk models.[8,9] Here, using data from the UK Biobank, we examined the predictive value of a GRS for CRC either alone or in combination with validated CRC-risk models.

Materials and methods

UK Biobank is a prospective cohort study of over 500,000 individuals[10] of whom 488,377 are genotyped[11] (Supplementary methods). Two of the best performing models (highest discrimination and good calibration) for the prediction of incident CRC,[12,13] identified from a systematic review and external validation study[5] were applied using data collected at baseline (Supplementary Table 1). Taylor et al.[13] calculated predicted absolute risk by combining age-specific rates of CRC with estimated relative risks for different degrees of CRC family history. Wells et al.[12] used a Cox regression model including age, diabetes, multi-vitamin usage, family history of colon cancer, years of education, body mass index, alcohol intake, physical activity, non-steroidal anti-inflammatory drug usage, red meat intake, smoking and oestrogen use (women only). Details of model calibration are presented in the supplementary methods. We constructed a weighted GRS as a linear combination of 41 autosomal single nucleotide polymorphisms (SNPs), with the allele dosage of each SNP multiplied by its associated log odds ratio from previously published GWAS studies (Supplementary Table 2).[6] Model performance was evaluated in terms of calibration and discrimination. Calibration was visually assessed by plotting observed probability (calculated using the Kaplan–Meier estimator) against mean predicted probability by tenths of the predicted risk. We also assessed calibration of predicted relative risks by plotting the estimated hazard ratio (estimated using flexible parametric survival models) as a function of model-predicted hazard ratios (HR). Discrimination was assessed using the C-statistic (with 1 representing a perfect ability to discriminate between those who will subsequently develop the outcome of interest, and 0.5 representing no better ability than chance). We assessed the performance of (i) the predicted probabilities of the base models, (ii) the GRS alone and (iii) the two combined. As age can itself strongly contribute to model performance, we additionally assessed discrimination of both models after removing the effect of age. To ensure comparable calibration between the published models and the models augmented with the GRS, we also fitted flexible parametric survival models.[14] We used two degree-of-freedom restricted cubic splines to model the baseline cumulative hazard of CRC, and included the overall predicted log HRs from the published models and the GRS as separate covariates. The fitted models were then used to predict 5-year absolute risks of CRC. Participants with missing data on any of the required covariates were excluded from the analysis, which led to a different number of available participants for each model. We conducted a sensitivity analysis including only those participants who could be used in both models to ensure that estimates of model performance were directly comparable. In a second sensitivity analysis we removed related participants by identifying pairs of individuals who were first- or second-degree relatives (kinship coefficient greater than 0.08),[15,16] and randomly dropping one member.

Results

The number of available participants was 361,543 for the Taylor et al.[13] model and 286,877 for the combined Wells et al.[12] model (Supplementary Figure 1), comprising 1623 and 1294 CRC cases, respectively. Comparison between those included and excluded for each model showed broadly comparable characteristics (Supplementary Table 3). The mean centred log GRS had a range of −2.022 to 2.411 and standard deviation of 0.495. It was weakly associated with self-reported family history of CRC, with a greater number of first degree relatives diagnosed with CRC associated with a higher GRS (Supplementary Table 4). In the sample used for the Wells et al.[12] model, the GRS alone provided modest discrimination for incident CRC (C-statistic 0.57, 95% CI: [0.55–0.58]), as it did for the sample used in the Taylor et al.[13] model (0.56 [0.55–0.58]). This is greater than the discrimination afforded by the Taylor et al.[13] model when excluding the effect of age (0.52 [0.51–0.53]), and comparable to that of the Wells et al.[12] model after the age coefficient had been removed (0.58 [0.57–0.60]). The subsequent combination of the GRS with the original models did not improve discrimination (Wells et al.[12] changed from 0.68 [0.67–0.69] to 0.69 [0.67–0.70], while for Taylor et al.[13] it changed from 0.67 [0.65–0.68] to 0.67 [0.66–0.68], Supplementary Table 5), and resulted in poor calibration with substantial over-estimation of risk for those in the upper tenth of predicted risk (Fig. 1a and Supplementary Figure 2). This miscalibration was evident even when considering only relative risks, with the GRS both alone and in combination with the published models implying relative risks far more extreme than those observed in these data (Supplementary Figures 3 and 4). On recalibration by fitting the predicted log-hazard ratios and GRS as covariates in the models, calibration of the models including the GRS was vastly improved, and comparable with that of the models excluding the GRS (Fig. 1a). There was little difference in discrimination performance of models in participants with and without a family history of CRC (Supplementary Table 5).

Fig. 1

a Calibration plots for the Taylor et al.[13] and Wells et al.[12] models in the UK Biobank. The original models were initially calibrated to the UK Biobank population and following this the genetic risk score (GRS) was combined with the model’s original coefficient(s). To ensure comparable calibration between models with and without the GRS, we then further recalibrated by the predicted log hazard from the original model as a covariate in a flexible parametric survival model by itself, and with the addition of the GRS. b Change in the 5-year predicted probabilities (expressed as a percentage) of the recalibrated models after the addition of the genetic risk score. The x-axes are the predicted probabilities from the original models, and the y-axes are the difference in predicted probabilities between the GRS-augmented models and the original models. Histograms display the distribution of data along each axis. Note that the ranges of the axes differ between the two panels. The crowding of points close to the horizontal line at 0 on the y-axis illustrates that the addition of the GRS did not affect the predicted probabilities for the majority of participants The inclusion of the GRS in the recalibrated models did not result in a substantive change in the predicted probability for the majority of participants (Fig. 1b). For example, only 5% or fewer of participants had a change in predicted risk of 0.3% points or greater (Supplementary Table 6). Sensitivity analyses restricted to participants available for inclusion in both models, as well as further restricting to unrelated individuals, did not substantially affect the discrimination or calibration (Supplementary Tables 7 and 8, Supplementary Figures 5 and 6).

Discussion

We examined the potential clinical utility of genetic information for CRC-risk prediction. In a large prospective cohort study, we showed that a GRS composed of 41 published, genome-wide significant SNPs for CRC, has poor discriminatory ability on its own and does not meaningfully improve model discrimination of established models, nor does it strongly influence the predicted probabilities for the vast majority of participants. To our knowledge this is the first investigation of GRS-enhanced risk prediction models for CRC that has assessed both calibration and discrimination. Jeon et al.[7] reported that a risk model including both genetic and environmental risk scores had slightly better discrimination than a model including an environmental risk score alone, but they could not assess model calibration. They also estimated individual recommended “starting ages” for screening, which differed by up to 12 years for men and 14 years for women. These estimates depend critically on the calibration of the model: any over- or under-estimation of risk will lead to more extreme variation in recommended starting ages, purely as an artefact of the miscalibration. We found that calibration of model-predicted probabilities deteriorated substantially with the inclusion of the GRS. This could have been due to inclusion of both the GRS and family history in the models, but we found that family history was only weakly associated with the GRS. Further, the GRS itself was miscalibrated, and implied relative risks that vastly overestimated the magnitude of the relative risks observed in our study. This is due to a phenomenon sometimes labelled as the “winner’s curse” or “statistical significance filter”, whereby estimates that surpass some threshold for significance tend to be overestimates of the underlying parameter. Our finding underlines the importance of careful recalibration of those GRSs based on SNPs selected as highly statistically significant in GWAS, and the potential for this to affect the performance of models, which do not assess or correct for it. This is particularly pertinent given that calibration is poorly reported in validation studies of risk prediction models and not commonly reported in GRS investigations, impairing the ability to assess the clinical usefulness of these models. Although the inclusion of the GRS did not meaningfully improve model discrimination overall, and did not substantially change the predicted probabilities for the vast majority of participants (for example, 95% of participants had a change in probability of less than 0.3% points), provision of genetic information may have some utility in a two-step risk assessment. We found that the proportion of participants whose predicted risk increased or decreased by 0.3% points or more after inclusion of the GRS was much higher among those who had an initial risk of 1% or greater. While these numbers are only for illustration, they demonstrate that the added value of a GRS for risk prediction will be greater if it is applied to those at higher initial risk, rather than an entire population. As larger studies are conducted more risk loci will likely be discovered, and more complete genetic information can potentially be incorporated into risk models. It is possible that the discrimination will improve beyond that already afforded by established risk models. On the other hand, as study sizes increase they will predominantly identify rare variants or variants that are more weakly associated with risk, so the potential for improvement in genetic prediction with the inclusion of these variants may be limited. In summary, inclusion of a GRS did not improve the performance of two previously validated CRC-risk prediction models. Any practical benefit of using the GRS for CRC prediction is likely to only affect people already predicted to be at high risk based on existing models.

Ethics approval and consent to participate

All participants provided written consent. UK Biobank has approval from the North West Multi-Centre Research Ethics Committee (MREC) and, in Scotland, the Community Health Index Advisory Group (CHIAG). Supplemental Material

12 in total

1. What makes a good predictor?: the evidence applied to coronary artery calcium score.

Authors: John P A Ioannidis; Ioanna Tzoulaki
Journal: JAMA Date: 2010-04-28 Impact factor: 56.272

2. Colorectal cancer predicted risk online (CRC-PRO) calculator using data from the multi-ethnic cohort study.

Authors: Brian J Wells; Michael W Kattan; Gregory S Cooper; Leila Jackson; Siran Koroukian
Journal: J Am Board Fam Med Date: 2014 Jan-Feb Impact factor: 2.657

Review 3. Assessment of claims of improved prediction beyond the Framingham risk score.

Authors: Ioanna Tzoulaki; George Liberopoulos; John P A Ioannidis
Journal: JAMA Date: 2009-12-02 Impact factor: 56.272

Review 4. Genetic architecture of colorectal cancer.

Authors: Ulrike Peters; Stephanie Bien; Niha Zubair
Journal: Gut Date: 2015-07-17 Impact factor: 23.059

5. Genetic Predisposition to High Blood Pressure and Lifestyle Factors: Associations With Midlife Blood Pressure Levels and Cardiovascular Events.

Authors: Raha Pazoki; Abbas Dehghan; Evangelos Evangelou; Helen Warren; He Gao; Mark Caulfield; Paul Elliott; Ioanna Tzoulaki
Journal: Circulation Date: 2017-12-18 Impact factor: 29.690

6. UK biobank: an open access resource for identifying the causes of a wide range of complex diseases of middle and old age.

Authors: Cathie Sudlow; John Gallacher; Naomi Allen; Valerie Beral; Paul Burton; John Danesh; Paul Downey; Paul Elliott; Jane Green; Martin Landray; Bette Liu; Paul Matthews; Giok Ong; Jill Pell; Alan Silman; Alan Young; Tim Sprosen; Tim Peakman; Rory Collins
Journal: PLoS Med Date: 2015-03-31 Impact factor: 11.069

Review 7. Effect of screening sigmoidoscopy and screening colonoscopy on colorectal cancer incidence and mortality: systematic review and meta-analysis of randomised controlled trials and observational studies.

Authors: Hermann Brenner; Christian Stock; Michael Hoffmeister
Journal: BMJ Date: 2014-04-09

8. Long term effects of once-only flexible sigmoidoscopy screening after 17 years of follow-up: the UK Flexible Sigmoidoscopy Screening randomised controlled trial.

Authors: Wendy Atkin; Kate Wooldrage; D Maxwell Parkin; Ines Kralj-Hans; Eilidh MacRae; Urvi Shah; Stephen Duffy; Amanda J Cross
Journal: Lancet Date: 2017-02-22 Impact factor: 79.321

9. Determining Risk of Colorectal Cancer and Starting Age of Screening Based on Lifestyle, Environmental, and Genetic Factors.

Authors: Jihyoun Jeon; Mengmeng Du; Robert E Schoen; Michael Hoffmeister; Polly A Newcomb; Sonja I Berndt; Bette Caan; Peter T Campbell; Andrew T Chan; Jenny Chang-Claude; Graham G Giles; Jian Gong; Tabitha A Harrison; Jeroen R Huyghe; Eric J Jacobs; Li Li; Yi Lin; Loïc Le Marchand; John D Potter; Conghui Qu; Stephanie A Bien; Niha Zubair; Robert J Macinnis; Daniel D Buchanan; John L Hopper; Yin Cao; Reiko Nishihara; Gad Rennert; Martha L Slattery; Duncan C Thomas; Michael O Woods; Ross L Prentice; Stephen B Gruber; Yingye Zheng; Hermann Brenner; Richard B Hayes; Emily White; Ulrike Peters; Li Hsu
Journal: Gastroenterology Date: 2018-02-17 Impact factor: 33.883

10. Comparison of prognostic models to predict the occurrence of colorectal cancer in asymptomatic individuals: a systematic literature review and external validation in the EPIC and UK Biobank prospective cohort studies.

Authors: Todd Smith; David C Muller; Karel G M Moons; Amanda J Cross; Mattias Johansson; Pietro Ferrari; Guy Fagherazzi; Petra H M Peeters; Gianluca Severi; Anika Hüsing; Rudolf Kaaks; Anne Tjonneland; Anja Olsen; Kim Overvad; Catalina Bonet; Miguel Rodriguez-Barranco; Jose Maria Huerta; Aurelio Barricarte Gurrea; Kathryn E Bradbury; Antonia Trichopoulou; Christina Bamia; Philippos Orfanos; Domenico Palli; Valeria Pala; Paolo Vineis; Bas Bueno-de-Mesquita; Bodil Ohlsson; Sophia Harlid; Bethany Van Guelpen; Guri Skeie; Elisabete Weiderpass; Mazda Jenab; Neil Murphy; Elio Riboli; Marc J Gunter; Krasimira Jekova Aleksandrova; Ioanna Tzoulaki
Journal: Gut Date: 2018-04-03 Impact factor: 23.059

8 in total

1. Risk Prediction Models for Colorectal Cancer Incorporating Common Genetic Variants: A Systematic Review.

Authors: Luke McGeoch; Catherine L Saunders; Simon J Griffin; Jon D Emery; Fiona M Walter; Deborah J Thompson; Antonis C Antoniou; Juliet A Usher-Smith
Journal: Cancer Epidemiol Biomarkers Prev Date: 2019-07-10 Impact factor: 4.254

2. Developing and validating polygenic risk scores for colorectal cancer risk prediction in East Asians.

Authors: Jie Ping; Yaohua Yang; Wanqing Wen; Sun-Seog Kweon; Koichi Matsuda; Wei-Hua Jia; Aesun Shin; Yu-Tang Gao; Keitaro Matsuo; Jeongseon Kim; Dong-Hyun Kim; Sun Ha Jee; Qiuyin Cai; Zhishan Chen; Ran Tao; Min-Ho Shin; Chizu Tanikawa; Zhi-Zhong Pan; Jae Hwan Oh; Isao Oze; Yoon-Ok Ahn; Keum Ji Jung; Zefang Ren; Xiao-Ou Shu; Jirong Long; Wei Zheng
Journal: Int J Cancer Date: 2022-07-21 Impact factor: 7.316

3. The advantages of UK Biobank's open-access strategy for health research.

Authors: M Conroy; J Sellors; M Effingham; T J Littlejohns; C Boultwood; L Gillions; C L M Sudlow; R Collins; N E Allen
Journal: J Intern Med Date: 2019-08-02 Impact factor: 8.989

Review 4. A risk-stratified approach to colorectal cancer prevention and diagnosis.

Authors: Mark A Hull; Colin J Rees; Linda Sharp; Sara Koo
Journal: Nat Rev Gastroenterol Hepatol Date: 2020-10-16 Impact factor: 46.802

5. Harmonising the human biobanking consent process: an Irish experience.

Authors: Lydia O'Sullivan; Tomás P Carroll; Niamh Clarke; Sarah Cooper; Ann Cullen; Laura Gorman; Billy McCann; Blánaid Mee; Nicola Miller; Verena Murphy; Máiréad Murray; Jackie O'Leary; Sharon O'Toole; Emma Snapes; Suzanne Bracken
Journal: HRB Open Res Date: 2021-09-15

6. Polygenic risk prediction models for colorectal cancer: a systematic review.

Authors: Michele Sassano; Marco Mariani; Gianluigi Quaranta; Roberta Pastorino; Stefania Boccia
Journal: BMC Cancer Date: 2022-01-15 Impact factor: 4.430

7. Genome-wide Modeling of Polygenic Risk Score in Colorectal Cancer Risk.

Authors: Minta Thomas; Lori C Sakoda; Michael Hoffmeister; Elisabeth A Rosenthal; Jeffrey K Lee; Franzel J B van Duijnhoven; Elizabeth A Platz; Anna H Wu; Christopher H Dampier; Albert de la Chapelle; Alicja Wolk; Amit D Joshi; Andrea Burnett-Hartman; Andrea Gsur; Annika Lindblom; Antoni Castells; Aung Ko Win; Bahram Namjou; Bethany Van Guelpen; Catherine M Tangen; Qianchuan He; Christopher I Li; Clemens Schafmayer; Corinne E Joshu; Cornelia M Ulrich; D Timothy Bishop; Daniel D Buchanan; Daniel Schaid; David A Drew; David C Muller; David Duggan; David R Crosslin; Demetrius Albanes; Edward L Giovannucci; Eric Larson; Flora Qu; Frank Mentch; Graham G Giles; Hakon Hakonarson; Heather Hampel; Ian B Stanaway; Jane C Figueiredo; Jeroen R Huyghe; Jessica Minnier; Jenny Chang-Claude; Jochen Hampe; John B Harley; Kala Visvanathan; Keith R Curtis; Kenneth Offit; Li Li; Loic Le Marchand; Ludmila Vodickova; Marc J Gunter; Mark A Jenkins; Martha L Slattery; Mathieu Lemire; Michael O Woods; Mingyang Song; Neil Murphy; Noralane M Lindor; Ozan Dikilitas; Paul D P Pharoah; Peter T Campbell; Polly A Newcomb; Roger L Milne; Robert J MacInnis; Sergi Castellví-Bel; Shuji Ogino; Sonja I Berndt; Stéphane Bézieau; Stephen N Thibodeau; Steven J Gallinger; Syed H Zaidi; Tabitha A Harrison; Temitope O Keku; Thomas J Hudson; Veronika Vymetalkova; Victor Moreno; Vicente Martín; Volker Arndt; Wei-Qi Wei; Wendy Chung; Yu-Ru Su; Richard B Hayes; Emily White; Pavel Vodicka; Graham Casey; Stephen B Gruber; Robert E Schoen; Andrew T Chan; John D Potter; Hermann Brenner; Gail P Jarvik; Douglas A Corley; Ulrike Peters; Li Hsu
Journal: Am J Hum Genet Date: 2020-08-05 Impact factor: 11.025

8. Pan-cancer analysis demonstrates that integrating polygenic risk scores with modifiable risk factors improves risk prediction.

Authors: Linda Kachuri; Rebecca E Graff; Karl Smith-Byrne; Travis J Meyers; Sara R Rashkin; Elad Ziv; John S Witte; Mattias Johansson
Journal: Nat Commun Date: 2020-11-27 Impact factor: 14.919

8 in total