| Literature DB >> 35351695 |
Ashley Kieran Clift1,2, Julia Hippisley-Cox2, David Dodwell3, Simon Lord4, Mike Brady4, Stavros Petrou2, Gary S Collins5.
Abstract
INTRODUCTION: Breast cancer is the most common cancer and the leading cause of cancer-related death in women worldwide. Risk prediction models may be useful to guide risk-reducing interventions (such as pharmacological agents) in women at increased risk or inform screening strategies for early detection methods such as screening. METHODS AND ANALYSIS: The study will use data for women aged 20-90 years between 2000 and 2020 from QResearch linked at the individual level to hospital episodes, cancer registry and death registry data. It will evaluate a set of modelling approaches to predict the risk of developing breast cancer within the next 10 years, the 'combined' risk of developing a breast cancer and then dying from it within 10 years, and the risk of breast cancer mortality within 10 years of diagnosis. Cox proportional hazards, competing risks, random survival forest, deep learning and XGBoost models will be explored. Models will be developed on the entire dataset, with 'apparent' performance reported, and internal-external cross-validation used to assess performance and geographical and temporal transportability (two 10-year time periods). Random effects meta-analysis will pool discrimination and calibration metric estimates from individual geographical units obtained from internal-external cross-validation. We will then externally validate the models in an independent dataset. Evaluation of performance heterogeneity will be conducted throughout, such as exploring performance across ethnic groups. ETHICS AND DISSEMINATION: Ethics approval was granted by the QResearch scientific committee (reference number REC 18/EM/0400: OX129). The results will be written up for submission to peer-reviewed journals. © Author(s) (or their employer(s)) 2022. Re-use permitted under CC BY. Published by BMJ.Entities:
Keywords: breast tumours; public health; statistics & research methods
Mesh:
Year: 2022 PMID: 35351695 PMCID: PMC8961149 DOI: 10.1136/bmjopen-2021-050828
Source DB: PubMed Journal: BMJ Open ISSN: 2044-6055 Impact factor: 2.692
Comparison of selected key existing risk prediction models for developing breast cancer
| Risk model, or study first author (year) | Study setting | Parameters | Risk trajectory modelled | Validation strategy used | Discrimination metrics | Calibration metrics |
| Tyrer-Cuzick model, also known as ‘IBIS’ (2004) | Model constructed from published data, informed by mathematical principles | Age, BRCA genotype, family history of breast cancer, (including relationship and ages), menarche, age at first birth, menopausal status, atypical hyperplasia, lobular carcinoma in situ, height, BMI | Diagnosis of breast cancer within next 10 years | None | Not reported | Not reported |
| van Veen (2018) | UK-based cohort study, women aged 46–73 years attending screening centres (n=9363) | Tyrer-Cuzick model (TCM) | Diagnosis of breast cancer within next 10 years | Predetermined scoring system applied to cohort | AUC 0.58 (0.52 to 0.62) | O/E 1.50 (1.33 to 1.70) |
| ‘Gail model’ | US-based case–control study, women aged 50+years (n=5998) | Age at menarche, age at first live birth, number of previous breast biopsies, number of first-degree relatives with breast cancer | Diagnosis of breast cancer within next 10, 20, and 30 years | None | Not reported | Not reported |
| Tice (2005) | US-based cohort study, women aged 35+years (n=81 777) | Gail model | Diagnosis of breast cancer (study: median follow-up 5.1 years, no explicit horizon) | Apparent model performance | C-index 0.67 (0.65 to 0.68) | None |
| ‘BCSC model’ | US cohort study of women aged 35+ years undergoing mammography (n=1 095 484) | Age, ethnicity, family history of breast cancer, breast biopsy history, breast density category | Diagnosis of breast cancer within next 5 years | Split-sample validation | C-index 0.66 (0.65 to 0.67) | O/E 1.03 (0.99 to 1.06) |
| QCancer Breast, Hippisley-Cox (2015) | England-based primary care open cohort, women aged 25–84 years (n=3 318 258) | Age, BMI, deprivation, ethnicity, alcohol intake, family history of breast cancer, benign breast disease, OCP use, oestrogen-containing HRT use, manic depression/schizophrenia, previous blood cancer, previous lung cancer, previous ovarian cancer | Diagnosis of breast cancer within next 10 years | Split-sample validation | C-index 0.761 (0.758 to 0.765) | Calibration plots by tenth of predicted risk |
AUC, area under the receiver operating curve; BMI, body mass index; HRT, hormone replacement therapy; OCP, oral contraceptive pill; O/E, observed to expected ratio; PRS, Polygenic Rsk Score.
Summary of candidate predictor variables that will be considered in this study
| Variable class | Variables (and functional form) |
| Demographic variables | Age (continuous variable) |
| Lifestyle factors | Smoking status (categorical, and also continuous if no of cigarettes per day is available) |
| Comorbidities and medical history (all binary, unless otherwise specified) | Previous ovarian cancer |
| Family history | Recorded family history of gynaecological cancer |
| Medications (at least three prescriptions prior to cohort entry; binary categorical) | Antihypertensives |
| Reproductive history | No of pregnancies (continuous or ordinal categorical) |
| Tumour characteristics (for diagnosed tumours) | Stage at diagnosis (ordinal categorical, I–IV) |
| Treatment variables (for diagnosed tumours) | Use of surgery |
As demonstrated, some classes of variables will only be appropriate for inclusion on models for certain outcomes of interest, that is, risk of death following a diagnosis of invasive breast cancer.
Figure 1Representation of the planned internal-external cross-validation schema that will concomitantly assess geographical and temporal transportability of each developed model. This permits the use of the entire dataset to develop and assess the performance of models, while also evaluating performance heterogeneity. Period 1 comprises 1 January 2000–31 December 2009; period 2 comprises 1 January 2010–31 December 2020.