Literature DB >> 35603307

Using cancer risk algorithms to improve risk estimates and referral decisions.

Olga Kostopoulou¹, Kavleen Arora¹, Bence Pálfi¹.

Abstract

Background: Cancer risk algorithms were introduced to clinical practice in the last decade, but they remain underused. We investigated whether General Practitioners (GPs) change their referral decisions in response to an unnamed algorithm, if decisions improve, and if changing decisions depends on having information about the algorithm and on whether GPs overestimated or underestimated risk.
Methods: 157 UK GPs were presented with 20 vignettes describing patients with possible colorectal cancer symptoms. GPs gave their risk estimates and inclination to refer. They then saw the risk score of an unnamed algorithm and could update their responses. Half of the sample was given information about the algorithm's derivation, validation, and accuracy. At the end, we measured their algorithm disposition. We analysed the data using multilevel regressions with random intercepts by GP and vignette.
Results: We find that, after receiving the algorithm's estimate, GPs' inclination to refer changes 26% of the time and their decisions switch entirely 3% of the time. Decisions become more consistent with the NICE 3% referral threshold (OR 1.45 [1.27, 1.65], p < .001). The algorithm's impact is greatest when GPs have underestimated risk. Information about the algorithm does not have a discernible effect on decisions but it results in a more positive GP disposition towards the algorithm. GPs' risk estimates become better calibrated over time, i.e., move closer to the algorithm. Conclusions: Cancer risk algorithms have the potential to improve cancer referral decisions. Their use as learning tools to improve risk estimates is promising and should be further investigated.

Entities: Chemical

Keywords: Diagnosis; Health services

Year: 2022 PMID： 35603307 PMCID： PMC9053195 DOI： 10.1038/s43856-021-00069-1

Source DB: PubMed Journal: Commun Med (Lond) ISSN： 2730-664X

Introduction

Improving cancer outcomes in England is a national priority. In 2018, 55% of cancers were diagnosed at stages 1 and 2[1]. NHS England aims to raise this to 75% by 2028 by improving the early diagnosis of cancer[2]. General Practitioners (GPs) can use the 2-week-wait (2WW) referral pathway if they suspect cancer; the patient is then seen by a specialist within a target of two weeks. It was recently demonstrated that the 2WW pathway is effective in improving cancer outcomes: higher use of the pathway was associated with lower mortality for common cancers and lower odds of late-stage diagnosis[3]. However, large variability between practices in their use of the 2WW referral pathway[4] means that it may not fulfil its full potential. The variability has partly been explained by the organisation of the local health services[5] and partly by GP decision making[6,7]. Discriminating patients who should be referred on the 2WW pathway from those who do not need to is difficult, especially where early cancers present with vague, non-specific symptoms that could easily be attributed to other conditions[8]. Using cancer risk calculators could improve cancer referral decision making[9], by helping GPs identify at-risk patients and, thus, reduce diagnostic delay, while reassuring them about low-risk patients who do not require referral, and thus avoid overloading the healthcare system. Cancer risk calculators are algorithms that calculate the probability that a patient with symptoms has a current, undiagnosed cancer. QCancer[10] and RAT[11] are two established cancer risk calculators which have been integrated with the electronic health record in some parts of UK primary care. Studies of the implementation of cancer risk calculators in clinical practice have had mixed results: a cohort study found an increase in the number of investigations ordered and cancers diagnosed after RATs were provided to primary care clinics[12]; a cluster randomised trial found no impact of GP education resources, which included RATs, on time to diagnosis[13]; and a qualitative study of GPs doing simulated consultations suggested distrust of QCancer when it conflicted with clinical judgement[14]. Indeed, despite these tools being available in primary care for almost a decade and their potential to improve the earlier diagnosis of cancer, they remain an underused resource[15]. The study reported here is the first in a planned series of studies aiming to investigate how cancer risk algorithms influence clinical risk assessments and referral decisions, and to identify ways to optimise their introduction and presentation. The study involved GPs responding to a series of clinical vignettes online. We investigated whether GPs change their referral decisions in response to an unnamed algorithm, if decisions improve, and what factors influence decision change. The factors that we investigated were the provision of information about the algorithm, and the position of the GPs’ initial risk estimates in relation to the algorithm, i.e., underestimation vs. overestimation of risk. GPs are not routinely informed about how algorithms that are introduced in their electronic health record have been elicited and validated and how accurate they are[16]. It is plausible to expect that such information would improve trust in the algorithm and lead to greater willingness to follow its advice and integrate its probabilities into one’s own risk assessment and referral decisions. We also expected that GPs would err on the side of caution, putting more importance on misses than false-positive referrals, and thus be less willing to change a referral decision when the algorithm suggested that the patient’s risk was lower (vs. higher) than what they had initially thought. Furthermore, we investigated GPs’ disposition towards the algorithm and associations with GP demographics, prior attitudes towards cancer risk calculators, and decision confidence. Only one of our hypotheses was confirmed: GPs were indeed more likely to change a referral decision if they had initially underestimated vs. overestimated risk. Having information about the algorithm’s derivation, validation and accuracy did not impact decisions. We measured a statistically significant improvement of decisions vis-à-vis the NICE 3% referral threshold. Finally, we observed that GPs’ risk estimates moved closer to the algorithm as the algorithm’s estimates were repeatedly presented to them over the series of vignettes. This is encouraging, as it suggests that such algorithms could be used to train clinicians to estimate cancer risk better.

Methods

Sample size

We powered the study to detect a small effect (f2 = 0.02) of the algorithm on referral decisions with alpha of 5% and power of 95% in a multiple linear regression. The G*Power software (v. 3.1.9.4) estimated that we would need at least 863 responses. To account for data clustering (each GP responding to 20 vignettes), we adjusted this number by the Design Effect (DE)[17]. This is calculated using the formula DE = 1 + (n–1)*ICC, where n is the cluster size (the 20 vignettes), and ICC is the intra-class correlation. We estimated the ICC from pilot data to be 0.088. Thus, DE = 2.68. We adjusted the number of participants required by multiplying the 863 required responses with the DE and dividing by the cluster size: (863*2.68)/20 = 116. Thus, we estimated that we needed to recruit a minimum of 116 GPs.

Materials

We prepared 23 clinical vignettes, each having a different combination of risk factors, symptoms and signs related to colorectal cancer. To prepare the vignettes, we used QCancer (https://qcancer.org), which is publicly available, as is its underlying computer code. We selected from the range of risk factors and symptoms that QCancer uses and employed them in different combinations, aiming for clinical plausibility and a wide range of risk across vignettes. Vignette risk ranged from 0.58% to 57.23% (mean 14.10%, SD 18.97, median 4.18). As well as creating some new vignettes, the majority were modified from those used in a previous study by the lead author[7]. Each vignette described a hypothetical patient presenting in general practice. All vignettes started with a list of demographics and risk factors (name, sex, age, BMI, smoking and alcohol intake), followed by the presenting problem. One or more of the relevant risk factors and symptoms in QCancer (type 2 diabetes, family history of gastrointestinal cancer, weight loss, appetite loss, abdominal pain, rectal bleeding, change in bowel habit, constipation, and anaemia) were incorporated into the description. All the vignettes are presented in the Supplementary Methods. Three of the vignettes were used for familiarisation purposes and no data were collected. The remaining twenty were split into two sets of ten to be completed on two different days to minimise fatigue. We made sure that the range, median, mean and standard deviation of risk estimates were almost identical in the two sets. We also counterbalanced the sets across participants, so that each set was completed first and second an equal number of times. All materials were presented online on the Qualtrics platform (qualtrics.com).

Procedure

Study approval was provided by the Health Research Authority (HRA) and Health & Care Research Wales (HCRW), REC reference 20/HRA/2418. An invitation email was sent to the 400 GPs in our database—a database of e-mails compiled by the lead author and consisting of participants in previous studies, all currently practising in England. The invitation email included a brief description of the study and outlined the benefits of participation: remuneration of £60, a completion certificate, and personalised feedback to use as evidence of continuous professional development (CPD) for their appraisal portfolio. Those interested in participating could follow a link in the email, which took them to an expression-of-interest form, where they could enter their NHS email address and GP practice code. After participants accessed the study site, they read an information sheet and provided consent online. They then completed demographics questions (age, gender, GP or GP trainee, year of GP qualification, and number of clinical sessions per week), and answered the following questions: “In general, how confident do you feel when assessing patients with symptoms that might indicate cancer?” (I always feel confident/I feel confident most of the time/I feel confident sometimes/I seldom feel confident). “Are you aware of any cancer risk algorithms that are being used in clinical practice to calculate a patient’s current risk of cancer (aka ‘cancer risk calculators’)?” (Yes/No). If they answered “yes”, they were then asked: “Are they available in the electronic health record that you use in your practice?” (Yes/No). If they answered “yes”, they were then asked to indicate which cancer risk algorithms were available in their practice, and they could choose one or more of the following options: RAT, QCancer, C the Signs, and Other. They were then asked how often they used these cancer risk algorithms (always/sometimes/never). Finally, all participants were asked to rate their attitude towards cancer risk calculators on a scale from “Very negative” (1) to “Very positive” (9). Half of the participants were randomly allocated to receive the following information about the study algorithm (Box 1): Participants in the algorithm information group then responded to three questions gauging understanding and trust. Specifically, they were asked if the description of the algorithm made sense to them (Yes/No), if they would trust this algorithm’s estimates (Definitely yes/Probably yes/ Probably not/Definitely not), and if they would like to have an algorithm like this in their clinical practice (Definitely yes/Probably yes/Probably not/Definitely not). All participants were then presented with the three practice vignettes in a random order. No data were collected at this stage and participants were informed of this. The aim of the practice vignettes was to familiarise participants with the task and help them calibrate their risk estimates, since GPs do not provide explicit cancer risk estimates on a routine basis. For this purpose, the practice vignettes represented three levels of risk of undiagnosed colorectal cancer: low (1%), medium (6%) and high (40%). The ten vignettes of the first set then followed in a random order. The procedure was exactly the same for all the vignettes, including the practice vignettes. Specifically, each vignette was followed by three questions: “Out of 100 patients with the same risk factors and symptoms as this patient, how many, in your clinical judgement, are likely to have colorectal cancer? Please type in a whole number between 0 and 100.” Responses could be typed in a box below the question. “What is the narrowest range which you are almost certain contains your estimate above? Enter the lower and upper limits in the boxes below. Make sure that your estimate falls within this range.” Respondents filled in the lower and upper limits in the following sentence: “I am almost certain that out of patients like this one, between and are likely to have colorectal cancer as yet undiagnosed.” “How likely is it that you would refer this patient on the 2WW pathway for suspected cancer ?” Responses were given on a rating scale: 1 (highly Unlikely), 2 (Unlikely), 3 (Uncertain), 4 (Likely), 5 (Highly likely). NB. Words in bold or italics appeared on the screen as they appear above. After these three questions were answered, the vignette was presented again, this time with the algorithmic estimate: “The algorithm estimates that out of 100 patients presenting like this is/are likely to have colorectal cancer. Your estimate was out of 100 (lower limit , upper limit ). If you wish to revise your initial estimates, please do so below. If you wish to stick with your initial estimates, please re-enter them below.” Participants were then invited to answer the same three questions as before. Following completion of the first 10 vignettes, participants had the opportunity to give feedback on any aspect of the study in free text. Twenty-four hours after completing the first set of 10 vignettes, participants were automatically sent a link to the second set. The procedure in the second study session was the same as in the first session. Participants who had received information about the algorithm in the first session were presented with it again at the start of the second session. After completing the second set of vignettes, all participants completed the Algorithm Disposition Questionnaire (ADQ). The ADQ consisted of seven statements: I found the algorithm’s risk estimates helpful. I think that the algorithm’s estimates were accurate. I felt irritated when receiving the algorithm’s estimates. I was happy to receive the algorithm’s estimates. I was frustrated when receiving the algorithm’s estimates. I felt more confident in my referral decisions, having received the algorithm’s estimates. I feel appreciative having access to the algorithm’s estimates. Respondents indicated their agreement with each statement on 7-point scales: 1 (strongly disagree), 2 (disagree), 3 (slightly disagree), 4 (neither disagree nor agree), 5 (slightly agree), 6 (agree), 7 (strongly agree). Statements 3 and 5 were reverse-scored. Finally, all participants were given the opportunity to comment on any aspect of the study, if they wished. Data collection took place between 27th June 2020 and 23rd September 2020—dates of first and last study completion. The algorithm aims to be used as a decision aid, to support 2WW cancer referral decisions. It is not intended to determine those decisions. The algorithm was derived from a large cohort study of 2.5 million patients in the UK. They used data in the primary care record of cancer patients to estimate associations between risk factors, symptoms/signs and a subsequent cancer diagnosis. The algorithm estimates the probability that a patient has colorectal cancer, given his/her risk factors and presenting symptoms/signs; in other words, how many people out of 100 with the same risk factors and presenting symptoms/signs are likely to have colorectal cancer. A study that validated the algorithm on another large cohort of patients, a proportion of whom had colorectal cancer, found that the algorithm performed very well: it discriminated correctly between cancer and non-cancer patients approximately 90% of the time (i.e., produced higher risk estimates for cancer than non-cancer patients).

Statistics and reproducibility

We aimed to measure the impact of our manipulation (algorithm information provided vs. not provided) and GPs’ over- vs. underestimation of risk on referral decisions. To this end, we created several variables, which we subsequently used in regression analyses.

Creation of variables

We created a dichotomous variable denoting the position of the GPs’ initial (i.e., pre-algorithm) risk estimates in relation to QCancer: overestimation (1) vs. underestimation (0). We excluded responses where intuitive estimates matched QCancer. To measure changes in risk estimates, we subtracted the final from the initial estimate, and signed the difference so that positive values indicated changes consistent with the algorithm (the final estimate was closer to the algorithm than the initial estimate) and negative values indicated changes inconsistent with the algorithm. Similarly, to measure changes in referral inclination, we subtracted the final from the initial response on the 1-5 scale and signed the difference so that positive values indicated changes consistent with the algorithm and negative values indicated changes inconsistent with the algorithm. For example, if the algorithm estimated a higher risk than the GP, who subsequently gave a higher value on the response scale, the raw difference of the two response values on the scale would be negative but the adjusted would be positive. We also created a simpler, dichotomous variable for referral inclination, indicating whether respondents moved from one point on the response scale to another: change (1) vs. no change (0). To determine whether referral decisions improved post-algorithm, we created two dichotomous variables: decision appropriateness (appropriate vs. not appropriate) and time of decision (pre- vs. post-algorithm). We defined decision appropriateness using the NICE risk threshold of 3% (https://www.nice.org.uk/guidance/ng12/evidence/full-guideline-pdf-2676000277). Therefore, if GPs indicated that they were either likely or highly likely to refer a vignette with QCancer risk score ≥3%, the decision was classed as appropriate. Similarly, if they indicated that they were either unlikely or highly unlikely to refer a vignette with QCancer risk score <3%, the decision was classed as appropriate. Otherwise, it was classed as inappropriate.

Regression models

All regression models were multilevel with random intercepts by GP and vignette, unless otherwise indicated. The regression tables are presented in Supplementary Note 1, in the sequence that they appear in the text. First, we ran two empty regression models, one for risk estimate changes and the other for changes in referral inclination to measure the impact of the algorithm on these two behavioural measures. To measure whether changes in risk estimates were associated with changes in referral inclination, we regressed inclination changes on risk estimate changes. We repeated the analysis as a logistic regression, using the simpler, dichotomous variable for inclination changes (change vs. no change). We then regressed each referral inclination variable on the two predictors of interest (algorithm information and position of GPs’ initial estimates vis-à-vis QCancer). We also explored the contribution of other variables by subsequently adding them to these two regression models in a single step: GP demographics (gender, years in general practice); confidence when assessing patients with symptoms that might indicate cancer; and general attitude towards cancer risk calculators. Using logistic regression, we regressed decision appropriateness on time of decision. In one analysis, uncertain decisions (i.e., those at the midpoint of the decision scale) were classed as inappropriate. We then repeated the analysis excluding uncertain decisions from the calculations. Finally, we explored whether any learning had taken place as a result of the QCancer score repeatedly presented after each vignette, by measuring whether GPs’ initial risk estimates improved over time, i.e., moved closer to QCancer. We defined improvement as a reduction in the difference between GPs’ initial risk estimates and QCancer. We used the absolute values of this difference to avoid situations where overestimation and underestimation cancelled each other out. We regressed this absolute difference on study session (1st vs. 2nd); in a separate model, we regressed it on vignette order (1–20). Finally, we explored predictors of algorithm disposition. Using simple linear regression, we regressed participants’ score on the Algorithm Disposition Questionnaire (ADQ score) on GP demographics (gender, years in general practice), confidence when assessing patients with symptoms that might indicate cancer, general attitude towards cancer risk calculators, and algorithm information (present vs. absent). We calculated the explained variance for each regression model using the r.squaredGLMM function of the MuMIn R package[18], which is based on the work of Nakagawa and colleagues[19]. We report both the marginal and conditional R2 (Supplementary Note 1). The marginal R2 indicates the explained variance by the fixed factors, and the conditional R2 indicates the variance explained by the whole model including the random effects. All analyses were conducted using Stata 17.0 and were confirmed in R (version 4.0.3). The dataset can be found in Supplementary Data 1.

Table 1

Access to and attitude towards cancer risk calculators.

Type of cancer risk calculator available at the practice	GPs with access to a cancer risk calculator at their practice	Attitude towards risk calculators* (mean, SD)
Qcancer	26 (55%)	5.08 (1.72)
C the Signs	7 (15%)	6.43 (1.27)
Qcancer & C the Signs	9 (19%)	6.11 (2.09)
Qcancer & RAT	2 (4%)	7 (2.83)
Other	3 (6%)	4 (2.65)
Total	47 (100%)	5.48 (1.88)

Numbers (%) of GPs who indicated that they had access to one or more cancer risk calculators at their practice and their attitude towards them, presented by type of cancer risk calculator available.

* “In general, how do you feel about having cancer risk calculators in clinical practice?” Response scale: “very negative” [1] to “very positive” [9].

Table 2

Frequency of use and attitudes towards cancer risk calculators.

Frequency of use of cancer risk calculators	GPs with access to a cancer risk calculator at their practice	Attitude towards cancer risk calculators (mean, SD)
Always	4 (9%)	8.25 (0.96)
Sometimes	24 (51%)	5.75 (1.70)
Never	19 (40%)	4.58 (1.64)
Total	47 (100%)	5.48 (1.88)

Frequency of cancer risk calculator use, where they were known to be available, and GPs’ attitude towards them.

Table 3

Frequency of referral decisions pre- and post-algorithm.

Referral decisions	Pre-algorithm	Post-algorithm
Unlikely (1) or highly unlikely (2)	545 (17.36%)	637 (20.29%)
Uncertain (3)	418 (30.67%)	381 (32.42%)
Likely (4) or highly likely (5)	2177 (69.33%)	2122 (67.58%)
Total	3140 (100%)	3140 (100%)

Decisions were measured on a 5-point scale ranging from 1 (highly unlikely) to 5 (highly likely), with a midpoint of 3 (uncertain).

Table 4

Changes in inclination to refer, risk estimates and QCancer.

Changes in inclination to refer		Risk estimate pre-algorithm	QCancer risk score	Risk estimate post-algorithm
Towards referral	327 (40.5%)	12.4% (13.0)	31.2% (22.6)	27.5% (20.1)
Away from referral	481 (59.5%)	22.6% (20.0)	3.5% (5.4)	7.6% (9.5)
Total	808 (100%)

Changes in the inclination to refer post-algorithm either towards or away from referral and associated means (SD) of GPs’ pre- and post-algorithm risk estimates and means (SD) of the QCancer risk score.

Table 5

Decision appropriateness.

	‘Uncertain’ responses excluded from count		‘Uncertain’ responses classed as inappropriate
	Pre-algorithm	Post-algorithm	Pre-algorithm	Post-algorithm
Appropriate	1925 (75.2%)	1984 (77.5%)	1982 (63.1%)	2147 (68.4%)
Inappropriate	635 (24.8%)	576 (22.5%)	1158 (36.9%)	993 (31.6%)
Total	2560 (100%)	2560 (100%)	3140 (100%)	3140 (100%)

Frequency of appropriate and inappropriate referral decisions before and after seeing the algorithm, with ‘uncertain’ responses first excluded and then included in the count as ‘inappropriate’.

25 in total

1. Exploring GPs' experiences of using diagnostic tools for cancer: a qualitative study in primary care.

Authors: Trish Green; Tanimola Martins; William Hamilton; Greg Rubin; Kathy Elliott; Una Macleod
Journal: Fam Pract Date: 2014-11-30 Impact factor: 2.267

2. Contribution of primary care organisation and specialist care provider to variation in GP referrals for suspected cancer: ecological analysis of national data.

Authors: Christopher Burton; Luke O'Neill; Phillip Oliver; Peter Murchie
Journal: BMJ Qual Saf Date: 2019-10-05 Impact factor: 7.035

3. Probability judgement in medicine: discounting unspecified possibilities.

Authors: D A Redelmeier; D J Koehler; V Liberman; A Tversky
Journal: Med Decis Making Date: 1995 Jul-Sep Impact factor: 2.583

4. You can lead a horse to water--improving physicians' knowledge of probabilities may not affect their decisions.

Authors: R M Poses; R D Cebul; R S Wigton
Journal: Med Decis Making Date: 1995 Jan-Mar Impact factor: 2.583

5. Development and validation of risk prediction algorithms to estimate future risk of common cancers in men and women: prospective cohort study.

Authors: Julia Hippisley-Cox; Carol Coupland
Journal: BMJ Open Date: 2015-03-17 Impact factor: 2.692

6. Disentangling the Relationship between Physician and Organizational Performance: A Signal Detection Approach.

Authors: Olga Kostopoulou; Martine Nurek; Brendan C Delaney
Journal: Med Decis Making Date: 2020-07-01 Impact factor: 2.583

7. Availability and use of cancer decision-support tools: a cross-sectional survey of UK primary care.

Authors: Sarah Price; Anne Spencer; Antonieta Medina-Lara; Willie Hamilton
Journal: Br J Gen Pract Date: 2019-05-07 Impact factor: 5.386

8. Referral Decision Making of General Practitioners: A Signal Detection Study.

Authors: Olga Kostopoulou; Martine Nurek; Simona Cantarella; Grace Okoli; Francesca Fiorentino; Brendan C Delaney
Journal: Med Decis Making Date: 2019-01 Impact factor: 2.583

9. Association between use of urgent suspected cancer referral and mortality and stage at diagnosis: a 5-year national cohort study.

Authors: Thomas Round; Carolynn Gildea; Mark Ashworth; Henrik Møller
Journal: Br J Gen Pract Date: 2020-05-28 Impact factor: 5.386

10. The coefficient of determination R² and intra-class correlation coefficient from generalized linear mixed-effects models revisited and expanded.

Authors: Shinichi Nakagawa; Paul C D Johnson; Holger Schielzeth
Journal: J R Soc Interface Date: 2017-09-13 Impact factor: 4.118

2 in total

1. Using cancer risk algorithms to improve risk estimates and referral decisions.

Authors: Olga Kostopoulou; Kavleen Arora; Bence Pálfi
Journal: Commun Med (Lond) Date: 2022-01-10

2. Algorithm-based advice taking and clinical judgement: impact of advice distance and algorithm information.

Authors: Bence Pálfi; Kavleen Arora; Olga Kostopoulou
Journal: Cogn Res Princ Implic Date: 2022-07-27

2 in total