Literature DB >> 32597991

Development and Validation of a Clinical Prediction Tool for Seasonal Influenza Vaccination in England.

Matthew M Loiacono^1,2, Nicholas Mitsakakis³, Jeffrey C Kwong^3,4,5,6,7, Gabriela B Gomez^8,9, Ayman Chit^1,2, Paul Grootendorst².

Abstract

Importance: Timely identification of patients likely to miss seasonal influenza vaccination (SIV) could help health care practitioners tailor services and gain efficiency. Objective: To develop and validate a predictive model of SIV uptake among at-risk adults. Design, Setting, and Participants: This prognostic study constructed a prediction model for vaccine uptake by adults at increased risk of influenza-associated complications. Drawing from the Clinical Practice Research Datalink database's records of primary care data of 324 284 adults routinely collected at general practices across England from January 2011 to December 2016, logistic regression models were trained on data from patients registered from January 2012 to December 2013 and validated with out-of-sample data from patients registered from January 2015 to December 2016. Data were extracted from the database December 2018 and analyzed between September 2019 and December 2019. Exposures: Covariates included sex, age, race/ethnicity, smoking status, socioeconomic status, previous pneumococcal vaccination, prior season SIV uptake, and clinical risk conditions. Main Outcomes and Measures: The main outcome was patient-level SIV uptake. Model performance was measured via misclassification rate, Brier score, sensitivity, specificity, and area under the curve.
Results: The training data sets consisted of 324 284 (aged 18 to 64 years) and 186 426 (aged 65 years or older) patients. The mean (SD) age in the training data among patients aged 18 to 64 years was 45 (13) years; 161 487 (49.8%) were women, and 102 133 (31.5%) were categorized as white. Among patients aged 65 years or older, the mean (SD) age was 77 (8) years; 96 169 (51.6%) were women, and 64 996 (34.9%) were categorized as white. The validation data sets consisted of 35 210 patients aged 18 to 64 years and 25 497 aged 65 years or older. The mean (SD) age in the validation data set among patients aged 18 to 64 years was 42 (14) years; 17 296 (49.1%) were women, and 13 346 (37.9%) were categorized as white. Among patients aged 65 years or older, the mean (SD) age was 73 (8) years; 13 135 (51.5%) were women, and 9641 (37.8) were categorized as white. Among patients aged 18 to 64 years, SIV uptake was 35.9% (95% CI, 35.7%-36.0%) and 32.6% (95% CI, 32.1%-33.1%) for the training and validation data sets, respectively. Among patients aged 65 years or older, SIV uptake was 83.1% (95% CI, 82.9%-83.2%) and 76.1% (95% CI, 75.5%-76.6%) for the training and validation data sets, respectively. Prior season SIV uptake and pneumococcal vaccination status were the best predictors of SIV uptake. Predicted SIV uptake probabilities for patients aged 18 to 64 years were reliable, but biased toward underpredicting, whereas, among patients aged 65 years or older, they were variable and biased toward overpredicting. Briefly, in out-of-sample validation among patients aged 18 to 64 years, misclassification rates were 0.163 to 0.164, Brier scores were 0.124 to 0.125, area under the receiver operating characteristic curve values ranged from 0.876 to 0.877, sensitivity ranged from 0.705 to 0.720, and specificity ranged from 0.896 to 0.902. In patients aged 65 years or older, misclassification rates were 0.120 to 0.125, Brier scores were 0.0953 to 0.0959, area under the receiver operating characteristic curve was 0.877, sensitivity ranged from 0.919 to 0.936, and specificity ranged from 0.680 to 0.753. Conclusions and Relevance: This study suggests that data obtained from primary care records could accurately predict SIV uptake among at-risk adults. Further research is needed to assess the feasibility and efficacy of implementing this model in clinical settings.

Entities: Disease Species

Mesh：

Substances：
Influenza Vaccines

Year: 2020 PMID： 32597991 PMCID： PMC7324952 DOI： 10.1001/jamanetworkopen.2020.7743

Source DB: PubMed Journal: JAMA Netw Open ISSN： 2574-3805

Introduction

Each year, millions of individuals develop severe influenza disease globally, and as many as 500 000 individuals die from influenza-associated complications.[1] While most are susceptible to influenza, the risk of influenza-related morbidity and mortality is greatest among specific subgroups, including young children, adults older than 65 years, pregnant women, those with certain chronic health conditions, and those who are immunosuppressed.[2,3] The seasonal influenza vaccine (SIV) remains the most effective means of reducing influenza-associated morbidity and mortality, especially among clinical risk groups.[1] Despite the known efficacy and safety of SIVs, as well as the provision of free or highly subsidized vaccines to eligible patients in many jurisdictions, SIV uptake among at-risk adults is suboptimal and has, at best, stagnated for more than a decade across many regions, including North America and Europe.[4,5,6] Countless efforts have been made to improve SIV uptake through patient- and health care professional (HCP)–level interventions, yet the gap between realized and optimal coverage remains.[7,8] Notwithstanding, numerous studies have highlighted the pivotal role of HCPs in influencing and advising a patient’s health-related behaviors, such as smoking cessation and cancer screening.[9,10] A similar role is taken with vaccinations, where HCP communications and recommendations have been shown to increase uptake of various adolescent and adult vaccines.[11,12,13] Leveraging this unique role, HCP-level interventions represent a promising avenue through which the SIV coverage gap may be addressed. Two commonly used HCP-level interventions include patient recall-reminder systems and software-based HCP-directed prompts, which have exhibited varying degrees of effectiveness.[14,15,16,17,18,19] Among recall-reminder systems, automated communication systems are most easily implemented, whereas personalized phone calls or even home visits may be more effective, but are substantially more resource intensive to implement.[17,20] Software-based prompts via electric health record (EHR) systems that remind HCPs to vaccinate a patient at the time of consultation are similarly easy to implement and have been shown to be effective across various health systems.[21,22] However, these prompts may be rendered ineffective if used too frequently (known as prompt fatigue) and, further, do not provide any unique insights into the patient’s likelihood of being vaccinated.[23,24] As for characterizing a patient’s likelihood of being vaccinated, prior SIV uptake determinant studies have shown that patient characteristics—including sociodemographic characteristics, health-related behaviors, and comorbidities—are associated with uptake.[25,26,27,28] Therefore, it stands that a patient’s SIV uptake could reasonably be predicted based on such characteristics, allowing HCPs to effectively identify patients who are less likely to be vaccinated during an upcoming season. Given stringent time constraints faced by HCPs, this insight may help them to optimally allocate their time and resources, through selective use of resource-intensive interventions as well as custom tailoring of software prompts to reflect the patient’s likelihood of SIV uptake.[29] Rapid improvements in the quality of primary care data can presumably be leveraged to generate real-time insights into a patient’s likelihood of receiving vaccination.[30] However, research in this area is limited; to our knowledge, only 1 study has attempted to construct a predictive model of SIV uptake using routinely collected primary care data.[31] In this study, we investigated the feasibility of developing and validating an SIV uptake prediction model based only on patient characteristics attainable from primary care data to estimate the probability of patient-level SIV uptake among a population of at-risk adults in England. Using data from the UK’s Clinical Practice Research Datalink (CPRD) database for model training and validation, we assessed the predictive performance of 3 forms of logistic regression models.

Methods

Data Source

Data used for model training and validation were derived from the CPRD database,[32] as described in Loiacono et al.[33] Briefly, a reference cohort (3 391 975 participants) was originally constructed to identify adults aged 18 years or older in the CPRD database who were registered to English practices for a minimum of 365 consecutive days between January 2011 and December 2016. The inclusion and exclusion criteria for this reference cohort were specified to identify patients with minimal gaps in registration and records of high enough quality for research purposes, as determined by the CPRD’s acceptability metric.[32] A detailed diagram of the reference cohort construction is available in eFigure 1 in the Supplement. From this reference cohort, we identified nonoverlapping cohorts of at-risk patients and constructed model training and out-of-sample validation data sets (Figure 1). Specific clinical risk conditions for inclusion, as defined by the National Health Service, included pregnancy, chronic renal disease, chronic heart disease, chronic respiratory disease, chronic liver disease, diabetes, immunosuppression, chronic neurological disease, and morbid obesity.[34] Training data consisted of patients registered to their practice from January 2012 to December 2013. Out-of-sample validation data consisted of patients registered to their practice from January 2015 to December 2016 and who were not present in the training data set. Two years of enrollment were required to assess the patient’s SIV uptake during the prior season. Data were extracted from the CPRD database in December 2018 and analyzed from September 2019 to December 2019.

Figure 1.

Construction of Training and Validation Data Sets

aAs constructed in Loiacono MM et al.33

bPatients aged 65 years or more without indices of multiple deprivation data were excluded (47 170 patients [20.2%] for training data; 9547 patients [27.2%] for validation data).

Construction of Training and Validation Data Sets

aAs constructed in Loiacono MM et al.33 bPatients aged 65 years or more without indices of multiple deprivation data were excluded (47 170 patients [20.2%] for training data; 9547 patients [27.2%] for validation data). Among the training and validation data sets, patients without at least 1 clinical risk condition during both the observed and prior influenza season were excluded. While the National Health Service also identifies patient age as a risk condition (≥65 years), in this study only older adults with at least 1 additional clinical risk condition were included to focus model prediction on older patients at greatest risk of influenza-associated morbidity and mortality. Training and validation data sets were stratified by patient age (18-64 years and ≥65 years). This study received approval by the Independent Scientific Advisory Committee of CPRD. Informed consent from study participants was not required, given that individual-level consent was provided prior to data collection, and all data were deidentified prior to CPRD’s collection. The development and validation of the prediction model in this study was performed in accordance with the Transparent Reporting of a Multivariable Prediction Model for Individual Prognosis or Diagnosis (TRIPOD) reporting guideline.[35]

Variables

The dependent variable measured was patient-level SIV uptake. Uptake was assessed annually between September 1 and December 31, a timeframe encompassing most SIVs administered.[36] Patients vaccinated outside of this window were excluded as outliers (10 203 patients [0.5%] excluded from training data; 1204 patients [0.1%] excluded from validation data). Covariates included sex, age, race/ethnicity, smoking status, socioeconomic status, history of pneumococcal vaccination, prior season SIV uptake, and clinical risk conditions.[34] Age was treated as a continuous variable. Ethnicity was defined as specified by the Office of National Statistics.[37] Patient socioeconomic status was approximated using the Indices of Multiple Deprivation (IMD), a socioeconomic measure of the patient’s area of residence.[38] Identification of variables for inclusion in models was based on prior SIV uptake determinants research.[33] Pneumococcal vaccination history was determined by whether a patient had at least 1 record of a pneumococcal vaccination prior to December 31 of the given year. Clinical risk conditions were identified using Read codes as arranged by Primary Care Information Services, including: pregnancy, chronic renal disease, chronic heart disease, chronic respiratory disease, chronic liver disease, diabetes, immunosuppression, chronic neurological disease, and morbid obesity.[39] Time-varying patient characteristics were assessed prior to September 1 of the given year, based on the most recent record. Missing values for ethnicity, smoking status, and morbid obesity were coded as unknown. All codes used for data extraction from the CPRD database are described in detail in Loiacono et al.[33]

Model Training and Validation

We evaluated 3 types of logistic regression models: stepwise, least absolute shrinkage and selection operator (LASSO), and ridge. We opted to use logistic regression models due to their inherently transparent development process and ease of implementation, all while maintaining strong predictive abilities.[40] All variables were considered for modeling across both age strata except for pregnancy and patient IMD. Pregnancy was not used in models for patients aged 65 years or more. Patient IMD was only used in models for patients aged 65 years or more, given prior evidence of the association with SIV uptake specifically in this age stratum.[33] Stepwise models were trained using a backward stepwise algorithm that systematically reduced the model via minimization of the Akaike information criterion.[41] For the LASSO and ridge models, an optimal value of λ, or the penalty coefficient for the loss functions, was determined via 10-fold cross-validations using the 2013 training data, in which the optimal λ minimized model deviance (eFigure 2, eFigure 3, eFigure 4, and eFigure 5 in the Supplement).[42] Both the backward stepwise algorithm and LASSO autonomously performed feature or variable selection, which resulted in a reduced model, whereas ridge maintained the full model. Models were trained on the 2013 training data set and validated on the 2016 out-of-sample validation data set. Given the limited prior research to guide cutoff selection, an uninformative cutoff of 0.5 was specified, a priori, to classify a patient’s SIV uptake status based on their predicted probability (ie, a predicted probability ≤0.5 did not receive SIV; a predicted probability >0.5 received SIV). For models trained on the 2013 training data set, estimated coefficients were reported.

Statistical Analysis

To assess the out-of-sample predictive performance of the models, the following performance metrics were calculated: misclassification rate, Brier score, area under the receiver operating characteristic curve (AUC), sensitivity, and specificity. Misclassification rate (ie, proportion of patients with an incorrectly predicted SIV uptake status, in which 0 indicates no misclassification) was calculated as the sum of false positives and false negatives divided by the total number of predictions. Brier score (accuracy of a probabilistic prediction ranging from 0 to 1, in which 0 indicates perfect accuracy) was calculated as Σ(y2 / n, where y and p were the observed SIV uptake status and probabilistic prediction for patient i, respectively, among n total patients.[43] The AUC is a measure of the model’s discrimination power, ranging from 0 to 1, in which 1 indicates perfect prediction. Sensitivity and specificity are true-positive and true-negative rates, ranging from 0 to 1, in which 1 indicates perfect true-positive or true-negative prediction. Uncertainty measures for all performance metrics (95% CIs) were calculated using normal approximation methods (eAppendix 1 in the Supplement). Calibration plots were constructed based on the methods previously described in Gerds et al.[44] Receiver operating characteristic curves and kernel density plots were constructed and are available (eFigures 6-12 in the Supplement). Additionally, results from a sensitivity analysis, in which models were trained and validated through a within-sample 10-fold cross-validation using the 2013 training data set, are available in eTable in the Supplement. All analyses were performed in R version 3.4.3 (R Project for Statistical Computing). Further details on specific R packages are available in eAppendix 2 in the Supplement.

Results

Summary of Patient Characteristics

Training data consisted of 324 284 patients aged 18 to 64 years (mean [SD] age, 45 [13] years; 161 487 women [49.8%]; 102 133 categorized as white [31.5%]) and 186 426 patients aged 65 years or older (mean [SD] age, 77 [8] years; 96 169 women [51.6%]; 64 996 categorized as white [34.9%]) (Table 1). Validation data consisted of 35 210 patients aged 18 to 64 years (mean [SD] age, 42 [14] years; 17 296 women [49.1%]; 13 346 categorized as white [37.9%]) and 25 497 patients aged 65 years or older (mean [SD] age, 73 [8] years; 13 135 women [51.5%]; 9641 categorized as white [37.8%]). Across both age strata, SIV uptake (prior and current season) and lower pneumococcal were noted in the validation data relative to the training data. Among patients aged 18 to 64 years, SIV uptake was 35.9% (116 316 patients; 95% CI, 35.7%-36.0%) and 32.6% (11 493 patients; 95% CI, 32.1%-33.1%) for the training and validation data sets, respectively. Among patients aged 65 years or older, SIV uptake was 83.1% (154 872 patients; 95% CI, 82.9%-83.2%) and 76.1% (11 493 patients; 95% CI, 75.5%-76.6%) for the training and validation data sets, respectively. Pneumococcal vaccine uptake among patients aged 18 to 64 years was 23.4% (75 984 patients) and 16.7% (5882 patients) for training and validation sets, respectively; for patients aged 65 years or older, uptake was 85.1% (158 661 patients) for the training set and 74.9% (19 099 patients) for the validation set.

Table 1.

Summary of Characteristics of Patients in the Training and Validation Data Sets, Stratified by Patient Age

Characteristic	Patients by age group, No. (%)
	Training data (season 2013)		Validation data (season 2016)
	18-64 y (n = 324 284)	≥65 y (n = 186 426)	18-64 y (n = 35 210)	≥65 y (n = 25 497)
Men	162 797 (50.2)	90 257 (48.4)	17 914 (50.9)	12 362 (48.5)
Women	161 487 (49.8)	96 169 (51.6)	17 296 (49.1)	13 135 (51.5)
Age, mean (SD), y	45 (13)	77 (8)	42 (14)	73 (8)
Race/ethnicity
Asian	10 330 (3.2)	3051 (1.6)	1932 (5.5)	660 (2.6)
Black	6570 (2.0)	1540 (0.8)	1306 (3.7)	276 (1.1)
Mixed	67 229 (20.7)	37 464 (20.1)	8312 (23.6)	5975 (23.4)
Other	2943 (0.9)	1056 (0.6)	608 (1.7)	257 (1.0)
Unknown	135 079 (41.7)	78 319 (42.0)	9706 (27.6)	8688 (34.1)
White	102 133 (31.5)	64 996 (34.9)	13 346 (37.9)	9641 (37.8)
Patient IMD
1, Least deprived	NA	44 729 (24.0)	NA	7329 (28.7)
2	NA	43 936 (23.6)	NA	5288 (20.7)
3	NA	40 268 (21.6)	NA	5196 (20.4)
4	NA	32 341 (17.3)	NA	4267 (16.7)
5, Most deprived	NA	25 152 (13.5)	NA	3417 (13.4)
Smoking status
Never	173 300 (53.4)	92 422 (49.6)	18 831 (53.5)	12 411 (48.7)
Current	74 142 (22.9)	17 897 (9.6)	8396 (23.8)	3723 (14.6)
Former	73 721 (22.7)	75 480 (40.5)	7491 (21.3)	9166 (35.9)
Unknown	3121 (1.0)	627 (0.3)	492 (1.4)	197 (0.8)
Pneumococcal vaccine uptake	75 984 (23.4)	158 661 (85.1)	5882 (16.7)	19 099 (74.9)
Prior season SIV uptake	112 148 (34.6)	153 159 (82.2)	10 970 (31.2)	19 278 (75.6)
Current season SIV uptake, No. (% [95% CI])	116 316 (35.9 [35.7-36.0])	154 872 (83.1 [82.9-83.2])	11 493 (32.6 [32.1-33.1])	19 393 (76.1 [75.5-76.6])
Clinical risk conditions^a
1	272 932 (84.2)	113 597 (60.9)	30 683 (87.1)	17 854 (70.0)
≥2	51 352 (15.8)	72 829 (39.1)	4527 (12.9)	7643 (30.0)

Abbreviations: IMD, Indices of Multiple Deprivation; NA, not applicable; SIV, seasonal influenza vaccine.

Among patients aged 65 or more, age was not counted as a clinical risk condition.

Abbreviations: IMD, Indices of Multiple Deprivation; NA, not applicable; SIV, seasonal influenza vaccine. Among patients aged 65 or more, age was not counted as a clinical risk condition.

Model Training

Estimated coefficients for trained models are presented in Table 2. The most influential predictors across both age strata were prior season SIV uptake and pneumococcal vaccination status. Otherwise, among patients aged 18 to 64 years, pregnancy (estimated coefficient for stepwise model, 1.80; LASSO, 1.76; ridge, 1.24), diabetes (estimated coefficient for stepwise model, 0.88; LASSO, 0.85; ridge, 0.69), and chronic neurological disease (estimated coefficient for stepwise model, 0.68; LASSO, 0.64; ridge, 0.48) had the greatest contributions. Among patients aged 65 years or older, estimated coefficients were overall smaller and less variable. For stepwise and LASSO models, only a small number of predictors were autonomously excluded from the models among patients aged 18 to 64 years (eg, race [mixed, other, and white] and chronic liver disease) as well as among patients aged 65 years or older (eg, race [mixed, other, and white], chronic renal disease, chronic liver disease, and chronic neurological disease), indicating that most predictors contributed nontrivially to the model fit.

Table 2.

Estimated Coefficients for All Models Trained on the Entire 2013 Training Data Set, Stratified by Age

Variable	Estimated coefficients by age group
	18-64 y			≥65 y
	Stepwise	LASSO	Ridge	Stepwise	LASSO	Ridge
Intercept	–3.91	–3.91	–3.19	–0.60	–0.73	–0.81
Women	0.22	0.21	0.17	–0.09	–0.09	–0.08
Age	0.032	0.032	0.025	–0.015	–0.014	–0.008
Ethnicity
Asian	1 [Reference]	1 [Reference]	1 [Reference]	1 [Reference]	1 [Reference]	1 [Reference]
Black	–0.21	–0.10	–0.09	–0.22	–0.12	–0.12
Mixed	–0.076	NA^a	0.022	–0.073	NA^a	0.025
Other	–0.060	NA^a	0.024	–0.072	NA^a	–0.005
Unknown	–0.22	–0.14	–0.10	–0.18	–0.10	–0.08
White	–0.071	NA^a	0.034	–0.037	0.026	0.055
Patient IMD
1, Least deprived	NA^b	NA^b	NA^b	1 [Reference]	1 [Reference]	1 [Reference]
2	NA^b	NA^b	NA^b	–0.087	–0.047	–0.028
3	NA^b	NA^b	NA^b	–0.092	–0.053	–0.040
4	NA^b	NA^b	NA^b	–0.140	–0.095	–0.071
5, Most deprived	NA^b	NA^b	NA^b	–0.110	–0.067	–0.059
Smoking status
Never	1 [Reference]	1 [Reference]	1 [Reference]	1 [Reference]	1 [Reference]	1 [Reference]
Current	–0.12	–0.11	–0.10	–0.24	–0.23	–0.21
Former	0.061	0.054	0.086	0.073	0.068	0.077
Unknown	–0.74	–0.61	–0.50	–0.26	–0.22	–0.29
Pneumococcal vaccination	1.07	1.07	0.96	1.48	1.48	1.30
Prior season SIV uptake	3.06	3.05	2.43	3.60	3.60	2.99
Pregnant	1.80	1.76	1.24	NA^c	NA^c	NA^c
Chronic renal disease	0.32	0.26	0.25	NA^a	–0.01	–0.02
Chronic heart disease	0.42	0.39	0.30	0.11	0.09	0.08
Chronic respiratory disease	0.20	0.16	0.02	0.06	0.05	0.06
Chronic liver disease	NA^a	NA^a	0.008	NA^a	0.020	0.024
Diabetes	0.88	0.85	0.69	0.15	0.13	0.12
Immunosuppression	0.52	0.46	0.35	0.13	0.10	0.08
Chronic neurological disease	0.68	0.64	0.48	NA^a	NA^a	–0.01
Obesity
Morbid (BMI ≥40)	–0.27	–0.26	–0.27	–0.11	–0.10	–0.10
Unknown	–0.41	–0.40	–0.39	–0.14	–0.13	–0.20

Variable removed by autonomous feature selection (stepwise and LASSO only).

Patient IMD manually excluded from models among patients aged 18 to 64 years because of consistent insignificance.

Variable excluded from models for patients aged 65 years or more because of lack of pregnancy records.

Abbreviations: BMI, body mass index (calculated as weight in kilograms divided by height in meters squared); IMD, Indices of Multiple Deprivation; LASSO, least absolute shrinkage and selection operator; NA, not applicable; SIV, seasonal influenza vaccine. Variable removed by autonomous feature selection (stepwise and LASSO only). Patient IMD manually excluded from models among patients aged 18 to 64 years because of consistent insignificance. Variable excluded from models for patients aged 65 years or more because of lack of pregnancy records.

Model Validation

Within each age strata, the 3 model types performed similarly overall (Table 3). Misclassification rates were lowest for the stepwise and LASSO models among both patients aged 18 to 64 years (0.163, 95% CI, 0.159-0.166) and 65 years or older (0.120, 95% CI, 0.116-0.124). Brier scores were highest for the stepwise and LASSO models among patients aged 18 to 64 years (0.125, 95% CI, 0.122-0.127), whereas they were highest for the ridge model among those aged 65 years or older (0.0959, 95% CI, 0.0932-0.0985). All models across both age strata had comparable AUCs (0.877, 95% CI, 0.873-0.881). Sensitivity was highest for the LASSO model among those aged 18 to 64 years (0.721, 95% CI, 0.717-0.725) and the ridge model among those aged 65 years or older (0.936, 95% CI, 0.933-0.939). Specificity was highest for the ridge model among those aged 18 to 64 years (0.902, 95% CI, 0.899-0.905) and the LASSO model among those aged 65 years or older (0.755, 95% CI, 0.750-0.760).

Table 3.

Performance Measures for Out-of-Sample Validation of Stepwise, LASSO, and Ridge Models, Among Patients Aged 18 to 64 Years and 65 Years or Older

Model	Performance measure^a
Model	Misclassification rate	Brier score	AUC	Sensitivity	Specificity
Patients aged 18-64 y
Stepwise	0.163 (0.159-0.166)	0.125 (0.122-0.127)	0.877 (0.873-0.881)	0.720 (0.716-0.724)	0.897 (0.894-0.899)
LASSO	0.163 (0.159-0.166)	0.125 (0.122-0.127)	0.877 (0.873-0.881)	0.721 (0.717-0.725)	0.896 (0.893-0.899)
Ridge	0.164 (0.160-0.167)	0.124 (0.122-0.126)	0.876 (0.873-0.880)	0.705 (0.701-0.710)	0.902 (0.899-0.905)
Patients aged ≥65 y
Stepwise	0.120 (0.116-0.124)	0.0954 (0.093-0.098)	0.877 (0.873-0.881)	0.920 (0.916-0.923)	0.753 (0.747-0.758)
LASSO	0.120 (0.116-0.124)	0.0953 (0.093-0.098)	0.877 (0.873-0.881)	0.919 (0.916-0.922)	0.755 (0.750-0.760)
Ridge	0.125 (0.121-0.129)	0.0959 (0.093-0.099)	0.877 (0.873-0.881)	0.936 (0.933-0.939)	0.680 (0.675-0.686)

Abbreviations: AUC, area under the receiver operating character curve; LASSO, least absolute shrinkage and selection operator; SIV, seasonal influenza vaccine.

Misclassification rate measures the proportion of patients with an incorrectly predicted SIV uptake status (based upon a cutoff of 0.5). Brier score measures the accuracy of a probabilistic prediction, ranging from 0 to 1, where 0 indicates perfect accuracy. AUC measures the model’s discrimination power, ranging from 0 to 1, where 0.5 indicates an inability to appropriately classify a patient’s SIV uptake and 1 indicates perfect prediction. Sensitivity measures the true positive rate; specificity measures the true negative rate.

Abbreviations: AUC, area under the receiver operating character curve; LASSO, least absolute shrinkage and selection operator; SIV, seasonal influenza vaccine. Misclassification rate measures the proportion of patients with an incorrectly predicted SIV uptake status (based upon a cutoff of 0.5). Brier score measures the accuracy of a probabilistic prediction, ranging from 0 to 1, where 0 indicates perfect accuracy. AUC measures the model’s discrimination power, ranging from 0 to 1, where 0.5 indicates an inability to appropriately classify a patient’s SIV uptake and 1 indicates perfect prediction. Sensitivity measures the true positive rate; specificity measures the true negative rate. Predicted probabilities for patients aged 18 to 64 years were overall reliable and comparable among the 3 model types (Figure 2). For the intermediate range of predicted probabilities (eg, 0.25-0.50), prediction was biased toward underpredicting. The ridge model exhibited the highest degree of underprediction bias, particularly for probabilities of 0.50 or more. For patients aged 65 years or older, predicted probabilities were substantially more variable and overall biased toward overpredicting (Figure 2). Prediction bias of the 3 model types was similar for probabilities greater than 0.75. However, for probabilities between approximately 0.20 and 0.60, the ridge model’s bias toward underpredicting was notably greater than those of the stepwise or LASSO models.

Figure 2.

Calibration Plots for Out-of-Sample Validation of Stepwise, Least Absolute Shrinkage and Selection Operator (LASSO), and Ridge Models Among Patients

SIV indicates seasonal influenza vaccine.

Calibration Plots for Out-of-Sample Validation of Stepwise, Least Absolute Shrinkage and Selection Operator (LASSO), and Ridge Models Among Patients

SIV indicates seasonal influenza vaccine.

Discussion

This study found that a clinical prediction tool using patient-level characteristics identified via primary care records was able to estimate the probability of patient-level SIV uptake among at-risk adults with an overall high degree of accuracy. As for the specific methods tested, performance of the 3 types of models were comparable, exhibiting only minor discrepancies across the various performance metrics. Relative to the stepwise model, neither the LASSO nor ridge models performed substantially better. These findings suggest that the simplest approach, the stepwise regression, is well suited for this type of prediction model. Model performance did however differ notably between the 2 age strata. Among younger adults, estimated probabilities were closer to the observed SIV uptake probabilities, whereas estimated probabilities were substantially more variable among older adults. This may be explained in part by the imbalanced outcome variable (eg, approximately 70%-80% vaccine uptake among older adults) but may also be indicative of a lower overall degree of explanatory power for predictors in the models among patients aged 65 years or more. Considering this, either model may reliably be used to identify patients who are least likely to be vaccinated (eg, probability ≤25%), but the model among younger adults may be better suited for evaluating the patient’s specific probability of being vaccinated. Our work establishes a new possible foundation from which future clinical tools and interventions may be developed, in which the insights gained from these predictions may guide HCP resource allocation. By identifying patients with a low probability of being vaccinated, HCPs can deploy more resource-intensive outreach efforts, such as personalized phone calls. Additionally, the geographical distribution of patients with low probabilities to be vaccinated could be further investigated for instances of clustering, which would highlight areas with potential limitations to health care access. As for HCP-level interventions, this model could be integrated into EHR systems, to deliver HCP-directed reminders via software prompts at the time of patient consultation. With the model performing real-time calculations behind the scenes, EHR-based prompts could be customized to not only remind the HCP to vaccinate the patient, but also serve the HCP with a unique insight into the patient’s inherent likelihood of being vaccinated and helping them tailor the conversation accordingly. Given the time constraints placed upon HCPs during a consultation, this may help them more efficiently use their time.[45] For example, a patient can be characterized as unlikely, moderately likely, or highly likely to be vaccinated based on their predicted probability, and EHR prompts could be constructed accordingly. For patients flagged as highly likely, the prompt may advise the HCP to make presumptive and time-saving recommendation (eg, “today we will be giving you your flu shot”).[46] For patients flagged as unlikely, the prompt may include a list of the patient’s clinical risk conditions, suggesting that the HCP initiate a personalized dialogue with regard to the patient’s specific condition(s) and the importance of vaccination, which would be followed by a presumptive recommendation.[47] As for those patients flagged as moderately likely, the prompt may advise the HCP to make a presumptive recommendation, but also to be prepared to discuss the patient’s specific risk conditions if they exhibit hesitancy. This scenario demonstrates 1 way these models potentially can be used to autonomously deliver time-optimizing, patient-tailored guidance to HCPs.

Limitations

Despite the strong predictive performance of these models, this study has some limitations that must be acknowledged. First, we explicitly opted to treat missing values as unknown, rather than the more common approach of imputation or exclusion of observations with missing data.[48] However, in a clinical setting, it is likely that values for some predictors may be missing in the patient’s records, such as ethnicity, smoking status, or body mass index. By explicitly modeling these missing values, we preserve their explanatory power among patients that have a known value while retaining the model’s ability to predict among patients with missing values, thereby maximizing the model’s clinical utility. As for excluding patients aged 65 years or older without IMD data, this was decided to reflect real-world circumstances, in which a patient’s IMD measure can simply be obtained via cross-referencing their area of residence with an IMD lookup table. Second, while the use of CPRD’s database allowed us to capture a wide breadth of patients and perform true out-of-sample validation, a fundamental strength of this study, it also introduced its own respective disadvantages. Given that our model was trained on England-specific data, it may not be applicable to other regions. Nevertheless, the framework that we have implemented here can be replicated elsewhere using similar sources of data to train and validate country-specific prediction models. Doing so would ensure that the model’s predictive capabilities are best suited to the respective health system and its patients. Additionally, we noted a decrease in both pneumococcal vaccine uptake in the validation data sets, relative to the training data sets. As explained in Loiacono et al,[33] this observed drop in vaccine coverage over time may be explained by the increasing number of CPRD-enrolled practices dropping out of data collection over time, or perhaps even the increase in pharmacy-administered vaccines and the subsequent lack of appropriate data transfers to general practitioners. Nevertheless, within the framework of predictive modeling, these differences are of less concern, as similarity of baseline characteristics between training and validation data are not a prerequisite. Third, although the model was capable of accurate prediction, it does not explicitly explain why the patient was likely or unlikely to be vaccinated. This is an inherent limitation of using large primary care databases, given that we must model a patient’s vaccine uptake as a function of only the characteristics that we can confidently measure. Thus, this study does not account for other known determinants of SIV uptake, such as personal beliefs, opinions, and other social factors that could not be accurately measured in CPRD’s database.[49] Similarly, it is possible that the patient-level attitudes and behavior underlying vaccine uptake may vary over time, making comparison between different years difficult. Nevertheless, the inclusion of a lagged independent variable (ie, prior season SIV uptake) in the model allowed for flexibility in this regard, given that it effectively encompassed and adjusted for changes in patient attitudes based on their historical actions.

Conclusions

The results of this study suggest that primary care records can be leveraged to provide future insights into patient preventive health behaviors such as SIV uptake. Logistic regression models can predict SIV uptake with high accuracy, and the modeling approach implemented here can likely be adapted to other countries and databases. Future research is needed to assess the feasibility of implementing this model in a clinical setting as well as to evaluate its potential effectiveness with regard to improving SIV uptake. Similarly, future studies may wish to investigate the performance of additional methods, such as decision tree learning, to identify a smaller subset of critical predictors that may be used as a decision tool, thus simplifying the model’s implementation.

34 in total

1. Factors influencing influenza vaccination uptake in an elderly, community-based sample.

Authors: Victoria E Burns; Christopher Ring; Douglas Carroll
Journal: Vaccine Date: 2005-05-20 Impact factor: 3.641

2. To act or not to act: responses to electronic health record prompts by family medicine clinicians.

Authors: Philip Zazove; Michael McKee; Lauren Schleicher; Lee Green; Paul Kileny; Mary Rapai; Elie Mulhem
Journal: J Am Med Inform Assoc Date: 2017-03-01 Impact factor: 4.497

Review 3. Vaccine hesitancy and healthcare providers.

Authors: Pauline Paterson; François Meurice; Lawrence R Stanberry; Steffen Glismann; Susan L Rosenthal; Heidi J Larson
Journal: Vaccine Date: 2016-10-31 Impact factor: 3.641

Review 4. Developing prediction models for clinical use using logistic regression: an overview.

Authors: Maren E Shipe; Stephen A Deppen; Farhood Farjah; Eric L Grogan
Journal: J Thorac Dis Date: 2019-03 Impact factor: 2.895

5. Primary Care Provider-Delivered Smoking Cessation Interventions and Smoking Cessation Among Participants in the National Lung Screening Trial.

Authors: Elyse R Park; Ilana F Gareen; Sandra Japuntich; Inga Lennes; Kelly Hyland; Sarah DeMello; JoRean D Sicks; Nancy A Rigotti
Journal: JAMA Intern Med Date: 2015-09 Impact factor: 21.873

Review 6. Provider-parent Communication When Discussing Vaccines: A Systematic Review.

Authors: John T Connors; Kate L Slotwinski; Eric A Hodges
Journal: J Pediatr Nurs Date: 2016-11-15 Impact factor: 2.145

7. Transparent Reporting of a multivariable prediction model for Individual Prognosis or Diagnosis (TRIPOD): the TRIPOD statement.

Authors: Gary S Collins; Johannes B Reitsma; Douglas G Altman; Karel G M Moons
Journal: Ann Intern Med Date: 2015-01-06 Impact factor: 25.391

Review 8. Patient reminder and patient recall systems to improve immunization rates.

Authors: Julie C Jacobson Vann; Peter Szilagyi
Journal: Cochrane Database Syst Rev Date: 2005-07-20

9. Cross-sectional survey of older peoples' views related to influenza vaccine uptake.

Authors: Punam Mangtani; Elizabeth Breeze; Sue Stirling; Smita Hanciles; Sari Kovats; Astrid Fletcher
Journal: BMC Public Health Date: 2006-10-11 Impact factor: 3.295

10. Identifying social factors amongst older individuals in linked electronic health records: An assessment in a population based study.

Authors: Anu Jain; Albert J van Hoek; Jemma L Walker; Rohini Mathur; Liam Smeeth; Sara L Thomas
Journal: PLoS One Date: 2017-11-30 Impact factor: 3.240