| Literature DB >> 35893436 |
Donna M Wolk1,2, Alon Lanyado3, Ann Marie Tice1, Maheen Shermohammed4, Yaron Kinar3, Amir Goren4, Christopher F Chabris4, Michelle N Meyer4, Avi Shoshan3, Vida Abedi5.
Abstract
Influenza vaccinations are recommended for high-risk individuals, but few population-based strategies exist to identify individual risks. Patient-level data from unvaccinated individuals, stratified into retrospective cases (n = 111,022) and controls (n = 2,207,714), informed a machine learning model designed to create an influenza risk score; the model was called the Geisinger Flu-Complications Flag (GFlu-CxFlag). The flag was created and validated on a cohort of 604,389 unique individuals. Risk scores were generated for influenza cases; the complication rate for individuals without influenza was estimated to adjust for unrelated complications. Shapley values were used to examine the model's correctness and demonstrate its dependence on different features. Bias was assessed for race and sex. Inverse propensity weighting was used in the derivation stage to correct for biases. The GFlu-CxFlag model was compared to the pre-existing Medial EarlySign Flu Algomarker and existing risk guidelines that describe high-risk patients who would benefit from influenza vaccination. The GFlu-CxFlag outperformed other traditional risk-based models; the area under curve (AUC) was 0.786 [0.783-0.789], compared with 0.694 [0.690-0.698] (p-value < 0.00001). The presence of acute and chronic respiratory diseases, age, and previous emergency department visits contributed most to the GFlu-CxFlag model's prediction. When higher numerical scores were assigned to more severe complications, the GFlu-CxFlag AUC increased to 0.828 [0.823-0.833], with excellent discrimination in the final model used to perform the risk stratification of the population. The GFlu-CxFlag can better identify high-risk individuals than existing models based on vaccination guidelines, thus creating a population-based risk stratification for individual risk assessment and deployment in vaccine hesitancy reduction programs in our health system.Entities:
Keywords: Clinical Lab 2.0; EHR; RT-PCR; decision support; electronic medical records; influenza; machine learning; precision medicine; risk stratification; vaccine
Year: 2022 PMID: 35893436 PMCID: PMC9332321 DOI: 10.3390/jcm11154342
Source DB: PubMed Journal: J Clin Med ISSN: 2077-0383 Impact factor: 4.964
Figure 1Inclusion and exclusion criteria with definitions of cases and controls used during data pre-processing: Description of cases, controls, and exclusion criteria during data pre-processing, i.e., cohort definition of influenza-related complications for unvaccinated individuals within a given influenza season.
Figure 2Tiers of confidence for influenza diagnosis and severity levels for influenza complications: An individual could be included in the cohort several times for different influenza seasons but was unique within an influenza season. The same individual could be categorized as a case in one season and as a control in another. (A) represents the exlusion criteria and cohorts. (B) represents the influenza registries.
Figure 3Equations Used. Equations (1)–(4) were used for defining the probability for complications. Equation (5), which describes the weighting process used in the model training stage, where Xi is the data vector for sample i and Wi is the weight for sample training i.
Comparison of AUC by different analysis methods and outcome definitions for different models.
| Age, Sex Model | WHO-inspired Age, Sex, and Comorbidities Model | Full GFlu-CxFlag Model | |
|---|---|---|---|
| No IPW, no over estimation analysis | 0.588 [0.583–0.593] | 0.694 [0.690–0.698] | 0.786 [0.783–0.789] |
| Only IPW | 0.597 [0.592–0.602] | 0.715 [0.711–0.720] | 0.789 [0.785–0.793] |
| IPW and over estimation analysis | 0.587 [0.580–0.593] | 0.693 [0.687–0.699] | 0.761 [0.757–0.768] |
| Flu diagnosis by RT- PCR & all complications * | 0.632 [0.615–0.647] | 0.704 [0.688–0.720] | 0.797 [0.785–0.809] |
| Severe complications (cohorts 1 and 2) * | 0.610 [0.602–0.616] | 0.709 [0.703–0.716] | 0.828 [0.823–0.833] |
* Full model performance is based on a model trained for influenza probability categories 1–3 and severity/complication tiers 1–3 (with no retraining); it is a single model. Simplest model = Age and sex only; CDC/WHO Model = Age, Sex, and some Comorbidities; GFlu-CxFlag Full model; IPW = Inverse probability weighting; RT-PCR = reverse transcriptase polymerase chain reaction.
XGBoost model discrimination and performance comparison table.
| Age, Sex Model | WHO Inspired (Age, Sex and Comorbidities Model) | Full GFlu-CxFlag Model | |
|---|---|---|---|
| AUC | 0.59 [0.58–0.59] | 0.69 [0.690–0.70] | 0.79 [0.78–0.79] |
| N, Controls | 442,329 [441,033–443,687] | 442,329 [441,022–443,739] | 442,329 [440,907–443,499] |
| N, Cases | 22,116 [21,747–22,500] | 22,116 [21,775–22,428] | 22,116 [21,773–22,473] |
| PPV@SENS_10 | 9.12 [8.62–9.63] | 15.82 [15.27–16.42] | 43.06 [41.23–44.93] |
| PPV@SENS_20 | 7.52 [7.27–7.80] | 14.80 [14.21–15.40] | 33.53 [32.40–34.89] |
| PPV@SENS_30 | 6.93 [6.73–7.16] | 13.27 [12.90–13.70] | 26.67 [25.67–27.73] |
| PPV@SENS_40 | 6.46 [6.30–6.64] | 12.19 [11.78–12.54] | 21.55 [20.88–22.25] |
| PPV@SENS_50 | 6.10 [5.97–6.26] | 10.58 [10.26–10.89] | 17.60 [17.03–18.21] |
| PPV@SENS_60 | 5.91 [5.79–6.04] | 9.10 [8.86–9.37] | 14.31 [13.90–14.74] |
| PPV@SENS_70 | 5.69 [5.58–5.81] | 7.79 [7.61–8.01] | 11.47 [11.13–11.84] |
| SENS@FPR_01 | 3.24 [3.01–3.49] | 4.31 [4.04–4.61] | 13.12 [12.55–13.71] |
| SENS@FPR_05 | 10.02 [9.59–10.49] | 17.76 [17.14–18.38] | 33.10 [32.44–33.82] |
| SENS@FPR_10 | 16.83 [16.28–17.41] | 30.54 [29.82–31.36] | 46.69 [46.01–47.42] |
| SENS@FPR_15 | 23.46 [22.84–24.14] | 40.94 [40.21–41.67] | 55.66 [54.97–56.37] |
| SENS@FPR_20 | 29.83 [29.13–30.61] | 48.58 [47.85–49.34] | 62.73 [62.06–63.45] |
| SENS@FPR_30 | 41.13 [40.30–41.97] | 60.06 [59.39–60.78] | 72.62 [71.99–73.29] |
| SENS@FPR_40 | 51.74 [50.95–52.56] | 68.89 [68.22–69.57] | 79.79 [79.21–80.38] |
| SENS@FPR_50 | 62.33 [61.46–63.04] | 75.97 [75.33–76.62] | 85.43 [84.95–85.94] |
Simplest model = Age and sex only; CDC/WHO Model = Age, sex, and some comorbidities; GFlu-CxFlag model (Full Model) AUC = area under the curve; N= sample size; PPV = Positive predictive value; SENS = sensitivity of the model FPR = False positive rate.
Comparison of AUC for different subpopulations.
| Age, Sex Model | WHO inspired (Age, Sex and Comorbidities Model) | Full GFlu-CxFlag Model | |
|---|---|---|---|
| All | 0.588 [0.583–0.593] | 0.694 [0.690–0.698] | 0.786 [0.783–0.789] |
| Age 18–65 years with chronic illness | 0.553 [0.546–0.559] | 0.671 [0.664–0.676] | 0.775 [0.770–0.781] |
| Age 0–18 years. | 0.617 [0.608–0.626] | 0.705 [0.698–0.712] | 0.792 [0.786–0.799] |
| Age 0–5 years. | 0.594 [0.582–0.605] | 0.680 [0.670–0.690] | 0.777 [0.768–0.787] |
| Age > 18 years. | 0.574 [0.568–0.579] | 0.690 [0.684–0.695] | 0.783 [0.779–0.787] |
| Age > 5 years. | 0.573 [0.568–0.578] | 0.689 [0.684–0.693] | 0.783 [0.779–0.787] |
| Age 5–18 years. | 0.563 [0.553–0.575] | 0.686 [0.676–0.695] | 0.782 [0.774–0.790] |
| Age > 65 years. | 0.521 [0.510–0.532] | 0.670 [0.660–0.680] | 0.812 [0.805–0.819] |
| >18 year. with comorbidity or >65 year. | 0.537 [0.531–0.542] | 0.669 [0.663–0.674] | 0.786 [0.781–0.791] |
Simplest model = Age and sex only; CDC/WHO Model = Age, Sex, and some Comorbidities; GFlu-CxFlag model (Full Model).
Features identified in the Geisinger and MES risk-stratification models compared with traditional WHO and CDC risk factors.
| Model | GFlu-Cx Flag | MES Flu | WHO | CDC | |
|---|---|---|---|---|---|
| Laboratory Results | % Lymphocytes | ||||
| Absolute eosinophils | |||||
| Reverse-transcriptase PCR confirmation of influenza | |||||
| Vital Signs | Respiratory rate | ||||
| SpO2 (peripheral capillary oxygen saturation) | |||||
| Temperature (Fahrenheit) | |||||
| Medical History | Respiratory disease, not limited to lung (acute or chronic) |
|
| ||
| Alimentary or metabolic diagnosis codes | |||||
| Influenza complications (pneumonia, complications, death) | |||||
| Incidence of influenza-like illness documented | |||||
| Antibiotic prescriptions/medication for sensory organs (ear or eye) | |||||
| Vaccinations | |||||
| Demographics | Age | ||||
| Sex | |||||
| Socioeconomic status (Medicare as a surrogate) | |||||
| Weight and/or BMI |
| ||||
| Smoking history | |||||
| Membership in Geisinger cohort | |||||
| Healthcare Interactions | Number of Emergency Dept. (ED) visits # | ||||
| Number of hospital admissions | |||||
| Other unique features for CDC and WHO as listed & | |||||
|
| |||||
| Light-gray indicates unique or shared presence features from the WHO and/or CDC guidelines for populations at high-risk of influenza | |||||
| Dark gray-shade indicates the top 5 features of importance in Geisinger Flu Complications Flag (Gflu-Cx Flag) | |||||
| Gray-shade describes other unique features in Geisinger Flu Complications Flag (Gflu-Cx Flag) | |||||
| Black-shade = Laboratory testing with RT-PCR, was unique but not a model feature because it was a classifier to the influenza diagnosis | |||||
# longitudinal trends were used as a measure of the variable; RT-PCR is not a model feature, because it is a classifier to the influenza diagnosis; its importance is underscored by the model’s prediction when RT-PCR is used to define illness; & WHO unique features = Lung, heart, kidney, neurologic, liver, and blood disease, plus immunocompromised status, stroke, pregnancy, and work in healthcare and CDC unique features = The same as WHO features, plus aspirin therapy, long-term care, and race. CDC risks do not include healthcare workers.
Figure 4Influenza complications for the model’s global feature importance (Geisinger Flu-Cx Flag). Refer to supplementary information for description of codes used in Figure 4.
Figure 5Feature contribution for various features in the full Geisinger Flu-Cx Flag model. The x-axis represents the feature value, and the yellow lines represent the mean outcome over the training set conditioned on the feature value. The blue line represents the feature’s mean Shapley value.
Post-hoc analysis for bias assessment.
| Group (% of Total Population) | Sensitivity [95% CI] | Effect of Matching | Aged 65+ “Model” Sensitivity [95% CI] | |
|---|---|---|---|---|
|
| ||||
| White (92.6%) | 43.1 [42.4–43.9] | - | 16.68 [16.1–17.4] | |
| Black (5.3%) | 38.8 [35.3–42.3] * | Mitigated after matching for age | 3.86 [2.6–5.2] * | |
| Asians (1%) | 27.6 [16.7–39.8] * | Maintained after matching for age | 9.27 [3.0–16.7] | |
|
| ||||
| Hispanic / LA (5.4%) | 46.5 [43.3–50.0] | - | 3.22 [2.1–4.5] | |
| Non-Hispanic / LA (94%) | 42.5 [41.7–43.3] * | Mitigated after matching for age | 16.59 [15.9–17.2] * | |
|
| ||||
| Medicaid 2 (36.1%) | 49.5 [48.4–50.5] | - | 2.79 [2.4–3.2] | |
| Medicare (14.5%) | 52.0 [50.4–53.7] * | Reversed after matching for age | 70.59 [68.8–72.4] * | |
| Commercial (48.5%) | 28.1 [27.0–29.3] * | Maintained after matching for age | 4.32 [3.7–5.0] * | |
|
| ||||
| Female (53.3%) | 45.0 [44.2–45.9] | - | 16.63 [15.8–17.5] | |
| Male (46.7%) | 39.7 [38.5–41.0] * | Mitigated after matching for age and number of visits | 15.12 [14.2–16.1] | |
* Significantly different at p < 0.05 compared to reference category, always listed first; CI = Confidence Interval; LA = Latin American 1 Other race and insurer categories exist but each compose less than 1% of the population; 2 Patients enrolled in Medicaid at any point in the last 11 years were placed in this category, even if they later shifted insurance (e.g., aged into Medicare).