Literature DB >> 33032935

Predicting Coronavirus Disease 2019 Infection Risk and Related Risk Drivers in Nursing Homes: A Machine Learning Approach.

Christopher L F Sun1, Eugenio Zuccarelli2, El Ghali A Zerhouni2, Jason Lee3, James Muller4, Karen M Scott5, Alida M Lujan5, Retsef Levi6.   

Abstract

OBJECTIVE: Inform coronavirus disease 2019 (COVID-19) infection prevention measures by identifying and assessing risk and possible vectors of infection in nursing homes (NHs) using a machine-learning approach.
DESIGN: This retrospective cohort study used a gradient boosting algorithm to evaluate risk of COVID-19 infection (ie, presence of at least 1 confirmed COVID-19 resident) in NHs. SETTING AND PARTICIPANTS: The model was trained on outcomes from 1146 NHs in Massachusetts, Georgia, and New Jersey, reporting COVID-19 case data on April 20, 2020. Risk indices generated from the model using data from May 4 were prospectively validated against outcomes reported on May 11 from 1021 NHs in California.
METHODS: Model features, pertaining to facility and community characteristics, were obtained from a self-constructed dataset based on multiple public and private sources. The model was assessed via out-of-sample area under the receiver operating characteristic curve (AUC), sensitivity, and specificity in the training (via 10-fold cross-validation) and validation datasets.
RESULTS: The mean AUC, sensitivity, and specificity of the model over 10-fold cross-validation were 0.729 [95% confidence interval (CI) 0.690‒0.767], 0.670 (95% CI 0.477‒0.862), and 0.611 (95% CI 0.412‒0.809), respectively. Prospective out-of-sample validation yielded similar performance measures (AUC 0.721; sensitivity 0.622; specificity 0.713). The strongest predictors of COVID-19 infection were identified as the NH's county's infection rate and the number of separate units in the NH; other predictors included the county's population density, historical Centers of Medicare and Medicaid Services cited health deficiencies, and the NH's resident density (in persons per 1000 square feet). In addition, the NH's historical percentage of non-Hispanic white residents was identified as a protective factor. CONCLUSIONS AND IMPLICATIONS: A machine-learning model can help quantify and predict NH infection risk. The identified risk factors support the early identification and management of presymptomatic and asymptomatic individuals (eg, staff) entering the NH from the surrounding community and the development of financially sustainable staff testing initiatives in preventing COVID-19 infection.
Copyright © 2020 AMDA – The Society for Post-Acute and Long-Term Care Medicine. Published by Elsevier Inc. All rights reserved.

Entities:  

Keywords:  COVID-19; Nursing homes; health policy; infection prevention; long-term care facility; machine-learning; risk modeling

Mesh:

Year:  2020        PMID: 33032935      PMCID: PMC7451194          DOI: 10.1016/j.jamda.2020.08.030

Source DB:  PubMed          Journal:  J Am Med Dir Assoc        ISSN: 1525-8610            Impact factor:   4.669


Long-term care facilities (LTCFs) have emerged as critical epicenters of coronavirus disease 2019 (COVID-19) outbreaks and are associated with approximately 1 in 10 COVID-19 cases and 1 in 3 COVID-19 fatalities in the United States. Among LTCFs, nursing homes (NHs) have been shown to have high-risk populations that are particularly vulnerable to COVID-19 infection and poor subsequent outcomes. , Rapid COVID-19 transmission within NHs stress the need for proactive measures preventing infection and facility spread.4, 5, 6, 7 However, developing effective policies and interventions is challenging because of a lack of both accurate data sources as well as data-driven analyses regarding infection vectors. This study describes the development and implications of a machine-learning model, trained on NH COVID-19 outcome data from multiple US states, to assess risk of COVID-19 infection and identify associated risk factors and possible infection introduction mechanisms.

Methods

Study Setting and Population

This study included public NH COVID-19 facility-level case data reported by state and local departments of health across the United States, which were collected to create a binary outcome variable for whether there was at least 1 resident infection in the facility. The model was trained on COVID-19 outcomes reported on April 20, 2020 from 1146 NHs in Massachusetts, Georgia, and New Jersey, and prospectively validated out-of-sample against outcomes reported on May 11, 2020 from 1021 NHs in California. These states had relatively comprehensive reporting and testing capacity at the time of outcome collection (see Supplementary Material, Data Sources and Model Inputs section for details).

Data Sources

Predictive features were created from a self-constructed dataset, integrating public and private sources from organizations including the Centers of Medicare and Medicaid Services (CMS), Long-Term Care Focus, and National Investment Center for Seniors Housing and Care, covering 15,300 federally certified US NHs and their surrounding communities. The dataset includes information on each NH's physical infrastructure, number of units, and historical financial, managerial, resident, staffing, and quality-of-care characteristics. In addition, the dataset includes NH community information (eg, COVID-19 infection rates on the day of NH reporting, social distancing, and population characteristics and mobility measures). See the Supplementary Table 1 for details on the dataset and model features.
Supplementary Table 1

Overview of the Data Inputs of the NH Risk Model

Data CategoryVariable DescriptionData SourceNumber of NH in Training Set without Missing Feature Values (n = 1146), n (%)
Facility's community characteristicsCumulative number of positive COVID-19 cases per capita in the NH's county on the day of NH COVID-19 case reportingNYT COVID Tracker1133 (98.9)
Estimated poverty score of the NH's county (Based on: Household income)CDC1133 (98.9)
Overall comorbidity score of the NH's county (Based on: Obesity, diabetes, hypertension, cardiovascular characteristics from 2018)CDC1133 (98.9)
Overall percentile ranking of social vulnerability index of the NH's county (Based on: Socioeconomic, household composition, minority status/language, and housing type/transportation characteristics)CDC1133 (98.9)
Percent of facility's county who are non-Hispanic WhiteUS Census1133 (98.9)
Percentage of family households in the NH's countyClaritas1133 (98.9)
Population density of the NH's county (population per square mile)Claritas1133 (98.9)
Community social distancing and population mobility characteristicsProportion of Safe Graph tracked devices traveling less than 8000 meters per day out of all tracked devices in the facility's Zip code of the week of NH COVID-19 case reportingSafe Graph1096 (95.6)
Proportion of Safe Graph tracked devices traveling more than 50,000 meters per day out of all tracked devices in the facility's zip code of the week of NH COVID-19 case reportingSafe Graph1096 (95.6)
Proportion of Safe Graph tracked devices exhibiting full-time employment behavior in the zip code of the week of NH COVID-19 case reportingSafe Graph1101 (96.1)
Proportion of Safe Graph tracked devices traveling less than 8000 meters per day out of all tracked devices in the facility's county of the week of NH COVID-19 case reportingSafe Graph1133 (98.9)
Proportion of Safe Graph tracked devices traveling more than 50,000 meters per day out of all tracked devices in the facility's county of the week of NH COVID-19 case reportingSafe Graph1133 (98.9)
Proportion of Safe Graph tracked devices exhibiting full-time employment behavior in the county of the week of NH COVID-19 case reportingSafe Graph1133 (98.9)
Inflow of Safe Graph tracked devices to the facility's county of the week of NH COVID-19 case reportingSafe Graph1131 (98.7)
Outflow of Safe Graph tracked devices from the facility's county of the week of NH COVID-19 case reportingSafe Graph1132 (98.8)
Percentage of county's population taking public transportation to workClaritas1133 (98.9)
Facility characteristicsAge of the nursing home in yearsNIC750 (65.4)
Standardized hourly cost per clinical worker excluding certified nursing assistants (per patient per day),MCDA1090 (95.1)
Standardized hourly cost per clinical worker including certified nursing assistant (per patient per day),MCDA1090 (95.1)
Walk Score measures walkability on a scale from 0‒100 based on walking routes to destinations such as grocery stores, schools, parks, restaurants, and retailWalk Score760 (66.3)
Number of clinical workers per 1000 square feet in the facility prior to the COVID-19 outbreakMCDA999 (87.2)
Number of health deficiencies at the facility as defined by the Centers for Medicare and Medicaid Service since 2014CMS977 (85.3)
Number of patients per 1000 square feet in the facility prior to the COVID-19 outbreakMCDA999 (87.2)
Overall infection control process and performance indexMCDA974 (85.0)
Rate of influenza vaccination for long stay residentsMCDA1065 (92.9)
Rate of influenza vaccination for short stay residentsMCDA1078 (94.1)
Rate of pneumococcal vaccination for long stay residentsMCDA1065 (92.9)
Rate of pneumococcal vaccination for short stay residentsMCDA1080 (94.2)
Rate of rehospitalizations of residents due to infectionMCDA1091 (95.2)
Index based on CMS health inspection citations related to infection control measures (citations weighted according to scope, severity, and recency)MCDA849 (74.1)
Index based on CMS health inspection citations related to laboratory processes (citations weighted according to scope, severity, and recency)MCDA739 (64.5)
Index based on CMS health inspection citations related to managerial processes (citations weighted according to scope, severity, and recency)MCDA966 (84.3)
Index based on CMS health inspection citations related to physical environment (citations weighted according to scope, severity, and recency)MCDA739 (64.5)
Total number of beds at the facilityCMS, NIC1146 (100.0)
Total number of beds for Nursing CareNIC768 (67.0)
Total number of units for assisted livingNIC768 (67.0)
Total number of units for independent livingNIC768 (67.0)
Total number of units for memory careNIC768 (67.0)
Percent of facility residents who were non-Hispanic white prior to the COVID-19 outbreakLTCFocus961 (83.9)
Percent of facility residents whose primary support was Medicare prior to the COVID-19 outbreakLTCFocus961 (83.9)
Percent of facility residents whose primary support was Medicaid prior to the COVID-19 outbreakLTCFocus961 (83.9)
Facility outcomesPresence of at least one resident COVID-19 caseState Departments of health1146 (100.0)

CDC, Centers for Disease Control and Prevention; LTCFocus, Long-term Care Focus; MCDA, Muller Consulting and Data Analytics; NIC, National Investment Center for Seniors Housing and Care; NYT, New York Times.

Clinical worker defined as registered nurses, licensed practical nurses, certified nursing assistants, nursing aides, medical aides/technicians, nursing home administrators, medical directors, physicians, physician assistants, nurse practitioners, clinical nurse specialists, pharmacists, dieticians, feeding assistants, occupational therapists, occupational therapy assistants, occupational therapy aides, physical therapists, physical therapist assistants, physical therapist aides, respiratory therapists, respiratory therapy technicians, speech/language pathologists, therapeutic recreation specialists, qualified activities professionals, other activities staff, qualified social workers, other social workers, mental health service workers.

Standardized costs based on average hours worked multiplied by Bureau of Labor Statistics national wage rate estimates for respective occupations in skilled nursing facilities.

Model Development

The model used a tree-based gradient boosting algorithm, predicting a binary classification outcome by facility that signifies at least 1 resident COVID-19 infection. In other words, the model generates a risk index associated with the likelihood of COVID-19 infection at the NH. Prior to training, highly correlated predictors with a variance inflation factor greater than 10 were removed. Then, the remaining predictors were assessed for predictive stability by determining the number of times each predictor was selected by the model following L1 regularization during 10-fold cross-validation (CV). Features selected at least 7 times out of the 10 folds were considered stable predictors. The model inputs were restricted to the identified stable predictors and hyperparameter tuning and out-of-sample performance assessments were conducted via 10-fold CV. The mean out-of-sample area under the receiver operating characteristic curve (AUC), sensitivity, and specificity, along with the associated 95% confidence intervals (CIs), were calculated over the 10-folds. The final tuned model was fit over the entire training dataset and was used to calculate the stable predictors’ feature importance and predict updated risk indices for model validation. A logistic regression model and 2-layer feedforward neural network were also developed using the identified stable predictors and assessed via the same methods, serving as benchmark predictive models for comparison. Although a neural network model is not easily interpretable, the model was used as a benchmark for comparison due its strong predictive capabilities. Model development and feature importance evaluation is further described in the Supplementary Material (Model Development and Feature Importance Evaluation section).

Model Validation

To assess the model's prognostic ability and generalizability, risk indices from May 4 were prospectively validated against NH outcomes reported on May 11 from California. The predicted risk indices were categorized as binary outcomes using an optimal threshold value (selected as the value that minimizes the difference between the model's sensitivity (true positive rate) and specificity (true negative rate) across the entire training dataset). Risk indices above the threshold value were predicted as infected NHs. The differences in the predictive characteristics between the training and validation datasets were compared using the Mann–Whitney U test for continuous variables, and the χ2 test for binary variables. In addition, reported outcomes from 7660 LTCFs on May 11 were used to calculate the Pearson correlation coefficient between each state's median NH risk index (ie, the median risk index, from May 4, across all NHs that are in our dataset in the state) and each state's LTCF related COVID-19 infection and death rates. The benchmark logistic regression and neural network models were also validated in the same manner, and the performance of the 3 models were compared. Model validation is further detailed in the Supplementary Material (Prospective Out-of-Sample Model Validation section).

Results

Table 1 summarizes and compares the characteristics the NHs used to train and validate the model. The training set included 1146 NHs that reported COVID-19 cases on April 20 (60.3% reported at least one resident COVID-19 case). The validation set included 1021 NHs (20.5% reported at least 1 resident COVID-19 case) reporting on May 11. The NH characteristics in the validation set was significantly different from the training set, indicating the validation set is suitable to rigorously assess the model's out-of-sample predictive performance and generalizability to unseen data.
Table 1

The Summary and Comparison of the Predictive Characteristics of the NH in the Model's Training and Validation Sets

Identified Predictive FeaturesTraining Set (n = 1146)Prospective Validation Set (n = 1021)P Value
Cumulative number of positive COVID-19 cases per capita in the facility's county on the day of NH COVID-19 case reporting (confirmed cases per 100,000 people), median (IQR)478.1 (182.0‒730.7)112.5 (79.5‒244.7)<.001
Total number of beds at the facility, median (IQR)122 (94‒167)99 (74‒140)<.001
Population density of the facility's county (population per square mile), median (IQR)1027.0 (420.6‒2033.7)1613.3 (343.8‒2508.6)<.05
Number of health deficiencies at the facility as defined by the CMS, median (IQR)12 (7‒19)35 (23‒50)<.001
Percent of NH residents who were non-Hispanic white prior to the COVID-19 outbreak, median (IQR)83.6 (62.0‒94.2)59.6 (42.9‒78.5)<.001
Number of patients per 1000 square feet in the facility prior to the COVID-19 outbreak, median (IQR)2.95 (2.15‒3.77)4.83 (3.75‒5.68)<.001
Number of clinical workers per 1000 square feet in the facility prior to the COVID-19 outbreak, median (IQR)1.06 (0.78‒1.33)1.90 (1.52‒2.32)<.001
Positive COVID-19 resident case in NH, No. (%)722 (63.0)209 (20.5)<.001

IQR, interquartile range.

Significant differences in the characteristics between the 2 sets were found. A strong predictive performance across a validation set population that is significantly different from its training set population suggests the model will be generalizable to different populations. P values from Mann–Whitney U and χ2 tests, as appropriate, comparing the differences in the characteristics are shown.

NHs from Massachusetts, Georgia, and New Jersey with outcomes reported on April 20, 2020.

NHs from California with outcomes reported on May 11, 2020.

The Summary and Comparison of the Predictive Characteristics of the NH in the Model's Training and Validation Sets IQR, interquartile range. Significant differences in the characteristics between the 2 sets were found. A strong predictive performance across a validation set population that is significantly different from its training set population suggests the model will be generalizable to different populations. P values from Mann–Whitney U and χ2 tests, as appropriate, comparing the differences in the characteristics are shown. NHs from Massachusetts, Georgia, and New Jersey with outcomes reported on April 20, 2020. NHs from California with outcomes reported on May 11, 2020. Overall, 7 out of 41 inputted features were identified as predictors of infection (Figure 1 ). The NH's county's infection rate and number of units were the strongest predictors of risk and positively associated with increased infection risk. The other predictors of infection include the NH's county's population density, CMS cited health deficiencies, and resident and staff densities, which were positively associated with infection risk, as well as the percent of non-Hispanic White residents, which was negatively associated with infection risk (Supplementary Figure 1). The gradient boosting model's mean out-of-sample AUC, sensitivity, and specificity from 10-fold CV over the training set were 0.729 (95% CI 0.690‒0.767), 0.670 (95% CI 0.477‒0.862), and 0.611 (95% CI 0.412‒0.809), respectively.
Fig. 1

Feature importance and impact on risk of COVID-19 infection in NHs from the gradient boosting model. The NH's county's COVID-19 infection rate and size had the largest impact on infection risk (features are in descending order from highest to lowest importance). In the figure, each dot represents a NH that the model has been trained on. For each NH, a high feature value corresponds to the color red, and a low feature value corresponds to the color blue. The horizontal axis shows whether the effect of the feature value is associated with a higher or lower risk of NG infection.

Supplementary Fig. 1

Predictive feature's impact , shown in subfigures (A‒G), on estimated NH risk of COVID-19 infection. The median (blue line), 25th and 75th percentiles (gray band), and 5th and 95th percentiles (orange band) of the infection risk levels generated by the trained model are shown across 15,300 NHs in the United States.

Feature importance and impact on risk of COVID-19 infection in NHs from the gradient boosting model. The NH's county's COVID-19 infection rate and size had the largest impact on infection risk (features are in descending order from highest to lowest importance). In the figure, each dot represents a NH that the model has been trained on. For each NH, a high feature value corresponds to the color red, and a low feature value corresponds to the color blue. The horizontal axis shows whether the effect of the feature value is associated with a higher or lower risk of NG infection. The model had an AUC of 0.721 (sensitivity 0.622; specificity 0.713) when prospectively compared against California NHs with reported outcomes from May 11. The optimal threshold value used to form binary outcome classifications from predicted risk indices was 0.618. Table 2 shows LTCF related case and death rates from May 11 with the model's median risk indices by state. The correlation was statistically significant for both case (R = 0.859; P < .001) and death (R = 0.856; P < .001) rates.
Table 2

Predicted NH Risk from the Gradient Boosting Model and LTCF Related COVID-19 Case and Death Rates Reported on May 11 by State

State Ranking Based on Predicted NH Risk Index (as of May 4, 2020)StatePredicted on May 4, 2020
Reported on May 11, 2020
Reported LTCF Related Deaths per 1000 Beds (Relative Rank)
Median Predicted NH Risk Indices (IQR)Reported LTCF Related Cases per 1000 Beds (Relative Rank)
1New Jersey78.7 (67.1‒82.7)500.9 (1)92.7 (1)
2Massachusetts74.8 (65.1‒81.2)365.3 (2)66.9 (2)
3Connecticut71.3 (50.6‒78.3)251.5 (3)63.3 (3)
4New York66.1 (44.4‒83.5)No reporting data47.8 (4)
5Maryland65.7 (49.8‒72.2)226.3 (5)28.8 (7)
6Rhode Island63.1 (54.6‒70.8)230.1 (4)37.9 (5)
7Delaware62.6 (58.5‒72.2)91.9 (14)28.0 (8)
8Louisiana58.0 (42.9‒74.5)112.3 (11)23.3 (10)
9California54.0 (39.9‒63.7)82.7 (15)8.4 (21)
10Pennsylvania51.0 (37.8‒68.1)152.5 (8)29.0 (6)
11Florida51.0 (39.7‒59.6)66.9 (17)8.6 (19)
12Virginia49.6 (37.7‒63.0)115.1 (10)15.2 (14)
13Michigan45.8 (34.0‒70.1)99.7 (13)4.6 (27)
14Illinois45.1 (33.9‒73.9)127.5 (9)17.4 (12)
15Colorado44.8 (35.6‒59.9)184.7 (6)26.8 (9)
16Washington44.8 (36.6‒51.7)51.0 (26)3.8 (32)
17Georgia44.8 (36.7‒62.6)158.9 (7)17.5 (11)
18Nevada44.8 (40.7‒48.6)102.3 (12)8.6 (20)
19Utah43.7 (31.5‒53.8)10.3 (40)2.0 (37)
20Mississippi43.6 (37.7‒52.6)66.4 (18)10.5 (17)
21Indiana43.5 (36.2‒52.4)59.1 (20)11.4 (16)
22Alabama41.5 (34.1‒47.3)62.2 (19)1.0 (40)
23Ohio40.7 (31.2‒56.0)54.2 (24)3.2 (34)
24Texas40.5 (31.5‒56.0)10.1 (41)3.6 (33)
25South Carolina40.0 (33.3‒44.8)54.4 (23)5.4 (25)
26Nebraska39.8 (31.5‒45.3)5.2 (45)0.1 (45)
27Hawaii38.6 (31.5‒45.4)0.7 (49)No reporting data
28New Mexico37.7 (31.5‒47.3)5.7 (42)2.2 (36)
29North Carolina37.7 (31.5‒46.2)56.4 (21)7.3 (22)
30Arizona37.7 (31.9‒44.0)76.0 (16)12.7 (15)
31Alaska36.6 (31.5‒42.0)3.9 (46)No reporting data
32New Hampshire33.9 (26.6‒39.8)19.3 (38)1.7 (38)
33Vermont33.7 (31.5‒38.7)55.8 (22)9.6 (18)
34Tennessee33.5 (28.4‒44.8)22.6 (36)2.4 (35)
35Kentucky33.3 (28.0‒43.6)53.3 (25)6.7 (23)
36Oklahoma33.2 (28.5‒44.8)35.8 (32)4.3 (29)
37Arkansas33.1 (29.8‒40.5)17.8 (39)1.4 (39)
38Iowa32.4 (28.0‒41.5)36.8 (31)0.5 (42)
39Missouri32.1 (28.0‒44.8)2.4 (48)0.2 (43)
40Idaho31.6 (29.6‒40.5)29.1 (34)4.7 (26)
41Minnesota31.5 (28.0‒46.4)48.7 (27)16.8 (13)
42Kansas31.5 (28.0‒40.7)26.2 (35)4.2 (30)
43Wyoming31.5 (28.0‒37.7)5.4 (43)No reporting data
44West Virginia31.5 (29.7‒37.7)30.7 (33)4.0 (31)
45Oregon31.5 (31.3‒50.8)39.9 (29)6.3 (24)
46Montana29.8 (28.0‒34.4)5.4 (44)0.9 (41)
47North Dakota29.1 (27.6‒39.5)44.0 (28)No reporting data
48Maine28.9 (25.5‒35.1)37.7 (30)4.4 (28)
49South Dakota28.5 (25.3‒36.6)2.8 (47)No reporting data
50Wisconsin28.0 (25.3‒37.9)22.1 (37)0.2 (44)

IQR, interquartile range.

States were ranked in descending order based on the state's median, 75th percentile and 25th percentile risk index as of May 4, 2020.

Predicted NH Risk from the Gradient Boosting Model and LTCF Related COVID-19 Case and Death Rates Reported on May 11 by State IQR, interquartile range. States were ranked in descending order based on the state's median, 75th percentile and 25th percentile risk index as of May 4, 2020. Compared with the benchmark models, logistic regression and neural network (Table 3 ), the gradient boosting model demonstrated stronger prognostic ability and higher correlation to LTCF case and death rates by state. The gradient boosting model had higher mean out-of-sample AUC, sensitivity, and specificity compared with the logistic regression and neural network models from 10-fold CV over the training set (Table 3). In the validation set, the logistic regression model had a lower AUC (0.689) compared with the gradient boosting model and a large difference in sensitivity (0.914) and specificity (0.233), indicating its poor predictive power (overestimating the number of infected NH's) and generalizability. Similarly, the neural network had a lower AUC (0.707) compared with the gradient boosting model and a large discrepancy in sensitivity (0.904) and specificity (0.308) across the validation set, also indicating overestimation of infected NH's and poor model generalizability. The optimal threshold value used to form binary outcome classifications from the logistic regression and neural network model predictions were 0.609 and 0.640, respectively. The correlation between the gradient boosting model's median risk index and LTCF outcome rates by state was stronger compared with both the logistic regression and neural network models for both state case rates, R = 0.384 (P < .05) and R = 0.731 (P < .001), respectively, as well as state death rates, R = 0.335 (P < .05) and R = 0.705 (P < .001), respectively. The benchmark logistic regression model is further detailed in the Supplementary Table 2 for the interested reader.
Table 3

The Gradient Boosting Model's Performance and Correlation to LTCF Related COVID-19 Case and Death Rates by State Compared with the Performance of the Benchmark Logistic Regression and Neural Network Models

DatasetMetric of InterestGradient Boosting ModelBenchmark Logistic Regression ModelBenchmark Neural Network Model
Training set (via 10-fold cross validation)AUC, mean (95% CI)0.729 (0.690‒0.767)0.653 (0.599‒0.706)0.696 (0.657‒0.734)
Sensitivity, mean (95% CI)0.670 (0.477‒0.862)0.610 (0.483‒0.738)0.664 (0.484‒0.843)
Specificity, mean (95% CI)0.611 (0.412‒0.809)0.592 (0.450‒0.733)0.585 (0.410‒0.760)
Prospective validation setAUC0.7210.6890.707
Sensitivity0.6220.9140.904
Specificity0.7130.2330.308
State LTCF outcome ratesCorrelation between median risk index and state LTCF case rates by state, Pearson correlation coefficient0.8590.3840.731
Correlation between median risk index and LTCF deaths rates by state, Pearson correlation coefficient0.8560.3350.705

NHs from Massachusetts, Georgia, and New Jersey with outcomes reported on April 20, 2020.

NHs from California with outcomes reported on May 11, 2020.

LTCF-related COVID-19 case and death rates reported on May 11th by states across the United States.

Supplementary Table 2

Odds Ratios for Benchmark Multivariate Logistic Regression Model Based on the Training Set Data (n = 1146)

VariablesOdds Ratio (95% CI)P Values
Cumulative number of positive COVID-19 cases per capita in the facility's county on the day of NH COVID-19 case reporting (confirmed cases per 100,000 people)75.7 × 1090 (57.8 × 1067‒99.0 × 10113)<.001
Total number of beds at the facility1.003 (1.001‒1.005)<.001
Population density of the facility's county (population per square mile)1.0000 (0.9999‒1.0001).782
Number of health deficiencies at the facility as defined by the Centers for Medicare and Medicaid Services1.029 (1.015‒1.044)<.001
Percent of nursing home residents who were non-Hispanic white prior to the COVID-19 outbreak0.997 (0.991‒1.004).456
Number of patients per 1000 square feet in the facility prior to the COVID-19 outbreak0.844 (0.653‒1.091).195
Number of clinical workers per 1000 square feet in the facility prior to the COVID-19 outbreak2.528 (1.185‒5.390)<.05
Intercept0.204 (0.096‒0.435)<.001

P < .05.

The Gradient Boosting Model's Performance and Correlation to LTCF Related COVID-19 Case and Death Rates by State Compared with the Performance of the Benchmark Logistic Regression and Neural Network Models NHs from Massachusetts, Georgia, and New Jersey with outcomes reported on April 20, 2020. NHs from California with outcomes reported on May 11, 2020. LTCF-related COVID-19 case and death rates reported on May 11th by states across the United States.

Discussion

Predicting COVID-19 outbreaks in senior care facilities has been a challenge for policymakers and nursing home operators who prioritize the allocation of various critical resources (eg, personal protection equipment (PPE), training, audits, testing) to prevent and mitigate outbreak and its consequences.10, 11, 12 For example, previous studies have mixed results on the relationship between standard LTCF ratings, such as the CMS 5-star overall and health inspection ratings, and increased risk of infection.11, 12, 13, 14, 15, 16 This study describes the development and application of a machine-learning gradient boosting model to quantify complex predictive relationships between NH COVID-19 infection risk and granular NH characteristics that were decomposed from traditional aggregated NH measures, highlighting factors contributing to NH infection during the initial COVID-19 outbreak phase (March/April 2020). The model demonstrated moderate predictive power and strong association with NH and LTCF outcomes across the United States, suggesting the value of the identified risk factors in predicting which NHs are most susceptible to infection introduction. The gradient boosting approach outperformed logistic regression and neural network benchmark models, further demonstrating its ability in providing insights to inform healthcare policies to prevent COVID-19 infection. The identified risk factors provide data-driven support of hypotheses regarding 2 primary infection mechanisms: (1) introduction via presymptomatic and asymptomatic individuals from the surrounding community, and (2) intra-facility transmission following initial exposure. , , , Opportunities for infection introduction increase with the number and frequency of individuals entering the NH from the surrounding community. Intra-facility transmission following exposure from the outside community appears to increase with staff and resident density, suggestive of greater interaction within the NH. In addition, historical CMS-cited health deficiencies could indicate poor safety culture, inappropriate infection control practices, and lack of financial resources to implement appropriate safety measures, , all of which may impact both infection introduction and spread. , , Lastly, a higher percent of non-Hispanic white residents was associated with lower risk of infection, consistent with the racial disparities of COVID-19 infection risk, as well as social and structural determinants of health, affecting both the general public21, 22, 23, 24, 25 and the geriatric and NH community.10, 11, 12 , As lower long-term , and post-acute care quality, as well as more limited financial resources have been found in NHs with a higher percentage of minority residents, these results further suggest poor infection control practice and limited access to infection control resources may impact COVID-19 introduction. These factors help inform policy priorities that have emerged in NH COVID-19 management: staff and resident testing; positive workforce practices; PPE availability and proper use; financial relief for NHs; and development of high-quality facility level COVID-19 databases. The importance of community transmission supports evidence that early identification and management of presymptomatic and asymptomatic individuals, particularly staff who frequently enter and exit the facility, can be effective in infection introduction. , , 30, 31, 32 The role that presymptomatic and asymptomatic individuals play in transmission underscores the importance of frequent surveillance testing of staff as a preferable policy to symptom screening in effectively preventing infection introduction. Staff have not been prioritized in many NH testing strategies, which have been extremely inconsistent across states. , Less than one-half of the states were reporting COVID-19 cases in staff during the initial COVID-19 outbreak in April, some of which did not even perform staff testing following a NH outbreak. The development of a state or federally supported surveillance testing approach for staff during the COVID-19 pandemic is essential to sustain effective infection prevention practices in NHs; most important, is securing funding sources and providing operational capacity – such questions have been raised in many states and most notably in New York which has recently mandated regular testing of workers. , In addition to testing, state-supported workforce policies could help address staffing shortages, facilitate effective organizational communication, and provide paid sick leave as COVID-positive workers are identified. Once a facility has at least 1 COVID-19 case, the relevant mechanism for infection to consider is intrafacility transmission among residents and staff. , , The positive association of risk to resident and staff density supports interventions that minimize staff transitions across parts of the facility, and that limit unnecessary in-person interactions with residents. In the short term, intrafacility infection spread may be lower in facilities with reduced occupancy rates as a result of the first wave of the COVID outbreak. At the same time, reduced occupancy has a significant financial impact on facilities, particularly from decreased Medicare revenue associated with low post-acute care referrals and increased patient management costs. At the federal level, short-term policies that bring Medicaid payments in line with Medicare payments per head and eliminate low occupancy penalization should be considered to provide financial stability to facilities while at the same time reducing the risk of intra-facility spread. And finally, immediate actions to increase available PPE for NH staff, which have been in shortage, , , , , are also essential, as unprotected and asymptomatic staff are likely primary vectors accelerating infection spread. The challenge of improving COVID-19 outcomes in NHs and compliance to infection policies emphasize the continuing role of data analytics and advanced modeling techniques to inform NH response. Risk indices, such as the one generated by our model, can be used by policymakers to prioritize certain facilities for enhanced support, as well as reveal critical support needs. Moreover, predictive risk models can be instrumental in informing the relaxation and tightening of NH visitor policies. Diligence around identifying risk factors and drivers of infection will remain critical through future COVID-19 recovery phases. Maintaining quality, up-to-date facility level data will help inform data-driven analysis in the dynamically changing NH and COVID-19 landscape. Moving forward, health organizations including CMS and Centers for Disease Control and Prevention would benefit from developing high-quality national datasets to inform infection and control policies. Along with this, frequent assessment of NH characteristics that are relevant to informing decision-makers should be conducted to support analysis of suspected infection mechanisms. For example, the inflow and outflow of residents and workforce, staffing levels, and workforce status within NHs have been points of interest , that have not been reported in public datasets. Applying these modeling techniques to inform targeted interventions may also improve COVID-19 outcomes in other institutional settings, such as homeless shelters and correctional facilities, that have experienced rapid intra-facility transmissions. , The strong correlation between state median risk indices and LTCF case and death rates can be explained as most infections and deaths occurred in NHs, but also could indicate the relevance of the risk factors to such settings. This study has several limitations. First, NH COVID-19 outcomes were inconsistently reported across states and could underestimate actual infection and fatality rates. Partial testing of NHs could also result in underestimations of outcomes. To mitigate this risk, training data was collected from states with relatively higher testing levels, and better data quality. Second, while the model performed strongly when validated on a state with significantly different characteristics from the training states, model performance could still be inconsistent across different geographic areas. Lastly, model predictors describing NHs were developed from historical reports, such as those from the 2017 Long-Term Care Focus database and may not reflect real-time NH characteristics.

Conclusions and Implications

A machine-learning gradient boosting model can describe and predict the risk of COVID-19 outbreak in NHs, providing data-driven support for NH infection control policies, strategies for the prioritization of resources to high-risk NHs, and the relaxation and restriction of NH visitor policies. The prevalence of COVID-19 infections in a NH's surrounding community and a NH's size were identified as the primary risk factors associated with NH infection, suggesting that the introduction of infection from the outside community as a likely infection mechanism. Developing financially sustainable testing and screening approaches to identify presymptomatic and asymptomatic individuals entering a NH are critical to preventing and controlling COVID-19 outbreaks in these settings.
  21 in total

1.  Driven to tiers: socioeconomic and racial disparities in the quality of nursing home care.

Authors:  Vincent Mor; Jacqueline Zinn; Joseph Angelelli; Joan M Teno; Susan C Miller
Journal:  Milbank Q       Date:  2004       Impact factor: 4.911

2.  Nursing Homes in States with Infection Control Training or Infection Reporting Have Reduced Infection Control Deficiency Citations.

Authors:  Catherine C Cohen; John Engberg; Carolyn T A Herzig; Andrew W Dick; Patricia W Stone
Journal:  Infect Control Hosp Epidemiol       Date:  2015-09-09       Impact factor: 3.254

3.  The Importance of Long-term Care Populations in Models of COVID-19.

Authors:  Karl Pillemer; Lakshminarayanan Subramanian; Nathaniel Hupert
Journal:  JAMA       Date:  2020-07-07       Impact factor: 56.272

4.  Perceived Patient Safety Culture in Nursing Homes Associated With "Nursing Home Compare" Performance Indicators.

Authors:  Yue Li; Xi Cen; Xueya Cai; Helena Temkin-Greener
Journal:  Med Care       Date:  2019-08       Impact factor: 2.983

5.  COVID-19 Preparedness in Nursing Homes in the Midst of the Pandemic.

Authors:  Denise D Quigley; Andrew Dick; Mansi Agarwal; Karen M Jones; Lona Mody; Patricia W Stone
Journal:  J Am Geriatr Soc       Date:  2020-05-12       Impact factor: 5.562

6.  Assessing racial and ethnic disparities using a COVID-19 outcomes continuum for New York State.

Authors:  David R Holtgrave; Meredith A Barranco; James M Tesoriero; Debra S Blog; Eli S Rosenberg
Journal:  Ann Epidemiol       Date:  2020-06-29       Impact factor: 3.797

7.  Hospitalization and Mortality among Black Patients and White Patients with Covid-19.

Authors:  Eboni G Price-Haywood; Jeffrey Burton; Daniel Fort; Leonardo Seoane
Journal:  N Engl J Med       Date:  2020-05-27       Impact factor: 91.245

8.  Epidemiology of Covid-19 in a Long-Term Care Facility in King County, Washington.

Authors:  Temet M McMichael; Dustin W Currie; Shauna Clark; Sargis Pogosjans; Meagan Kay; Noah G Schwartz; James Lewis; Atar Baer; Vance Kawakami; Margaret D Lukoff; Jessica Ferro; Claire Brostrom-Smith; Thomas D Rea; Michael R Sayre; Francis X Riedo; Denny Russell; Brian Hiatt; Patricia Montgomery; Agam K Rao; Eric J Chow; Farrell Tobolowsky; Michael J Hughes; Ana C Bardossy; Lisa P Oakley; Jesica R Jacobs; Nimalie D Stone; Sujan C Reddy; John A Jernigan; Margaret A Honein; Thomas A Clark; Jeffrey S Duchin
Journal:  N Engl J Med       Date:  2020-03-27       Impact factor: 91.245

9.  Is There a Link between Nursing Home Reported Quality and COVID-19 Cases? Evidence from California Skilled Nursing Facilities.

Authors:  Mengying He; Yumeng Li; Fang Fang
Journal:  J Am Med Dir Assoc       Date:  2020-06-15       Impact factor: 4.669

10.  Characteristics of U.S. Nursing Homes with COVID-19 Cases.

Authors:  Hannah R Abrams; Lacey Loomer; Ashvin Gandhi; David C Grabowski
Journal:  J Am Geriatr Soc       Date:  2020-07-07       Impact factor: 7.538

View more
  9 in total

Review 1.  Data Science Trends Relevant to Nursing Practice: A Rapid Review of the 2020 Literature.

Authors:  Brian J Douthit; Rachel L Walden; Kenrick Cato; Cynthia P Coviak; Christopher Cruz; Fabio D'Agostino; Thompson Forbes; Grace Gao; Theresa A Kapetanovic; Mikyoung A Lee; Lisiane Pruinelli; Mary A Schultz; Ann Wieben; Alvin D Jeffery
Journal:  Appl Clin Inform       Date:  2022-02-09       Impact factor: 2.342

2.  Intelligent system for COVID-19 prognosis: a state-of-the-art survey.

Authors:  Janmenjoy Nayak; Bighnaraj Naik; Paidi Dinesh; Kanithi Vakula; B Kameswara Rao; Weiping Ding; Danilo Pelusi
Journal:  Appl Intell (Dordr)       Date:  2021-01-06       Impact factor: 5.086

3.  Individual Factors Associated With COVID-19 Infection: A Machine Learning Study.

Authors:  Tania Ramírez-Del Real; Mireya Martínez-García; Manlio F Márquez; Laura López-Trejo; Guadalupe Gutiérrez-Esparza; Enrique Hernández-Lemus
Journal:  Front Public Health       Date:  2022-06-30

4.  Front-line Nursing Home Staff Experiences During the COVID-19 Pandemic.

Authors:  Elizabeth M White; Terrie Fox Wetle; Ann Reddy; Rosa R Baier
Journal:  J Am Med Dir Assoc       Date:  2020-11-24       Impact factor: 4.669

5.  COVID-19's Influence on Information and Communication Technologies in Long-Term Care: Results From a Web-Based Survey With Long-Term Care Administrators.

Authors:  Amy M Schuster; Shelia R Cotten
Journal:  JMIR Aging       Date:  2022-01-12

6.  COVID-19 outbreaks in nursing homes: A strong link with the coronavirus spread in the surrounding population, France, March to July 2020.

Authors:  Muriel Rabilloud; Benjamin Riche; Jean François Etard; Mad-Hélénie Elsensohn; Nicolas Voirin; Thomas Bénet; Jean Iwaz; René Ecochard; Philippe Vanhems
Journal:  PLoS One       Date:  2022-01-07       Impact factor: 3.240

7.  Factors associated with SARS-CoV-2 test positivity in long-term care homes: A population-based cohort analysis using machine learning.

Authors:  Douglas S Lee; Chloe X Wang; Finlay A McAlister; Shihao Ma; Anna Chu; Paula A Rochon; Padma Kaul; Peter C Austin; Xuesong Wang; Sunil V Kalmady; Jacob A Udell; Michael J Schull; Barry B Rubin; Bo Wang
Journal:  Lancet Reg Health Am       Date:  2022-01-17

8.  Southeastern United States Predictors of COVID-19 in Nursing Homes.

Authors:  Sandi J Lane; Maggie Sugg; Trent J Spaulding; Adam Hege; Lakshmi Iyer
Journal:  J Appl Gerontol       Date:  2022-04-12

9.  Short-Stay Admissions Associated With Large COVID-19 Outbreaks in Maryland Nursing Homes.

Authors:  T Joseph Mattingly; Alison Trinkoff; Alison D Lydecker; Justin J Kim; Jung Min Yoon; Mary-Claire Roghmann
Journal:  Gerontol Geriatr Med       Date:  2021-12-09
  9 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.