| Literature DB >> 33032935 |
Christopher L F Sun1, Eugenio Zuccarelli2, El Ghali A Zerhouni2, Jason Lee3, James Muller4, Karen M Scott5, Alida M Lujan5, Retsef Levi6.
Abstract
OBJECTIVE: Inform coronavirus disease 2019 (COVID-19) infection prevention measures by identifying and assessing risk and possible vectors of infection in nursing homes (NHs) using a machine-learning approach.Entities:
Keywords: COVID-19; Nursing homes; health policy; infection prevention; long-term care facility; machine-learning; risk modeling
Mesh:
Year: 2020 PMID: 33032935 PMCID: PMC7451194 DOI: 10.1016/j.jamda.2020.08.030
Source DB: PubMed Journal: J Am Med Dir Assoc ISSN: 1525-8610 Impact factor: 4.669
Overview of the Data Inputs of the NH Risk Model
| Data Category | Variable Description | Data Source | Number of NH in Training Set without Missing Feature Values (n = 1146), n (%) |
|---|---|---|---|
| Facility's community characteristics | Cumulative number of positive COVID-19 cases per capita in the NH's county on the day of NH COVID-19 case reporting | NYT COVID Tracker | 1133 (98.9) |
| Estimated poverty score of the NH's county (Based on: Household income) | CDC | 1133 (98.9) | |
| Overall comorbidity score of the NH's county (Based on: Obesity, diabetes, hypertension, cardiovascular characteristics from 2018) | CDC | 1133 (98.9) | |
| Overall percentile ranking of social vulnerability index of the NH's county (Based on: Socioeconomic, household composition, minority status/language, and housing type/transportation characteristics) | CDC | 1133 (98.9) | |
| Percent of facility's county who are non-Hispanic White | US Census | 1133 (98.9) | |
| Percentage of family households in the NH's county | Claritas | 1133 (98.9) | |
| Population density of the NH's county (population per square mile) | Claritas | 1133 (98.9) | |
| Community social distancing and population mobility characteristics | Proportion of Safe Graph tracked devices traveling less than 8000 meters per day out of all tracked devices in the facility's Zip code of the week of NH COVID-19 case reporting | Safe Graph | 1096 (95.6) |
| Proportion of Safe Graph tracked devices traveling more than 50,000 meters per day out of all tracked devices in the facility's zip code of the week of NH COVID-19 case reporting | Safe Graph | 1096 (95.6) | |
| Proportion of Safe Graph tracked devices exhibiting full-time employment behavior in the zip code of the week of NH COVID-19 case reporting | Safe Graph | 1101 (96.1) | |
| Proportion of Safe Graph tracked devices traveling less than 8000 meters per day out of all tracked devices in the facility's county of the week of NH COVID-19 case reporting | Safe Graph | 1133 (98.9) | |
| Proportion of Safe Graph tracked devices traveling more than 50,000 meters per day out of all tracked devices in the facility's county of the week of NH COVID-19 case reporting | Safe Graph | 1133 (98.9) | |
| Proportion of Safe Graph tracked devices exhibiting full-time employment behavior in the county of the week of NH COVID-19 case reporting | Safe Graph | 1133 (98.9) | |
| Inflow of Safe Graph tracked devices to the facility's county of the week of NH COVID-19 case reporting | Safe Graph | 1131 (98.7) | |
| Outflow of Safe Graph tracked devices from the facility's county of the week of NH COVID-19 case reporting | Safe Graph | 1132 (98.8) | |
| Percentage of county's population taking public transportation to work | Claritas | 1133 (98.9) | |
| Facility characteristics | Age of the nursing home in years | NIC | 750 (65.4) |
| Standardized hourly cost per clinical worker excluding certified nursing assistants (per patient per day) | MCDA | 1090 (95.1) | |
| Standardized hourly cost per clinical worker including certified nursing assistant (per patient per day) | MCDA | 1090 (95.1) | |
| Walk Score measures walkability on a scale from 0‒100 based on walking routes to destinations such as grocery stores, schools, parks, restaurants, and retail | Walk Score | 760 (66.3) | |
| Number of clinical workers per 1000 square feet in the facility prior to the COVID-19 outbreak | MCDA | 999 (87.2) | |
| Number of health deficiencies at the facility as defined by the Centers for Medicare and Medicaid Service since 2014 | CMS | 977 (85.3) | |
| Number of patients per 1000 square feet in the facility prior to the COVID-19 outbreak | MCDA | 999 (87.2) | |
| Overall infection control process and performance index | MCDA | 974 (85.0) | |
| Rate of influenza vaccination for long stay residents | MCDA | 1065 (92.9) | |
| Rate of influenza vaccination for short stay residents | MCDA | 1078 (94.1) | |
| Rate of pneumococcal vaccination for long stay residents | MCDA | 1065 (92.9) | |
| Rate of pneumococcal vaccination for short stay residents | MCDA | 1080 (94.2) | |
| Rate of rehospitalizations of residents due to infection | MCDA | 1091 (95.2) | |
| Index based on CMS health inspection citations related to infection control measures (citations weighted according to scope, severity, and recency) | MCDA | 849 (74.1) | |
| Index based on CMS health inspection citations related to laboratory processes (citations weighted according to scope, severity, and recency) | MCDA | 739 (64.5) | |
| Index based on CMS health inspection citations related to managerial processes (citations weighted according to scope, severity, and recency) | MCDA | 966 (84.3) | |
| Index based on CMS health inspection citations related to physical environment (citations weighted according to scope, severity, and recency) | MCDA | 739 (64.5) | |
| Total number of beds at the facility | CMS, NIC | 1146 (100.0) | |
| Total number of beds for Nursing Care | NIC | 768 (67.0) | |
| Total number of units for assisted living | NIC | 768 (67.0) | |
| Total number of units for independent living | NIC | 768 (67.0) | |
| Total number of units for memory care | NIC | 768 (67.0) | |
| Percent of facility residents who were non-Hispanic white prior to the COVID-19 outbreak | LTCFocus | 961 (83.9) | |
| Percent of facility residents whose primary support was Medicare prior to the COVID-19 outbreak | LTCFocus | 961 (83.9) | |
| Percent of facility residents whose primary support was Medicaid prior to the COVID-19 outbreak | LTCFocus | 961 (83.9) | |
| Facility outcomes | Presence of at least one resident COVID-19 case | State Departments of health | 1146 (100.0) |
CDC, Centers for Disease Control and Prevention; LTCFocus, Long-term Care Focus; MCDA, Muller Consulting and Data Analytics; NIC, National Investment Center for Seniors Housing and Care; NYT, New York Times.
Clinical worker defined as registered nurses, licensed practical nurses, certified nursing assistants, nursing aides, medical aides/technicians, nursing home administrators, medical directors, physicians, physician assistants, nurse practitioners, clinical nurse specialists, pharmacists, dieticians, feeding assistants, occupational therapists, occupational therapy assistants, occupational therapy aides, physical therapists, physical therapist assistants, physical therapist aides, respiratory therapists, respiratory therapy technicians, speech/language pathologists, therapeutic recreation specialists, qualified activities professionals, other activities staff, qualified social workers, other social workers, mental health service workers.
Standardized costs based on average hours worked multiplied by Bureau of Labor Statistics national wage rate estimates for respective occupations in skilled nursing facilities.
The Summary and Comparison of the Predictive Characteristics of the NH in the Model's Training and Validation Sets
| Identified Predictive Features | Training Set | Prospective Validation Set | |
|---|---|---|---|
| Cumulative number of positive COVID-19 cases per capita in the facility's county on the day of NH COVID-19 case reporting (confirmed cases per 100,000 people), median (IQR) | 478.1 (182.0‒730.7) | 112.5 (79.5‒244.7) | <.001 |
| Total number of beds at the facility, median (IQR) | 122 (94‒167) | 99 (74‒140) | <.001 |
| Population density of the facility's county (population per square mile), median (IQR) | 1027.0 (420.6‒2033.7) | 1613.3 (343.8‒2508.6) | <.05 |
| Number of health deficiencies at the facility as defined by the CMS, median (IQR) | 12 (7‒19) | 35 (23‒50) | <.001 |
| Percent of NH residents who were non-Hispanic white prior to the COVID-19 outbreak, median (IQR) | 83.6 (62.0‒94.2) | 59.6 (42.9‒78.5) | <.001 |
| Number of patients per 1000 square feet in the facility prior to the COVID-19 outbreak, median (IQR) | 2.95 (2.15‒3.77) | 4.83 (3.75‒5.68) | <.001 |
| Number of clinical workers per 1000 square feet in the facility prior to the COVID-19 outbreak, median (IQR) | 1.06 (0.78‒1.33) | 1.90 (1.52‒2.32) | <.001 |
| Positive COVID-19 resident case in NH, No. (%) | 722 (63.0) | 209 (20.5) | <.001 |
IQR, interquartile range.
Significant differences in the characteristics between the 2 sets were found. A strong predictive performance across a validation set population that is significantly different from its training set population suggests the model will be generalizable to different populations. P values from Mann–Whitney U and χ2 tests, as appropriate, comparing the differences in the characteristics are shown.
NHs from Massachusetts, Georgia, and New Jersey with outcomes reported on April 20, 2020.
NHs from California with outcomes reported on May 11, 2020.
Fig. 1Feature importance and impact on risk of COVID-19 infection in NHs from the gradient boosting model. The NH's county's COVID-19 infection rate and size had the largest impact on infection risk (features are in descending order from highest to lowest importance). In the figure, each dot represents a NH that the model has been trained on. For each NH, a high feature value corresponds to the color red, and a low feature value corresponds to the color blue. The horizontal axis shows whether the effect of the feature value is associated with a higher or lower risk of NG infection.
Supplementary Fig. 1Predictive feature's impact , shown in subfigures (A‒G), on estimated NH risk of COVID-19 infection. The median (blue line), 25th and 75th percentiles (gray band), and 5th and 95th percentiles (orange band) of the infection risk levels generated by the trained model are shown across 15,300 NHs in the United States.
Predicted NH Risk from the Gradient Boosting Model and LTCF Related COVID-19 Case and Death Rates Reported on May 11 by State
| State Ranking Based on Predicted NH Risk Index (as of May 4, 2020) | State | Predicted on May 4, 2020 | Reported on May 11, 2020 | Reported LTCF Related Deaths per 1000 Beds (Relative Rank) |
|---|---|---|---|---|
| Median Predicted NH Risk Indices (IQR) | Reported LTCF Related Cases per 1000 Beds (Relative Rank) | |||
| 1 | New Jersey | 78.7 (67.1‒82.7) | 500.9 (1) | 92.7 (1) |
| 2 | Massachusetts | 74.8 (65.1‒81.2) | 365.3 (2) | 66.9 (2) |
| 3 | Connecticut | 71.3 (50.6‒78.3) | 251.5 (3) | 63.3 (3) |
| 4 | New York | 66.1 (44.4‒83.5) | No reporting data | 47.8 (4) |
| 5 | Maryland | 65.7 (49.8‒72.2) | 226.3 (5) | 28.8 (7) |
| 6 | Rhode Island | 63.1 (54.6‒70.8) | 230.1 (4) | 37.9 (5) |
| 7 | Delaware | 62.6 (58.5‒72.2) | 91.9 (14) | 28.0 (8) |
| 8 | Louisiana | 58.0 (42.9‒74.5) | 112.3 (11) | 23.3 (10) |
| 9 | California | 54.0 (39.9‒63.7) | 82.7 (15) | 8.4 (21) |
| 10 | Pennsylvania | 51.0 (37.8‒68.1) | 152.5 (8) | 29.0 (6) |
| 11 | Florida | 51.0 (39.7‒59.6) | 66.9 (17) | 8.6 (19) |
| 12 | Virginia | 49.6 (37.7‒63.0) | 115.1 (10) | 15.2 (14) |
| 13 | Michigan | 45.8 (34.0‒70.1) | 99.7 (13) | 4.6 (27) |
| 14 | Illinois | 45.1 (33.9‒73.9) | 127.5 (9) | 17.4 (12) |
| 15 | Colorado | 44.8 (35.6‒59.9) | 184.7 (6) | 26.8 (9) |
| 16 | Washington | 44.8 (36.6‒51.7) | 51.0 (26) | 3.8 (32) |
| 17 | Georgia | 44.8 (36.7‒62.6) | 158.9 (7) | 17.5 (11) |
| 18 | Nevada | 44.8 (40.7‒48.6) | 102.3 (12) | 8.6 (20) |
| 19 | Utah | 43.7 (31.5‒53.8) | 10.3 (40) | 2.0 (37) |
| 20 | Mississippi | 43.6 (37.7‒52.6) | 66.4 (18) | 10.5 (17) |
| 21 | Indiana | 43.5 (36.2‒52.4) | 59.1 (20) | 11.4 (16) |
| 22 | Alabama | 41.5 (34.1‒47.3) | 62.2 (19) | 1.0 (40) |
| 23 | Ohio | 40.7 (31.2‒56.0) | 54.2 (24) | 3.2 (34) |
| 24 | Texas | 40.5 (31.5‒56.0) | 10.1 (41) | 3.6 (33) |
| 25 | South Carolina | 40.0 (33.3‒44.8) | 54.4 (23) | 5.4 (25) |
| 26 | Nebraska | 39.8 (31.5‒45.3) | 5.2 (45) | 0.1 (45) |
| 27 | Hawaii | 38.6 (31.5‒45.4) | 0.7 (49) | No reporting data |
| 28 | New Mexico | 37.7 (31.5‒47.3) | 5.7 (42) | 2.2 (36) |
| 29 | North Carolina | 37.7 (31.5‒46.2) | 56.4 (21) | 7.3 (22) |
| 30 | Arizona | 37.7 (31.9‒44.0) | 76.0 (16) | 12.7 (15) |
| 31 | Alaska | 36.6 (31.5‒42.0) | 3.9 (46) | No reporting data |
| 32 | New Hampshire | 33.9 (26.6‒39.8) | 19.3 (38) | 1.7 (38) |
| 33 | Vermont | 33.7 (31.5‒38.7) | 55.8 (22) | 9.6 (18) |
| 34 | Tennessee | 33.5 (28.4‒44.8) | 22.6 (36) | 2.4 (35) |
| 35 | Kentucky | 33.3 (28.0‒43.6) | 53.3 (25) | 6.7 (23) |
| 36 | Oklahoma | 33.2 (28.5‒44.8) | 35.8 (32) | 4.3 (29) |
| 37 | Arkansas | 33.1 (29.8‒40.5) | 17.8 (39) | 1.4 (39) |
| 38 | Iowa | 32.4 (28.0‒41.5) | 36.8 (31) | 0.5 (42) |
| 39 | Missouri | 32.1 (28.0‒44.8) | 2.4 (48) | 0.2 (43) |
| 40 | Idaho | 31.6 (29.6‒40.5) | 29.1 (34) | 4.7 (26) |
| 41 | Minnesota | 31.5 (28.0‒46.4) | 48.7 (27) | 16.8 (13) |
| 42 | Kansas | 31.5 (28.0‒40.7) | 26.2 (35) | 4.2 (30) |
| 43 | Wyoming | 31.5 (28.0‒37.7) | 5.4 (43) | No reporting data |
| 44 | West Virginia | 31.5 (29.7‒37.7) | 30.7 (33) | 4.0 (31) |
| 45 | Oregon | 31.5 (31.3‒50.8) | 39.9 (29) | 6.3 (24) |
| 46 | Montana | 29.8 (28.0‒34.4) | 5.4 (44) | 0.9 (41) |
| 47 | North Dakota | 29.1 (27.6‒39.5) | 44.0 (28) | No reporting data |
| 48 | Maine | 28.9 (25.5‒35.1) | 37.7 (30) | 4.4 (28) |
| 49 | South Dakota | 28.5 (25.3‒36.6) | 2.8 (47) | No reporting data |
| 50 | Wisconsin | 28.0 (25.3‒37.9) | 22.1 (37) | 0.2 (44) |
IQR, interquartile range.
States were ranked in descending order based on the state's median, 75th percentile and 25th percentile risk index as of May 4, 2020.
The Gradient Boosting Model's Performance and Correlation to LTCF Related COVID-19 Case and Death Rates by State Compared with the Performance of the Benchmark Logistic Regression and Neural Network Models
| Dataset | Metric of Interest | Gradient Boosting Model | Benchmark Logistic Regression Model | Benchmark Neural Network Model |
|---|---|---|---|---|
| Training set | AUC, mean (95% CI) | 0.729 (0.690‒0.767) | 0.653 (0.599‒0.706) | 0.696 (0.657‒0.734) |
| Sensitivity, mean (95% CI) | 0.670 (0.477‒0.862) | 0.610 (0.483‒0.738) | 0.664 (0.484‒0.843) | |
| Specificity, mean (95% CI) | 0.611 (0.412‒0.809) | 0.592 (0.450‒0.733) | 0.585 (0.410‒0.760) | |
| Prospective validation set | AUC | 0.721 | 0.689 | 0.707 |
| Sensitivity | 0.622 | 0.914 | 0.904 | |
| Specificity | 0.713 | 0.233 | 0.308 | |
| State LTCF outcome rates | Correlation between median risk index and state LTCF case rates by state, Pearson correlation coefficient | 0.859 | 0.384 | 0.731 |
| Correlation between median risk index and LTCF deaths rates by state, Pearson correlation coefficient | 0.856 | 0.335 | 0.705 |
NHs from Massachusetts, Georgia, and New Jersey with outcomes reported on April 20, 2020.
NHs from California with outcomes reported on May 11, 2020.
LTCF-related COVID-19 case and death rates reported on May 11th by states across the United States.
Odds Ratios for Benchmark Multivariate Logistic Regression Model Based on the Training Set Data (n = 1146)
| Variables | Odds Ratio (95% CI) | |
|---|---|---|
| Cumulative number of positive COVID-19 cases per capita in the facility's county on the day of NH COVID-19 case reporting (confirmed cases per 100,000 people) | 75.7 × 1090 (57.8 × 1067‒99.0 × 10113) | <.001 |
| Total number of beds at the facility | 1.003 (1.001‒1.005) | <.001 |
| Population density of the facility's county (population per square mile) | 1.0000 (0.9999‒1.0001) | .782 |
| Number of health deficiencies at the facility as defined by the Centers for Medicare and Medicaid Services | 1.029 (1.015‒1.044) | <.001 |
| Percent of nursing home residents who were non-Hispanic white prior to the COVID-19 outbreak | 0.997 (0.991‒1.004) | .456 |
| Number of patients per 1000 square feet in the facility prior to the COVID-19 outbreak | 0.844 (0.653‒1.091) | .195 |
| Number of clinical workers per 1000 square feet in the facility prior to the COVID-19 outbreak | 2.528 (1.185‒5.390) | <.05 |
| Intercept | 0.204 (0.096‒0.435) | <.001 |
P < .05.