Literature DB >> 36131845

Using a Cloud-Based Machine Learning Classification Tree Analysis to Understand the Demographic Characteristics Associated With COVID-19 Booster Vaccination Among Adults in the United States.

Lu Meng1,2, Hannah E Fast1,3, Ryan Saelee1,3, Elizabeth Zell1,4, Bhavini Patel Murthy1,3, Neil Chandra Murthy1,3, Peng-Jun Lu1,3, Lauren Shaw1,3, LaTreace Harris1,3, Lynn Gibbs-Scharf1,3, Terence Chorba1,5.   

Abstract

A tree model identified adults age ≤34 years, Johnson & Johnson primary series recipients, people from racial/ethnic minority groups, residents of nonlarge metro areas, and those living in socially vulnerable communities in the South as less likely to be boosted. These findings can guide clinical/public health outreach toward specific subpopulations. Published by Oxford University Press on behalf of Infectious Diseases Society of America 2022.

Entities:  

Keywords:  COVID-19; COVID-19 vaccination; booster dose; coronavirus

Year:  2022        PMID: 36131845      PMCID: PMC9452182          DOI: 10.1093/ofid/ofac446

Source DB:  PubMed          Journal:  Open Forum Infect Dis        ISSN: 2328-8957            Impact factor:   4.423


Coronavirus disease 2019 (COVID-19) booster vaccination increases protection against severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) infection, including the recently predominant Omicron variant (B.1.1.529), and reduces COVID-19-associated hospitalization and death [1]. During August–November 2021, a series of Emergency Use Authorizations and recommendations, including those for an additional primary dose for immunocompromised persons and a booster dose for persons age ≥18 years, were approved by the Food and Drug Administration [2]. In the United States, as of April 2022, all adults (age ≥18 years) were eligible to receive a booster dose ≥2 months after vaccination with the 1-dose Johnson & Johnson/Janssen (J&J) primary series or ≥5 months after the second dose of the Pfizer-BioNTech or Moderna 2-dose mRNA primary series [2]. Certain populations may have also chosen to receive a second booster dose using an mRNA COVID-19 vaccine ≥4 months after the first booster dose [2]. As of March 2022, ∼47% of persons age ≥18 years who were eligible to receive a booster dose after completing a primary series of COVID-19 vaccine had not yet received a booster [3]. Disparities in COVID-19 vaccine booster uptake have been related to socioeconomic status, insurance status, disability, and social demographic factors, including age, education level, race/ethnicity, and residency in rural or urban areas [4-7]. In the present study, we applied machine learning methods in the form of a classification tree algorithm to identify and describe relationships and interactions of demographic factors associated with the receipt or nonreceipt of a COVID-19 booster vaccine among eligible persons age ≥18 years in the United States.

METHODS

Over 152 million COVID-19 primary vaccine completion records (administered from 12/14/2020 through 09/15/2021) and 81 million first booster dose records (administered through 03/15/2022) reported to the Centers for Disease Control and Prevention (CDC) from 49 states and the District of Columbia (DC) were analyzed using the cloud-based data platform Microsoft Azure DataBricks (Azure Databricks | Microsoft Azure). Texas had data-sharing restrictions on information reported to the CDC; its data were not available for inclusion. Vaccine records from US territories were not included in the present study. Recipients’ primary series and booster dose records were matched. A classification tree model was built to examine factors contributing to receiving a booster dose, with Gini impurity as the classification tree splitting metric [8]. Input variables included primary series vaccine product (Moderna, Pfizer-BioNTech, J&J), age group (18–24, 25–34, 35–44, 45–54, 55–64, ≥65 years), sex (male, female), race/ethnicity (Hispanic/Latino, non-Hispanic Black [Black], non-Hispanic American Indian/Alaska Native [AI/AN], non-Hispanic Asian/other Pacific Islander [Asian/OPI], Non-Hispanic White [White], other/multiracial/unknown [other/unknown]), region (South, Midwest, Mountain, Pacific, Northeast [South Region includes AZ, NM, OK, AR, LA, MS, AL, TN, KY, GA, SC, NC, WV, MD, VA, FL, DE, & DC; Midwest Region includes ND, SD, NE, KS, MN, IA, MO, IL, WI, IN, MI, & OH; Mountain Region includes NV, UT, CO, WY, MT, & ID; Pacific Region includes WA, HI, AK, OR, & CA; Northeast Region includes PA, NY, VT, NH, ME, MA, RI, CT, & NJ]), urbanicity (large central metro, large fringe metro [large fringe metro counties are counties in Metropolitan Statistical Areas of ≥1 million population that do not qualify as large central; for more information regarding urbanicity classification, please see Gaffney et al.] [7], medium metro, small metro, micropolitan, noncore [9]), and CDC/ATSDR Social Vulnerability Index (SVI) of zip code of residence (low, medium, and high). Factors affecting SVI scores include socioeconomic status, household composition, disability, minority status, housing type, and transportation. A lower SVI score means the zip code of residence is less socially vulnerable [10-11]. All the input variables were derived from vaccine records. Gender identity was not available. Descriptive analyses were performed for input variables, and feature importance of input variables and prediction rate of each end node were reported. This study was reviewed by the CDC and conducted in accordance with applicable federal law and CDC policy.

RESULTS

As shown in Figure 1, the classification tree model had a depth of 5 branches, with 17 end nodes and 32 nodes in total. Detailed information (eg, sample sizes, prediction rates, etc.) about the 17 end nodes is presented in Supplementary Table 1. The model generated a feature importance score for each input variable; a higher score meant that the specific feature had a larger effect on the model that was being used to predict the outcome variable [12]. In sum, age group had the highest feature importance score (0.739), followed by region (0.168), primary series vaccine product (0.071), race/ethnicity (0.010), SVI ranking (0.009), urbanicity (0.004), and sex (0.000). Overall, the model correctly predicted the booster status of 61.5% of individuals. In general, adults aged ≤34 years, J&J primary series recipients, persons belonging to racial/ethnic minority groups, residents of nonlarge metro areas, and those living in socially vulnerable areas were less likely to be boosted.
Figure 1.

Classification tree diagram depicting demographic characteristics associated with COVID-19 booster vaccination among adults completing the primary series before September 15, 2021, by social demographic factors, March 15, 2022, United States. Detailed information (eg, sample sizes, prediction rates, etc.) about the 17 end nodes is listed in Supplementary Table 1. Abbreviations: AI/AN, American Indian/Alaska Native; COVID-19, coronavirus disease 2019; J&J, Johnson & Johnson/Janssen; OPI, other Pacific Islander; SVI, Social Vulnerability Index.

Classification tree diagram depicting demographic characteristics associated with COVID-19 booster vaccination among adults completing the primary series before September 15, 2021, by social demographic factors, March 15, 2022, United States. Detailed information (eg, sample sizes, prediction rates, etc.) about the 17 end nodes is listed in Supplementary Table 1. Abbreviations: AI/AN, American Indian/Alaska Native; COVID-19, coronavirus disease 2019; J&J, Johnson & Johnson/Janssen; OPI, other Pacific Islander; SVI, Social Vulnerability Index. The first partition or split in the classification tree was between adults age ≤54 and ≥55 years. Then, the model split South apart from all other regions (Midwest, Mountain, Northeast, and Pacific), and different branches were developed for residents of the South and non-South regions. Among persons aged 35–54 years in non-South regions, those who received a Pfizer or Moderna primary vaccine series and were non-Hispanic White were more likely to be boosted. Among persons aged ≥55 years in non-South regions, those who received a primary series of Moderna or Pfizer vaccines were more likely to be boosted. Among Southerners age 35–54 years, those who resided in low-SVI areas (ie, less socially vulnerable) and large fringe metro areas were more likely to be boosted. Among Southerners age ≥55 years, those who received a Moderna or Pfizer primary vaccine series were more likely to be boosted. Among Southerners age ≥65 years who received a J&J primary vaccine, those who resided in large fringe metro areas were more likely to be boosted. Table 1 presents results from descriptive analyses of COVID-19 vaccine booster dose status by social demographic factors. Lower booster coverage was observed among J&J primary series recipients, younger age groups (eg, 18–34 years), residents of areas that are more socially vulnerable, people from racial and ethnic minority groups, and residents of the South.
Table 1.

COVID-19 Vaccine Booster Dose Status for Adults Completing the Primary Series Before September 15, 2021, by Social Demographic Factors, March 15, 2022, United States

VariableBooster Dose StatusTotal, No.
Not Boosted, No.%Boosted, No.%
Total71 056 07146.7181 060 16953.29152 116 240
Primary series completion dose vaccine product
 Pfizer-BioNTech37 312 81346.8842 272 76953.1279 585 582
 Moderna26 062 91543.4833 886 21456.5259 949 129
 Johnson & Johnson7 680 34361.044 901 18638.9612 581 529
Age group
 18–24 y8 953 40764.284 975 32335.7213 928 730
 25–34 y13 177 37460.438 628 06339.5721 805 437
 35–44 y12 453 11953.8210 685 45846.1823 138 577
 45–54 y11 763 99748.6112 438 29651.3924 202 293
 55–64 y11 470 66140.6916 717 19859.3128 187 859
 ≥65 y13 237 51332.4027 615 83167.6040 853 344
Sex
 Male34 863 03748.9136 418 96251.0971 281 999
 Female36 193 03444.7744 641 20755.2380 834 241
Urbanicity
 Large fringe metro22 259 02346.0626 065 68653.9448 324 709
 Large central metro18 549 12245.3822 324 58454.6240 873 706
 Medium metro15 313 54947.8516 688 09152.1532 001 640
 Small metro6 127 18547.806 691 77152.2012 818 956
 Micropolitan5 413 23749.115 608 53050.8911 021 767
 Noncore3 393 95547.973 681 50752.037 075 462
Social Vulnerability Index
 High22 625 71950.0522 583 83149.9545 209 550
 Medium28 684 15946.9632 396 90053.0461 081 059
 Low19 746 19343.0926 079 43856.9145 825 631
Race/ethnicity
 Hispanic10 212 28458.707 186 26941.3017 398 553
 Non-Hispanic Black5 880 66052.485 325 51347.5211 206 173
 Non-Hispanic American Indian/Alaska Native571 05757.20427 32942.80998 386
 Non-Hispanic Asian/OPI3 214 04638.125 218 31861.888 432 364
 Non-Hispanic White30 949 70642.8341 305 02957.1772 254 735
 Other/Unknown20 228 31848.3621 597 71151.6441 826 029
Region
 South26 809 57253.5223 284 38846.4850 093 960
 Midwest13 871 56141.7019 396 82958.3033 268 390
 Mountain3 481 79545.874 108 30154.137 590 096
 Pacific12 059 87241.0717 307 60858.9329 367 480
 Northeast14 833 27146.6516 963 04353.3531 796 314

Abbreviations: COVID-19, coronavirus disease 2019; OPI, other Pacific Islander.

COVID-19 Vaccine Booster Dose Status for Adults Completing the Primary Series Before September 15, 2021, by Social Demographic Factors, March 15, 2022, United States Abbreviations: COVID-19, coronavirus disease 2019; OPI, other Pacific Islander.

DISCUSSION

This study used 233 million COVID-19 vaccination records to construct a classification tree model that assessed demographic characteristics associated with receipt or nonreceipt of COVID-19 booster vaccination among US adult populations. The classification tree model provides a framework to consider the impact of each input variable on vaccination outcomes within specific subpopulations; it would be prohibitively time-consuming to investigate outcomes at this granularity using other analytical approaches. Age group was the most important characteristic, with a feature importance score of 0.739, and persons age 18–34 years in all regions were less likely to have received a booster vaccination. Previous studies have identified attitudes and beliefs corresponding to low intent to receive primary series vaccination and low primary series coverage among young adults age 18–39 years [13]. The South had lower booster coverage than the other 4 regions and was split by the model from all other regions to form its own branches. SVI and urbanicity were important predictors of booster status in the South. Southerners residing in less socially vulnerable areas or large fringe metro areas were more likely to have received a booster dose. Residents within these areas report higher household income, which has been linked with higher COVID-19 vaccine uptake [9, 14]. In addition, marginalized populations within rural or socially vulnerable areas may have limited transportation options, less paid time off, and reduced ability to access vaccination providers [15-16]. Our finding that SVI is an important predictor of booster dose status among Southerners age 35–54 years is consistent with the observation of greater income-associated health disparities in the South than in other regions [17]. Among non-Southerners, age, primary vaccine type, race/ethnicity, and urbanicity determined the outcome. For persons age 35–54 years who received a primary series of Moderna or Pfizer, the tree model identified non-Hispanic White persons as more likely to be boosted; however, this pattern of race and ethnicity was not found among persons in other age groups or in residents of the South. Regardless of age or region, recipients of a J&J primary series were less likely to have received a booster dose. Given lower vaccine effectiveness of a J&J primary series compared with an mRNA vaccine primary series, this population would particularly benefit from the increased effectiveness conferred by a booster dose [18]. More information is needed to understand factors contributing to low booster uptake among J&J recipients. Some J&J recipients may have chosen the 1-dose primary series because they were less likely to complete a 2-dose mRNA vaccination series, whether due to vaccination-related anxiety (eg, needle aversion), to concerns about mRNA vaccines due to health conditions or personal beliefs, or to barriers to accessing health care or vaccine providers (eg, transportation, limited time off, reduced availability of specific vaccines in certain geographic areas) [19-21]. These findings are subject to at least 3 limitations. First, Texas data were not included in this analysis, and given Texas’ large population size, lack of data from Texas could have impacted these findings. Second, the booster status of a small portion of individuals may have been misclassified if the booster dose record was not able to be linked to the primary series completion record, such as if vaccinations were received in different jurisdictions. Third, the current tree model yields a 61.5% prediction rate, which may limit the application of these findings. A single classification tree model is often reported to have relatively low prediction accuracy; we found during the process of model selection that replacing a single tree with a random forest of trees or growing the tree model to a depth of >5 branches could improve prediction rates but would dramatically reduce interpretability [12]. The classification tree diagram is a novel approach to analyzing public health vaccination data. One advantage of the classification tree approach is its use of a splitting metric to identify partitions in input variable responses, which describes variability across a population in a way that is easy to understand. By structuring certain demographic characteristics into paths, the classification tree was able to describe the relationships (or lack thereof) between the many input variables used in the model. The paths described possible intersections between demographic characteristics that may have contributed to low access and acceptance of vaccinations and identified specific subpopulations that would be likely to have a higher burden of health disparities. Despite the challenge of seeking to increase the prediction rate, the paths in the tree diagram can inform clinical and public health interventions and outreach toward specific subpopulations. The use of the classification tree model to identify subpopulations that would be less likely to receive a booster vaccine can inform public health efforts and other strategies on a broader scale, such as efforts that involve other vaccinations. The model presented here indicates that low booster vaccination coverage was seen among young adults, J&J primary series recipients, people from racial and ethnic minority groups, residents of nonlarge metro areas, and those living in socially vulnerable communities in the South. Click here for additional data file.
  15 in total

1.  Recommended solutions to the barriers to immunization in children and adults.

Authors:  Edwin L Anderson
Journal:  Mo Med       Date:  2014 Jul-Aug

2.  2013 NCHS Urban-Rural Classification Scheme for Counties.

Authors:  Deborah D Ingram; Sheila J Franco
Journal:  Vital Health Stat 2       Date:  2014-04

3.  COVID-19 booster vaccine attitudes and behaviors among university students and staff in the United States: The USC Trojan pandemic research Initiative.

Authors:  Ryan C Lee; Howard Hu; Eric S Kawaguchi; Andre E Kim; Daniel W Soto; Kush Shanker; Jeffrey D Klausner; Sarah Van Orman; Jennifer B Unger
Journal:  Prev Med Rep       Date:  2022-06-27

4.  Factors Associated with COVID-19 Vaccine Booster Hesitancy: A Retrospective Cohort Study, Fukushima Vaccination Community Survey.

Authors:  Makoto Yoshida; Yurie Kobashi; Takeshi Kawamura; Yuzo Shimazu; Yoshitaka Nishikawa; Fumiya Omata; Tianchen Zhao; Chika Yamamoto; Yudai Kaneko; Aya Nakayama; Morihito Takita; Naomi Ito; Moe Kawashima; Sota Sugiura; Kenji Shibuya; Shingo Iwami; Kwangsu Kim; Shoya Iwanami; Tatsuhiko Kodama; Masaharu Tsubokura
Journal:  Vaccines (Basel)       Date:  2022-03-26

5.  Association Between 3 Doses of mRNA COVID-19 Vaccine and Symptomatic Infection Caused by the SARS-CoV-2 Omicron and Delta Variants.

Authors:  Emma K Accorsi; Amadea Britton; Katherine E Fleming-Dutra; Zachary R Smith; Nong Shang; Gordana Derado; Joseph Miller; Stephanie J Schrag; Jennifer R Verani
Journal:  JAMA       Date:  2022-02-15       Impact factor: 157.335

6.  Effectiveness of Homologous and Heterologous COVID-19 Booster Doses Following 1 Ad.26.COV2.S (Janssen [Johnson & Johnson]) Vaccine Dose Against COVID-19-Associated Emergency Department and Urgent Care Encounters and Hospitalizations Among Adults - VISION Network, 10 States, December 2021-March 2022.

Authors:  Karthik Natarajan; Namrata Prasad; Kristin Dascomb; Stephanie A Irving; Duck-Hye Yang; Manjusha Gaglani; Nicola P Klein; Malini B DeSilva; Toan C Ong; Shaun J Grannis; Edward Stenehjem; Ruth Link-Gelles; Elizabeth A Rowley; Allison L Naleway; Jungmi Han; Chandni Raiyani; Gabriela Vazquez Benitez; Suchitra Rao; Ned Lewis; William F Fadel; Nancy Grisel; Eric P Griggs; Margaret M Dunne; Melissa S Stockwell; Mufaddal Mamawala; Charlene McEvoy; Michelle A Barron; Kristin Goddard; Nimish R Valvi; Julie Arndorfer; Palak Patel; Patrick K Mitchell; Michael Smith; Anupam B Kharbanda; Bruce Fireman; Peter J Embi; Monica Dickerson; Jonathan M Davis; Ousseny Zerbo; Alexandra F Dalton; Mehiret H Wondimu; Eduardo Azziz-Baumgartner; Catherine H Bozio; Sue Reynolds; Jill Ferdinands; Jeremiah Williams; Stephanie J Schrag; Jennifer R Verani; Sarah Ball; Mark G Thompson; Brian E Dixon
Journal:  MMWR Morb Mortal Wkly Rep       Date:  2022-04-01       Impact factor: 17.586

7.  Factors associated with latino sexual minority men's likelihood and motivation for obtaining a COVID-19 vaccine: a mixed-methods study.

Authors:  Elliott R Weinstein; Raymond Balise; Nicholas Metheny; Maria Jose Baeza Robba; Daniel Mayo; Cassandra Michel; Bill Chan; Steven A Safren; Audrey Harkness
Journal:  J Behav Med       Date:  2022-04-27

8.  COVID-19 Vaccination Coverage and Intent Among Adults Aged 18-39 Years - United States, March-May 2021.

Authors:  Brittney N Baack; Neetu Abad; David Yankey; Katherine E Kahn; Hilda Razzaghi; Kathryn Brookmeyer; Jessica Kolis; Elisabeth Wilhelm; Kimberly H Nguyen; James A Singleton
Journal:  MMWR Morb Mortal Wkly Rep       Date:  2021-06-25       Impact factor: 17.586

9.  Emerging Socioeconomic Disparities in COVID-19 Vaccine Second-Dose Completion Rates in the United States.

Authors:  Autumn Gertz; Benjamin Rader; Kara Sewalk; John S Brownstein
Journal:  Vaccines (Basel)       Date:  2022-01-14

10.  Anxiety-Related Adverse Event Clusters After Janssen COVID-19 Vaccination - Five U.S. Mass Vaccination Sites, April 2021.

Authors:  Anne M Hause; Julianne Gee; Tara Johnson; Amelia Jazwa; Paige Marquez; Elaine Miller; John Su; Tom T Shimabukuro; David K Shay
Journal:  MMWR Morb Mortal Wkly Rep       Date:  2021-05-07       Impact factor: 35.301

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.