| Literature DB >> 34151987 |
Yi Guo1,2, Yahan Zhang3, Tianchen Lyu1,2, Mattia Prosperi4, Fei Wang5, Hua Xu6, Jiang Bian1,2.
Abstract
OBJECTIVE: To summarize how artificial intelligence (AI) is being applied in COVID-19 research and determine whether these AI applications integrated heterogenous data from different sources for modeling.Entities:
Keywords: coronavirus; deep learning; machine learning; natural language processing; neural networks
Mesh:
Year: 2021 PMID: 34151987 PMCID: PMC8344463 DOI: 10.1093/jamia/ocab098
Source DB: PubMed Journal: J Am Med Inform Assoc ISSN: 1067-5027 Impact factor: 7.942
Figure 1.Search and review procedure.
Figure 2.Number and percentage of studies with data integration in each research area.
Studies on COVID-19 forecasting that integrated heterogeneous data
| Study | Region | Outcome | Data source | Model | Heterogeneous data | Missing data imputation |
|---|---|---|---|---|---|---|
|
| ||||||
| Brooks et al | Worldwide | COVID-19 mortality rate | World Bank, Worldometer, Index Mundi, Wikipedia, Our World in Data, JHU, BCG Atlas, WHO, Oxford, GHS Index | k-means, linear regression | Socioeconomic, health system readiness, environmental, existing disease burden, demographics, vaccination programs, and response to the pandemic | Imputed with mean values |
| Cao et al | China | COVID-19 incidence and growth rate | Chinese NHC, Baidu Qianxi, China Health & Family Planning Statistical Yearbook, China City Statistical Yearbook, CMA, CNIC | XGBoost | Travel-related, medical, socioeconomic, environmental, and influenza-like illness factors | No |
| Cazzolla-Gatti et al | Italy | SARS-CoV-2 mortality and infectivity | Italian Civil Protection, ARPA, I.Stat, EpiCentro, Italian MoH, ENAC, ACI.it | RF | Environmental, health, socioeconomic factors | No |
| Chakraborti et al | Worldwide | COVID-19 incidence and deaths | ECDC, World Bank, Google | RF, GB | Natural (climatic, environmental) and human (socioeconomic, demographic) factors | No |
| Gujral et al | USA | COVID-19 incidence | JHU, US EPA, | EDEM | Air pollution, meteorological data, county-level demographics | No |
| Haghshenas et al | Italy | COVID-19 incidence | Unspecified | ANN (PSO, DE) | Historical data, climate and urban factors | No |
| Kasilingam et al | Worldwide | COVID-19 incidence | WHO, World Bank, Weather Underground | LR, DT, RF, SVM | Infrastructure, environment, policies, and infection-related factors | No |
| Khan et al | China | COVID-19 incidence | Chinese NHC, IDIS, NBS, NCEP/NCAR | K-means, SIR | Temperature, population density, and demographic information | No |
| Kuo et al | USA | COVID-19 incidence | NYT, USDA ERA, gridMET, Google, Federal Reserve Bank of Dallas | EN, PCR, PLSR, k-NN, RT, RF, GB, 2-layer ANN | County-level demographic, environmental, and mobility data | Imputed with median values |
| Li et al | Worldwide | COVID-19 incidence and deaths | JHU, NOAA, KG system, CIA, Wikipedia, ESPN, CIES, Hupu, BBC, UN, WEO, World Bank, WHO, Knoema, FAO, OICA, | LASSO | Factors on politics, economy, culture, demographics, geography, education, medical resources, scientific development, environment, diseases, diet, and nutrition | No |
| Mollalo et al | USA | COVID-19 incidence | USAFacts (CDC, JHU CSSE), US Census, GHDx | ANN (MLP) | Historical data, sociodemographic and environmental factors, disease mortality | No |
| Nikolopoulos et al | USA, India, UK, Germany, Singapore | COVID-19 incidence and growth rate | WHO, JHU, Beihan University, Mayer Brown, WPR, WHR, World Bank, OECD, Google | 52 statistical, epidemiological, machine- and deep-learning models | Climate, travel restrictions and curfews, population density, disease rates (lung, heart, diabetes), GDP spent on healthcare, air pollution, import data, Google trends | No |
| Pourghasemi et al | Iran | COVID-19 incidence and deaths | Iranian MOHME, Open Street Map, WorldClim | RF | Historical data, anthropogenic and climatic factors | No |
| Torrats-Espinosa | USA | COVID-19 incidence and death rate | Unspecified | Double-Lasso Regression | County-level demographics, density and potential for public interaction, social capital, health risk factors, capacity of the healthcare system, air pollution, employment in essential businesses, and political views | No |
| Zawbaa et al | Italy, USA, China, Japan, Iran, Egypt, Alegria, Kenya, Cote d’Ivoire | COVID-19 incidence and death rate | JHU, ECDC | ANN (MLP) | Average age, average weather temperature, BCG vaccination, malaria treatment | No |
|
| ||||||
| Cobb et al | USA | COVID-19 incidence | US local health departments, US Census | RF | SIP orders, county metrics | No |
| Galvan et al | Brazil | COVID-19 incidence and deaths | Brazil MoH, IBGE, SUS, BCB, ADHB | ANN (SOM) | Socioeconomic, health, and safety data | No |
| Hasan et al | Bangladesh | COVID-19 incidence | WHO, IEDCR, survey | LSTM, ANFIS, ANN (MLP) | Governing authorities, compliance, probability of infection and test positivity | No |
| Liu et al | China | COVID-19 incidence | China CDC, Baidu Search data, Media Cloud, GLEAM | Complete linkage hierarchical clustering, LASSO | Official health reports, COVID-19-related internet search activity, news media activity, daily forecasts of COVID-19 activity | No |
| Mehta et al | USA | COVID-19 incidence | NYT, CDC, GHDx | XGBoost | County-level population statistics, county-level disease rate and mortality | No |
| Pandit et al | Worldwide | COVID-19 mortality rate | WHO, GSAID | LogitBoost, AdaboostM1 | Age, SARS-CoV-2 clade information | No |
| Roy et al | USA | COVID-19 incidence and deaths | WPR, Wikipedia, KFF, AHRQ, Hud Exchange, Kaggle, Worldometer, Census Bureau, CDC, NYCOpenData | SVM, SGD, NC, DTs, Gaussian NB | Social, economic, environmental, demographic, ethnic, cultural and health factors | No |
| Sun et al | USA | COVID-19 incidence | Local DOH, CMS, LTCF, NICSHC | GB | Nursing home facility and community characteristics | Imputed using k-NN |
| Ye et al | USA | COVID-19 risk indices | WHO, CDC, Local DOH, Census Bureau, Google Maps, Reddit | cGAN, LSTM | Disease related data, demographic, mobility and social media data | No |
|
| ||||||
| Aydin et al | Worldwide | Performances against COVID-19 | Self-curated, Kaggle | k-means, hierarchic clustering | GDP, Poverty index, population, stringency index, smoking rate, CVD death rate, diabetes prevalence | Imputed with mean values |
| Bird et al | Worldwide | COVID-19 risk | Worldometers, CIA, WHO | K% binning discretization, SVM, DT, GB, NB, LDA, QDA | Population, medical doctor density, tobacco use, obesity rate, GDP, land, migration, infant mortality, birth rate, death rate | No |
| Carrillo-Larco et al | Worldwide | COVID-19 incidence | JHU, GBD, UW, GHO, WHO | k-means | Historical data, diseases, environmental factors, sociodemographics, health system factors | No |
| Lai et al | USA | COVID-19 incidence | NYT, CDC, Census Bureau, USALEEP, | k-means | population census data, GIS data, business pattern censuses, and other sources | No |
Data that are heterogeneous in syntax, schema, and semantics.
Available at https://doi.org/10.7910/DVN/JHFOSE.
ADHB: Human Development Atlas of Brazil; AHRQ: Agency for Healthcare Research and Quality; ANFIS: adaptive neuro fuzzy inference system; ANN: artificial neural network; ARIMA: autoregressive integrated moving average; ARPA: Regional Environmental Protection Agency; BBC: British Broadcasting Corporation; BCB: Central Bank of Brazil; BCG: Bacillus Calmette–Guérin; BGFS-PNN: Broyden-Fletcher-Goldfarb-Shanno Optimized Polynomial Neural Network; CDC: Centers for Disease Control and Prevention; cGAN: conditional generative adversarial net; CIA: Central Intelligence Agency; CIES: Centre International d'Etude du Sport (International Centre for Sports Studies); CMA: China Meteorological Administration; CMS: Centers of Medicare and Medicaid Services; CNIC: Chinese National Influenza Center; CPC-NN: Multivariate clustering based partial curve nearest neighbor; CRC: Coronavirus Resource Center; CSSE: Center for Systems Science and Engineering; CVD: cardiovascular disease; DCP: Department of Civil Protection; DE: differential evolution algorithm; DNN: deep neural network; DOH: Departments of Health; QDA:quadratic discriminant analysis; DT: decision tree; ECDC: European Centre for Disease Prevention and Control; EDEM: Ensemble-based Dynamic Emission Model; EN: Elastic net; ENAC: Ente Nazionale per l'Aviazione Civile (Italian Civil Aviation Authority); EPA: Environmental Protection Agency; ESPN: Entertainment and Sports Programming Network; FAO: Food and Agriculture Organisation of the United Nations; GB: gradient boosting; GBD: global burden of disease; GDP: gross domestic product; GHDx: Global Health Data Exchange; GHO: Global Health Observatory; GHS: Global Health Security; GIS: geographical information systems; GLEAM: global epidemic and mobility model; GSAID: global initiative on sharing all influenza data; IBGE: Brazilian Institute of Geography and Statistics; IDIS: Infectious Disease Information System of China; IEDCR: Institute of Epidemiology, Disease Control and Research; JHU: Johns Hopkins University; KFF: Kaiser Family Foundation; KG: Köppen–Geiger climate classification; k-NN: k-nearest neighbors; LDA: linear discriminant analysis; LR: logistic regression; LSTM: long short-term memory; LTCF: long-term care focus; MLP: multilayer perceptron; MoH: Ministry of Health; MOHME: Ministry of Health and Medical Education; NB: Naïve Bayes; NBS: National Bureau of Statistics of China; NC: nearest centroid; NCAR: National Center for Atmospheric Research; NCEP: National Centers for Environmental Prediction; NHC: National Health Commissions; NICSHC: National Investment Center for Seniors Housing and Care; NLP: natural language processing; NOAA: National Oceanic and Atmospheric Administration; NYT: New York Times; OECD: Organisation for Economic Co-operation and Development; OICA: Organisation Internationale des Constructeurs d'Automobiles (International Organization of Motor Vehicle Manufacturers); PC-NN: partial curve nearest neighbor; PCR: principal components regression; PLSR: partial least squares regression; PSO: particle swarm optimization algorithm; RF: random forest; RT: regression tree; SEIR: susceptible-exposed-infected-recovered model; SGD: stochastic gradient descent; SIP: shelter-in-place; SIR: susceptible-infected-recovered model; SOM: self-organizing maps; SUS: Sistema Único de Saúde (Brazil's publicly funded healthcare system); SVM: support vector machine; UN: United Nations; USALEEP: Small-Area Life Expectancy Estimates Project; USDA ERA: United States Department of Agriculture, Economic Research Service; UW: Washington University; WEO: World Economic Outlook database; WHO: World Health Organization; WHR: World Health Rankings; WPR: world population review.
Studies on medical imaging-based COVID-19 detection or prognosis using heterogeneous data
| Study | Region | Outcome | Data source | Model | Heterogeneous data | Missing data imputation |
|---|---|---|---|---|---|---|
| Cai et al | China | RT-PCR negativity | Single hospital | Unspecified DL, LR | CT image data, clinical data | Replaced by median |
| Cai et al | China | Need and duration of ICU, duration of oxygen inhalation, duration of hospitalization, duration of sputum NAT-positive, clinical prognosis | Single hospital | 3DQI platform, U-Net, RF | CT image data, clinical data | No |
| Chao et al | USA, Iran, Italy | ICU admission | 3 hospitals | DNN, RF | CT image data, demographics, vitals, lab data | Imputed by mean values |
| Chassagnon et al | France | COVID-19 staging and prognosis (mechanical ventilation) | 8 hospitals | CNN, DT, Linear SVM, XGBoosting, AdaBoost, Lasso | CT image data, clinical and biological markers | No |
| Cheng et al | China | Severe vs. nonsevere COVID-19 | Single hospital | CNN (uAI Discover-2019nCoV) | CT image data, clinical data | No |
| D'Ambrosia et al | USA | RT-PCR confirmed SARS-CoV-2 infection | Single hospital | BN, SC, DML, LR | Symptoms, local SARS-CoV-2 prevalence, CXR imaging, molecular diagnostic performance | No |
| Ebrahimian et al | USA, South Korea | Death vs. recovery, need for mechanical ventilation | Tertiary care hospitals | CNN (U-Net), LR | CXR image data, Demographics, Lab data | No |
| Fu et al | China | Stable vs progressive COVID-19 | Unspecified hospitals | SVM | CT image data, clinical and lab data | No |
| Grodecki et al | USA, Italy | Clinical deterioration vs death | 3 hospitals | CNN (U-Net), LR | CT image data, clinical data | No |
| Guo et al | China | COVID-19 vs seasonal flu | 2 hospitals | RF | CT image data, symptoms, blood tests, RT-PCR results | No |
| Hahm et al | South Korea | Worsening oxygenation event | Single hospital | DL software (MEDIP) | CT severity score, Demographics, Comorbidity, Lab data | No |
| Hermans et al | The Netherlands | COVID-19 positivity by RT-PCR | 2 hospitals | LR | CT image data, demographics, symptoms, vitals, lab | No |
| Ho et al | South Korea | Severe vs nonsevere COVID-19 | 5 hospitals | ANN, CNN, ACNN | CT image data, demographic, clinical, and lab data | No |
| Jeong et al | South Korea | Severe vs nonsevere COVID-19 | Single hospital | AI software (syngo.via Frontier) | CT severity score, demographics, symptoms, comorbidity, lab | No |
| Kimura-Sandoval et al | Mexico | Need mechanical ventilation, death | Single hospital | AI software (Siemens healthcare) | CT variables, demographics, clinical, lab | No |
| Lang et al | USA | Acute neuroimaging findings | Single hospital | Unspecified ML, LR | CT severity score, demographics, clinical data | No |
| Lassau et al | French | Severe vs nonsevere COVID-19 | 2 hospitals | CNN (EfficientNet-B0, ResNet50, U-Net), LR | CT variables, AI-severity score (5 clinical, biological variables) | Imputed with the average |
| Li et al | China | Severe vs nonsevere COVID-19 | Single hospital | CNN (U-net), RF, GB, XGBoost, LR, SVM | CT outcomes, clinical biochemical indexes | Imputed with mean values |
| Liu et al | China | COVID-19 vs. non-COVID-19 pneumonia | Single hospital | CT image software (pyradiomics), LR, LASSO | CT outcomes, clinical data | No |
| Mei et al | USA | COVID-19 positivity by RT-PCR | 18 hospitals | CNN, SVM, RF, MLP | CT findings, clinical symptoms, exposure history, Lab | No |
| Meng et al | China | Death within 14 days | 4 hospitals | CNN, LR | CT image features, clinical information | No |
| Mushtaq et al | Italy | Death, ICU admission | Single hospital | CNN (AI system qXR), Cox PH | CXR severity, demographics, clinical data | No |
| Ning et al | China | Morbidity, mortality | 2 hospitals | CNN, DNN, Ridge LR | CT features, 130 types of clinical features | No |
| Quiroz et al | China | Severe vs nonsevere COVID-19 | 2 hospitals | CNN (U-Net), LR, XGBoost | CT features, demographics, clinical data | Imputed with mean values |
| Salvatore et al | Italy | COVID-19 severity (discharge, hospitalization, ICU, or death) | Single hospital | AI tool (Thoracic VCAR), LR | CT parameters, clinical and lab data | No |
| Varble et al | China, Japan | Asymptomatic vs pre-symptomatic patients with SARS-CoV-2 | 2 hospitals | CNN (AH-Net), LASSO LR | CT characteristics, clinical and lab data | No |
| Xia et al | China | COVID-19 vs. influenza A/B | 2 hospitals | DNN | CXR and CT features, 56 clinical features | No |
| Xu et al | China | Healthy or COVID-19 pneumonia or non-COVID pneumonia | Single hospital | CNN, SVM, KNN, RF | CT features, 23 clinical features, 10 lab testing features | No |
| Xue et al | China | 4-level COVID-19 severity | Multiple hospitals | DSA-MIL, MA-CLR | LUC features, age, medical history, symptoms | No |
Data that are heterogeneous in syntax, schema, and semantics.
3DQI: 3D quantitative imaging; ACNN: artificial convolutional neural network; AI: artificial intelligence; ANN: artificial neural networks; BN: Bayesian inference network; CNN: convolutional neural network; DL: deep learning; DML: distance metric-learning; DNN: deep neural network; DSA-MIL: dual-level supervised attention-based multiple; DT: decision tree; GB: gradient boosting; ICU: intensive care unit; LR: logistic regression; LUC: lung ultrasound; MA-CLR: modality alignment contrastive learning of representation instance learning; ML: machine learning; MLP: multilayer perceptron; NAT: nucleic acid testing; RF: random forest; SC: Information-theoretic Set Cover; SVM: support vector machine.
Studies on COVID-19 detection or prognosis using heterogeneous data
| Study | Region | Outcome | Data source | Model | Heterogeneous data | Missing data imputation |
|---|---|---|---|---|---|---|
|
| ||||||
| Ahamad et al | China | Confirmed vs. suspected COVID-19 cases | Multiple hospitals | DT, RF, XGBoost, GB, SVM | Structured EHR data (Demographics, symptoms), Structured EHR data (Isolation treatment status, Travel history) | Imputed gender with random values based on male/female ratio; impute age with random values within IQR |
| Langer et al | Italy | COVID-19 positivity by RT-PCR | Single hospital | ANN | Demographics, Comorbidity, Medications, Signs and Symptoms, Lab, Vitals, CXR | No |
| Martin et al | Worldwide | COVID-19 positivity | Literature (British Medical Journal) | AI system (Symptoma) | Keywords and symptoms, Age and sex, Symptom occurrence frequency rates, Country-specific disease incidences | No |
| Obinata et al | Japan | COVID-19 positivity by RT-PCR | 2 hospitals | RF | Demographics, Vitals, Lab, Symptoms, Contact history | No |
| Otoom et al | Worldwide | COVID-19 positivity | CORD-19 repository | SVM, ANN, NB, k-NN, decision table, decision stump, OneR, ZeroR | Symptoms, travel history to suspicious areas, contact history | No |
| Shimon et al | Israel | COVID-19 positivity | Multiple hospitals | CNN, SVM, RF | Voice samples (acoustic features), self-reported symptoms | No |
| Wintjens et al | The Netherlands | COVID-19 positivity by RT-PCR | Single hospital | ANN, RF, LR | Breath features (CO, NO2, VOC), clinical and demographic variables | No |
| Zoabi et al | Israel | COVID-19 positivity by RT-PCR | The Israeli Ministry of Health | GB | Demographics, clinical symptoms, known contact with an infected individual | No |
|
| ||||||
| Al-Najjar et al | South Korea | mortality | KCDC | ANN | Demographics, infection reason and date | No |
| An et al | South Korea | mortality | KNHIS | LASSO, SVM, RF, k-NN | Sociodemographic and medical information | No |
| Burian et al | Germany | ICU admission | 1 hospital | RF | Demographic, clinical, lab, and imaging data | Imputed with mean or mode |
| Cheng et al | USA | ICU transfer in 24 hours | 1 hospital | RF | Demographics, time-series of the admission–discharge–transfer events, clinical assessments, vital signs, lab and ECG results | Imputed with median value |
| Das et al | South Korea | mortality | KCDC | LR, SVM, k-NN, RF, GB | Demographic and exposure features | No |
| Ge et al | China | Ventilator parameters | 1 hospital | Unspecified | Demographics, clinical data, Ventilator parameters | No |
| Haimovich et al | USA | early respiratory decompensation | 8 EDs | RF, LASSO, GB, XGBoost | Demographics, medical histories, vitals, outpatient medications, chest radiograph reports, Lab | No |
| Hu et al | China | mortality | 1 hospital | LR, PLS regression, EN, RF, bagged FDA | Demographics, CT features, lab | Imputed using bagging trees |
| Iwendi et al | Worldwide | Severity, recovery, death | Kaggle (WHO, JHU) | RF | Demographics, symptoms, travel data | No |
| Josephus et al | Worldwide | mortality | Kaggle (WHO, JHU) | LR | Demographics, symptoms, travel data | Imputed (unspecified) |
| Li et al | Worldwide | mortality | Github and Wolfram dataset | LR, RF, SVM | Demographics, location, symptoms, travel history, market exposure, chronic disease | No |
| Liang et al | China | ICU admission, requiring mechanical ventilation, death, etc | Chinese NHC | CPH, ANN | Demographic, clinical, lab, and imaging data | Imputed with multivariate imputation by chained equation |
| Ma et al | China | mortality | 1 hospital | RF, XGboost | Symptoms, comorbidity, demographic, vitals, CT scans results, lab | No |
| Metsker et al | Russia | mortality | Russian government, Single hospital | ANN | Demographics, comorbidity, lab, treatment, travel history | No |
| Mountantonakis et al | USA | AF and mortality | 13 hospitals | NLP | Demographics, medical history, lab, NLP extracted atrial fibrillation | No |
| Nakamichi et al | USA | Hospitalization and mortality | Multiple hospitals | AdaBoost, ET, GB, RF | Demographics, comorbidity, SARS-CoV-2 sequence clades | Multiple imputation by chained equations |
| Neuraz et al | France | in-hospital mortality | 39 hospitals | NLP, Cox | Demographics, comorbidity, NLP extracted use of calcium channel blockers | No |
| Patel et al | USA | Severity | 3 hospitals | RF, ANN (MLP), SVM, GB, ET classifier, AdaBoost | Demographics, international travel, contact history, comorbidity, symptoms, blood panel profile | No |
| Planchuelo-Gómez et al | Spain | headache | 1 hospital | GLM, PCA | Intensity and self-reported disability caused by headache, quality and topography of headache, migraine features, COVID-19 symptoms, lab. | No |
| Schwartz et al | Canada | mortality | iPHIS, CORES, The COD, CCMtool, CCM | NLP, LR | Demographics, comorbidities, symptoms, NLP extracted long-term care home exposure | Imputed by weekly median value |
| Wu et al | China, Italy, Belgium | ICU admission, death, etc | Multiple hospitals | RF, LR | Demographic, clinical, lab, and imaging data | No |
Data that are heterogeneous in syntax, schema, and semantics.
AF: atrial fibrillation; ANN: artificial neural networks; CCM: Public Health Case and Contact Management Solution; CCMtool: Middlesex-London COVID-19 Case and Contact Management tool; CO: carbon monoxide; COD: the Ottawa Public Health COVID-19 Ottawa Database; CORD-19: COVID-19 Open Research Dataset; CORES: Toronto Public Health Coronavirus Rapid Entry System; CPH: Cox proportional hazard; CT: computed tomography; CXR: chest x-ray; DT: decision tree; ECG: electrocardiogram; ED: emergency department; EHR: electronic health record; EN: elastic net; ET: extra trees; FDA: flexible discriminant analysis; GB: gradient boosting; GLM: generalized linear model; ICU: intensive care unit; iPHIS: integrated Public Health Information System; IQR: interquartile range; JHU: John Hopkins University; KCDC: Korea Centers for Disease Control and Prevention; KNHIS: Korean National Health Insurance Service; k-NN: k-nearest neighbors; LR: linear regression; MLP: multilayer perceptron; NB: Naïve Bayes; NHC: National Health Commission; NLP: natural language processing; NO2: nitrogen dioxide; PCA: principal component analysis; PLS: partial least squares; RBF: radial basis function; RF: random forest; SHAP: Shapley additive explanation; SVM: support vector machine; VOC: volatile organic compound; WHO: World Health Organization.
Other COVID-19 studies using heterogeneous data
| Study | Region | Outcome | Data source | Model |
| Missing data imputation |
|---|---|---|---|---|---|---|
|
| ||||||
| Reese et al | N/A | Knowledge Graphs for COVID-19 Response | 13 knowledge sources | Traditional or graph-based ML | Scientific literature, COVID-19 cases and mortality, Drug, Genome sequence, Diseases, Chemicals | N/A |
|
| ||||||
| Franchini et al | Italy | Individualized COVID-19 risk | Survey, medical records | RF, SVM, GBM | Demographic, Heath status, Other health and social information | No |
|
| ||||||
| Abdalla et al | USA | Social distancing | NYT, Census Bureau, USDA ERS, CDC, Google Community Mobility Reports | Elastic net | 43 socio-demographic variables | No |
Data that are heterogeneous in syntax, schema, and semantics.
CDC: Centers for Disease Control and Prevention; GBM: gradient boosting machine; ML: machine learning; NYT: New York Times; RF: random forest; SVM: support vector machine; USDA ERA: US Department of Agriculture Economic Research Service.