Literature DB >> 32348380

Trends and Prediction in Daily New Cases and Deaths of COVID-19 in the United States: An Internet Search-Interest Based Model.

Xiaoling Yuan1,2,3, Jie Xu1,3, Sabiha Hussain4, He Wang5, Nan Gao2,6, Lanjing Zhang2,6,7,8.   

Abstract

BACKGROUND AND OBJECTIVES: The daily incidence and deaths of coronavirus disease 2019 (COVID-19) in the USA are poorly understood. Internet search interest was found to be correlated with COVID-19 daily incidence in China, but has not yet been applied to the USA. Therefore, we examined the association of internet search-interest with COVID-19 daily incidence and deaths in the USA.
METHODS: We extracted COVID-19 daily new cases and deaths in the USA from two population-based datasets, namely 1-point-3-acres.com and the Johns Hopkins COVID-19 data repository. The internet search-interest of COVID-19-related terms was obtained using Google Trends. The Pearson correlation test and general linear model were used to examine correlations and predict trends, respectively.
RESULTS: There were 636,282 new cases and,325 deaths of COVID-19 in the USA from March 1 to April 15, 2020, with a crude mortality of 4.45%. The daily new cases peaked at 35,098 cases on April 10, 2020 and the daily deaths peaked at 2,494 on April 15, 2020. The search interest of COVID, "COVID pneumonia" and "COVID heart" were correlated with COVID-19 daily incidence, with 12 or 14 days of delay (Pearson's r = 0.978, 0.978 and 0.979, respectively) and deaths with 19 days of delay (Pearson's r = 0.963, 0.958 and 0.970, respectively). The 7-day follow-up with prospectively collected data showed no significant correlations of the observed data with the predicted daily new cases or daily deaths, using search interest of COVID, COVID heart, and COVID pneumonia.
CONCLUSIONS: Search terms related to COVID-19 are highly correlated with the COVID-19 daily new cases and deaths in the USA.

Entities:  

Keywords:  COVID-19; Incidence; Model; Pandemic; Search interest; Trend; USA

Year:  2020        PMID: 32348380      PMCID: PMC7176069          DOI: 10.14218/ERHM.2020.00023

Source DB:  PubMed          Journal:  Explor Res Hypothesis Med        ISSN: 2472-0712


Introduction

Coronavirus disease 2019 (COVID-19) has been pandemic in the world.1–4 It has now affected more than 560,000 Americans.3,5 Several attempts were successfully made to model COVID-19 daily incidence in China.1,6 However, the trends of daily incidence and deaths of COVID-19 in the USA are still poorly understood. Recently, internet search-interest was found to be correlated with daily incidence of COVID-19 in China, with the lag time of 8 to 10 days.7 Google search-interest was also used to track or model COVID-19 trends in Europe, Iran, and Taiwan.8–10 Indeed, internet search-interest has been used for modelling and detecting influenza epidemics in the USA and Australia.11,12 We, therefore, aimed to examine the association of search-interest with daily incidence/new cases and deaths of COVID-19 in the USA, using population-based data and a semiparametric model.

Methods

The data of daily new cases and new deaths of COVID-19 in the USA were extracted from the 1-point-3-acres.com5 and the Johns Hopkins COVID-19 data repository3 on April 9, 2020, respectively, for modelling. We later obtained additional data from these sites to evaluate our models’ accuracies using Pearson’s correlation coefficients. We used a semiparametric model, including prediction of the daily new-case or new-death value based on a given Google Trends search-interest using Pearson’s correlation (the parametric component), as well as assigning such a predicted value to the corresponding date of the given Google Trends search interest. Owing to no finite dimensionality of Google Trends search-interest versus time, the second component thus is non-parametric. Data from the World Health Organization (WHO) Situation Reports appeared significantly inconsistent, and thus were not used.13 According to the 1-point-3-acres.com website, their data were extracted from various media and government websites, have been manually verified,5 and have been used by various parties, including Johns Hopkins COVID-19 data repository, WHO, and many others. Due to the use of publicly available, de-identified data and lack of protected health information, the study is exempted from requiring an Institutional Review Board approval (Category 4). We used the Google Trends function to extract the data of search-interest with the search period of March 1 to April 7, 2020 and COVID-19-related search terms. Based on the COVID-19 symptoms, common terms for COVID-19 and common diseases in the USA, we chose the search terms of “COVID-19,” “COVID,” “coronavirus,” “SARS-CoV2,” “pneumonia,” “high temperature,” “cough,” “COVID heart,” “COVID pneumonia,” and “COVID diabetes.” Google Trends search-interest represented search interest relative to the highest search-interest for a given time and region.7,12 A value of 100 is the peak popularity for the term, while a score of 0 means there were not enough data for this term. We then examined the lag correlations of the terms’ search interests with COVID-19 daily new cases and deaths as described before,7 whereas the lag time was defined as the difference between a data point’s original corresponding time and the shifted one in the lag correlation study. The lag times of our interest were up to 20 days for daily new cases and 23 days for daily death, respectively. The terms with the top-3 correlation coefficients were used to build respective generalized linear models. Based on these models, we used the existing search interests to predict future COVID-19 daily new cases and new deaths in the USA, which would be compared with the prospectively collected data for assessing prediction accuracies. All statistical analyses were carried out using Stata (version 15). The models’ accuracies were assessed using Pearson’s r. All p values were two-sided. Only a p<0.05 was considered statistically significant.

Results

The Johns Hopkins data repository and 1-point-3-acres.com provided slightly different estimates of COVID-19 daily new cases and deaths in the USA, although they claimed to share data. The data of a given date from 1-point-3-acres.com dataset varied by the release dates. Considering the data inconsistency, we chose the John Hopkins’ data for modelling, and the 1-point-3-acres.com data for a sensitivity study. There were 636,282 new cases and,325 deaths of COVID-19 reported in the USA from March 1 to April 15, 2020, with a crude mortality of 4.45%. The daily new cases peaked at 35,098 cases on April 10, 2020 and the daily deaths peaked at 2,494 on April 15, 2020. Google Trends search-interests had a 2-day delay in reporting (i.e. a search on April 9 yielded data up to April 7). COVID-19 has a much lower search interest score than COVID (Fig. 1), and was excluded from additional analysis also owing to its close relationship with COVID. As reported before, the correlation coefficients of search terms changed with lag time (Fig. 2). Among the nine terms we searched, COVID, “COVID pneumonia” and “COVID heart” had the top-3 correlation coefficients for the correlation with daily incidence and new deaths (Table 1). Our predicted COVID-19 daily new cases and new deaths would plateau for about 12 days (Fig. 3), suggesting a possible 12-day plateau of these epidemiologic parameters in the future.
Fig. 1

Trends in search-interest of COVID-19-related terms.

The numbers represented the search-interest relative to the term of the highest search-interest in the USA from March 1 to April 7, 2020.

Fig. 2

Lag correlations between Google Trends search-interest of the terms “COVID,” “COVID heart,” “COVID pneumonia,” and others, and the daily new cases and deaths of COVID-19 in the USA, March 1 to April 8, 2020.

(a, c) The search terms with the highest Pearson’s correlation coefficients for daily new cases and new deaths, respectively; (b, d) The rest of the search terms.

Table 1

The search term of the top-3 correlation coefficients for correlations with COVID-19 daily incidence and deaths, March 1 to April 8, 2020

Search termJohns Hopkins Data Repository
1-point-3-acres.com
Daily new cases
Daily new deaths
Daily new cases
Daily new deaths
Days earlierrapDays earlierrapDays earlierrapDays earlierrap
COVID heart120.979<0.001190.970<0.001120.982<0.001190.977<0.001
COVID pneumonia140.978<0.001190.958<0.001120.977<0.001190.967<0.001
COVID120.978<0.001190.963<0.001130.973<0.001200.972<0.001
Cough190.932<0.001200.923<0.001190.935<0.001200.945<0.001
Coronavirus190.914<0.001230.905<0.001190.909<0.001220.925<0.001
Pneumonia190.848<0.001220.854<0.001190.832<0.001220.897<0.001
COVID diabetes180.821<0.001190.816<0.001180.812<0.001190.801<0.001
SARS-CoV2180.814<0.001220.877<0.001180.805<0.001220.856<0.001
High temperature170.681<0.001220.6410.006160.667<0.001220.6500.005

aThe highest correlation coefficients among the correlation coefficients of a given search term by various lag times.

Fig. 3

Google Trends search-interest and the trends in COVID-19 daily new cases and new deaths in the USA, March 1 to April 15, 2020.

(a–c) The search-interests of “COVID,” “COVID heart,” and “COVID pneumonia” in Google Trends were 12 to 13 days lagged from COVID-19 daily new cases/incidence (Pearson’s r = 0.977, 0.982 and 0.973, respectively, p < 0.001 for all). (d–f) The search interests of “COVID,” “COVID heart,” and “COVID pneumonia” in Google Trends were 19 to 20 days lagged from COVID-19 daily new deaths (Pearson’s r = 0.967, 0.977 and 0.972, respectively, p < 0.001 for all). Note, d12, d14 and d19 indicate the trend curves were shifted for 12, 14 and 19 days, respectively, to compensate for lag time. The 7-day follow-up with prospectively collected data showed no significant correlations of observed data with the predicted daily new cases using search interest of “COVID,” “COVID heart,” and “COVID pneumonia” search (p = 0.178, 0.480 and 0.094, respectively), or with predicted daily new deaths (p = 0.267, 0.222 and 0.841, respectively).

Trends in search-interest of COVID-19-related terms.

The numbers represented the search-interest relative to the term of the highest search-interest in the USA from March 1 to April 7, 2020.

Lag correlations between Google Trends search-interest of the terms “COVID,” “COVID heart,” “COVID pneumonia,” and others, and the daily new cases and deaths of COVID-19 in the USA, March 1 to April 8, 2020.

(a, c) The search terms with the highest Pearson’s correlation coefficients for daily new cases and new deaths, respectively; (b, d) The rest of the search terms.

Google Trends search-interest and the trends in COVID-19 daily new cases and new deaths in the USA, March 1 to April 15, 2020.

(a–c) The search-interests of “COVID,” “COVID heart,” and “COVID pneumonia” in Google Trends were 12 to 13 days lagged from COVID-19 daily new cases/incidence (Pearson’s r = 0.977, 0.982 and 0.973, respectively, p < 0.001 for all). (d–f) The search interests of “COVID,” “COVID heart,” and “COVID pneumonia” in Google Trends were 19 to 20 days lagged from COVID-19 daily new deaths (Pearson’s r = 0.967, 0.977 and 0.972, respectively, p < 0.001 for all). Note, d12, d14 and d19 indicate the trend curves were shifted for 12, 14 and 19 days, respectively, to compensate for lag time. The 7-day follow-up with prospectively collected data showed no significant correlations of observed data with the predicted daily new cases using search interest of “COVID,” “COVID heart,” and “COVID pneumonia” search (p = 0.178, 0.480 and 0.094, respectively), or with predicted daily new deaths (p = 0.267, 0.222 and 0.841, respectively). aThe highest correlation coefficients among the correlation coefficients of a given search term by various lag times. The sensitivity study using 1-point-3-acres’ data revealed the correlation coefficients that were similar to those produced using Johns Hopkins’ data (Table 1). The 7-day follow-up with prospectively collected data showed no significant correlations of the observed data with the predicted daily new cases using search-interest of COVID, COVID heart and COVID pneumonia (p = 0.178, 0.480 and 0.094, respectively) nor with the predicted daily new deaths using search interest of COVID, COVID heart and COVID pneumonia (p = 0.267, 0.222 and 0.841, respectively).

Discussion

This population-based study shows that there were 636,282 new cases and,325 deaths of COVID-19 reported in the USA from March 1 to April 15, 2020. It also shows that the search-interest of COVID, COVID pneumonia, and COVID heart were highly correlated with COVID-19 daily new cases and new deaths, with a delay of 12 days and 19 days, respectively. However, the prediction accuracies of these models appeared low during a 7-day follow-up. To our knowledge, this study provided, for the first time, evidence that search-interest pertinent to COVID-19 is highly correlated with the trends in COVID-19 daily new cases and new deaths in the USA. The approximately 7 days of difference in lag time between daily new cases and deaths suggest the possibility of a 7-day interval between COVID-19 diagnosis and death in some patients. Additional studies are warranted to investigate this hypothesis. The findings of our study enable us to model daily new cases and deaths in the USA during the early phase (March 1 to April 8) of the COVID-19 outbreak and may greatly help prevent and prepare for any upcoming pandemic and burdens of COVID-19 in the future. The 12 days of lag time in the USA, as shown by us, was longer than the previously reported 9 days in China.7 Several factors may contribute to this difference but should be subject to additional studies. First, there was a significant delay in testing for COVID-19 in the USA,14 which might subsequently lead to longer lag time between the trends of search-interest and daily incidence. Second, the U.S. Centers for Disease Control and Prevention (CDC) recommended a priority-based testing strategy and allowed for not testing some subjects considered low-priority when the COVID-19 tests are short in supply.15 The criteria for testing COVID-19 in the USA, therefore, were different from those in China and Europe, where the WHO criteria were adopted.16–18 Thus, the patients, who met the WHO criteria, may not be tested and subsequently not included in the daily incidence in the USA; this could lead to underreporting of daily incidence. Third, the biological and socioeconomic differences between the USA and Chinese patients may also contribute to the difference. Finally, the prevalent COVID-19 subtypes in the USA may also be different from those in China and result in different lag times.19 This study provides several lines of valuable evidence. First, COVID-19 daily new deaths in the USA are poorly understood, and are here described and studied using a semiparametric model. Second, we extensively examined nine COVID-19-related search terms, which are more than the two used in a previous study.7 Our data also suggest that pneumonia and heart problems were highly relevant to the daily new cases and deaths in the USA. This finding may be explained by the frequent pneumonia and cardiac injuries seen in COVID-19 patients.20,21 Third, the lag time in our study was longer than that previously reported in China (12 days vs. 9 days). However, the 12 and 19 days of lag time also afforded us the opportunity to assess a model’s prediction accuracy for a longer period of future trends. Fourth, the comparison of predicted values and prospectively collected data will significantly reduce the recall and selection biases. We will continue updating the models’ accuracies as more data become available (see https://github.com/thezhanglab/COVID-US-google). Indeed, we found very high correlation in retrospective modelling but low accuracy in prediction, suggesting that the search-interest based model may be more helpful in predicting daily-incidence peak or early outbreak than post-peak or post-intervention trends. The unexpected low accuracy of model prediction was due to significant attenuation of trend plateau. It may be linked to the April 3 recommendation of wearing masks by the U.S. CDC,22 which was 5 days before our model’s peak time and matched the COVID19’s median incubation time of 5 days.20 Finally, to our knowledge, we are first to examine the correlations of search interest with the COVID-19 daily new cases and deaths in the USA and show greater correlations (Pearson’s r > 0.97) than reported in the Chinese data.7 This study is limited by the retrospective nature of the modeling part and may have some related biases. Moreover, due to the different testing strategies and criteria used in the USA and other countries,15–18 the comparison of our findings to those of other countries should be interpreted with caution. Finally, the data from Johns Hopkins’ data repository was not independently validated or authenticated. However, our sensitivity study using the 1-point-3-acres’ data confirms a similar correlation of search-interest with COVID-19 daily new cases and deaths in the USA.

Future directions

Despite the high correlation coefficients in retrospective study/modeling, the prediction-models based on the search-interest trend reached low accuracies during a 7-day follow-up. Additional studies are warranted to understand and improve these models. Why the prediction model failed should also be examined. The April 3 CDC recommendation of more indications for mask-use might be one of the reasons. Finally, the factors linked to and the epidemiological significance of lag time revealed by this study should also be further explored.

Conclusions

This population-based observational study shows that search terms related to COVID-19 are highly correlated with the trends in daily new cases and new deaths of COVID-19 in the USA. Therefore, an internet search-interest based model may be used to predict development and peak-time of COVID-19 outbreak.
  10 in total

1.  Interpreting Google flu trends data for pandemic H1N1 influenza: the New Zealand experience.

Authors:  N Wilson; K Mason; M Tobias; M Peacey; Q S Huang; M Baker
Journal:  Euro Surveill       Date:  2009-11-05

2.  Association of Cardiac Injury With Mortality in Hospitalized Patients With COVID-19 in Wuhan, China.

Authors:  Shaobo Shi; Mu Qin; Bo Shen; Yuli Cai; Tao Liu; Fan Yang; Wei Gong; Xu Liu; Jinjun Liang; Qinyan Zhao; He Huang; Bo Yang; Congxin Huang
Journal:  JAMA Cardiol       Date:  2020-07-01       Impact factor: 14.676

3.  Detecting influenza epidemics using search engine query data.

Authors:  Jeremy Ginsberg; Matthew H Mohebbi; Rajan S Patel; Lynnette Brammer; Mark S Smolinski; Larry Brilliant
Journal:  Nature       Date:  2009-02-19       Impact factor: 49.962

4.  Tracking COVID-19 in Europe: Infodemiology Approach.

Authors:  Amaryllis Mavragani
Journal:  JMIR Public Health Surveill       Date:  2020-04-20

5.  Early Transmission Dynamics in Wuhan, China, of Novel Coronavirus-Infected Pneumonia.

Authors:  Qun Li; Xuhua Guan; Peng Wu; Xiaoye Wang; Lei Zhou; Yeqing Tong; Ruiqi Ren; Kathy S M Leung; Eric H Y Lau; Jessica Y Wong; Xuesen Xing; Nijuan Xiang; Yang Wu; Chao Li; Qi Chen; Dan Li; Tian Liu; Jing Zhao; Man Liu; Wenxiao Tu; Chuding Chen; Lianmei Jin; Rui Yang; Qi Wang; Suhua Zhou; Rui Wang; Hui Liu; Yinbo Luo; Yuan Liu; Ge Shao; Huan Li; Zhongfa Tao; Yang Yang; Zhiqiang Deng; Boxi Liu; Zhitao Ma; Yanping Zhang; Guoqing Shi; Tommy T Y Lam; Joseph T Wu; George F Gao; Benjamin J Cowling; Bo Yang; Gabriel M Leung; Zijian Feng
Journal:  N Engl J Med       Date:  2020-01-29       Impact factor: 176.079

6.  Nowcasting and forecasting the potential domestic and international spread of the 2019-nCoV outbreak originating in Wuhan, China: a modelling study.

Authors:  Joseph T Wu; Kathy Leung; Gabriel M Leung
Journal:  Lancet       Date:  2020-01-31       Impact factor: 79.321

7.  Predicting COVID-19 Incidence Through Analysis of Google Trends Data in Iran: Data Mining and Deep Learning Pilot Study.

Authors:  Seyed Mohammad Ayyoubzadeh; Seyed Mehdi Ayyoubzadeh; Hoda Zahedi; Mahnaz Ahmadi; Sharareh R Niakan Kalhori
Journal:  JMIR Public Health Surveill       Date:  2020-04-14

8.  An interactive web-based dashboard to track COVID-19 in real time.

Authors:  Ensheng Dong; Hongru Du; Lauren Gardner
Journal:  Lancet Infect Dis       Date:  2020-02-19       Impact factor: 25.071

9.  Applications of Google Search Trends for risk communication in infectious disease management: A case study of the COVID-19 outbreak in Taiwan.

Authors:  Atina Husnayain; Anis Fuad; Emily Chia-Yu Su
Journal:  Int J Infect Dis       Date:  2020-03-12       Impact factor: 3.623

10.  Retrospective analysis of the possibility of predicting the COVID-19 outbreak from Internet searches and social media data, China, 2020.

Authors:  Cuilian Li; Li Jia Chen; Xueyu Chen; Mingzhi Zhang; Chi Pui Pang; Haoyu Chen
Journal:  Euro Surveill       Date:  2020-03
  10 in total
  29 in total

1.  Estimating the Prevalence and Mortality of Coronavirus Disease 2019 (COVID-19) in the USA, the UK, Russia, and India.

Authors:  Yongbin Wang; Chunjie Xu; Sanqiao Yao; Yingzheng Zhao; Yuchun Li; Lei Wang; Xiangmei Zhao
Journal:  Infect Drug Resist       Date:  2020-09-29       Impact factor: 4.003

2.  Exploring the role of non-pharmaceutical interventions (NPIs) in flattening the Greek COVID-19 epidemic curve.

Authors:  Amaryllis Mavragani; Konstantinos Gkillas
Journal:  Sci Rep       Date:  2021-06-03       Impact factor: 4.379

3.  Multivariable-adjusted trends in mortality due to alcoholic liver disease among adults in the United States, from 1999-2017.

Authors:  Emily Ryu; Harry H Xia; Grace L Guo; Lanjing Zhang
Journal:  Am J Transl Res       Date:  2022-02-15       Impact factor: 4.060

4.  Real-time Prediction of the Daily Incidence of COVID-19 in 215 Countries and Territories Using Machine Learning: Model Development and Validation.

Authors:  Yuanyuan Peng; Cuilian Li; Yibiao Rong; Chi Pui Pang; Xinjian Chen; Haoyu Chen
Journal:  J Med Internet Res       Date:  2021-06-14       Impact factor: 5.428

5.  Tracking COVID-19 using taste and smell loss Google searches is not a reliable strategy.

Authors:  Kim Asseo; Fabrizio Fierro; Yuli Slavutsky; Johannes Frasnelli; Masha Y Niv
Journal:  Sci Rep       Date:  2020-11-25       Impact factor: 4.379

6.  The Popularity of the Biologically-Based Therapies During Coronavirus Pandemic Among the Google Users in the USA, UK, Germany, Italy and France.

Authors:  Elif Günalan; İrem Kaya Cebioğlu; Özge Çonak
Journal:  Complement Ther Med       Date:  2021-02-15       Impact factor: 2.446

7.  Is Cancer an Independent Risk Factor for Fatal Outcomes of Coronavirus Disease 2019 Patients?

Authors:  Jie Xu; Wenwei Xiao; Li Shi; Yadong Wang; Haiyan Yang
Journal:  Arch Med Res       Date:  2021-05-24       Impact factor: 2.235

8.  Associations of Stay-at-Home Order and Face-Masking Recommendation with Trends in Daily New Cases and Deaths of Laboratory-Confirmed COVID-19 in the United States.

Authors:  Jie Xu; Sabiha Hussain; Guanzhu Lu; Kai Zheng; Shi Wei; Wei Bao; Lanjing Zhang
Journal:  Explor Res Hypothesis Med       Date:  2020-07-08

9.  Statistical procedures for evaluating trends in coronavirus disease-19 cases in the United States.

Authors:  David Ison
Journal:  Int J Health Sci (Qassim)       Date:  2020 Sep-Oct

10.  Retrospective analysis of the accuracy of predicting the alert level of COVID-19 in 202 countries using Google Trends and machine learning.

Authors:  Yuanyuan Peng; Cuilian Li; Yibiao Rong; Xinjian Chen; Haoyu Chen
Journal:  J Glob Health       Date:  2020-12       Impact factor: 4.413

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.