| Literature DB >> 35496673 |
Zhimin Liu1, Zuodong Jiang1, Geoffrey Kip1, Kirti Snigdha1, Jennings Xu1, Xiaoying Wu1, Najat Khan1, Timothy Schultz1.
Abstract
The outbreak of the SARS-CoV-2 novel coronavirus has caused a health crisis of immeasurable magnitude. Signals from heterogeneous public data sources could serve as early predictors for infection waves of the pandemic, particularly in its early phases, when infection data was scarce. In this article, we characterize temporal pandemic indicators by leveraging an integrated set of public data and apply them to a Prophet model to predict COVID-19 trends. An effective natural language processing pipeline was first built to extract time-series signals of specific articles from a news corpus. Bursts of these temporal signals were further identified with Kleinberg's burst detection algorithm. Across different US states, correlations for Google Trends of COVID-19 related terms, COVID-19 news volume, and publicly available wastewater SARS-CoV-2 measurements with weekly COVID-19 case numbers were generally high with lags ranging from 0 to 3 weeks, indicating them as strong predictors of viral spread. Incorporating time-series signals of these effective predictors significantly improved the performance of the Prophet model, which was able to predict the COVID-19 case numbers between one and two weeks with average mean absolute error rates of 0.38 and 0.46 respectively across different states.Entities:
Keywords: Google trends; Infodemiology; News mining; Prophet model; Signal burst model; Word2Vec
Year: 2022 PMID: 35496673 PMCID: PMC9040481 DOI: 10.1016/j.patrec.2022.04.030
Source DB: PubMed Journal: Pattern Recognit Lett ISSN: 0167-8655 Impact factor: 4.757
Notations used in this article.
| Parameters | Notation Used |
|---|---|
| Cosine similarity score | |
| Vector representing each news | |
| Vector representing the target topic | |
| Probability of baseline state | |
| Probability of bursty state | |
| Sum of daily target news numbers | |
| Sum of all daily news numbers | |
| Constant to calculate the bursty state probability | |
| Difference between observation and expectation | |
| State | |
| Week | |
| Number of target news of the week | |
| Number of all news of the week | |
| Transition cost | |
| Difficulty of transitioning to higher states | |
| Number of weeks | |
| Optimal state sequence | |
| Burst weight | |
| Mean absolute error | |
| Mean absolute percentage error | |
| Total number of weeks in the test data | |
| Week in the test data | |
| Actual value of the week | |
| Predicted value of the week |
Fig. 1Search for “school reopen” related news with an NLP pipeline using Word2Vec embeddings. (A) Detailed procedures to identify “school reopen” related COVD-19 news. (B) The times of occurrence of 13 clusters of news, each of which is represented by its two most frequent words. (C) The distribution of cosine similarity scores with “school reopen” in each cluster, red asterisks represent the mean. (D) Examples of “school reopen” related COVID-19 news that pass the threshold (0.5104). Blue and red mark news that have the largest and smallest cosine similarity scores, respectively.
Fig. 2Align various signals with COVID-19 case numbers. (A) signals extracted from various data sources to track COVID-19 cases. (B) Several signals correlate well with COVID-19 case numbers in Massachusetts. r represents the Spearman's rank correlation coefficient.
Fig. 3Lead-lag correlation of weekly COVID-19 cases with some representative signals across different states. (A-F) Correlation coefficients with different offsets between two time-series measurements. Each line represents a state. (G) Boxplot of offsets that generate the maximum correlation coefficient across 19 states. Similar signals are labeled with the same color. Purple dashed line represents offset of 0. Red dashed line represents the median offset across different states.
Fig. 4Predicting future COVID-19 cases with selected metrics using the Prophet model. (A) Mean absolute percentage errors (MAPEs) of predicting COVID-19 case in one week with different Prophet models in each state. (B) Boxplot of MAPEs in (A) grouped by different Prophet models. (C) Barplot of MAPEs in predicting COVID-19 cases in one week using the Prophet model that incorporates different metrics including wastewater COVID-19 measurements. (D) Boxplot of MAPEs of predicting COVID-19 cases in 1, 2, 3, and 4 weeks across different states using the Prophet model + all above metrics. Green triangles and numbers in (B) and (D) represent the mean MAPE within each distribution.