| Literature DB >> 32401809 |
Nicolò Gozzi1, Daniela Perrotta2, Daniela Paolotti3, Nicola Perra1,3.
Abstract
In this work, we aim to determine the main factors driving self-initiated behavioral changes during the seasonal flu. To this end, we designed and deployed a questionnaire via Influweb, a Web platform for participatory surveillance in Italy, during the 2017 - 18 and 2018 - 19 seasons. We collected 599 surveys completed by 434 users. The data provide socio-demographic information, level of concerns about the flu, past experience with illnesses, and the type of behavioral changes voluntarily implemented by each participant. We describe each response with a set of features and divide them in three target categories. These describe those that report i) no (26%), ii) only moderately (36%), iii) significant (38%) changes in behaviors. In these settings, we adopt machine learning algorithms to investigate the extent to which target variables can be predicted by looking only at the set of features. Notably, 66% of the samples in the category describing more significant changes in behaviors are correctly classified through Gradient Boosted Trees. Furthermore, we investigate the importance of each feature in the classification task and uncover complex relationships between individuals' characteristics and their attitude towards behavioral change. We find that intensity, recency of past illnesses, perceived susceptibility to and perceived severity of an infection are the most significant features in the classification task and are associated to significant changes in behaviors. Overall, the research contributes to the small set of empirical studies devoted to the data-driven characterization of behavioral changes induced by infectious diseases.Entities:
Year: 2020 PMID: 32401809 PMCID: PMC7250468 DOI: 10.1371/journal.pcbi.1007879
Source DB: PubMed Journal: PLoS Comput Biol ISSN: 1553-734X Impact factor: 4.475
Fig 1Histograms of number of behaviors changed for the class of social distancing and for class of moderate change.
The two histograms look significantly different, with that of social distancing much more skewed towards higher values.
List of features.
| Type | Feature | Meaning |
|---|---|---|
| gender | ||
| age class (15, 15-30, 30-50, 50-65, 65+ years) | ||
| true if the individual has daily contacts with large groups, patients, children | ||
| true if the individual smokes regularly | ||
| true if the individual follows a special diet | ||
| true if the individual has children in school age | ||
| true if the individual takes regularly public transportation | ||
| true if the individual has old people (65+) in her household | ||
| frequency of flu-like illness | ||
| true if the individual had flu in the current season | ||
| measure of severity of diseases experienced in the past | ||
| true if the individual has received a vaccine in the current season | ||
| true if the individual has received a vaccine in the previous season | ||
| true if the individual has allergies that can cause respiratory problems | ||
| true if the individual receives regularly medication for chronic diseases | ||
| true if the individual seeks regularly information regarding the flu | ||
| self-evaluation of the level of information regarding the flu | ||
| true if the individual thinks that proactive measures can prevent the contagion | ||
| measure of anxiety deriving from a possible contagion | ||
| measure of awareness of efficacy of behavioral measures. It can assume integer values in between 0 and +8, the higher the more the individual believes that behavioral change can lower the risk of an infection | ||
| measure of concerns related to the possibility of contagion | ||
| days between the ILI peak and the date of compilation of the behavioral survey | ||
| flu prevalence in the Italian region where the participant reside in |
Classification performance.
| model | precision | bal. accuracy | recall | f1 score |
|---|---|---|---|---|
| RND | 0.343 | 0.335 | 0.334 | 0.335 |
| SVM | 0.519 | 0.503 | 0.500 | 0.504 |
| LG | 0.479 | 0.492 | 0.478 | 0.472 |
| RF | 0.506 | 0.498 | 0.506 | 0.505 |
| GBT |
Fig 2Confusion matrix of GBT.
Each row and column represents a particular class: on the vertical axis are represented the true labels, while on the horizontal axis are represented the predicted labels. Hence, in the main diagonal boxes, we can observe the percentage of samples correctly labeled for each class, and in non-diagonal boxes, we can observe the percentage of misclassifications among all possible pairs of classes.
Fig 3Summary plot for SHAP analysis.
It shows the mean absolute SHAP value of ten most important features for the three classes.
Fig 4SHAP value plot for the three most important features: A) disease score, B) perceived susceptibility, C) perceived severity.