| Literature DB >> 33110594 |
Yuanyuan Peng1, Cuilian Li2, Yibiao Rong3, Xinjian Chen1, Haoyu Chen2.
Abstract
BACKGROUND: Internet search engine data, such as Google Trends, was shown to be correlated with the incidence of COVID-19, but only in several countries. We aim to develop a model from a small number of countries to predict the epidemic alert level in all the countries worldwide.Entities:
Mesh:
Year: 2020 PMID: 33110594 PMCID: PMC7567446 DOI: 10.7189/jogh.10.020511
Source DB: PubMed Journal: J Glob Health ISSN: 2047-2978 Impact factor: 4.413
Figure 1The predicted alert level (red), normalized Google Trends search volume of the topic “Coronavirus” (green), normalized daily new confirmed cases. Panel A. Italy. Panel B. United States Virgin Islands.
Spearman correlation coefficients of 16 features with the incidence of COVID-19 at one-week lag in 202 countries*
| Features | Daily_AVG | Daily_MAX | Label_AVG | Label_MAX |
|---|---|---|---|---|
| Coronavirus | 0.59 | 0.88 | 0.78 | 0.93 |
| Pneumonia | 0.15 | 0.80 | 0.31 | 0.91 |
| Cough | 0.09 | 0.72 | 0.23 | 0.91 |
| Fever | 0.16 | 0.80 | 0.31 | 0.92 |
| Nasal congestion | 0.09 | 0.69 | 0.22 | 0.88 |
| Rhinorrhea | 0.16 | 0.76 | 0.35 | 0.92 |
| Diarrhea | 0.01 | 0.70 | 0.02 | 0.82 |
| Fatigue | -0.03 | 0.52 | -0.01 | 0.89 |
| Coronavirus_RE | 0.38 | 0.75 | 0.60 | 0.90 |
| Pneumonia_RE | 0.13 | 0.75 | -0.16 | 0.77 |
| Cough_RE | 0.09 | 0.68 | 0.20 | 0.87 |
| Fever_RE | 0.04 | 0.47 | 0.27 | 0.59 |
| Nasal congestion_RE | -0.06 | 0.43 | -0.07 | 0.85 |
| Rhinorrhea_RE | 0.01 | 0.45 | 0.01 | 0.78 |
| Diarrhea_RE | -0.04 | 0.29 | 0.23 | 0.88 |
| Fatigue_RE | -0.12 | 0.27 | -0.03 | 0.75 |
*Daily_AVG and Daily_MAX: the average and maximum Spearman correlation coefficient of Google search volume data of each feature and daily new confirmed cases at one week behind in 202 countries; Label_AVG and Label_MAX: the average and maximum Spearman correlation coefficient of the average weekly Google search volume data of each feature and the weekly epidemic alert level one week behind in 202 countries.
The list of countries/regions with different results
| Category | No. | List of countries/regions |
|---|---|---|
| Training | 20 | Argentina, Australia, Austria, Belgium, Brazil, Finland, France, Germany, India, Indonesia, Iran (Islamic Republic of), Ireland, Italy, Peru, Poland, Puerto Rico, South Africa, Spain, Switzerland, United States of America |
| 0 error | 5 | Aruba, Central African Republic, French Polynesia, Ghana, Venezuela (Bolivarian Republic of) |
| 1 error | 18 | Canada, Chad, Colombia, Costa Rica, Coted Ivoire, Greece, Guadeloupe, Iceland, Kuwait, Morocco, Netherlands, Panama, Republic of Moldova, Rwanda, The United Kingdom, United States Virgin Islands, Uruguay, Uzbekistan |
| 2 errors | 43 | Albania, Anguilla, Antigua and Barbuda, Bahrain, Bolivia (Plurinational State of), Botswana, Bulgaria, Burundi, Cameroon, Chile, Croatia, Cuba, Cyprus, Dominican Republic, Eritrea, Eswatini, Falkland Islands (Malvinas), French Guiana, Gambia, Grenada, Honduras, Kazakhstan, Kenya, Kyrgyzstan, Lebanon, Luxembourg, Malta, Martinique, Mauritania, Montenegro, Nigeria, Norway, Oman, Portugal, Qatar, Reuntion, Saint Kitts and Nevis, Saint Martin, Serbia, Sierra Leone, Slovakia, Slovenia, Ukraine |
| 3 errors | 39 | Algeria, Armenia, Azerbaijan, Barbados, Belarus, Burkina Faso, Cayman Islands, Curacao, Czechia, Denmark, Ecuador, Egypt, El Salvador, Equatorial Guinea, Estonia, Guatemala, Guinea, Guinea-Bissau, Hungary, Jordan, Kosovo, Latvia, Lithuania, Malawi, Mali, Mauritius, Mexico, New Caledonia, Niger, Paraguay, Saint Barthelemy, Saint Vincent and the Grenadines, Senegal, Sint Maarten, Sweden, Togo, Tunisia, Turkey, Saudi Arabia |
| 4 errors | 29 | Bahamas, Bangladesh, Benin, Bermuda, Bhutan, Djibouti, Gibraltar, Guernsey, Guyana, Haiti, Jersey, Liberia, Libya, Madagascar, Montserrat, Mozambique, North Macedonia, Philippines, Romania, Russian Federation, Seychelles, Somalia, South Sudan, Turks and Caicos Islands, Uganda, United Republic of Tanzania, Afghanistan, Pakistan, Sudan |
| 5 errors | 19 | Bosnia and Herzegovina, Fiji, Georgia, Greenland, Guam, Isle of Man, Jamaica, Japan, Liechtenstein, Myanmar, Nepal, Papua New Guinea, San Marino, Singapore, Sri Lanka, Suriname, United Arab Emirates, Zambia, Zimbabwe |
| 6 errors | 12 | Andorra, Belize, Ethiopia, Gabon, Malaysia, Mayotte, Mongolia, Nicaragua, Saint Lucia, Sao Tome and Principe, Timor-Leste, Yemen |
| 7 errors | 6 | Angola, British Virgin Islands, Dominica, New Zealand, Thailand, Trinidad and Tobago |
| 8 errors | 8 | Brunei Darussalam, Cambodia, Faroe Islands, Iraq, Israel, Maldives, Northern Mariana Islands (Commonwealth of the), Syrian Arab Republic |
| 10 errors | 1 | China |
| 11 errors | 1 | Viet Nam |
| 12 errors | 1 | Laos |
Performance of the final model in different test data sets
| Test data | ACC | M_P | M_R | M_F1 | K-Score |
|---|---|---|---|---|---|
| 20 training +5 no error countries | 0.9886 | 0.9781 | 0.9912 | 0.9844 | 0.9803 |
| 43 countries with ≤1 error | 0.9528 | 0.9195 | 0.9306 | 0.9248 | 0.9209 |
| 86 countries with ≤2 errors | 0.9009 | 0.8633 | 0.8458 | 0.8534 | 0.8324 |
| 125 countries with ≤3 errors | 0.8663 | 0.8170 | 0.7995 | 0.8058 | 0.7729 |
| 154 countries with ≤4 errors | 0.8133 | 0.7489 | 0.7377 | 0.7401 | 0.6828 |
| All 202 countries | 0.7527 | 0.6724 | 0.6754 | 0.6698 | 0.5841 |
ACC – accuracy, M_P – macro precision, M_R – macro recall, M_F1 – macro F1-score, K_Score – kappa-coefficient
Figure 2The importance of included features.
Performance of different machine learning methods in 202 countries
| Methods | ACC | M_P | M_R | M_F1 | K_Score |
|---|---|---|---|---|---|
| Linear Regression Classification | 0.6127 | 0.5138 | 0.5472 | 0.5048 | 0.3786 |
| Support Vector Machine | 0.5943 | 0.4963 | 0.5080 | 0.4998 | 0.3210 |
| k-Nearest Neighbor | 0.6799 | 0.5962 | 0.5516 | 0.5592 | 0.4156 |
| Decision Tree Classification | 0.6681 | 0.6027 | 0.6152 | 0.6064 | 0.4605 |
| Random Forest Classification |
ACC – accuracy, M_P – macro precision, M_R – macro recall, M_F1 – macro F1-score, K_Score – kappa-coefficient
The result of ablation experiments in 202 countries
| Model | ACC | M_P | M_R | M_F1 | K_Score |
|---|---|---|---|---|---|
| 7 features | 0.7501 | 0.6705 | 0.6732 | 0.6691 | 0.5821 |
| 9 features | |||||
| 16 features | 0.7448 | 0.6570 | 0.6592 | 0.6511 | 0.5686 |
ACC – accuracy, M_P – macro precision, M_R – macro recall, M_F1 – macro F1-score, K_Score – kappa-coefficient
Figure 3Classification confusion matrix.