| Literature DB >> 34995281 |
Emmanuelle Sylvestre1,2, Clarisse Joachim3,4, Elsa Cécilia-Joseph2, Guillaume Bouzillé1, Boris Campillo-Gimenez1,5, Marc Cuggia1, André Cabié6,7,8.
Abstract
BACKGROUND: Traditionally, dengue surveillance is based on case reporting to a central health agency. However, the delay between a case and its notification can limit the system responsiveness. Machine learning methods have been developed to reduce the reporting delays and to predict outbreaks, based on non-traditional and non-clinical data sources. The aim of this systematic review was to identify studies that used real-world data, Big Data and/or machine learning methods to monitor and predict dengue-related outcomes. METHODOLOGY/PRINCIPALEntities:
Mesh:
Year: 2022 PMID: 34995281 PMCID: PMC8740963 DOI: 10.1371/journal.pntd.0010056
Source DB: PubMed Journal: PLoS Negl Trop Dis ISSN: 1935-2727
Fig 1PRISMA Flow Diagram describing the screening process for the systematic review.
Fig 2Number of publications on dengue prediction and/or surveillance published between January 1, 2000 and August 31, 2020.
Type, study population and themes of the selected studies.
| n | % | |
|---|---|---|
|
|
| |
| Article | 77 | 65 |
| Conference paper | 42 | 35 |
|
| ||
| Americas | ||
| Caribbean | 3 | 2 |
| North America | 3 | 2 |
| South America | 28 | 22 |
| Asia | ||
| East Asia | 16 | 13 |
| South-East Asia | 47 | 37 |
| South Asia | 27 | 21 |
| Australia | 1 | 1 |
| Worldwide | 2 | 2 |
|
| ||
| Information Technology & Science |
|
|
| Computer Science | 42 | 35 |
| Engineering | 10 | 8 |
| Science & Technology—Other Topics | 10 | 8 |
| Medicine |
|
|
| Infectious Diseases & Tropical Medicine | 20 | 17 |
| Medicine—Other Topics | 8 | 7 |
| Health Informatics, Public Health & Biology |
|
|
| Biology | 7 | 6 |
| Medical Informatics | 16 | 13 |
| Public Health | 6 | 5 |
*Some studies were carried out in more than one geographic regions
Data sources for dengue monitoring and prediction depending on the main theme.
| Number of studies n(%) | Study main theme n (%) | |||
|---|---|---|---|---|
| IT | Med | PH | ||
|
|
|
|
| |
|
| ||||
| Epidemiological and demographic data | 86 (72) | 42 (68) | 24 (86) | 20 (69) |
| Clinical and biological data | 33 (27) | 20 (32) | 3 (11) | 10 (34) |
| Genomic sequence data | 2 (1) | 1 (2) | 0 (0) | 1 (3) |
| Climate, environmental and geographic data | 45 (37) | 26 (42) | 12 (43) | 7 (24) |
| Vector data | 4 (3) | 1 (2) | 3 (11) | 0 (0) |
|
| ||||
|
| 25 (21) | 8 (13) | 11 (39) | 6 (21) |
| Baidu | 6 (5) | 2 (3) | 4 (14) | 0 (0) |
| 19 (15) | 6 (9) | 7 (25) | 6 (20) | |
|
| 21 (17) | 14 (22) | 4 (14) | 3 (10) |
| 18 (14) | 12 (19) | 4 (14) | 2 (6) | |
| Other | 3 (2) | 2 (3) | 0 (0) | 1 (3) |
|
| 10 | 2 (3) | 3 (11) | 5 (17) |
| Cellphone | 2 | 2 (3) | 0 (0) | 0 (0) |
| HealthMap | 2 | 0 (0) | 1 (3) | 1 (3) |
| LeXisNexis | 2 | 0 (0) | 1 (3) | 1 (3) |
| Political stability | 1 | 0 (0) | 0 | 1 (3) |
| Wikipedia | 1 | 0 (0) | 1 (3) | 0 (0) |
a As most studies used several data sources, some articles are present several times.
IT: Information Technology & Science; Med: Medicine; PH: Health Informatics, Public Health & Biology
Statistical methods and models used in the selected studies depending on the study aim*.
| Statistical methods | Prediction n (%) | Surveillance | Prediction and surveillance n (%) | Totaln (%) |
|---|---|---|---|---|
|
|
|
|
|
|
| Machine learning methods | 126 (82) | 27 (46) | 51 (75) |
|
| Supervised learning | 121 (79) | 21 (36) | 50 (74) |
|
| Unsupervised learning | 5 (3) | 6 (10) | 1 (1) |
|
| Other model types (including time series models) | 25 (16) | 9 (15) | 4 (6) |
|
| Correlation | 2 (1) | 23 (39) | 13 (19) |
|
|
|
|
|
|
|
| Artificial neural networks | 31 (21) | 2 (6) | 3 (5) |
|
| Association rules | 3 (2) | 1 (3) | 0 (0) |
|
| Bayesian models | 12 (8) | 5 (14) | 3 (5) |
|
| Clustering | 5 (3) | 5 (14) | 1 (2) |
|
| Decision tree | 35 (23) | 2 (6) | 6 (11) |
|
| Regression model | 20 (13) | 9 (25) | 31 (56) |
|
| Support-vector machine | 17 (11) | 3 (8) | 7 (13) |
|
| Time series | 12 (8) | 1 (3) | 3 (5) |
|
| Other | 16 (11) | 8 (22) | 1 (2) |
|
*As most studies used several models and/or statistical methods, some are listed several times.
a Studies evaluating a data source (traditional or novel data streams) for dengue monitoring
b Some models classified as “Other” are also included in the “Supervised learning” category
Evaluation metrics used in the selected articles depending on their aim(s)*.
| Evaluation metrics | Prediction n (%) | Surveillance | Prediction and surveillance n(%) | Total n(%) |
|---|---|---|---|---|
|
|
|
|
|
|
| Correlation coefficient | 3 (38) | 16 (73) | 9 (50) |
|
| R-squared | 4 (50) | 5 (23) | 9 (50) |
|
| Other correlation metric | 1 (12) | 1 (5) | 0 (0) |
|
|
|
|
|
|
|
| Root mean square error | 14 (41) | 0 (0) | 9 (43) |
|
| Mean absolute error | 7 (21) | 0 (0) | 4 (19) |
|
| Mean absolute percentage error | 4 (12) | 0 (0) | 3 (14) |
|
| Mean squared error | 3 (9) | 0 (0) | 3 (14) |
|
| Other | 6 (18) | 2 (100) | 2 (10) |
|
|
|
|
|
|
|
| Accuracy | 38 (26) | 6 (46) | 7 (41) |
|
| Recall/Sensitivity | 32 (22) | 2 (15) | 3 (18) |
|
| Specificity | 20 (14) | 0 (0) | 3 (18) |
|
| Precision/Positive predictive value | 17 (12) | 1 (8) | 1 (6) |
|
| F-score | 12 (8) | 2 (15) | 0 (0) |
|
| AUC and/or ROC curve | 16 (11) | 1 (8) | 3 (18) |
|
| Kappa statistic | 5 (3) | 0 (0) | 0 (0) |
|
| Other | 7 (5) | 1 (8) | 0 (0) |
|
|
|
|
|
|
|
|
|
|
|
|
|
| Correlation metrics | 7 (11) | 19 (66) | 18 (72) |
|
| Error-based metrics | 19 (29) | 1 (3) | 15 (60) |
|
| Confusion matrix-based metrics | 47 (72) | 9 (31) | 8 (32) |
|
| Other evaluation metrics | 8 (12) | 6 (21) | 8 (32) |
|
*As studies used several metrics, some articles are listed more than once.
a Studies evaluating a data source (traditional data or novel data streams) for dengue monitoring
b AUC: Area Under the ROC Curve. ROC: Receiver Operating Characteristic
Most significant predictors for the three most frequently studied outcomes.
| Number of studies n (%) | Dengue incidence rates n = 27 | Dengue outbreaks n = 9 | Dengue diagnosis n = 6 |
|---|---|---|---|
|
| |||
| Rainfall | 14 (52) | 7 (78) | 0 (0) |
| Temperature | 14 (52) | 6 (67) | 0 (0) |
| Humidity | 9 (33) | 1 (11) | 0 (0) |
| Mosquito-related predictor | 0 (0) | 2 (22) | 0 (0) |
| Google search index | 4 (15) | 0 (0) | 0 (0) |
| Baidu search index | 3 (11) | 0 (0) | 0 (0) |
| Tweets | 3 (11) | 0 (0) | 0 (0) |
| Fever | 0 (0) | 0 (0) | 4 (66) |
| Arthralgia/myalgia | 0 (0) | 0 (0) | 3 (50) |
| Platelet count | 0 (0) | 0 (0) | 2 (33) |
| White blood cell count | 0 (0) | 0 (0) | 2 (33) |
| Other | 13 (48) | 6 (67) | 5 (83) |
*Most studies found several significant predictors
Model with the best performance for the three most frequently studied outcomes.
| Number of studies | Dengue incidence rates n = 24 | Dengue outbreaks n = 9 | Dengue diagnosis n = 14 |
|---|---|---|---|
|
| |||
| Artificial neural network | 4 (17) | 1 (11) | 4 (29) |
| Decision tree | 4 (17) | 2 (22) | 4 (29) |
| Support vector machine | 4 (17) | 1 (11) | 4 (29) |
| Regression model | 5 (21) | 1 (11) | 0 (0) |
| Time series | 3 (12) | 2 (22) | 0 (0) |
| Bayesian models | 2 (8) | 0 (0) | 0 (0) |
| Association rules | 1 (4) | 1 (11) | 0 (0) |
| Clustering | 0 (0) | 0 (0) | 1 (7) |
| Other | 1 (4) | 1 (11) | 1 (7) |