| Literature DB >> 35098113 |
Michael Rapp1, Moritz Kulessa1, Eneldo Loza Mencía1, Johannes Fürnkranz2.
Abstract
Early outbreak detection is a key aspect in the containment of infectious diseases, as it enables the identification and isolation of infected individuals before the disease can spread to a larger population. Instead of detecting unexpected increases of infections by monitoring confirmed cases, syndromic surveillance aims at the detection of cases with early symptoms, which allows a more timely disclosure of outbreaks. However, the definition of these disease patterns is often challenging, as early symptoms are usually shared among many diseases and a particular disease can have several clinical pictures in the early phase of an infection. As a first step toward the goal to support epidemiologists in the process of defining reliable disease patterns, we present a novel, data-driven approach to discover such patterns in historic data. The key idea is to take into account the correlation between indicators in a health-related data source and the reported number of infections in the respective geographic region. In an preliminary experimental study, we use data from several emergency departments to discover disease patterns for three infectious diseases. Our results show the potential of the proposed approach to find patterns that correlate with the reported infections and to identify indicators that are related to the respective diseases. It also motivates the need for additional measures to overcome practical limitations, such as the requirement to deal with noisy and unbalanced data, and demonstrates the importance of incorporating feedback of domain experts into the learning procedure.Entities:
Keywords: knowledge discovery; outbreak detection; rule learning; syndromic surveillance; time series analysis
Year: 2022 PMID: 35098113 PMCID: PMC8793623 DOI: 10.3389/fdata.2021.784159
Source DB: PubMed Journal: Front Big Data ISSN: 2624-909X
Figure 1Exemplary comparison of two syndrome definitions (blue lines) with reported cases (orange line). The Pearson's correlation for “fever AND cough” is 0.98 and for “cough OR runny nose OR sore throat” is 0.88.
Attributes included in the emergency department data.
|
| |||
|---|---|---|---|
|
|
|
|
|
|
| |||
| MTS presentation | Discrete | 57 | 0.01 |
| MTS indicator | Discrete | 179 | 5.10 |
| ICD code | Discrete | 5901 | 65.45 |
| ICD code (short) | Discrete | 1509 | 65.45 |
|
| |||
| Gender | Discrete | 3 | 0.00 |
| Age | Discrete | 21 | 0.00 |
|
| |||
| Blood pressure systolic | Numeric | − | 57.19 |
| Blood pressure diastolic | Numeric | − | 57.22 |
| Temperature | Numeric | − | 59.31 |
| Respiration rate | Numeric | − | 59.55 |
| Pulse frequency | Numeric | − | 91.91 |
| Oxygen saturation | Numeric | − | 57.18 |
|
| |||
| No isolation | Discrete | 11 | 1.81 |
| Transport | Discrete | 6 | 59.74 |
| Disposition | Discrete | 13 | 90.56 |
Pearson correlation between cases identified by automatically learned syndromes on different feature categories and actually reported cases, as well as cases that match the handcrafted syndrome definitions.
|
|
|
| ||||
|---|---|---|---|---|---|---|
|
|
|
|
|
|
| |
|
| ||||||
| ✓ | 0.9354 | 0.9917 | ||||
| ✓ | ✓ | 0.9357 | 0.9796 | |||
| ✓ | ✓ | 0.9480 | 0.9768 | |||
| ✓ | ✓ | 0.9366 | 0.9948 | |||
| ✓ | ✓ | ✓ | ✓ | 0.9493 | 0.9800 | |
|
| ||||||
| ✓ | 0.9399 | 0.9473 | ||||
| ✓ | ✓ | 0.9454 | 0.9219 | |||
| ✓ | ✓ | 0.9528 | 0.8689 | |||
| ✓ | ✓ | 0.9464 | 0.9506 | |||
| ✓ | ✓ | ✓ | ✓ | 0.9528 | 0.8689 | |
|
| ||||||
| ✓ | 0.7669 | 0.2761 | ||||
| ✓ | ✓ | 0.7669 | 0.2761 | |||
| ✓ | ✓ | 0.7303 | 0.1470 | |||
| ✓ | ✓ | 0.7167 | 0.1608 | |||
| ✓ | ✓ | ✓ | ✓ | 0.7242 | 0.1672 | |
Figure 2Percentage of successfully reconstructed syndrome definitions of different types for varying complexities of the predefined syndromes.
Figure 3Number of cases that satisfy the automatically discovered syndrome definitions (blue area) compared to the actual cases (left, orange line) and handcrafted syndromes (right, black line) for three diseases Influenza (A), SARS-CoV-2 (B), and Norovirus (C).
Exemplary automatically induced syndrome definitions.
| ① Influenza |
| J10 ∨ J11 ∨ “new confusion condition” ∨ Z96.0 ∨… |
| ① SARS-CoV-2 |
| (J12 ∧ “breathing problems”) ∨ U07.1 ∨ “pain in lower abdomen” ∨… |
| ① Norovirus |
| J21.0 ∨ D40 ∨ (J34 ∧ “recent problem”) |
| ① ③ Influenza |
| J10 |
| ∨ (J11 ∧ diastolic ≤ 92.5 ∧ systolic ≤ 156.5 ∧ temperature >38.5) |
| ∨ (temperature ≤ 40.5 ∧ diastolic ≤ 108.5 ∧ systolic ≤ 162 ∧187.5 ≤ heart |
| rate ≤ 207.5) |
| ∨… |
| ① ② ③ ④ Influenza |
| J10 |
| ∨ (J11 ∧ diastolic ≤ 92.5 ∧ systolic ≤ 156.5 ∧ temperature >38.5) |
| ∨ (temperature ≤ 40.5 ∧ diastolic ≤ 110 ∧ systolic ≤ 162 ∧187.5 ≤ heart |
| rate ≤ 212.5 |
| ∧ no isolation ∧ patient sent home) |
| ∨… |
D40, Neoplasm of uncertain/unknown behaviour of male genital organs; J21.0, Acute bronchiolitis due to respiratory syncytial virus; J10, Influenza due to identified seasonal influenza virus; J34, Other disorders of nose and nasal sinuses; J11, Influenza, virus not identified; U07.1, COVID-19, virus identified; J12, Viral pneumonia, not elsewhere classified; Z96.0, Presence of urogenital implants.