| Literature DB >> 26536596 |
Jean-Baptiste Perrin1,2, Benoît Durand3, Emilie Gay1, Christian Ducrot2, Pascal Hendrikx4, Didier Calavas1, Viviane Hénaux1.
Abstract
We performed a simulation study to evaluate the performances of an anomaly detection algorithm considered in the frame of an automated surveillance system of cattle mortality. The method consisted in a combination of temporal regression and spatial cluster detection which allows identifying, for a given week, clusters of spatial units showing an excess of deaths in comparison with their own historical fluctuations. First, we simulated 1,000 outbreaks of a disease causing extra deaths in the French cattle population (about 200,000 herds and 20 million cattle) according to a model mimicking the spreading patterns of an infectious disease and injected these disease-related extra deaths in an authentic mortality dataset, spanning from January 2005 to January 2010. Second, we applied our algorithm on each of the 1,000 semi-synthetic datasets to identify clusters of spatial units showing an excess of deaths considering their own historical fluctuations. Third, we verified if the clusters identified by the algorithm did contain simulated extra deaths in order to evaluate the ability of the algorithm to identify unusual mortality clusters caused by an outbreak. Among the 1,000 simulations, the median duration of simulated outbreaks was 8 weeks, with a median number of 5,627 simulated deaths and 441 infected herds. Within the 12-week trial period, 73% of the simulated outbreaks were detected, with a median timeliness of 1 week, and a mean of 1.4 weeks. The proportion of outbreak weeks flagged by an alarm was 61% (i.e. sensitivity) whereas one in three alarms was a true alarm (i.e. positive predictive value). The performances of the detection algorithm were evaluated for alternative combination of epidemiologic parameters. The results of our study confirmed that in certain conditions automated algorithms could help identifying abnormal cattle mortality increases possibly related to unidentified health events.Entities:
Mesh:
Year: 2015 PMID: 26536596 PMCID: PMC4633029 DOI: 10.1371/journal.pone.0141273
Source DB: PubMed Journal: PLoS One ISSN: 1932-6203 Impact factor: 3.240
Performance indicators stored from each trial.
| Indicator | Definition | |
|---|---|---|
| Success | - | Success equals 1 if there is at least one true alarm during the trial, 0 otherwise |
| Timeliness | - | Number of weeks elapsed between the first simulated death and the first true alarm |
| Sensitivitya
| Sea | Proportion of weeks with an alarm among the outbreak weeks |
| Positive predictive valuea
| PPVa | Proportion of true alarms among the alarms |
| Negative predictive valuea
| NPVa | Proportion of weeks with no simulated deaths among the weeks without alarm |
| Sensitivityb
| Seb | Proportion of hexagon-weeks included in a cluster among hexagon-weeks infected |
| Positive predictive valueb
| PPVb | Proportion of hexagon-weeks infected among all the hexagon-weeks included in a cluster |
| Negative predictive valueb
| NPVb | Proportion of hexagon-weeks not infected among the hexagon-weeks not included in a cluster |
*For sensitivity, positive and negative predictive values, two types of indicators were computed using two different units: week and hexagon-week.
The first, type (a), was based on the number of true/false alarms produced over the test period. The second, type (b), was based on the number of hexagons-weeks included/not included in clusters during the test period.
Characteristics (median, IQ range) of 1000 simulated outbreaks (based on a SIR model with R0 of 5 and daily mortality rate of 0.03).
| Descriptors | Q25 | median | Q75 |
|---|---|---|---|
| Duration (weeks) | 6 | 8 | 10 |
| Number of simulated deaths | 101 | 5,627 | 15,629 |
| Proportion of simulated deaths | 0.1 | 1.9 | 7.3 |
| Number of infected herds | 5 | 441 | 1,415 |
| Proportion of infected herds | <0.001 | 0.21 | 0.66 |
| Number of infected hexagons | 1 | 26 | 57 |
| Proportion of infected hexagons | 0.1 | 2.3 | 5.1 |
| Weekly average no. of simulated deaths per infected herd | 1.2 | 1.6 | 2.8 |
† The spread of the outbreak was simulated during a maximum of 12 weeks or until it reached ≥15,000 deaths.
* Proportion of simulated deaths, infected herds and hexagons were respectively calculated using the total number of deaths (real+simulated) that occurred during the course of the outbreak, and the total number of herds and hexagons under surveillance over this period.
Fig 1Characteristics of the 1000 outbreaks simulated with a reproduction ratio R0 of 5 and a daily mortality rate of 0.03.
Distribution of the total number of simulated deaths (a), the number of infected herds (b) and infected hexagons (c), and the duration of the outbreak (d) among the 1,000 simulations. The spread of the outbreak was simulated during a maximum of 12 weeks or until it reached ≥15,000 simulated deaths.
Fig 2Characteristics over time of the 1000 outbreaks simulated with a reproduction ratio R0 of 5 and a daily mortality rate of 0.03.
Median (solid line) and interquartile range (dashed lines) of the number of simulated deaths, infected herds, and infected hexagons by week over the 12 weeks following the start of the simulation, among the 1,000 simulations.
Fig 3Cumulated proportion of success over the 12 weeks of the outbreak (n = 1,000 simulations).
Distribution among the 1,000 simulations of the performance indicators.
| Indicators | Q10 | median | Q90 | mean |
|---|---|---|---|---|
| Timeliness (weeks) | 1 | 1 | 3 | 1.4 |
| Sea | 0.0 | 61.5 | 90.0 | 49.6 |
| PPVa | 0.0 | 35.7 | 57.1 | 30.4 |
| NPVa | 89.3 | 94.6 | 100.0 | 94.6 |
| Seb | 0.0 | 12.5 | 26.0 | 13.0 |
| PPVb | 0.0 | 12.0 | 28.8 | 12.7 |
| NPVb | 99.8 | 99.9 | 100.0 | 99.9 |
Fig 4Distribution of the performance indicators among the 1,000 simulations.
Characteristics of the outbreak at first alarm (n = 1000 simulations based on a SIR model with R0 of 5 and daily mortality rate of 0.03).
| Descriptors | Q10 | Median | Q90 | Mean |
|---|---|---|---|---|
| Number of simulated deaths | 50 | 132 | 417 | 196 |
| Proportion of simulated deaths | 0.1 | 0.2 | 0.6 | 0.3 |
| Number of infected herds | 5 | 16 | 54 | 24 |
| Proportion of infected herds | 0.001 | 0.003 | 0.010 | 0.005 |
| Number of infected hexagons | 1 | 2 | 4 | 2.3 |
| Proportion of infected hexagons | 0.1 | 0.2 | 0.4 | 0.2 |
* Proportion of simulated deaths was calculated using the total number of deaths that occurred during the course of the outbreak as denominator. Proportion of infected herds and hexagons were based on the total number of herds and hexagons under surveillance during the course of the outbreak.
Number of simulations, success rate, and median of the performance indicators according to the size of the simulated outbreaks.
| Indicators | Small | Medium | Large |
|---|---|---|---|
| Number of simulations | 247 | 272 | 473 |
| Success (%) | 14.6 | 82.0 | 100 |
| Timeliness (weeks) | 1.0 | 1.0 | 1.0 |
| Sea | 0.0 | 30.4 | 85.7 |
| PPVa | 0.0 | 22.2 | 47.1 |
| NPVa | 92.9 | 91.1 | 98.0 |
| Seb | 0.0 | 16.7 | 14.5 |
| PPVb | 0.0 | 16.7 | 14.7 |
| NPVb | 100 | 100 | 99.8 |
| False alarm rate | 0.15 | 0.16 | 0.15 |
Small: < = 100 simulated deaths, Medium: 101–10,000 simulated deaths, Large: > 10,000 simulated deaths.
Sensitivity of model parameters (R0 of 1, 3, 5, 7, 9 and daily mortality rate from 1 to 5%) on performance indicators of the cluster detection method.
| Indicators | R0 | Mortality | ||
|---|---|---|---|---|
| LCC | p-value | LCC | p-value | |
| Success | 0.72 | <0,001 | 0.29 | 0.160 |
| Timeliness | -0.73 | <0,001 | -0.26 | 0.214 |
| Sea | 0.81 | <0,001 | 0.28 | 0.170 |
| PPVa | 0.67 | <0,001 | 0.06 | 0.759 |
| NPVa | 0.84 | <0,001 | 0.21 | 0.321 |
| Number of simulated deaths at first alert | 0.36 | 0.073 | -0.02 | 0.938 |
| Number of infected herds at first alert | 0.01 | 0.947 | -0.78 | <0,001 |
| Number of infected hexagons at first alert | 0.33 | 0.105 | -0.61 | <0,001 |
| Number of simulated deaths per herd at first alert | 0.57 | 0.003 | 0.72 | <0,001 |
*Linear correlation coefficient (LCC) between input parameter values and median performance indicator.
† Probability of a t-statistic (based on N-2 df) that evaluates LCC = 0