| Literature DB >> 31088446 |
I R Lake1,2, F J Colón-González3,4, G C Barker4, R A Morbey4,5, G E Smith4,5, A J Elliot4,5.
Abstract
BACKGROUND: Worldwide, syndromic surveillance is increasingly used for improved and timely situational awareness and early identification of public health threats. Syndromic data streams are fed into detection algorithms, which produce statistical alarms highlighting potential activity of public health importance. All alarms must be assessed to confirm whether they are of public health importance. In England, approximately 100 alarms are generated daily and, although their analysis is formalised through a risk assessment process, the process requires notable time, training, and maintenance of an expertise base to determine which alarms are of public health importance. The process is made more complicated by the observation that only 0.1% of statistical alarms are deemed to be of public health importance. Therefore, the aims of this study were to evaluate machine learning as a tool for computer-assisted human decision-making when assessing statistical alarms.Entities:
Keywords: Artificial intelligence; Bayes’ theorem; Decision making; Machine learning; Public health; Syndromic surveillance
Mesh:
Year: 2019 PMID: 31088446 PMCID: PMC6515660 DOI: 10.1186/s12889-019-6916-9
Source DB: PubMed Journal: BMC Public Health ISSN: 1471-2458 Impact factor: 3.295
Attributes included in the development of a classifier for statistical alarms recorded by a PHE multi-system syndromic surveillance service
| Field Name | Description | Entries | Missing | Unique | Ip | Values | |
|---|---|---|---|---|---|---|---|
|
| |||||||
| | Decision taken by syndromic surveillance analyst | 592 | 66,913 | 3 | 1.0000 | Alert, Monitor, No-action | – |
|
| |||||||
| | Year of the alarm | 67,505 | 0 | 3 | 0.0001 | 2013, 2014, 2015 | 9.7 × 10−2 |
| | Quarter | 67,505 | 0 | 4 | 0.0002 | Jan-Mar, Apr-Jun, Jul-Sep, Oct-Dec | 3.3 × 10−2 |
| | Day of the week | 67,505 | 0 | 7 | 0.0006 | Sun, Mon, Tue, Wed, Thu, Fri, Sat | 6.9 × 10−8 |
| | Was the event a statistical alarm? | 67,505 | 0 | 3 | 0.0014 | Yes, No, Unknown | < 10−10 |
| | The system that alarmed | 67,505 | 0 | 5 | 0.0006 | NHS111, NHS24, EDSSS, GPOOHSS, or GPIHSS | 4.4 × 10−9 |
| | Indicator that alarmed | 67,505 | 0 | 53 | 0.0041 | 1 of 53 different syndromes | < 10− 10 |
| | Coarse grained version of | 67,505 | 0 | 8 | 0.0013 | Cardiac, Impact of Cold, Gastrointestinal, Impact of Heat, Influenza-like illness, Respiratory, Other & Unspecified | < 10− 10 |
| | Specific/general indicator | 67,505 | 0 | 2 | 0.0001 | specific, General | 2.0 × 10−3 |
| | Indicator severity | 67,505 | 0 | 5 | 0.0002 | Consultation, Admitted, Severe, High Dependency Unit/Intensive Care Unit, Mortality | 1.0 × 10−10 |
| | PHE Region | 67,505 | 0 | 13 | 0.0037 | 1 of 13 PHE regions | < 10−10 |
| | Geography of alarm | 67,505 | 0 | 3 | 0.0037 | Local, Regional, National | < 10−10 |
| | Is syndromic surveillance analyst experienced? | 67,505 | 0 | 2 | 0.0001 | Yes, No | 2.1 × 10−2 |
|
| |||||||
| | Size of the alarm | 66,406 | 1099 | 4 | 0.0115 | 0,1,2,3 | < 10−10 |
| | Is the alarm a repeat? | 65,766 | 1739 | 4 | 0.0026 | 0, 1,2,3 | < 10−10 |
| | Is the alarm in multiple systems simultaneously? | 65,742 | 1763 | 4 | 0.0094 | 0,1,2,3 | < 10−10 |
| | Is the alarm counter to the national trend? | 65,771 | 1734 | 4 | 0.0003 | 0,1,2,3 | 2.3 × 10−5 |
| | Sum of scores from first stage risk assessment | 65,795 | 1710 | 13 | 0.0277 | 0–12 | < 10−10 |
| | Does first stage analyst engage consultant epidemiologist to perform second stage? | 67,505 | 0 | 2 | 0.0357 | Yes, No | < 10−10 |
|
| |||||||
| | Is the alarm counter to the seasonal trend? | 573 | 66,932 | 3 | 0.0258 | Yes, No, Missing | < 10−10 |
| | Does the alarm show an atypical geographical clustering? | 572 | 66,933 | 3 | 0.0259 | Yes, No, Missing | < 10−10 |
| | Is the alarm centred on a particular age group? | 572 | 66,933 | 3 | 0.0264 | Yes, No, Missing | < 10−10 |
| | Is there an unusual increase in illness severity associated with the alarm? | 571 | 66,934 | 3 | 0.0259 | Yes, No, Missing | < 10−10 |
| | Are the second stage scores subsequently completed? | 67,505 | 0 | 2 | 0.0130 | Yes, No | < 10−10 |
| | Sum of scores from second stage risk assessment | 67,505 | 0 | 15 | 0.0325 | 1–15 | < 10−10 |
| | Presence of text in summary field | 67,505 | 0 | 2 | 0.0041 | Yes, no | < 10−10 |
Notes: Ip is the amount of information obtained about the decision through observing the attribute (the mutual information between an attribute and decision)
P-value is a significance obtained from a Pearson χ2 measure of the association between a variable and the Decision
Fig. 1A simple 4 node network representing the conditional independence of observed symptoms A, B and C given a patient with disease D and a representation of naïve Bayes decision making
Classification performance measures
| Measure | Description |
|---|---|
| accuracy | Proportion of correct predictions made by the classifier. |
| Matthews correlation coefficient (MCC) | Calculated for each outcome separately. Varies between − 1 and 1, and is similar to a Pearson correlation. It is evaluated from all the elements of the confusion matrix. Gives a more balanced quantification of performance than accuracy as it considers how closely the predicted results follow the decisions in the test data. Other correlation measures exist, but the MCC is suited to asymmetric classes and multi-state systems [ |
| Precision (positive predictive power) | Calculated for each outcome separately. Expresses the fraction of classifications that match the true outcome. True positives/(true positives + false positives). E.g. proportion of ‘Alerts’ produced by the classifier that were ‘Alerts’ in the risk assessment database. |
| Recall (sensitivity) | Calculated for each outcome separately. Expresses the proportion of each outcome that is correctly returned by the classifier. True positives/(true positives + false negatives). E.g. proportion of ‘Alerts’ in the risk assessment database that were identified by the classifier. |
Confusion matrix for classification of statistical alarms recorded by a multi-system syndromic surveillance service in England
| Classification | |||||
|---|---|---|---|---|---|
| Alert | Monitor | No-action | |||
| Decision | NB | Alert | 32 | 22 | 8 |
| Monitor | 42 | 269 | 69 | ||
| No-action | 30 | 469 | 66,564 | ||
| TAN | Alert | 21 | 27 | 14 | |
| Monitor | 12 | 227 | 141 | ||
| No-action | 17 | 144 | 66,902 | ||
| TAN* | Alert | 27 | 24 | 11 | |
| Monitor | 22 | 260 | 98 | ||
| No-action | 27 | 293 | 66,743 | ||
| Multinet | Alert | 11 | 19 | 32 | |
| Monitor | 1 | 173 | 206 | ||
| No-action | 1 | 138 | 66,924 | ||
| Multinet* | Alert | 24 | 36 | 2 | |
| Monitor | 8 | 315 | 57 | ||
| No-action | 17 | 866 | 66,180 | ||
*Modified approach to account for data asymmetry i.e the predominance of ‘No-Action’ outcomes
Performance measures (tenfold cross validation) for multi-state classification of statistical alarms in the PHE multi-system syndromic surveillance service
| NB | TAN | TAN* | Multinet | Multinet* | |||
|---|---|---|---|---|---|---|---|
| Decision | Alert | MCC | 0.398 | 0.377 | 0.393 | 0.387 | 0.435 |
| Precision | 0.308 | 0.420 | 0.355 | 0.846 | 0.490 | ||
| Recall | 0.516 | 0.339 | 0.435 | 0.177 | 0.387 | ||
| Monitor | MCC | 0.497 | 0.581 | 0.552 | 0.486 | 0.459 | |
| Precision | 0.354 | 0.570 | 0.450 | 0.524 | 0.259 | ||
| Recall | 0.708 | 0.597 | 0.684 | 0.455 | 0.829 | ||
| Alert + Monitor (merging decisions/classifications from Table | MCC | 0.587 | 0.643 | 0.617 | 0.521 | 0.507 | |
| Precision | 0.422 | 0.641 | 0.510 | 0.595 | 0.303 | ||
| Recall | 0.826 | 0.649 | 0.754 | 0.462 | 0.867 | ||
*Modified approach to account for data asymmetry i.e the predominance of ‘No-Action’ outcomes
Fig. 2A tree-augmented naïve Bayes network structure induced from details of recorded statistical alarms within the PHE multi-system syndromic surveillance system
Fig. 3Three components of a multinet classifier, structured as Chow-Liu trees rooted on the “BInitial” variable. Trees corresponding to the ‘Alert’, ‘Monitor’ and ‘No-action’ outcomes of the “Decision” variable are at the top left, top right and bottom of the figure