| Literature DB >> 29216923 |
Bernard Hernandez1, Pau Herrero2, Timothy Miles Rawson3, Luke S P Moore3, Benjamin Evans2, Christofer Toumazou2, Alison H Holmes3, Pantelis Georgiou2.
Abstract
BACKGROUND: Antimicrobial Resistance is threatening our ability to treat common infectious diseases and overuse of antimicrobials to treat human infections in hospitals is accelerating this process. Clinical Decision Support Systems (CDSSs) have been proven to enhance quality of care by promoting change in prescription practices through antimicrobial selection advice. However, bypassing an initial assessment to determine the existence of an underlying disease that justifies the need of antimicrobial therapy might lead to indiscriminate and often unnecessary prescriptions.Entities:
Keywords: Antimicrobial resistance; Behaviour change; Biochemical markers; Decision support; Infection; Machine learning; Predictive modelling; Supervised learning
Mesh:
Substances:
Year: 2017 PMID: 29216923 PMCID: PMC5721579 DOI: 10.1186/s12911-017-0550-1
Source DB: PubMed Journal: BMC Med Inform Decis Mak ISSN: 1472-6947 Impact factor: 2.796
Selected laboratory biochemical markers
| Abbreviation | Marker | Unit |
|---|---|---|
| ALT | Alanine aminotransferase | iu/L |
| ALP | Alkaline phosphatase | iu/L |
| BIL | Bilirubin | umol/L |
| CRE | Creatinine | umol/L |
| CRP | C-Reactive protein | mg/L |
| WBC | White blood count | 10*9/L |
Fig. 1Profile with metadata, feature vector and label
Fig. 2High level diagram of the work-flow followed to build the models and obtain the results presented in this paper. First, data cleaning and outlier removal is performed. The remaining observations are grouped as complete or incomplete profiles. The former is further split into Cross-Validation Set (CVS) and Hold-out Set (HOS). Ten-Fold Stratified Cross-Validation is performed on CVS and two outputs are obtained in this step: a preprocessing equation to transform new observations (T) and a calibrated model (M) which are later used. It is important to highlight that sampling and preprocessing are performed using the training set while calibration is achieved from completely unseen observations. The performance of calibrated models is evaluated in HOS and
Evaluation metrics: descriptions and equations
| Metric | Description | Equation |
|---|---|---|
| Sensitivity | Proportion of observed positives that are correctly identified as such (i.e. percentage of culture-positive profiles correctly identified as positive). Also called recall (REC) or true positive rate (TPR). |
|
| Specificity | Proportion of observed negatives that are correctly identified as such (i.e. percentage of culture-negative profiles correctly identified as negative). Also called true negative rate (TNR). |
|
| ROC | This curve illustrates the performance of a binary classifier as its discrimnation threshold is varied by plotting true positive rate (TPR) against false positive rate (FPR). It is related to cost/benefit analysis of diagnostic decision making. | |
| PR | This curve represents precision against recall where high scores for both shows that the classifier is returning accurate results (high precision) as well as returning a majority of all positive results (high recall). |
Pathology biomarkers and profiles overview
| ALP | ALT | BIL | CRE | CRP | WBC | All Tests | Profiles | ||
|---|---|---|---|---|---|---|---|---|---|
| C- | F1 | 10858 | 236 | 327 | 53443 | 10477 | 191213 | 266554 | 266554 |
| F2 | 11654 | 492 | 889 | 81337 | 25959 | 94605 | 214936 | 107468 | |
| F3 | 51047 | 27921 | 28506 | 131058 | 113049 | 130870 | 482451 | 160817 | |
| F4 | 135450 | 97665 | 101738 | 112962 | 36446 | 59607 | 543868 | 135967 | |
| F5 | 412266 | 386171 | 409873 | 404555 | 58530 | 391120 | 2062515 | 412503 | |
| F6 | 517397 | 517397 | 517397 | 517397 | 517397 | 517397 | 3104382 | 517397 | |
| Total | 1138672 | 1029882 | 1058730 | 1300752 | 761858 | 1384812 |
|
| |
| C+ | F1 | 40 | 5 | 7 | 412 | 267 | 1445 | 2176 | 2176 |
| F2 | 103 | 12 | 20 | 1458 | 1140 | 1983 | 4716 | 2358 | |
| F3 | 484 | 85 | 121 | 7671 | 7367 | 7621 | 23349 | 7783 | |
| F4 | 2395 | 373 | 578 | 2308 | 1946 | 2096 | 9696 | 2424 | |
| F5 | 5277 | 3043 | 5145 | 5165 | 3106 | 4674 | 26410 | 5282 | |
| F6 | 23474 | 23474 | 23474 | 23474 | 23474 | 23474 | 140844 | 23474 | |
| Total | 31773 | 26992 | 29345 | 40488 | 37300 | 41293 |
|
|
Bold numbers indicate total numbers of tests and profiles
Fig. 3Percentages representing the frequency of each biomarker (a and b) and the completeness of profiles (c and d) for both culture-negative (C-) and culture-positive (C+) categories respectively
Fig. 4Distribution of measurements for each single biomarker grouped in two categories: culture-negative (C-) and culture-positive (C+). The inter-quartile range rule with threshold of 1.5 (IQRx1.5) has been applied to each category independently to discard outliers
Sampling method: performance comparison
| AUCROC | AUCPRB | SENS | SPEC | ||
|---|---|---|---|---|---|
| RANDU | GNB | 0.763 | 0.871 | 0.533 | 0.992 |
| DTC | 0.798 | 0.891 | 0.601 | 0.993 | |
| RFC | 0.791 | 0.892 | 0.583 | 0.993 | |
| SVM | 0.792 | 0.894 | 0.593 | 0.991 | |
| RANDO | GNB | 0.742 | 0.860 | 0.482 | 0.991 |
| DTC | 0.810 | 0.876 | 0.688 | 0.932 | |
| RFC | 0.801 | 0.901 | 0.617 | 0.990 | |
| SVM | 0.753 | 0.872 | 0.523 | 0.991 | |
| SMOTE | GNB | 0.814 | 0.872 | 0.725 | 0.903 |
| DTC | 0.779 | 0.881 | 0.636 | 0.963 | |
| RFC | 0.818 | 0.876 | 0.725 | 0.909 | |
| SVM | 0.830 | 0.884 | 0.747 | 0.912 |
Missing data: performance comparison
| AUCROC | AUCPRB | SENS | SPEC | ||
|---|---|---|---|---|---|
| GNB | F6 | 0.814 | 0.872 | 0.725 | 0.903 |
| F5 | 0.802 | 0.874 | 0.664 | 0.939 | |
| F4 | 0.803 | 0.874 | 0.669 | 0.938 | |
| F3 | 0.750 | 0.832 | 0.589 | 0.912 | |
| F2 | 0.686 | 0.816 | 0.400 | 0.971 | |
| F1 | 0.569 | 0.767 | 0.145 | 0.994 | |
| DTC | F6 | 0.799 | 0.881 | 0.636 | 0.963 |
| F5 | 0.777 | 0.859 | 0.614 | 0.940 | |
| F4 | 0.769 | 0.839 | 0.652 | 0.886 | |
| F3 | 0.702 | 0.777 | 0.672 | 0.732 | |
| F2 | 0.617 | 0.722 | 0.583 | 0.652 | |
| F1 | 0.535 | 0.684 | 0.480 | 0.590 | |
| RFC | F6 | 0.818 | 0.876 | 0.725 | 0.909 |
| F5 | 0.806 | 0.874 | 0.682 | 0.930 | |
| F4 | 0.805 | 0.867 | 0.707 | 0.903 | |
| F3 | 0.764 | 0.826 | 0.707 | 0.822 | |
| F2 | 0.704 | 0.796 | 0.504 | 0.904 | |
| F1 | 0.599 | 0.775 | 0.212 | 0.987 | |
| SVM | F6 | 0.830 | 0.884 | 0.747 | 0.912 |
| F5 | 0.816 | 0.885 | 0.687 | 0.944 | |
| F4 | 0.809 | 0.874 | 0.694 | 0.924 | |
| F3 | 0.768 | 0.837 | 0.654 | 0.881 | |
| F2 | 0.699 | 0.809 | 0.453 | 0.949 | |
| F1 | 0.591 | 0.785 | 0.186 | 0.996 |
Fig. 5Sensitivity and specificity variation for different degrees of missing inputs
Fig. 6ROC curves for the selected SVM classifier on different degrees of missing inputs. F4 indicates that four inputs are available
Fig. 7PR curves for the selected SVM classifier on different degrees of imbalanced categories. R60 indicates that 60% of the observations are culture-negative
Fig. 8Probability density distributions for each type of prediction in the confusion matrix: TP, TN, FP and FN