| Literature DB >> 31337390 |
Hossein Estiri1,2, Jeffrey G Klann3,4, Shawn N Murphy3,4,5.
Abstract
BACKGROUND: Identifying implausible clinical observations (e.g., laboratory test and vital sign values) in Electronic Health Record (EHR) data using rule-based procedures is challenging. Anomaly/outlier detection methods can be applied as an alternative algorithmic approach to flagging such implausible values in EHRs.Entities:
Keywords: Anomaly detection; Data quality; Electronic health records; Implausible observations; Informatics applications; Unsupervised clustering
Mesh:
Year: 2019 PMID: 31337390 PMCID: PMC6652024 DOI: 10.1186/s12911-019-0852-6
Source DB: PubMed Journal: BMC Med Inform Decis Mak ISSN: 1472-6947 Impact factor: 2.796
Fig. 1Detecting implausible observations through unsupervised clustering. * density values are mirrored around 0 for visualization purpose
Fig. 2Parallel implementation of the clustering solution for identifying implausible EHR observations
Specificity from the clustering approach for identifying implausible lab observations in EHRs
* Columns represent different thresholds, a, for flagging a cluster as implausible
** Best performances are highlighted
Sensitivity from the clustering approach for identifying implausible lab observations in EHRs
* Columns represent different thresholds, a for flagging a cluster as implausible.
** Best performances are highlighted
Comparing performance between conventional anomaly detection (CAD) and the proposed clustering approach
* best performances are highlighted
** ties are in Bold
*** best sensitivity among CAD methods was obtained from applying Mahalanobis Distances and 3.717526 (sqrt of 13.82) as critical value
**** best specificity among CAD methods was obtained from using 6 standard deviations as threshold for identifying outliers
Fig. 3Changes in specificity of the clustering approach by
Fig. 4Data distribution and implausible value detection for Troponin I.cardiac (LOINC: 10839–9) and Cholesterol in LDL (LOINC: 13457–7). * x-axes are transformed to square root for visualization purpose
Fig. 5Comparing the specificity (1-specificity) performance between conventional anomaly detection and the clustering approach. * Y-axis is transformed into square root to highlight differences
Fig. 6Comparing the sensitivity (1-sensitivity) performance between conventional anomaly detection and the clustering approach. * Y-axis is transformed into square root to highlight differences. 1-sensitivity is used for visualization purpose
Fig. 7Pairwise comparison of false positive cases between conventional anomaly detection (CAD) and the clustering approaches
Silver-standard low and high ranges for implausible observation values
| LOINC | Low implausible | High implausible | Long common name |
|---|---|---|---|
| 1742-6 | 0 | 2500 | Alanine aminotransferase [Enzymatic activity/volume] in Serum or Plasma |
| 1751-7 | 0 | 20 | Albumin [Mass/volume] in Serum or Plasma |
| 2862-1 | 0 | 20 | Albumin [Mass/volume] in Serum or Plasma by Electrophoresis |
| 6768-6 | 0 | 5000 | Alkaline phosphatase [Enzymatic activity/volume] in Serum or Plasma |
| 1920-8 | 0 | 12500 | Aspartate aminotransferase [Enzymatic activity/volume] in Serum or Plasma |
| 26444-0 | 0 | 100 | Basophils [#/volume] in Blood |
| 704-7 | 0 | 5 | Basophils [#/volume] in Blood by Automated count |
| 706-2 | 0 | 50 | Basophils/100 leukocytes in Blood by Automated count |
| 707-0 | 0 | 50 | Basophils/100 leukocytes in Blood by Manual count |
| 1959-6 | 0 | 100 | Bicarbonate [Moles/volume] in Blood |
| 1971-1 | 0 | 50 | Bilirubin.indirect [Mass/volume] in Serum or Plasma |
| 1975-2 | 0 | 50 | Bilirubin.total [Mass/volume] in Serum or Plasma |
| 29463-7 | 0 | 1400 | Body weight |
| 2093-3 | 0 | 1500 | Cholesterol [Mass/volume] in Serum or Plasma |
| 2085-9 | 0 | 450 | Cholesterol in HDL [Mass/volume] in Serum or Plasma |
| 2089-1 | 0 | 780 | Cholesterol in LDL [Mass/volume] in Serum or Plasma |
| 13457-7 | 0 | 1000 | Cholesterol in LDL [Mass/volume] in Serum or Plasma by calculation |
| 9830-1 | 0 | 100 | Cholesterol.total/Cholesterol in HDL [Mass Ratio] in Serum or Plasma |
| 8462-4 | 0 | 360 | Diastolic blood pressure |
| 26449-9 | 0 | 100 | Eosinophils [#/volume] in Blood |
| 26450-7 | 0 | 100 | Eosinophils/100 leukocytes in Blood |
| 713-8 | 0 | 100 | Eosinophils/100 leukocytes in Blood by Automated count |
| 714-6 | 0 | 100 | Eosinophils/100 leukocytes in Blood by Manual count |
| 789-8 | 0 | 25 | Erythrocytes [#/volume] in Blood by Automated count |
| 2324-2 | 0 | 2500 | Gamma glutamyl transferase [Enzymatic activity/volume] in Serum or Plasma |
| 2339-0 | 0 | 1000 | Glucose [Mass/volume] in Blood |
| 30313-1 | 0.5 | 75 | Hemoglobin [Mass/volume] in Arterial blood |
| 718-7 | 0.5 | 75 | Hemoglobin [Mass/volume] in Blood |
| 4548-4 | 0 | 30 | Hemoglobin A1c/Hemoglobin.total in Blood |
| 6690-2 | 0 | 500 | Leukocytes [#/volume] in Blood by Automated count |
| 26478-8 | 0 | 100 | Lymphocytes/100 leukocytes in Blood |
| 736-9 | 0 | 100 | Lymphocytes/100 leukocytes in Blood by Automated count |
| 737-7 | 0 | 100 | Lymphocytes/100 leukocytes in Blood by Manual count |
| 785-6 | 0 | 100 | MCH [Entitic mass] by Automated count |
| 786-4 | 0 | 100 | MCHC [Mass/volume] by Automated count |
| 787-2 | 0 | 400 | MCV [Entitic volume] by Automated count |
| 26484-6 | 0 | 100 | Monocytes [#/volume] in Blood |
| 742-7 | 0 | 100 | Monocytes [#/volume] in Blood by Automated count |
| 26485-3 | 0 | 100 | Monocytes/100 leukocytes in Blood |
| 5905-5 | 0 | 100 | Monocytes/100 leukocytes in Blood by Automated count |
| 744-3 | 0 | 100 | Monocytes/100 leukocytes in Blood by Manual count |
| 26499-4 | 0 | 400 | Neutrophils [#/volume] in Blood |
| 26511-6 | 0 | 100 | Neutrophils/100 leukocytes in Blood |
| 6298-4 | 0 | 30 | Potassium [Moles/volume] in Blood |
| 2885-2 | 0 | 30 | Protein [Mass/volume] in Serum or Plasma |
| 2947-0 | 0 | 580 | Sodium [Moles/volume] in Blood |
| 8480-6 | 0 | 560 | Systolic blood pressure |
| 2571-8 | 0 | 2500 | Triglyceride [Mass/volume] in Serum or Plasma |
| 10839-9 | 0 | 20 | Troponin I.cardiac [Mass/volume] in Serum or Plasma |
| 6598-7 | 0 | 20 | Troponin T.cardiac [Mass/volume] in Serum or Plasma |
athe silver standard low and high ranges for implausible observation values are defined by the authors based on literature search and expert judgement, and validate using data distributions