| Literature DB >> 31437950 |
Abstract
Secondary analysis of electronic health records for clinical research faces significant challenges due to known data quality issues in health data observationally collected for clinical care and the data biases caused by standard healthcare processes. In this manuscript, we contribute methodology for data quality assessment by plotting domain-level (conditions (diagnoses), drugs, and procedures) aggregate statistics and concept-level temporal frequencies (i.e., annual prevalence rates of clinical concepts). We detect common temporal patterns in concept frequencies by normalizing and clustering annual concept frequencies using K-means clustering. We apply these methods to the Columbia University Irving Medical Center Observational Medical Outcomes Partnership database. The resulting domain-aggregate and cluster plots show a variety of patterns. We review the patterns found in the condition domain and investigate the processes that shape them. We find that these patterns suggest data quality issues influenced by system-wide factors that affect individual concept frequencies.Entities:
Keywords: Cluster Analysis; Data Accuracy; Electronic Health Records
Mesh:
Year: 2019 PMID: 31437950 PMCID: PMC6857180 DOI: 10.3233/SHTI190248
Source DB: PubMed Journal: Stud Health Technol Inform ISSN: 0926-9630
Figure 1–Conceptual framework for analysis.
Figure 2–Total count (blue) and count-per-capita (orange) of a) conditions, b) drugs, c) procedures and d) people per year.
Figure 3–Count of unique concepts (orange) and the mean frequency of concepts (blue) per year for a) conditions, b) drugs, and c) procedures.
Figure 4–Example annual concept trends for the ten most prevalent conditions.
Figure 5–K-means clusters for conditions. Plots show cluster centroids (the cluster trend over time) with standard deviation across concepts as error bars (intracluster vaiance). Subplot titles show cluster labels and the percent of concepts belonging to each cluster.
Source vocabulary composition of condition clusters.
| Vocabulary ID | Cluster 0 | Cluster 1 | Remaining clusters |
|---|---|---|---|
| SNOMED | 0.2% | 1.6% | 0.0% |
| ICD10CM | 93.9% | 92.0% | 16.8% |
| ICD9CM | 6.0% | 6.4% | 83.2% |
Figure 6–Annual number of visits for the 10 most common visit concepts.