| Literature DB >> 31437931 |
Wonsuk Oh1, Michael S Steinbach2, M Regina Castro3, Kevin A Peterson4, Vipin Kumar2, Pedro J Caraballo5, Gyorgy J Simona6.
Abstract
Different analytic techniques operate optimally with different types of data. As the use of EHR-based analytics expands to newer tasks, data will have to be transformed into different representations, so the tasks can be optimally solved. We classified representations into broad categories based on their characteristics, and proposed a new knowledge-driven representation for clinical data mining as well as trajectory mining, called Severity Encoding Variables (SEVs). Additionally, we studied which characteristics make representations most suitable for particular clinical analytics tasks including trajectory mining. Our evaluation shows that, for regression, most data representations performed similarly, with SEV achieving a slight (albeit statistically significant) advantage. For patients at high risk of diabetes, it outperformed the competing representation by (relative) 20%. For association mining, SEV achieved the highest performance. Its ability to constrain the search space of patterns through clinical knowledge was key to its success.Entities:
Keywords: Data Mining; Data Science; Electronic Health Records
Mesh:
Year: 2019 PMID: 31437931 PMCID: PMC7666864 DOI: 10.3233/SHTI190229
Source DB: PubMed Journal: Stud Health Technol Inform ISSN: 0926-9630
Figure 1:Sample Severity Encoding Variable Hierarchy for Hyperlipidemia. Abbreviations used: Treatment (Tx), Diagnosis (Dx), High-density lipoprotein (HDL), Low-density lipoprotein (LDL), Triglycerides (TG).
Figure 2:(a) Performance comparison of data representations for the regression task. (b) Comparison of concordance on subpopulation with Framingham score ≥ 20.
Figure 3:(a) Comparison of the predictive performance of the association patterns discovered using the various data representations as a function of the minimum support in cases (minsupC) (b) The number of association patterns discovered using the various data representations. (minsupC=5)
Categorization of the Data Representations.
| Outcome-Specific | Outcome Independent | ||
|---|---|---|---|
| Data-Driven | Knowledge-Driven | ||
| Dimensionality-Reducing | SS | PCA | SEV |
| Dimensionality-Expanding | DAE-34 | SEV | |