| Literature DB >> 30764811 |
Xia Jing1, Matthew Emerson2, David Masters2, Matthew Brooks2, Jacob Buskirk2, Nasseef Abukamail2, Chang Liu2, James J Cimino3, Jay Shubrook4, Sonsoles De Lacalle5, Yuchun Zhou6, Vimla L Patel7.
Abstract
BACKGROUND: Vast volumes of data, coded through hierarchical terminologies (e.g., International Classification of Diseases, Tenth Revision-Clinical Modification [ICD10-CM], Medical Subject Headings [MeSH]), are generated routinely in electronic health record systems and medical literature databases. Although graphic representations can help to augment human understanding of such data sets, a graph with hundreds or thousands of nodes challenges human comprehension. To improve comprehension, new tools are needed to extract the overviews of such data sets. We aim to develop a visual interactive analytic tool for filtering and summarizing large health data sets coded with hierarchical terminologies (VIADS) as an online, and publicly accessible tool. The ultimate goals are to filter, summarize the health data sets, extract insights, compare and highlight the differences between various health data sets by using VIADS. The results generated from VIADS can be utilized as data-driven evidence to facilitate clinicians, clinical researchers, and health care administrators to make more informed clinical, research, and administrative decisions. We utilized the following tools and the development environments to develop VIADS: Django, Python, JavaScript, Vis.js, Graph.js, JQuery, Plotly, Chart.js, Unittest, R, and MySQL.Entities:
Keywords: Data analytic tool; Data set filtering; Hierarchical terminology; Human comprehension; Visualization
Mesh:
Year: 2019 PMID: 30764811 PMCID: PMC6376747 DOI: 10.1186/s12911-019-0750-y
Source DB: PubMed Journal: BMC Med Inform Decis Mak ISSN: 1472-6947 Impact factor: 2.796
Fig. 1VIADS architecture design and relationships among different modules (V refers to the validation and preparation module; a single arrow indicates a user can move toward one direction; a double arrow indicates a user can move both directions)
Usage comparison between guest users and registered users in VIADS
| Guest Users | Registered Users | |
|---|---|---|
| Register | N/A | √ |
| Log in | N/A | √ |
| Upload test data sets | √ | √ |
| Select algorithm | √ | √ |
| Tune thresholds to filter data sets | √ | √ |
| Generate summary views >= thresholds | √ | √ |
| Generate comparison views >= thresholds | √ | √ |
| Export analytic results | √ | √ |
| Test data sets and analytic results | Deleted at the end of the visit or deleted >= 24 h | Saved on the server side for at least 5 years (free); can be deleted by users manually |
| Access to the saved results | N/A | √ + VIADS maintenance personnel |
| Log out | N/A | √ |
| Web site visit log records | Kept | Kept |
Acceptable data sets’ format and size in VIADS
| Data set | Node ID (code) | Usage frequency |
|---|---|---|
| ICD9-CM | 250.00 | 1223 |
| 401.9 | 25,567 | |
| … … | … | |
| ICD10-CM | E11.9 | 4559 |
| I10 | 3000 | |
| … … | … | |
| MeSH | A0087342 | 26,460 |
| A0021563 | 24,459 | |
| … … | … | |
| Acceptable data set size for VIADS | Patient counts | > = 100 |
| Event counts | > = 1000 |
Algorithms implemented in VIADS with examples of their usage
| Filter | Definition | Usage example |
|---|---|---|
| NC (node counts) | NC = usage frequency of a node (ICD code or MeSH term) | Displaying a summary view: the most frequently used MeSH terms and their ancestors in 2011 |
| CC (class counts) | CCnode = NCdescendant1 + NCdescendant2 + NCdescendant 3… | Displaying a summary view: the most frequently used ICD9-CM codes in 2011 in a selected institution |
| Ratio | Ratio = CCchild node/ CCparent node | Identifying the largest MeSH contributors to upper-level MeSH terms and their ancestors in 2011 |
| Top nodes | Top NC nodes (numbers) | Displaying the top 50 or (top 5%) of ICD9-CM codes that have the highest NC (or CC) in 2018 in a selected institution. This algorithm can show the most important nodes in a date set |
| Systematic comparison (data set1 vs data set2) | CCnode1 vs CCnode1; | Displaying a comparison view: the most significant different ICD9-CM codes between pioglitazone(data set1) and rosiglitazone (data set2) groups after systematical comparison |
| Combination | NC + Ratio | Displaying a summary view: the most frequently used MeSH terms and the largest MeSH contributors and their ancestors in 2011 |
| CC + Ratio | Displaying a summary view: the most frequently used ICD9-CM codes and the largest ICD9-CM contributors and their ancestors in 2011 |
Fig. 2VIADS analytic engine workflow
Fig. 3Graphs before (upper, an original graph with 1066 nodes) and after (lower, filtered graph with 56 nodes, top 5% CC) filtering within VIADS by using top CC% algorithm (colors indicate the values of CC; red > > green)
Comparison of AmiGo 2, GoMiner, and VIADS
| AmiGo 2 | GoMiner | VIADS | |
|---|---|---|---|
| Analyze | |||
| Provide biological interpretation | – | √ | – |
| Upload data sets | – | √ | √ |
| Select algorithm/versions | – | √ | √ |
| Tune thresholds to filter | – | – | √ |
| Generate summary views | – | √ | √ |
| Generate comparison views | – | – | √ |
| Acceptable data set | – | Genomics data, Microarray data, proteomics data | Data sets coded with hierarchical terminologies + usage frequencies |
| Applicable terminologies | GO | GO | ICD9-CM, ICD10-CM, MeSH |
| Browse | |||
| GO | √ | √ | – |
| Search | |||
| External links | √ | √ | – |
| GO | √ | √ | – |
| Visualize | |||
| Tree-like structure | √ (GO) | √ (GO) | √ (Summary views) |
| Statistical analysis results | – | √ | √ |