Literature DB >> 24727931

A complementary graphical method for reducing and analyzing large data sets. Case studies demonstrating thresholds setting and selection.

X Jing1, J J Cimino.   

Abstract

OBJECTIVES: Graphical displays can make data more understandable; however, large graphs can challenge human comprehension. We have previously described a filtering method to provide high-level summary views of large data sets. In this paper we demonstrate our method for setting and selecting thresholds to limit graph size while retaining important information by applying it to large single and paired data sets, taken from patient and bibliographic databases.
METHODS: Four case studies are used to illustrate our method. The data are either patient discharge diagnoses (coded using the International Classification of Diseases, Clinical Modifications [ICD9-CM]) or Medline citations (coded using the Medical Subject Headings [MeSH]). We use combinations of different thresholds to obtain filtered graphs for detailed analysis. The thresholds setting and selection, such as thresholds for node counts, class counts, ratio values, p values (for diff data sets), and percentiles of selected class count thresholds, are demonstrated with details in case studies. The main steps include: data preparation, data manipulation, computation, and threshold selection and visualization. We also describe the data models for different types of thresholds and the considerations for thresholds selection.
RESULTS: The filtered graphs are 1%-3% of the size of the original graphs. For our case studies, the graphs provide 1) the most heavily used ICD9-CM codes, 2) the codes with most patients in a research hospital in 2011, 3) a profile of publications on "heavily represented topics" in MEDLINE in 2011, and 4) validated knowledge about adverse effects of the medication of rosiglitazone and new interesting areas in the ICD9-CM hierarchy associated with patients taking the medication of pioglitazone.
CONCLUSIONS: Our filtering method reduces large graphs to a manageable size by removing relatively unimportant nodes. The graphical method provides summary views based on computation of usage frequency and semantic context of hierarchical terminology. The method is applicable to large data sets (such as a hundred thousand records or more) and can be used to generate new hypotheses from data sets coded with hierarchical terminologies.

Entities:  

Keywords:  Data mining method; clinical data repository; data analysis; data filtering method; data visualization; hierarchical terminology; threshold selection; threshold setting

Mesh:

Substances:

Year:  2014        PMID: 24727931      PMCID: PMC4209908          DOI: 10.3414/ME13-01-0075

Source DB:  PubMed          Journal:  Methods Inf Med        ISSN: 0026-1270            Impact factor:   2.176


  10 in total

1.  World-Wide Web-based graphical user interfaces for laboratory data.

Authors:  D Keller; W J Schaller; J S K Wong; P C de Groen
Journal:  Methods Inf Med       Date:  2002       Impact factor: 2.176

2.  Graphical methods for reducing, visualizing and analyzing large data sets using hierarchical terminologies.

Authors:  Xia Jing; James J Cimino
Journal:  AMIA Annu Symp Proc       Date:  2011-10-22

3.  Multiparameter Intelligent Monitoring in Intensive Care II: a public-access intensive care unit database.

Authors:  Mohammed Saeed; Mauricio Villarroel; Andrew T Reisner; Gari Clifford; Li-Wei Lehman; George Moody; Thomas Heldt; Tin H Kyaw; Benjamin Moody; Roger G Mark
Journal:  Crit Care Med       Date:  2011-05       Impact factor: 7.598

4.  Visualization of medical data based on EHR standards.

Authors:  G Kopanitsa; C Hildebrand; J Stausberg; K H Englmeier
Journal:  Methods Inf Med       Date:  2012-12-07       Impact factor: 2.176

5.  The Unified Medical Language System.

Authors:  D A Lindberg; B L Humphreys; A T McCray
Journal:  Methods Inf Med       Date:  1993-08       Impact factor: 2.176

6.  LOINC® - A Universal Catalog of Individual Clinical Observations and Uniform Representation of Enumerated Collections.

Authors:  Daniel J Vreeman; Clement J McDonald; Stanley M Huff
Journal:  Int J Funct Inform Personal Med       Date:  2011-05-23

7.  The clinical research data repository of the US National Institutes of Health.

Authors:  James J Cimino; Elaine J Ayres
Journal:  Stud Health Technol Inform       Date:  2010

8.  Serving the enterprise and beyond with informatics for integrating biology and the bedside (i2b2).

Authors:  Shawn N Murphy; Griffin Weber; Michael Mendis; Vivian Gainer; Henry C Chueh; Susanne Churchill; Isaac Kohane
Journal:  J Am Med Inform Assoc       Date:  2010 Mar-Apr       Impact factor: 4.497

9.  Comprehensive decision tree models in bioinformatics.

Authors:  Gregor Stiglic; Simon Kocbek; Igor Pernek; Peter Kokol
Journal:  PLoS One       Date:  2012-03-30       Impact factor: 3.240

10.  Rapid identification of myocardial infarction risk associated with diabetes medications using electronic medical records.

Authors:  John S Brownstein; Shawn N Murphy; Allison B Goldfine; Richard W Grant; Margarita Sordo; Vivian Gainer; Judith A Colecchi; Anil Dubey; David M Nathan; John P Glaser; Isaac S Kohane
Journal:  Diabetes Care       Date:  2009-12-15       Impact factor: 19.112

  10 in total
  3 in total

1.  Feasibility of Population Health Analytics and Data Visualization for Decision Support in the Infectious Diseases Domain: A pilot study.

Authors:  Don Roosan; Guilherme Del Fiol; Jorie Butler; Yarden Livnat; Jeanmarie Mayer; Matthew Samore; Makoto Jones; Charlene Weir
Journal:  Appl Clin Inform       Date:  2016-06-29       Impact factor: 2.342

2.  A visual interactive analytic tool for filtering and summarizing large health data sets coded with hierarchical terminologies (VIADS).

Authors:  Xia Jing; Matthew Emerson; David Masters; Matthew Brooks; Jacob Buskirk; Nasseef Abukamail; Chang Liu; James J Cimino; Jay Shubrook; Sonsoles De Lacalle; Yuchun Zhou; Vimla L Patel
Journal:  BMC Med Inform Decis Mak       Date:  2019-02-14       Impact factor: 2.796

3.  The Roles of a Secondary Data Analytics Tool and Experience in Scientific Hypothesis Generation in Clinical Research: Protocol for a Mixed Methods Study.

Authors:  Xia Jing; Vimla L Patel; James J Cimino; Jay H Shubrook; Yuchun Zhou; Chang Liu; Sonsoles De Lacalle
Journal:  JMIR Res Protoc       Date:  2022-07-18
  3 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.