Corey W Arnold1, Andrea Oh2, Shawn Chen2, William Speier2. 1. Medical Imaging and Informatics Group, Department of Radiological Sciences, University of California, Los Angeles, United States. Electronic address: cwarnold@ucla.edu. 2. Medical Imaging and Informatics Group, Department of Radiological Sciences, University of California, Los Angeles, United States.
Abstract
BACKGROUND AND OBJECTIVE: Probabilistic topic models provide an unsupervised method for analyzing unstructured text. These models discover semantically coherent combinations of words (topics) that could be integrated in a clinical automatic summarization system for primary care physicians performing chart review. However, the human interpretability of topics discovered from clinical reports is unknown. Our objective is to assess the coherence of topics and their ability to represent the contents of clinical reports from a primary care physician's point of view. METHODS: Three latent Dirichlet allocation models (50 topics, 100 topics, and 150 topics) were fit to a large collection of clinical reports. Topics were manually evaluated by primary care physicians and graduate students. Wilcoxon Signed-Rank Tests for Paired Samples were used to evaluate differences between different topic models, while differences in performance between students and primary care physicians (PCPs) were tested using Mann-Whitney U tests for each of the tasks. RESULTS: While the 150-topic model produced the best log likelihood, participants were most accurate at identifying words that did not belong in topics learned by the 100-topic model, suggesting that 100 topics provides better relative granularity of discovered semantic themes for the data set used in this study. Models were comparable in their ability to represent the contents of documents. Primary care physicians significantly outperformed students in both tasks. CONCLUSION: This work establishes a baseline of interpretability for topic models trained with clinical reports, and provides insights on the appropriateness of using topic models for informatics applications. Our results indicate that PCPs find discovered topics more coherent and representative of clinical reports relative to students, warranting further research into their use for automatic summarization.
BACKGROUND AND OBJECTIVE: Probabilistic topic models provide an unsupervised method for analyzing unstructured text. These models discover semantically coherent combinations of words (topics) that could be integrated in a clinical automatic summarization system for primary care physicians performing chart review. However, the human interpretability of topics discovered from clinical reports is unknown. Our objective is to assess the coherence of topics and their ability to represent the contents of clinical reports from a primary care physician's point of view. METHODS: Three latent Dirichlet allocation models (50 topics, 100 topics, and 150 topics) were fit to a large collection of clinical reports. Topics were manually evaluated by primary care physicians and graduate students. Wilcoxon Signed-Rank Tests for Paired Samples were used to evaluate differences between different topic models, while differences in performance between students and primary care physicians (PCPs) were tested using Mann-Whitney U tests for each of the tasks. RESULTS: While the 150-topic model produced the best log likelihood, participants were most accurate at identifying words that did not belong in topics learned by the 100-topic model, suggesting that 100 topics provides better relative granularity of discovered semantic themes for the data set used in this study. Models were comparable in their ability to represent the contents of documents. Primary care physicians significantly outperformed students in both tasks. CONCLUSION: This work establishes a baseline of interpretability for topic models trained with clinical reports, and provides insights on the appropriateness of using topic models for informatics applications. Our results indicate that PCPs find discovered topics more coherent and representative of clinical reports relative to students, warranting further research into their use for automatic summarization.
Authors: Lisa Pizziferri; Anne F Kittler; Lynn A Volk; Melissa M Honour; Sameer Gupta; Samuel Wang; Tiffany Wang; Margaret Lippincott; Qi Li; David W Bates Journal: J Biomed Inform Date: 2004-12-14 Impact factor: 6.317
Authors: Thomas R Konrad; Carol L Link; Rebecca J Shackelton; Lisa D Marceau; Olaf von dem Knesebeck; Johannes Siegrist; Sara Arber; Ann Adams; John B McKinlay Journal: Med Care Date: 2010-02 Impact factor: 2.983
Authors: Jamie S Hirsch; Jessica S Tanenbaum; Sharon Lipsky Gorman; Connie Liu; Eric Schmitz; Dritan Hashorva; Artem Ervits; David Vawdrey; Marc Sturm; Noémie Elhadad Journal: J Am Med Inform Assoc Date: 2014-10-28 Impact factor: 4.497
Authors: Daniel J Feller; Jason Zucker; Michael T Yin; Peter Gordon; Noémie Elhadad Journal: J Acquir Immune Defic Syndr Date: 2018-02-01 Impact factor: 3.731
Authors: Antony Hardjojo; Arunan Gunachandran; Long Pang; Mohammed Ridzwan Bin Abdullah; Win Wah; Joash Wen Chen Chong; Ee Hui Goh; Sok Huang Teo; Gilbert Lim; Mong Li Lee; Wynne Hsu; Vernon Lee; Mark I-Cheng Chen; Franco Wong; Jonathan Siung King Phang Journal: JMIR Med Inform Date: 2018-06-11