T Hao, C Weng1. 1. Chunhua Weng, Ph.D., Associate Professor, Department of Biomedical Informatics, Columbia University, 622 W 168 Street, PH-20, New York, NY, 10032, USA, E-mail: cw2384@columbia.edu.
Abstract
OBJECTIVES: To develop an adaptive approach to mine frequent semantic tags (FSTs) from heterogeneous clinical research texts. METHODS: We develop a "plug-n-play" framework that integrates replaceable unsupervised kernel algorithms with formatting, functional, and utility wrappers for FST mining. Temporal information identification and semantic equivalence detection were two example functional wrappers. We first compared this approach's recall and efficiency for mining FSTs from ClinicalTrials.gov to that of a recently published tag-mining algorithm. Then we assessed this approach's adaptability to two other types of clinical research texts: clinical data requests and clinical trial protocols, by comparing the prevalence trends of FSTs across three texts. RESULTS: Our approach increased the average recall and speed by 12.8% and 47.02% respectively upon the baseline when mining FSTs from ClinicalTrials.gov, and maintained an overlap in relevant FSTs with the base- line ranging between 76.9% and 100% for varying FST frequency thresholds. The FSTs saturated when the data size reached 200 documents. Consistent trends in the prevalence of FST were observed across the three texts as the data size or frequency threshold changed. CONCLUSIONS: This paper contributes an adaptive tag-mining framework that is scalable and adaptable without sacrificing its recall. This component-based architectural design can be potentially generalizable to improve the adaptability of other clinical text mining methods.
OBJECTIVES: To develop an adaptive approach to mine frequent semantic tags (FSTs) from heterogeneous clinical research texts. METHODS: We develop a "plug-n-play" framework that integrates replaceable unsupervised kernel algorithms with formatting, functional, and utility wrappers for FST mining. Temporal information identification and semantic equivalence detection were two example functional wrappers. We first compared this approach's recall and efficiency for mining FSTs from ClinicalTrials.gov to that of a recently published tag-mining algorithm. Then we assessed this approach's adaptability to two other types of clinical research texts: clinical data requests and clinical trial protocols, by comparing the prevalence trends of FSTs across three texts. RESULTS: Our approach increased the average recall and speed by 12.8% and 47.02% respectively upon the baseline when mining FSTs from ClinicalTrials.gov, and maintained an overlap in relevant FSTs with the base- line ranging between 76.9% and 100% for varying FST frequency thresholds. The FSTs saturated when the data size reached 200 documents. Consistent trends in the prevalence of FST were observed across the three texts as the data size or frequency threshold changed. CONCLUSIONS: This paper contributes an adaptive tag-mining framework that is scalable and adaptable without sacrificing its recall. This component-based architectural design can be potentially generalizable to improve the adaptability of other clinical text mining methods.
Entities:
Keywords:
Medical informatics; clinical trials; component-based architecture; semantic tags; text mining
Authors: David R Lynch; Massimo Pandolfo; Jorg B Schulz; Susan Perlman; Martin B Delatycki; R Mark Payne; Robert Shaddy; Kenneth H Fischbeck; Jennifer Farmer; Paul Kantor; Subha V Raman; Lisa Hunegs; Joanne Odenkirchen; Kristy Miller; Petra Kaufmann Journal: Mov Disord Date: 2012-12-12 Impact factor: 10.338
Authors: Gesine Respondek; Sigrun Roeber; Hans Kretzschmar; Claire Troakes; Safa Al-Sarraj; Ellen Gelpi; Carles Gaig; Wang Zheng Chiu; John C van Swieten; Wolfgang H Oertel; Günter U Höglinger Journal: Mov Disord Date: 2013-02-21 Impact factor: 10.338
Authors: Eugene Fink; Princeton K Kokku; Savvas Nikiforou; Lawrence O Hall; Dmitry B Goldgof; Jeffrey P Krischer Journal: Artif Intell Med Date: 2004-07 Impact factor: 5.326