OBJECTIVE: To semi-automatically induce semantic categories of eligibility criteria from text and to automatically classify eligibility criteria based on their semantic similarity. DESIGN: The UMLS semantic types and a set of previously developed semantic preference rules were utilized to create an unambiguous semantic feature representation to induce eligibility criteria categories through hierarchical clustering and to train supervised classifiers. MEASUREMENTS: We induced 27 categories and measured the prevalence of the categories in 27,278 eligibility criteria from 1578 clinical trials and compared the classification performance (i.e., precision, recall, and F1-score) between the UMLS-based feature representation and the "bag of words" feature representation among five common classifiers in Weka, including J48, Bayesian Network, Naïve Bayesian, Nearest Neighbor, and instance-based learning classifier. RESULTS: The UMLS semantic feature representation outperforms the "bag of words" feature representation in 89% of the criteria categories. Using the semantically induced categories, machine-learning classifiers required only 2000 instances to stabilize classification performance. The J48 classifier yielded the best F1-score and the Bayesian Network classifier achieved the best learning efficiency. CONCLUSION: The UMLS is an effective knowledge source and can enable an efficient feature representation for semi-automated semantic category induction and automatic categorization for clinical research eligibility criteria and possibly other clinical text.
OBJECTIVE: To semi-automatically induce semantic categories of eligibility criteria from text and to automatically classify eligibility criteria based on their semantic similarity. DESIGN: The UMLS semantic types and a set of previously developed semantic preference rules were utilized to create an unambiguous semantic feature representation to induce eligibility criteria categories through hierarchical clustering and to train supervised classifiers. MEASUREMENTS: We induced 27 categories and measured the prevalence of the categories in 27,278 eligibility criteria from 1578 clinical trials and compared the classification performance (i.e., precision, recall, and F1-score) between the UMLS-based feature representation and the "bag of words" feature representation among five common classifiers in Weka, including J48, Bayesian Network, Naïve Bayesian, Nearest Neighbor, and instance-based learning classifier. RESULTS: The UMLS semantic feature representation outperforms the "bag of words" feature representation in 89% of the criteria categories. Using the semantically induced categories, machine-learning classifiers required only 2000 instances to stabilize classification performance. The J48 classifier yielded the best F1-score and the Bayesian Network classifier achieved the best learning efficiency. CONCLUSION: The UMLS is an effective knowledge source and can enable an efficient feature representation for semi-automated semantic category induction and automatic categorization for clinical research eligibility criteria and possibly other clinical text.
Authors: Samson W Tu; Mor Peleg; Simona Carini; Michael Bobak; Jessica Ross; Daniel Rubin; Ida Sim Journal: J Biomed Inform Date: 2010-09-17 Impact factor: 6.317
Authors: Richard R Rubin; Wilfred Y Fujimoto; David G Marrero; Tina Brenneman; Jeanne B Charleston; Sharon L Edelstein; Edwin B Fisher; Ruth Jordan; William C Knowler; Lynne C Lichterman; Melvin Prince; Patricia M Rowe Journal: Control Clin Trials Date: 2002-04
Authors: Amy Y Wang; William J Lancaster; Matthew C Wyatt; Luke V Rasmussen; Daniel G Fort; James J Cimino Journal: AMIA Annu Symp Proc Date: 2018-04-16
Authors: Taylor R Pressler; Po-Yin Yen; Jing Ding; Jianhua Liu; Peter J Embi; Philip R O Payne Journal: BMC Med Inform Decis Mak Date: 2012-05-30 Impact factor: 2.796
Authors: Felix Köpcke; Benjamin Trinczek; Raphael W Majeed; Björn Schreiweis; Joachim Wenk; Thomas Leusch; Thomas Ganslandt; Christian Ohmann; Björn Bergh; Rainer Röhrig; Martin Dugas; Hans-Ulrich Prokosch Journal: BMC Med Inform Decis Mak Date: 2013-03-21 Impact factor: 2.796