| Literature DB >> 15360845 |
Serguei V Pakhomov1, James D Buntrock, Christopher G Chute.
Abstract
Classification of diagnoses (a.k.a. coding) is the central part of current concept based medical IR systems. Some classification systems contain over 30,000 distinct codes which makes classifying clinical documents a time consuming labor intensive and error prone process. This paper presents a simple methodology for cleaning up and reusing existing manually coded diagnostic statements mainly extracted from clinical notes to build predictive models using a sparse-feature implementation of a Naïve Bayes classifier. One of the problems addressed is that diagnostic statements often contain several diagnoses and are assigned several codes resulting in a multi-class classification problem. We investigate one possible way of addressing this problem by introducing compound (multiple code) categories. We present experimental results of classifying >16,000 randomly selected diagnostic strings into 19 top level categories. A small improvement (3%) with using compound categories over simple categories indicates that using multiple code categories is a promising solution, although clearly in need of further research and refinement.Mesh:
Year: 2004 PMID: 15360845
Source DB: PubMed Journal: Stud Health Technol Inform ISSN: 0926-9630