| Literature DB >> 14728153 |
Paul S Cho1, Ricky K Taira, Hooshang Kangarloo.
Abstract
Automated segmentation of medical reports can significantly enhance the productivity of the healthcare departments. While many algorithms have been developed for document summarization, passage retrieval, and story segmentation of news feeds, much less effort has been devoted to parsing of medical documents. We present an algorithm specifically developed for medical applications. The algorithm consists of two components. First, a rule-based algorithm is used to detect the sections that contain labels. It utilizes a knowledge base of commonly employed heading labels and linguistic cues seen within training examples. The second part of the algorithm handles the detection of unlabeled sections. It uses a combination of lexical pattern recognition and a classifier based on an expectation model for a particular class of medical reports. The proposed method was evaluated on three test corpora containing a total of 129,303 report sections. The detection rates for labeled and unlabeled sections for individual corpus ranged from 97.4% to 99.4% and from 96.5% to 99.0%, respectively. The rule-based approach is particularly effective for medical reports due to inherently structured nature of these documents.Mesh:
Year: 2003 PMID: 14728153 PMCID: PMC1479978
Source DB: PubMed Journal: AMIA Annu Symp Proc ISSN: 1559-4076