| Literature DB >> 11825186 |
U Hahn1, M Honeck, M Piotrowski, S Schulz.
Abstract
Many lexical items from medical sublanguages exhibit a complex morphological structure that is hard to account for by simple string matching (e.g., truncation). While inflection is usually easy to deal with, productive morphological processes in terms of derivation and (single-word) composition constitute a real challenge. We here propose an approach in which morphologically complex word forms are segmented into medically significant subwords. After segmentation, both query terms and document terms are submitted to the matching procedure. This way, problems arising from morphologically motivated word form alterations can be eliminated from the retrieval procedure. We provide empirical data which reveals that subword-based indexing and retrieval performs significantly better than conventional string matching approaches.Mesh:
Year: 2001 PMID: 11825186 PMCID: PMC2243631
Source DB: PubMed Journal: Proc AMIA Symp ISSN: 1531-605X