Literature DB >> 14728443

Automatic learning of the morphology of medical language using information compression.

Shamim Ara Mollah1, Stephen B Johnson.   

Abstract

Conversion of free-text strings in a natural language to a standard representation (codes) is an important reoccurring problem in biomedical informatics. Determining the content of a string involves identifying its meaningful constituents (morphemes). One current method of identifying these constituents is to look them up in a preexisting table (lexicon). Manual construction of lexicons and grammars in complex domains such as biomedicine is extremely laborious. As an alternative to the lexico-grammatical approach, we introduce a segmentation algorithm that automatically learns lexical and structural preferences from corpora via information compression. The method is based on the Minimum Description Length (MDL) principle from classic information theory.

Entities:  

Mesh:

Year:  2003        PMID: 14728443      PMCID: PMC1480252     

Source DB:  PubMed          Journal:  AMIA Annu Symp Proc        ISSN: 1559-4076


  1 in total

1.  Word segmentation processing: a way to exponentially extend medical dictionaries.

Authors:  C Lovis; P A Michel; R Baud; J R Scherrer
Journal:  Medinfo       Date:  1995
  1 in total
  1 in total

1.  An unsupervised machine learning approach to segmentation of clinician-entered free text.

Authors:  Jesse O Wrenn; Peter D Stetson; Stephen B Johnson
Journal:  AMIA Annu Symp Proc       Date:  2007-10-11
  1 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.