Literature DB >> 12460633

Evaluating and reducing the effect of data corruption when applying bag of words approaches to medical records.

P Ruch1, R Baud, A Geissbühler.   

Abstract

Unlike journal corpora, which are supposed to be carefully reviewed before being published, the quality of documents in a patient record are often corrupted by mispelled words and conventional graphies or abbreviations. After a survey of the domain, the paper focuses on evaluating the effect of such corruption on an information retrieval (IR) engine. The IR system uses a classical bag of words approach, with stems as representation items and term frequency-inverse document frequency (tf-idf) as weighting schema; we pay special attention to the normalization factor. First results shows that even low corruption levels (3%) do affect retrieval effectiveness (4-7%), whereas higher corruption levels can affect retrieval effectiveness by 25%. Then, we show that the use of an improved automatic spelling correction system, applied on the corrupted collection, can almost restore the retrieval effectiveness of the engine.

Entities:  

Mesh:

Year:  2002        PMID: 12460633     DOI: 10.1016/s1386-5056(02)00057-6

Source DB:  PubMed          Journal:  Int J Med Inform        ISSN: 1386-5056            Impact factor:   4.046


  6 in total

1.  Initializing and Growing a Database of Health Information Technology (HIT) Events by Using TF-IDF and Biterm Topic Modeling.

Authors:  Hong Kang; Zhiguo Yu; Yang Gong
Journal:  AMIA Annu Symp Proc       Date:  2018-04-16

2.  Japanese EMRs and IT in Medicine: Expansion, Integration, and Reuse of Data.

Authors:  Katsuhiko Takabayashi; Shunsuke Doi; Takahiro Suzuki
Journal:  Healthc Inform Res       Date:  2011-09-30

3.  Integrative disease classification based on cross-platform microarray data.

Authors:  Chun-Chi Liu; Jianjun Hu; Mrinal Kalakrishnan; Haiyan Huang; Xianghong Jasmine Zhou
Journal:  BMC Bioinformatics       Date:  2009-01-30       Impact factor: 3.169

4.  Word2Vec inversion and traditional text classifiers for phenotyping lupus.

Authors:  Clayton A Turner; Alexander D Jacobs; Cassios K Marques; James C Oates; Diane L Kamen; Paul E Anderson; Jihad S Obeid
Journal:  BMC Med Inform Decis Mak       Date:  2017-08-22       Impact factor: 2.796

5.  Identifying influenza-like illness presentation from unstructured general practice clinical narrative using a text classifier rule-based expert system versus a clinical expert.

Authors:  Jayden MacRae; Tom Love; Michael G Baker; Anthony Dowell; Matthew Carnachan; Maria Stubbe; Lynn McBain
Journal:  BMC Med Inform Decis Mak       Date:  2015-10-06       Impact factor: 2.796

6.  Validation of a Natural Language Processing Algorithm for Detecting Infectious Disease Symptoms in Primary Care Electronic Medical Records in Singapore.

Authors:  Antony Hardjojo; Arunan Gunachandran; Long Pang; Mohammed Ridzwan Bin Abdullah; Win Wah; Joash Wen Chen Chong; Ee Hui Goh; Sok Huang Teo; Gilbert Lim; Mong Li Lee; Wynne Hsu; Vernon Lee; Mark I-Cheng Chen; Franco Wong; Jonathan Siung King Phang
Journal:  JMIR Med Inform       Date:  2018-06-11
  6 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.