Literature DB >> 10431513

Term domain distribution analysis: a data mining tool for text databases.

J A Goldman1, W W Chu, D S Parker, R M Goldman.   

Abstract

In this paper, we give a case history illustrating the real-world application of a useful technique for data mining of text databases. The technique, which we call Term Domain Distribution Analysis (TDDA), consists of keeping track of term frequencies for specific finite domains and announcing significant differences from standard frequency distributions over these domains as a hypothesis. TDDA is part of a larger framework, the Digital Filter Model, for data mining of text documents. In the case study presented, the domain of terms was the pair {right, left}, over which we expected a uniform distribution. In analyzing term frequencies in a thoracic lung cancer database, the TDDA technique led to the surprising discovery that primary thoracic lung cancer tumors appear in the right lung more often than the left lung, with a ratio of 3:2. Treating the text discovery as a hypothesis, we verified this relationship against the medical literature in which primary lung tumor sites were reported, using a standard chi 2 statistic. We subsequently developed a working theoretical model of lung cancer that may explain the discovery. This discovery and our model may change how oncologists view the mechanisms of primary lung tumor location.

Entities:  

Mesh:

Year:  1999        PMID: 10431513

Source DB:  PubMed          Journal:  Methods Inf Med        ISSN: 0026-1270            Impact factor:   2.176


  4 in total

Review 1.  Detecting adverse events using information technology.

Authors:  David W Bates; R Scott Evans; Harvey Murff; Peter D Stetson; Lisa Pizziferri; George Hripcsak
Journal:  J Am Med Inform Assoc       Date:  2003 Mar-Apr       Impact factor: 4.497

2.  Macromolecule mass spectrometry: citation mining of user documents.

Authors:  Ronald N Kostoff; Clifford D Bedford; J Antonio del Río; Héctor D Cortes; George Karypis
Journal:  J Am Soc Mass Spectrom       Date:  2004-03       Impact factor: 3.109

3.  eQuality for all: Extending automated quality measurement of free text clinical narratives.

Authors:  Steven H Brown; Peter L Elkin; S Trent Rosenbloom; Elliot Fielstein; Ted Speroff
Journal:  AMIA Annu Symp Proc       Date:  2008-11-06

4.  Identifying potential adverse effects using the web: a new approach to medical hypothesis generation.

Authors:  Adrian Benton; Lyle Ungar; Shawndra Hill; Sean Hennessy; Jun Mao; Annie Chung; Charles E Leonard; John H Holmes
Journal:  J Biomed Inform       Date:  2011-07-26       Impact factor: 6.317

  4 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.