Literature DB >> 21658289

A system for de-identifying medical message board text.

Adrian Benton1, Shawndra Hill, Lyle Ungar, Annie Chung, Charles Leonard, Cristin Freeman, John H Holmes.   

Abstract

There are millions of public posts to medical message boards by users seeking support and information on a wide range of medical conditions. It has been shown that these posts can be used to gain a greater understanding of patients' experiences and concerns. As investigators continue to explore large corpora of medical discussion board data for research purposes, protecting the privacy of the members of these online communities becomes an important challenge that needs to be met. Extant entity recognition methods used for more structured text are not sufficient because message posts present additional challenges: the posts contain many typographical errors, larger variety of possible names, terms and abbreviations specific to Internet posts or a particular message board, and mentions of the authors' personal lives. The main contribution of this paper is a system to de-identify the authors of message board posts automatically, taking into account the aforementioned challenges. We demonstrate our system on two different message board corpora, one on breast cancer and another on arthritis. We show that our approach significantly outperforms other publicly available named entity recognition and de-identification systems, which have been tuned for more structured text like operative reports, pathology reports, discharge summaries, or newswire.

Entities:  

Mesh:

Year:  2011        PMID: 21658289      PMCID: PMC3111588          DOI: 10.1186/1471-2105-12-S3-S2

Source DB:  PubMed          Journal:  BMC Bioinformatics        ISSN: 1471-2105            Impact factor:   3.169


  10 in total

1.  PhysioBank, PhysioToolkit, and PhysioNet: components of a new research resource for complex physiologic signals.

Authors:  A L Goldberger; L A Amaral; L Glass; J M Hausdorff; P C Ivanov; R G Mark; J E Mietus; G B Moody; C K Peng; H E Stanley
Journal:  Circulation       Date:  2000-06-13       Impact factor: 29.690

2.  Identification of patient name references within medical documents using semantic selectional restrictions.

Authors:  Ricky K Taira; Alex A T Bui; Hooshang Kangarloo
Journal:  Proc AMIA Symp       Date:  2002

3.  A successful technique for removing names in pathology reports using an augmented search and replace method.

Authors:  Sean M Thomas; Burke Mamlin; Gunther Schadow; Clement McDonald
Journal:  Proc AMIA Symp       Date:  2002

4.  Evaluating the state-of-the-art in automatic de-identification.

Authors:  Ozlem Uzuner; Yuan Luo; Peter Szolovits
Journal:  J Am Med Inform Assoc       Date:  2007-06-28       Impact factor: 4.497

5.  Replacing personally-identifying information in medical records, the Scrub system.

Authors:  L Sweeney
Journal:  Proc AMIA Annu Fall Symp       Date:  1996

6.  The experiences of midlife women with migraines.

Authors:  Margaret F Moloney; Ora L Strickland; Sarah E DeRossett; Melissa K Melby; Alexa S Dietrich
Journal:  J Nurs Scholarsh       Date:  2006       Impact factor: 3.176

7.  The MITRE Identification Scrubber Toolkit: design, training, and assessment.

Authors:  John Aberdeen; Samuel Bayer; Reyyan Yeniterzi; Ben Wellner; Cheryl Clark; David Hanauer; Bradley Malin; Lynette Hirschman
Journal:  Int J Med Inform       Date:  2010-10-14       Impact factor: 4.046

8.  Peer-support in coping with medical uncertainty: discussion of oophorectomy and hormone replacement therapy on a web-based message board.

Authors:  Regina H Kenen; Pamela J Shapiro; Susan Friedman; James C Coyne
Journal:  Psychooncology       Date:  2007-08       Impact factor: 3.894

9.  The invisible reality of arthritis: a qualitative analysis of an online message board.

Authors:  Aimee Hadert; Karen Rodham
Journal:  Musculoskeletal Care       Date:  2008-09

10.  Automated de-identification of free-text medical records.

Authors:  Ishna Neamatullah; Margaret M Douglass; Li-wei H Lehman; Andrew Reisner; Mauricio Villarroel; William J Long; Peter Szolovits; George B Moody; Roger G Mark; Gari D Clifford
Journal:  BMC Med Inform Decis Mak       Date:  2008-07-24       Impact factor: 2.796

  10 in total
  13 in total

1.  medpie: an information extraction package for medical message board posts.

Authors:  A Benton; J H Holmes; S Hill; A Chung; L Ungar
Journal:  Bioinformatics       Date:  2012-01-19       Impact factor: 6.937

2.  Leveraging existing corpora for de-identification of psychiatric notes using domain adaptation.

Authors:  Hee-Jin Lee; Yaoyun Zhang; Kirk Roberts; Hua Xu
Journal:  AMIA Annu Symp Proc       Date:  2018-04-16

3.  BoB, a best-of-breed automated text de-identification system for VHA clinical documents.

Authors:  Oscar Ferrández; Brett R South; Shuying Shen; F Jeffrey Friedlin; Matthew H Samore; Stéphane M Meystre
Journal:  J Am Med Inform Assoc       Date:  2012-09-04       Impact factor: 4.497

4.  Building gold standard corpora for medical natural language processing tasks.

Authors:  Louise Deleger; Qi Li; Todd Lingren; Megan Kaiser; Katalin Molnar; Laura Stoutenborough; Michal Kouril; Keith Marsolo; Imre Solti
Journal:  AMIA Annu Symp Proc       Date:  2012-11-03

5.  Scalable Iterative Classification for Sanitizing Large-Scale Datasets.

Authors:  Bo Li; Yevgeniy Vorobeychik; Muqun Li; Bradley Malin
Journal:  IEEE Trans Knowl Data Eng       Date:  2016-11-11       Impact factor: 6.977

6.  LESSONS LEARNED ABOUT PUBLIC HEALTH FROM ONLINE CROWD SURVEILLANCE.

Authors:  Shawndra Hill; Raina Merchant; Lyle Ungar
Journal:  Big Data       Date:  2013-09-10       Impact factor: 2.128

7.  A hybrid approach to automatic de-identification of psychiatric notes.

Authors:  Hee-Jin Lee; Yonghui Wu; Yaoyun Zhang; Jun Xu; Hua Xu; Kirk Roberts
Journal:  J Biomed Inform       Date:  2017-06-07       Impact factor: 6.317

8.  Preparing an annotated gold standard corpus to share with extramural investigators for de-identification research.

Authors:  Todd Lingren; Yizhao Ni; Louise Deleger; Megan Kaiser; Laura Stoutenborough; Keith Marsolo; Michal Kouril; Katalin Molnar; Imre Solti
Journal:  J Biomed Inform       Date:  2014-02-17       Impact factor: 6.317

9.  De-identification of Address, Date, and Alphanumeric Identifiers in Narrative Clinical Reports.

Authors:  Mehmet Kayaalp; Allen C Browne; Zeyno A Dodd; Pamela Sagan; Clement J McDonald
Journal:  AMIA Annu Symp Proc       Date:  2014-11-14

10.  Automatic detection of protected health information from clinic narratives.

Authors:  Hui Yang; Jonathan M Garibaldi
Journal:  J Biomed Inform       Date:  2015-07-29       Impact factor: 6.317

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.