Literature DB >> 12463930

A successful technique for removing names in pathology reports using an augmented search and replace method.

Sean M Thomas1, Burke Mamlin, Gunther Schadow, Clement McDonald.   

Abstract

The ability to access large amounts of de-identified clinical data would facilitate epidemiologic and retrospective research. Previously described de-identification methods require knowledge of natural language processing or have not been made available to the public. We take advantage of the fact that the vast majority of proper names in pathology reports occur in pairs. In rare cases where one proper name is by itself, it is preceded or followed by an affix that identifies it as a proper name (Mrs., Dr., PhD). We created a tool based on this observation using substitution methods that was easy to implement and was largely based on publicly available data sources. We compiled a Clinical and Common Usage Word (CCUW) list as well as a fairly comprehensive proper name list. Despite the large overlap between these two lists, we were able to refine our methods to achieve accuracy similar to previous attempts at de-identification. Our method found 98.7% of 231 proper names in the narrative sections of pathology reports. Three single proper names were missed out of 1001 pathology reports (0.3%, no first name/last name pairs). It is unlikely that identification could be implied from this information. We will continue to refine our methods, specifically working to improve the quality of our CCUW and proper name lists to obtain higher levels of accuracy.

Entities:  

Mesh:

Year:  2002        PMID: 12463930      PMCID: PMC2244188     

Source DB:  PubMed          Journal:  Proc AMIA Symp        ISSN: 1531-605X


  3 in total

1.  Medical document anonymization with a semantic lexicon.

Authors:  P Ruch; R H Baud; A M Rassinoux; P Bouillon; G Robert
Journal:  Proc AMIA Symp       Date:  2000

2.  Replacing personally-identifying information in medical records, the Scrub system.

Authors:  L Sweeney
Journal:  Proc AMIA Annu Fall Symp       Date:  1996

3.  Guaranteeing anonymity when sharing medical data, the Datafly System.

Authors:  L Sweeney
Journal:  Proc AMIA Annu Fall Symp       Date:  1997
  3 in total
  28 in total

1.  Extracting structured information from free text pathology reports.

Authors:  Gunther Schadow; Clement J McDonald
Journal:  AMIA Annu Symp Proc       Date:  2003

2.  Automated extraction and normalization of findings from cancer-related free-text radiology reports.

Authors:  Burke W Mamlin; Daniel T Heinze; Clement J McDonald
Journal:  AMIA Annu Symp Proc       Date:  2003

Review 3.  Strategies for de-identification and anonymization of electronic health record data for use in multicenter research studies.

Authors:  Clete A Kushida; Deborah A Nichols; Rik Jadrnicek; Ric Miller; James K Walsh; Kara Griffin
Journal:  Med Care       Date:  2012-07       Impact factor: 2.983

4.  State-of-the-art anonymization of medical records using an iterative machine learning framework.

Authors:  György Szarvas; Richárd Farkas; Róbert Busa-Fekete
Journal:  J Am Med Inform Assoc       Date:  2007 Sep-Oct       Impact factor: 4.497

5.  A self-scaling, distributed information architecture for public health, research, and clinical care.

Authors:  Andrew J McMurry; Clint A Gilbert; Ben Y Reis; Henry C Chueh; Isaac S Kohane; Kenneth D Mandl
Journal:  J Am Med Inform Assoc       Date:  2007-04-25       Impact factor: 4.497

6.  Leveraging existing corpora for de-identification of psychiatric notes using domain adaptation.

Authors:  Hee-Jin Lee; Yaoyun Zhang; Kirk Roberts; Hua Xu
Journal:  AMIA Annu Symp Proc       Date:  2018-04-16

7.  A system for de-identifying medical message board text.

Authors:  Adrian Benton; Shawndra Hill; Lyle Ungar; Annie Chung; Charles Leonard; Cristin Freeman; John H Holmes
Journal:  BMC Bioinformatics       Date:  2011-06-09       Impact factor: 3.169

8.  Building gold standard corpora for medical natural language processing tasks.

Authors:  Louise Deleger; Qi Li; Todd Lingren; Megan Kaiser; Katalin Molnar; Laura Stoutenborough; Michal Kouril; Keith Marsolo; Imre Solti
Journal:  AMIA Annu Symp Proc       Date:  2012-11-03

9.  Preparing an annotated gold standard corpus to share with extramural investigators for de-identification research.

Authors:  Todd Lingren; Yizhao Ni; Louise Deleger; Megan Kaiser; Laura Stoutenborough; Keith Marsolo; Michal Kouril; Katalin Molnar; Imre Solti
Journal:  J Biomed Inform       Date:  2014-02-17       Impact factor: 6.317

10.  De-identification of primary care electronic medical records free-text data in Ontario, Canada.

Authors:  Karen Tu; Julie Klein-Geltink; Tezeta F Mitiku; Chiriac Mihai; Joel Martin
Journal:  BMC Med Inform Decis Mak       Date:  2010-06-18       Impact factor: 2.796

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.