Literature DB >> 12741890

Concept-match medical data scrubbing. How pathology text can be used in research.

Jules J Berman1.   

Abstract

CONTEXT: In the normal course of activity, pathologists create and archive immense data sets of scientifically valuable information. Researchers need pathology-based data sets, annotated with clinical information and linked to archived tissues, to discover and validate new diagnostic tests and therapies. Pathology records can be used for research purposes (without obtaining informed patient consent for each use of each record), provided the data are rendered harmless. Large data sets can be made harmless through 3 computational steps: (1) deidentification, the removal or modification of data fields that can be used to identify a patient (name, social security number, etc); (2) rendering the data ambiguous, ensuring that every data record in a public data set has a nonunique set of characterizing data; and (3) data scrubbing, the removal or transformation of words in free text that can be used to identify persons or that contain information that is incriminating or otherwise private. This article addresses the problem of data scrubbing.
OBJECTIVE: To design and implement a general algorithm that scrubs pathology free text, removing all identifying or private information.
METHODS: The Concept-Match algorithm steps through confidential text. When a medical term matching a standard nomenclature term is encountered, the term is replaced by a nomenclature code and a synonym for the original term. When a high-frequency "stop" word, such as a, an, the, or for, is encountered, it is left in place. When any other word is encountered, it is blocked and replaced by asterisks. This produces a scrubbed text. An open-source implementation of the algorithm is freely available.
RESULTS: The Concept-Match scrub method transformed pathology free text into scrubbed output that preserved the sense of the original sentences, while it blocked terms that did not match terms found in the Unified Medical Language System (UMLS). The scrubbed product is safe, in the restricted sense that the output retains only standard medical terms. The software implementation scrubbed more than half a million surgical pathology report phrases in less than an hour.
CONCLUSIONS: Computerized scrubbing can render the textual portion of a pathology report harmless for research purposes. Scrubbing and deidentification methods allow pathologists to create and use large pathology databases to conduct medical research.

Entities:  

Mesh:

Year:  2003        PMID: 12741890     DOI: 10.5858/2003-127-680-CMDS

Source DB:  PubMed          Journal:  Arch Pathol Lab Med        ISSN: 0003-9985            Impact factor:   5.534


  27 in total

1.  Doublet method for very fast autocoding.

Authors:  Jules J Berman
Journal:  BMC Med Inform Decis Mak       Date:  2004-09-15       Impact factor: 2.796

2.  Strategies for maintaining patient privacy in i2b2.

Authors:  Shawn N Murphy; Vivian Gainer; Michael Mendis; Susanne Churchill; Isaac Kohane
Journal:  J Am Med Inform Assoc       Date:  2011-10-07       Impact factor: 4.497

Review 3.  Strategies for de-identification and anonymization of electronic health record data for use in multicenter research studies.

Authors:  Clete A Kushida; Deborah A Nichols; Rik Jadrnicek; Ric Miller; James K Walsh; Kara Griffin
Journal:  Med Care       Date:  2012-07       Impact factor: 2.983

4.  Using a pipeline to improve de-identification performance.

Authors:  Frances P Morrison; Soumitra Sengupta; George Hripcsak
Journal:  AMIA Annu Symp Proc       Date:  2009-11-14

5.  Toward a fully de-identified biomedical information warehouse.

Authors:  Jianhua Liu; Selnur Erdal; Scott A Silvey; Jing Ding; John D Riedel; Clay B Marsh; Jyoti Kamal
Journal:  AMIA Annu Symp Proc       Date:  2009-11-14

6.  Embedding a hiding function in a portable electronic health record for privacy preservation.

Authors:  Lu-Chou Huang; Huei-Chung Chu; Chung-Yueh Lien; Chia-Hung Hsiao; Tsair Kao
Journal:  J Med Syst       Date:  2010-06       Impact factor: 4.460

7.  A de-identifier for medical discharge summaries.

Authors:  Ozlem Uzuner; Tawanda C Sibanda; Yuan Luo; Peter Szolovits
Journal:  Artif Intell Med       Date:  2007-11-28       Impact factor: 5.326

8.  A cascaded approach for Chinese clinical text de-identification with less annotation effort.

Authors:  Zhe Jian; Xusheng Guo; Shijian Liu; Handong Ma; Shaodian Zhang; Rui Zhang; Jianbo Lei
Journal:  J Biomed Inform       Date:  2017-07-26       Impact factor: 6.317

9.  De-identification of primary care electronic medical records free-text data in Ontario, Canada.

Authors:  Karen Tu; Julie Klein-Geltink; Tezeta F Mitiku; Chiriac Mihai; Joel Martin
Journal:  BMC Med Inform Decis Mak       Date:  2010-06-18       Impact factor: 2.796

10.  A software tool for removing patient identifying information from clinical documents.

Authors:  F Jeff Friedlin; Clement J McDonald
Journal:  J Am Med Inform Assoc       Date:  2008-06-25       Impact factor: 4.497

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.