Literature DB >> 33781334

De-identifying Spanish medical texts - named entity recognition applied to radiology reports.

Irene Pérez-Díez1,2, Raúl Pérez-Moraga1,3, Adolfo López-Cerdán1,2, Jose-Maria Salinas-Serrano4, María de la Iglesia-Vayá5,6,7.   

Abstract

BACKGROUND: Medical texts such as radiology reports or electronic health records are a powerful source of data for researchers. Anonymization methods must be developed to de-identify documents containing personal information from both patients and medical staff. Although currently there are several anonymization strategies for the English language, they are also language-dependent. Here, we introduce a named entity recognition strategy for Spanish medical texts, translatable to other languages.
RESULTS: We tested 4 neural networks on our radiology reports dataset, achieving a recall of 97.18% of the identifying entities. Alongside, we developed a randomization algorithm to substitute the detected entities with new ones from the same category, making it virtually impossible to differentiate real data from synthetic data. The three best architectures were tested with the MEDDOCAN challenge dataset of electronic health records as an external test, achieving a recall of 69.18%.
CONCLUSIONS: The strategy proposed, combining named entity recognition tasks with randomization of entities, is suitable for Spanish radiology reports. It does not require a big training corpus, thus it could be easily extended to other languages and medical texts, such as electronic health records.

Entities:  

Keywords:  Medical texts; Named entity recognition; Natural language processing; Radiology reports; Spanish

Year:  2021        PMID: 33781334      PMCID: PMC8006627          DOI: 10.1186/s13326-021-00236-2

Source DB:  PubMed          Journal:  J Biomed Semantics


  12 in total

1.  Long short-term memory.

Authors:  S Hochreiter; J Schmidhuber
Journal:  Neural Comput       Date:  1997-11-15       Impact factor: 2.026

2.  Text de-identification for privacy protection: a study of its impact on clinical text information content.

Authors:  Stéphane M Meystre; Óscar Ferrández; F Jeffrey Friedlin; Brett R South; Shuying Shen; Matthew H Samore
Journal:  J Biomed Inform       Date:  2014-02-03       Impact factor: 6.317

3.  A cascaded approach for Chinese clinical text de-identification with less annotation effort.

Authors:  Zhe Jian; Xusheng Guo; Shijian Liu; Handong Ma; Shaodian Zhang; Rui Zhang; Jianbo Lei
Journal:  J Biomed Inform       Date:  2017-07-26       Impact factor: 6.317

4.  Deep Learning Approaches Outperform Conventional Strategies in De-Identification of German Medical Reports.

Authors:  Phillip Richter-Pechanski; Ali Amr; Hugo A Katus; Christoph Dieterich
Journal:  Stud Health Technol Inform       Date:  2019-09-03

5.  Semi-automated De-identification of German Content Sensitive Reports for Big Data Analytics.

Authors:  Hannes Seuss; Peter Dankerl; Matthias Ihle; Andrea Grandjean; Rebecca Hammon; Nicola Kaestle; Peter A Fasching; Christian Maier; Jan Christoph; Martin Sedlmayr; Michael Uder; Alexander Cavallaro; Matthias Hammon
Journal:  Rofo       Date:  2017-03-23

6.  De-identifying Swedish clinical text - refinement of a gold standard and experiments with Conditional random fields.

Authors:  Hercules Dalianis; Sumithra Velupillai
Journal:  J Biomed Semantics       Date:  2010-04-12

7.  PadChest: A large chest x-ray image dataset with multi-label annotated reports.

Authors:  Aurelia Bustos; Antonio Pertusa; Jose-Maria Salinas; Maria de la Iglesia-Vayá
Journal:  Med Image Anal       Date:  2020-08-20       Impact factor: 8.545

8.  Proposal and evaluation of FASDIM, a Fast And Simple De-Identification Method for unstructured free-text clinical records.

Authors:  Emmanuel Chazard; Capucine Mouret; Grégoire Ficheur; Aurélien Schaffar; Jean-Baptiste Beuscart; Régis Beuscart
Journal:  Int J Med Inform       Date:  2013-12-07       Impact factor: 4.046

Review 9.  Clinical Natural Language Processing in languages other than English: opportunities and challenges.

Authors:  Aurélie Névéol; Hercules Dalianis; Sumithra Velupillai; Guergana Savova; Pierre Zweigenbaum
Journal:  J Biomed Semantics       Date:  2018-03-30

Review 10.  Big data from electronic health records for early and late translational cardiovascular research: challenges and potential.

Authors:  Harry Hemingway; Folkert W Asselbergs; John Danesh; Richard Dobson; Nikolaos Maniadakis; Aldo Maggioni; Ghislaine J M van Thiel; Maureen Cronin; Gunnar Brobert; Panos Vardas; Stefan D Anker; Diederick E Grobbee; Spiros Denaxas
Journal:  Eur Heart J       Date:  2018-04-21       Impact factor: 29.983

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.