Literature DB >> 23920600

Automatic de-identification of French clinical records: comparison of rule-based and machine-learning approaches.

Cyril Grouin1, Pierre Zweigenbaum.   

Abstract

In this paper, we present a comparison of two approaches to automatically de-identify medical records written in French: a rule-based system and a machine-learning based system using a conditional random fields (CRF) formalism. Both systems have been designed to process nine identifiers in a corpus of medical records in cardiology. We performed two evaluations: first, on 62 documents in cardiology, and on 10 documents in foetopathology - produced by optical character recognition (OCR) - to evaluate the robustness of our systems. We achieved a 0.843 (rule-based) and 0.883 (machine-learning) exact match overall F-measure in cardiology. While the rule-based system allowed us to achieve good results on nominative (first and last names) and numerical data (dates, phone numbers, and zip codes), the machine-learning approach performed best on more complex categories (postal addresses, hospital names, medical devices, and towns). On the foetopathology corpus, although our systems have not been designed for this corpus and despite OCR character recognition errors, we obtained promising results: a 0.681 (rule-based) and 0.638 (machine-learning) exact-match overall F-measure. This demonstrates that existing tools can be applied to process new documents of lower quality.

Entities:  

Mesh:

Year:  2013        PMID: 23920600

Source DB:  PubMed          Journal:  Stud Health Technol Inform        ISSN: 0926-9630


  6 in total

1.  CAS: corpus of clinical cases in French.

Authors:  Natalia Grabar; Clément Dalloux; Vincent Claveau
Journal:  J Biomed Semantics       Date:  2020-08-06

2.  Improving domain adaptation in de-identification of electronic health records through self-training.

Authors:  Shun Liao; Jamie Kiros; Jiyang Chen; Zhaolei Zhang; Ting Chen
Journal:  J Am Med Inform Assoc       Date:  2021-09-18       Impact factor: 7.942

3.  Investigation of the Utility of Features in a Clinical De-identification Model: A Demonstration Using EHR Pathology Reports for Advanced NSCLC Patients.

Authors:  Tanmoy Paul; Md Kamruz Zaman Rana; Preethi Aishwarya Tautam; Teja Venkat Pavan Kotapati; Yaswitha Jampani; Nitesh Singh; Humayera Islam; Vasanthi Mandhadi; Vishakha Sharma; Michael Barnes; Richard D Hammer; Abu Saleh Mohammad Mosa
Journal:  Front Digit Health       Date:  2022-02-16

4.  Salience of Medical Concepts of Inside Clinical Texts and Outside Medical Records for Referred Cardiovascular Patients.

Authors:  Sungrim Moon; Sijia Liu; David Chen; Yanshan Wang; Douglas L Wood; Rajeev Chaudhry; Hongfang Liu; Paul Kingsbury
Journal:  J Healthc Inform Res       Date:  2019-01-28

5.  Implementing a Cloud Based Method for Protected Clinical Trial Data Sharing.

Authors:  Gaurav Luthria; Qingbo Wang
Journal:  Pac Symp Biocomput       Date:  2020

6.  De-identifying free text of Japanese electronic health records.

Authors:  Kohei Kajiyama; Hiromasa Horiguchi; Takashi Okumura; Mizuki Morita; Yoshinobu Kano
Journal:  J Biomed Semantics       Date:  2020-09-21
  6 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.