Literature DB >> 22271492

Automating classification of free-text electronic health records for epidemiological studies.

Martijn J Schuemie1, Emine Sen, Geert W 't Jong, Eva M van Soest, Miriam C Sturkenboom, Jan A Kors.   

Abstract

PURPOSE: Increasingly, patient information is stored in electronic medical records, which could be reused for research. Often these records comprise unstructured narrative data, which are cumbersome to analyze. The authors investigated whether text mining can make these data suitable for epidemiological studies and compared a concept recognition approach and a range of machine learning techniques that require a manually annotated training set. The authors show how this training set can be created with minimal effort by using a broad database query.
METHODS: The approaches were tested on two data sets: a publicly available set of English radiology reports for which International Classification of Diseases, Ninth Revision, Clinical Modification code needed to be assigned and a set of Dutch GP records that needed to be classified as either liver disorder cases or noncases. Performance was tested against a manually created gold standard.
RESULTS: The best overall performance was achieved by a combination of a manually created filter for removing negations and speculations and rule learning algorithms such as RIPPER, with high scores on both the radiology reports (positive predictive value = 0.88, sensitivity = 0.85, specificity = 1.00) and the GP records (positive predictive value = 0.89, sensitivity =0.91, specificity =0.76).
CONCLUSIONS: Although a training set still needs to be created manually, text mining can help reduce the amount of manual work needed to incorporate narrative data in an epidemiological study and will make the data extraction more reproducible. An advantage of machine learning is that it is able to pick up specific language use, such as abbreviations and synonyms used by physicians.
Copyright © 2012 John Wiley & Sons, Ltd.

Entities:  

Mesh:

Year:  2012        PMID: 22271492     DOI: 10.1002/pds.3205

Source DB:  PubMed          Journal:  Pharmacoepidemiol Drug Saf        ISSN: 1053-8569            Impact factor:   2.890


  14 in total

Review 1.  Clinical Data Reuse or Secondary Use: Current Status and Potential Future Progress.

Authors:  S M Meystre; C Lovis; T Bürkle; G Tognola; A Budrionis; C U Lehmann
Journal:  Yearb Med Inform       Date:  2017-09-11

Review 2.  "Big data" and the electronic health record.

Authors:  M K Ross; W Wei; L Ohno-Machado
Journal:  Yearb Med Inform       Date:  2014-08-15

3.  Pharmacoepidemiology in the era of real-world evidence.

Authors:  Sengwee Toh
Journal:  Curr Epidemiol Rep       Date:  2017-10-19

4.  Automated information extraction from free-text EEG reports.

Authors:  Siddharth Biswal; Zarina Nip; Valdery Moura Junior; Matt T Bianchi; Eric S Rosenthal; M Brandon Westover
Journal:  Conf Proc IEEE Eng Med Biol Soc       Date:  2015

5.  Natural Language Processing to identify pneumonia from radiology reports.

Authors:  Sascha Dublin; Eric Baldwin; Rod L Walker; Lee M Christensen; Peter J Haug; Michael L Jackson; Jennifer C Nelson; Jeffrey Ferraro; David Carrell; Wendy W Chapman
Journal:  Pharmacoepidemiol Drug Saf       Date:  2013-04-01       Impact factor: 2.890

6.  Improving sensitivity of machine learning methods for automated case identification from free-text electronic medical records.

Authors:  Zubair Afzal; Martijn J Schuemie; Jan C van Blijderveen; Elif F Sen; Miriam C J M Sturkenboom; Jan A Kors
Journal:  BMC Med Inform Decis Mak       Date:  2013-03-02       Impact factor: 2.796

7.  Identification of acute myocardial infarction from electronic healthcare records using different disease coding systems: a validation study in three European countries.

Authors:  Preciosa M Coloma; Vera E Valkhoff; Giampiero Mazzaglia; Malene Schou Nielsson; Lars Pedersen; Mariam Molokhia; Mees Mosseveld; Paolo Morabito; Martijn J Schuemie; Johan van der Lei; Miriam Sturkenboom; Gianluca Trifirò
Journal:  BMJ Open       Date:  2013-06-20       Impact factor: 2.692

8.  A rule-based electronic phenotyping algorithm for detecting clinically relevant cardiovascular disease cases.

Authors:  Santiago Esteban; Manuel Rodríguez Tablado; Ricardo Ignacio Ricci; Sergio Terrasa; Karin Kopitowski
Journal:  BMC Res Notes       Date:  2017-07-14

9.  Automatic ICD-10 multi-class classification of cause of death from plaintext autopsy reports through expert-driven feature selection.

Authors:  Ghulam Mujtaba; Liyana Shuib; Ram Gopal Raj; Retnagowri Rajandram; Khairunisa Shaikh; Mohammed Ali Al-Garadi
Journal:  PLoS One       Date:  2017-02-06       Impact factor: 3.240

10.  Identification of major cardiovascular events in patients with diabetes using primary care data.

Authors:  Koen Bernardus Pouwels; Jaco Voorham; Eelko Hak; Petra Denig
Journal:  BMC Health Serv Res       Date:  2016-04-02       Impact factor: 2.655

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.