Warning: Undefined array key "mm" in /www/wwwroot/www.ai-bt.com/si.php on line 10 Deprecated: trim(): Passing null to parameter #1 ($string) of type string is deprecated in /www/wwwroot/www.ai-bt.com/si.php on line 10 Automating classification of free-text electronic health records for epidemiological studies.

Literature DB >> 22271492

Automating classification of free-text electronic health records for epidemiological studies.

Martijn J Schuemie¹, Emine Sen, Geert W 't Jong, Eva M van Soest, Miriam C Sturkenboom, Jan A Kors.

Abstract

PURPOSE: Increasingly, patient information is stored in electronic medical records, which could be reused for research. Often these records comprise unstructured narrative data, which are cumbersome to analyze. The authors investigated whether text mining can make these data suitable for epidemiological studies and compared a concept recognition approach and a range of machine learning techniques that require a manually annotated training set. The authors show how this training set can be created with minimal effort by using a broad database query.
METHODS: The approaches were tested on two data sets: a publicly available set of English radiology reports for which International Classification of Diseases, Ninth Revision, Clinical Modification code needed to be assigned and a set of Dutch GP records that needed to be classified as either liver disorder cases or noncases. Performance was tested against a manually created gold standard.
RESULTS: The best overall performance was achieved by a combination of a manually created filter for removing negations and speculations and rule learning algorithms such as RIPPER, with high scores on both the radiology reports (positive predictive value = 0.88, sensitivity = 0.85, specificity = 1.00) and the GP records (positive predictive value = 0.89, sensitivity =0.91, specificity =0.76).
CONCLUSIONS: Although a training set still needs to be created manually, text mining can help reduce the amount of manual work needed to incorporate narrative data in an epidemiological study and will make the data extraction more reproducible. An advantage of machine learning is that it is able to pick up specific language use, such as abbreviations and synonyms used by physicians.

Entities: Disease Species

Mesh：

Year: 2012 PMID： 22271492 DOI： 10.1002/pds.3205

Source DB: PubMed Journal: Pharmacoepidemiol Drug Saf ISSN： 1053-8569 Impact factor: 2.890

Keyword Cloud
Cited

14 in total

Review 1. Clinical Data Reuse or Secondary Use: Current Status and Potential Future Progress.

Authors: S M Meystre; C Lovis; T Bürkle; G Tognola; A Budrionis; C U Lehmann
Journal: Yearb Med Inform Date: 2017-09-11

Review 2. "Big data" and the electronic health record.

Authors: M K Ross; W Wei; L Ohno-Machado
Journal: Yearb Med Inform Date: 2014-08-15

3. Pharmacoepidemiology in the era of real-world evidence.

Authors: Sengwee Toh
Journal: Curr Epidemiol Rep Date: 2017-10-19

4. Automated information extraction from free-text EEG reports.

Authors: Siddharth Biswal; Zarina Nip; Valdery Moura Junior; Matt T Bianchi; Eric S Rosenthal; M Brandon Westover
Journal: Conf Proc IEEE Eng Med Biol Soc Date: 2015

5. Natural Language Processing to identify pneumonia from radiology reports.

Authors: Sascha Dublin; Eric Baldwin; Rod L Walker; Lee M Christensen; Peter J Haug; Michael L Jackson; Jennifer C Nelson; Jeffrey Ferraro; David Carrell; Wendy W Chapman
Journal: Pharmacoepidemiol Drug Saf Date: 2013-04-01 Impact factor: 2.890

6. Improving sensitivity of machine learning methods for automated case identification from free-text electronic medical records.

Authors: Zubair Afzal; Martijn J Schuemie; Jan C van Blijderveen; Elif F Sen; Miriam C J M Sturkenboom; Jan A Kors
Journal: BMC Med Inform Decis Mak Date: 2013-03-02 Impact factor: 2.796

7. Identification of acute myocardial infarction from electronic healthcare records using different disease coding systems: a validation study in three European countries.

Authors: Preciosa M Coloma; Vera E Valkhoff; Giampiero Mazzaglia; Malene Schou Nielsson; Lars Pedersen; Mariam Molokhia; Mees Mosseveld; Paolo Morabito; Martijn J Schuemie; Johan van der Lei; Miriam Sturkenboom; Gianluca Trifirò
Journal: BMJ Open Date: 2013-06-20 Impact factor: 2.692

8. A rule-based electronic phenotyping algorithm for detecting clinically relevant cardiovascular disease cases.

Authors: Santiago Esteban; Manuel Rodríguez Tablado; Ricardo Ignacio Ricci; Sergio Terrasa; Karin Kopitowski
Journal: BMC Res Notes Date: 2017-07-14

9. Automatic ICD-10 multi-class classification of cause of death from plaintext autopsy reports through expert-driven feature selection.

Authors: Ghulam Mujtaba; Liyana Shuib; Ram Gopal Raj; Retnagowri Rajandram; Khairunisa Shaikh; Mohammed Ali Al-Garadi
Journal: PLoS One Date: 2017-02-06 Impact factor: 3.240

10. Identification of major cardiovascular events in patients with diabetes using primary care data.

Authors: Koen Bernardus Pouwels; Jaco Voorham; Eelko Hak; Petra Denig
Journal: BMC Health Serv Res Date: 2016-04-02 Impact factor: 2.655