Literature DB >> 22586067

Feature engineering combined with machine learning and rule-based methods for structured information extraction from narrative clinical discharge summaries.

Yan Xu1, Kai Hong, Junichi Tsujii, Eric I-Chao Chang.   

Abstract

OBJECTIVE: A system that translates narrative text in the medical domain into structured representation is in great demand. The system performs three sub-tasks: concept extraction, assertion classification, and relation identification.
DESIGN: The overall system consists of five steps: (1) pre-processing sentences, (2) marking noun phrases (NPs) and adjective phrases (APs), (3) extracting concepts that use a dosage-unit dictionary to dynamically switch two models based on Conditional Random Fields (CRF), (4) classifying assertions based on voting of five classifiers, and (5) identifying relations using normalized sentences with a set of effective discriminating features. MEASUREMENTS: Macro-averaged and micro-averaged precision, recall and F-measure were used to evaluate results.
RESULTS: The performance is competitive with the state-of-the-art systems with micro-averaged F-measure of 0.8489 for concept extraction, 0.9392 for assertion classification and 0.7326 for relation identification.
CONCLUSIONS: The system exploits an array of common features and achieves state-of-the-art performance. Prudent feature engineering sets the foundation of our systems. In concept extraction, we demonstrated that switching models, one of which is especially designed for telegraphic sentences, improved extraction of the treatment concept significantly. In assertion classification, a set of features derived from a rule-based classifier were proven to be effective for the classes such as conditional and possible. These classes would suffer from data scarcity in conventional machine-learning methods. In relation identification, we use two-staged architecture, the second of which applies pairwise classifiers to possible candidate classes. This architecture significantly improves performance.

Mesh:

Year:  2012        PMID: 22586067      PMCID: PMC3422834          DOI: 10.1136/amiajnl-2011-000776

Source DB:  PubMed          Journal:  J Am Med Inform Assoc        ISSN: 1067-5027            Impact factor:   4.497


  18 in total

1.  Logistic regression in the medical literature: standards for use and reporting, with particular attention to one medical domain.

Authors:  S C Bagley; H White; B A Golomb
Journal:  J Clin Epidemiol       Date:  2001-10       Impact factor: 6.437

2.  Evaluating the state-of-the-art in automatic de-identification.

Authors:  Ozlem Uzuner; Yuan Luo; Peter Szolovits
Journal:  J Am Med Inform Assoc       Date:  2007-06-28       Impact factor: 4.497

3.  Recognizing obesity and comorbidities in sparse data.

Authors:  Ozlem Uzuner
Journal:  J Am Med Inform Assoc       Date:  2009-04-23       Impact factor: 4.497

Review 4.  Extracting information from textual documents in the electronic health record: a review of recent research.

Authors:  S M Meystre; G K Savova; K C Kipper-Schuler; J F Hurdle
Journal:  Yearb Med Inform       Date:  2008

5.  Assessment of commercial NLP engines for medication information extraction from dictated clinical notes.

Authors:  V Jagannathan; Charles J Mullett; James G Arbogast; Kevin A Halbritter; Deepthi Yellapragada; Sushmitha Regulapati; Pavani Bandaru
Journal:  Int J Med Inform       Date:  2008-10-05       Impact factor: 4.046

6.  Automatic extraction of relations between medical concepts in clinical texts.

Authors:  Bryan Rink; Sanda Harabagiu; Kirk Roberts
Journal:  J Am Med Inform Assoc       Date:  2011 Sep-Oct       Impact factor: 4.497

Review 7.  Natural language processing in medicine: an overview.

Authors:  P Spyns
Journal:  Methods Inf Med       Date:  1996-12       Impact factor: 2.176

Review 8.  Natural language processing and the representation of clinical data.

Authors:  N Sager; M Lyman; C Bucknall; N Nhan; L J Tick
Journal:  J Am Med Inform Assoc       Date:  1994 Mar-Apr       Impact factor: 4.497

9.  A general natural-language text processor for clinical radiology.

Authors:  C Friedman; P O Alderson; J H Austin; J J Cimino; S B Johnson
Journal:  J Am Med Inform Assoc       Date:  1994 Mar-Apr       Impact factor: 4.497

10.  Identifying patient smoking status from medical discharge records.

Authors:  Ozlem Uzuner; Ira Goldstein; Yuan Luo; Isaac Kohane
Journal:  J Am Med Inform Assoc       Date:  2007-10-18       Impact factor: 4.497

View more
  18 in total

1.  Joint segmentation and named entity recognition using dual decomposition in Chinese discharge summaries.

Authors:  Yan Xu; Yining Wang; Tianren Liu; Jiahua Liu; Yubo Fan; Yi Qian; Junichi Tsujii; Eric I Chang
Journal:  J Am Med Inform Assoc       Date:  2013-08-09       Impact factor: 4.497

2.  Electronic health records-driven phenotyping: challenges, recent advances, and perspectives.

Authors:  Jyotishman Pathak; Abel N Kho; Joshua C Denny
Journal:  J Am Med Inform Assoc       Date:  2013-12       Impact factor: 4.497

3.  In response to: Method of electronic health record documentation and quality of primary care.

Authors:  Jonathan A Handler; James G Adams
Journal:  J Am Med Inform Assoc       Date:  2012-07-28       Impact factor: 4.497

4.  An end-to-end system to identify temporal relation in discharge summaries: 2012 i2b2 challenge.

Authors:  Yan Xu; Yining Wang; Tianren Liu; Junichi Tsujii; Eric I-Chao Chang
Journal:  J Am Med Inform Assoc       Date:  2013-03-06       Impact factor: 4.497

5.  A usability framework for speech recognition technologies in clinical handover: a pre-implementation study.

Authors:  Linda Dawson; Maree Johnson; Hanna Suominen; Jim Basilakis; Paula Sanchez; Dominique Estival; Barbara Kelly; Leif Hanlen
Journal:  J Med Syst       Date:  2014-05-15       Impact factor: 4.460

Review 6.  Natural language processing systems for capturing and standardizing unstructured clinical information: A systematic review.

Authors:  Kory Kreimeyer; Matthew Foster; Abhishek Pandey; Nina Arya; Gwendolyn Halford; Sandra F Jones; Richard Forshee; Mark Walderhaug; Taxiarchis Botsis
Journal:  J Biomed Inform       Date:  2017-07-17       Impact factor: 6.317

7.  A computational framework for converting textual clinical diagnostic criteria into the quality data model.

Authors:  Na Hong; Dingcheng Li; Yue Yu; Qiongying Xiu; Hongfang Liu; Guoqian Jiang
Journal:  J Biomed Inform       Date:  2016-07-19       Impact factor: 6.317

Review 8.  Clinical concept extraction: A methodology review.

Authors:  Sunyang Fu; David Chen; Huan He; Sijia Liu; Sungrim Moon; Kevin J Peterson; Feichen Shen; Liwei Wang; Yanshan Wang; Andrew Wen; Yiqing Zhao; Sunghwan Sohn; Hongfang Liu
Journal:  J Biomed Inform       Date:  2020-08-06       Impact factor: 6.317

9.  Using local lexicalized rules to identify heart disease risk factors in clinical notes.

Authors:  George Karystianis; Azad Dehghan; Aleksandar Kovacevic; John A Keane; Goran Nenadic
Journal:  J Biomed Inform       Date:  2015-06-29       Impact factor: 6.317

10.  Bilingual term alignment from comparable corpora in English discharge summary and Chinese discharge summary.

Authors:  Yan Xu; Luoxin Chen; Junsheng Wei; Sophia Ananiadou; Yubo Fan; Yi Qian; Eric I-Chao Chang; Junichi Tsujii
Journal:  BMC Bioinformatics       Date:  2015-05-09       Impact factor: 3.169

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.