Literature DB >> 21709161

Using machine learning for concept extraction on clinical documents from multiple data sources.

Manabu Torii1, Kavishwar Wagholikar, Hongfang Liu.   

Abstract

OBJECTIVE: Concept extraction is a process to identify phrases referring to concepts of interests in unstructured text. It is a critical component in automated text processing. We investigate the performance of machine learning taggers for clinical concept extraction, particularly the portability of taggers across documents from multiple data sources.
METHODS: We used BioTagger-GM to train machine learning taggers, which we originally developed for the detection of gene/protein names in the biology domain. Trained taggers were evaluated using the annotated clinical documents made available in the 2010 i2b2/VA Challenge workshop, consisting of documents from four data sources.
RESULTS: As expected, performance of a tagger trained on one data source degraded when evaluated on another source, but the degradation of the performance varied depending on data sources. A tagger trained on multiple data sources was robust, and it achieved an F score as high as 0.890 on one data source. The results also suggest that performance of machine learning taggers is likely to improve if more annotated documents are available for training.
CONCLUSION: Our study shows how the performance of machine learning taggers is degraded when they are ported across clinical documents from different sources. The portability of taggers can be enhanced by training on datasets from multiple sources. The study also shows that BioTagger-GM can be easily extended to detect clinical concept mentions with good performance.

Mesh:

Year:  2011        PMID: 21709161      PMCID: PMC3168314          DOI: 10.1136/amiajnl-2011-000155

Source DB:  PubMed          Journal:  J Am Med Inform Assoc        ISSN: 1067-5027            Impact factor:   4.497


  26 in total

1.  SNOMED clinical terms: overview of the development process and project status.

Authors:  M Q Stearns; C Price; K A Spackman; A Y Wang
Journal:  Proc AMIA Symp       Date:  2001

2.  The Unified Medical Language System (UMLS): integrating biomedical terminology.

Authors:  Olivier Bodenreider
Journal:  Nucleic Acids Res       Date:  2004-01-01       Impact factor: 16.971

3.  A vocabulary development and visualization tool based on natural language processing and the mining of textual patient reports.

Authors:  Carol Friedman; Hongfang Liu; Lyudmila Shagina
Journal:  J Biomed Inform       Date:  2003-06       Impact factor: 6.317

4.  Term identification in the biomedical literature.

Authors:  Michael Krauthammer; Goran Nenadic
Journal:  J Biomed Inform       Date:  2004-12       Impact factor: 6.317

5.  BioThesaurus: a web-based thesaurus of protein and gene names.

Authors:  Hongfang Liu; Zhang-Zhi Hu; Jian Zhang; Cathy Wu
Journal:  Bioinformatics       Date:  2005-11-02       Impact factor: 6.937

6.  ABNER: an open source tool for automatically tagging genes, proteins and other entity names in text.

Authors:  Burr Settles
Journal:  Bioinformatics       Date:  2005-04-28       Impact factor: 6.937

7.  Quantitative assessment of dictionary-based protein named entity tagging.

Authors:  Hongfang Liu; Zhang-Zhi Hu; Manabu Torii; Cathy Wu; Carol Friedman
Journal:  J Am Med Inform Assoc       Date:  2006-06-23       Impact factor: 4.497

8.  Identifying smokers with a medical extraction system.

Authors:  Cheryl Clark; Kathleen Good; Lesley Jezierny; Melissa Macpherson; Brian Wilson; Urszula Chajewska
Journal:  J Am Med Inform Assoc       Date:  2007-10-18       Impact factor: 4.497

9.  Evaluating the state-of-the-art in automatic de-identification.

Authors:  Ozlem Uzuner; Yuan Luo; Peter Szolovits
Journal:  J Am Med Inform Assoc       Date:  2007-06-28       Impact factor: 4.497

10.  Extracting principal diagnosis, co-morbidity and smoking status for asthma research: evaluation of a natural language processing system.

Authors:  Qing T Zeng; Sergey Goryachev; Scott Weiss; Margarita Sordo; Shawn N Murphy; Ross Lazarus
Journal:  BMC Med Inform Decis Mak       Date:  2006-07-26       Impact factor: 2.796

View more
  51 in total

1.  Automating annotation of information-giving for analysis of clinical conversation.

Authors:  Elijah Mayfield; M Barton Laws; Ira B Wilson; Carolyn Penstein Rosé
Journal:  J Am Med Inform Assoc       Date:  2013-09-12       Impact factor: 4.497

2.  Electronic health records-driven phenotyping: challenges, recent advances, and perspectives.

Authors:  Jyotishman Pathak; Abel N Kho; Joshua C Denny
Journal:  J Am Med Inform Assoc       Date:  2013-12       Impact factor: 4.497

Review 3.  De-identification of psychiatric intake records: Overview of 2016 CEGS N-GRID shared tasks Track 1.

Authors:  Amber Stubbs; Michele Filannino; Özlem Uzuner
Journal:  J Biomed Inform       Date:  2017-06-11       Impact factor: 6.317

4.  Analysis of medication and indication occurrences in clinical notes.

Authors:  Sunghwan Sohn; Hongfang Liu
Journal:  AMIA Annu Symp Proc       Date:  2014-11-14

5.  Automated Assessment of Medical Students' Clinical Exposures according to AAMC Geriatric Competencies.

Authors:  Yukun Chen; Jesse Wrenn; Hua Xu; Anderson Spickard; Ralf Habermann; James Powers; Joshua C Denny
Journal:  AMIA Annu Symp Proc       Date:  2014-11-14

6.  Automated extraction of sudden cardiac death risk factors in hypertrophic cardiomyopathy patients by natural language processing.

Authors:  Sungrim Moon; Sijia Liu; Christopher G Scott; Sujith Samudrala; Mohamed M Abidian; Jeffrey B Geske; Peter A Noseworthy; Jane L Shellum; Rajeev Chaudhry; Steve R Ommen; Rick A Nishimura; Hongfang Liu; Adelaide M Arruda-Olson
Journal:  Int J Med Inform       Date:  2019-05-13       Impact factor: 4.046

7.  Ensembles of NLP Tools for Data Element Extraction from Clinical Notes.

Authors:  Tsung-Ting Kuo; Pallavi Rao; Cleo Maehara; Son Doan; Juan D Chaparro; Michele E Day; Claudiu Farcas; Lucila Ohno-Machado; Chun-Nan Hsu
Journal:  AMIA Annu Symp Proc       Date:  2017-02-10

8.  Automated annotation and classification of BI-RADS assessment from radiology reports.

Authors:  Sergio M Castro; Eugene Tseytlin; Olga Medvedeva; Kevin Mitchell; Shyam Visweswaran; Tanja Bekhuis; Rebecca S Jacobson
Journal:  J Biomed Inform       Date:  2017-04-18       Impact factor: 6.317

9.  Postoperative bleeding risk prediction for patients undergoing colorectal surgery.

Authors:  David Chen; Naveed Afzal; Sunghwan Sohn; Elizabeth B Habermann; James M Naessens; David W Larson; Hongfang Liu
Journal:  Surgery       Date:  2018-07-20       Impact factor: 3.982

10.  A computational framework for converting textual clinical diagnostic criteria into the quality data model.

Authors:  Na Hong; Dingcheng Li; Yue Yu; Qiongying Xiu; Hongfang Liu; Guoqian Jiang
Journal:  J Biomed Inform       Date:  2016-07-19       Impact factor: 6.317

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.