Literature DB >> 23845911

Semi-supervised clinical text classification with Laplacian SVMs: an application to cancer case management.

Vijay Garla1, Caroline Taylor, Cynthia Brandt.   

Abstract

OBJECTIVE: To compare linear and Laplacian SVMs on a clinical text classification task; to evaluate the effect of unlabeled training data on Laplacian SVM performance.
BACKGROUND: The development of machine-learning based clinical text classifiers requires the creation of labeled training data, obtained via manual review by clinicians. Due to the effort and expense involved in labeling data, training data sets in the clinical domain are of limited size. In contrast, electronic medical record (EMR) systems contain hundreds of thousands of unlabeled notes that are not used by supervised machine learning approaches. Semi-supervised learning algorithms use both labeled and unlabeled data to train classifiers, and can outperform their supervised counterparts.
METHODS: We trained support vector machines (SVMs) and Laplacian SVMs on a training reference standard of 820 abdominal CT, MRI, and ultrasound reports labeled for the presence of potentially malignant liver lesions that require follow up (positive class prevalence 77%). The Laplacian SVM used 19,845 randomly sampled unlabeled notes in addition to the training reference standard. We evaluated SVMs and Laplacian SVMs on a test set of 520 labeled reports.
RESULTS: The Laplacian SVM trained on labeled and unlabeled radiology reports significantly outperformed supervised SVMs (Macro-F1 0.773 vs. 0.741, Sensitivity 0.943 vs. 0.911, Positive Predictive value 0.877 vs. 0.883). Performance improved with the number of labeled and unlabeled notes used to train the Laplacian SVM (pearson's ρ=0.529 for correlation between number of unlabeled notes and macro-F1 score). These results suggest that practical semi-supervised methods such as the Laplacian SVM can leverage the large, unlabeled corpora that reside within EMRs to improve clinical text classification.
Copyright © 2013 Elsevier Inc. All rights reserved.

Entities:  

Keywords:  Graph Laplacian; Natural language processing; Semi-supervised learning; Support vector machine

Mesh:

Year:  2013        PMID: 23845911      PMCID: PMC3806632          DOI: 10.1016/j.jbi.2013.06.014

Source DB:  PubMed          Journal:  J Biomed Inform        ISSN: 1532-0464            Impact factor:   6.317


  13 in total

1.  Characteristics and predictors of missed opportunities in lung cancer diagnosis: an electronic health record-based study.

Authors:  Hardeep Singh; Kamal Hirani; Himabindu Kadiyala; Olga Rudomiotov; Traber Davis; Myrna M Khan; Terry L Wahls
Journal:  J Clin Oncol       Date:  2010-06-07       Impact factor: 44.544

2.  Named entity recognition of follow-up and time information in 20,000 radiology reports.

Authors:  Yan Xu; Junichi Tsujii; Eric I-Chao Chang
Journal:  J Am Med Inform Assoc       Date:  2012-07-06       Impact factor: 4.497

3.  Increasing prevalence of HCC and cirrhosis in patients with chronic hepatitis C virus infection.

Authors:  Fasiha Kanwal; Tuyen Hoang; Jennifer R Kramer; Steven M Asch; Matthew Bidwell Goetz; Angelique Zeringue; Peter Richardson; Hashem B El-Serag
Journal:  Gastroenterology       Date:  2010-12-22       Impact factor: 22.682

4.  Application of recently developed computer algorithm for automatic classification of unstructured radiology reports: validation study.

Authors:  Keith J Dreyer; Mannudeep K Kalra; Michael M Maher; Autumn M Hurier; Benjamin A Asfaw; Thomas Schultz; Elkan F Halpern; James H Thrall
Journal:  Radiology       Date:  2004-12-10       Impact factor: 11.105

5.  Recognizing obesity and comorbidities in sparse data.

Authors:  Ozlem Uzuner
Journal:  J Am Med Inform Assoc       Date:  2009-04-23       Impact factor: 4.497

6.  The Yale cTAKES extensions for document classification: architecture and application.

Authors:  Vijay Garla; Vincent Lo Re; Zachariah Dorey-Stein; Farah Kidwai; Matthew Scotch; Julie Womack; Amy Justice; Cynthia Brandt
Journal:  J Am Med Inform Assoc       Date:  2011-05-27       Impact factor: 4.497

Review 7.  Using electronic health records to drive discovery in disease genomics.

Authors:  Isaac S Kohane
Journal:  Nat Rev Genet       Date:  2011-05-18       Impact factor: 53.242

8.  Extracting information on pneumonia in infants using natural language processing of radiology reports.

Authors:  Eneida A Mendonça; Janet Haas; Lyudmila Shagina; Elaine Larson; Carol Friedman
Journal:  J Biomed Inform       Date:  2005-03-30       Impact factor: 6.317

9.  NLP-based identification of pneumonia cases from free-text radiological reports.

Authors:  Peter L Elkin; David Froehling; Dietlind Wahner-Roedler; Brett Trusko; Gail Welsh; Haobo Ma; Armen X Asatryan; Jerome I Tokars; S Trent Rosenbloom; Steven H Brown
Journal:  AMIA Annu Symp Proc       Date:  2008-11-06

Review 10.  What can natural language processing do for clinical decision support?

Authors:  Dina Demner-Fushman; Wendy W Chapman; Clement J McDonald
Journal:  J Biomed Inform       Date:  2009-08-13       Impact factor: 6.317

View more
  16 in total

1.  Semi-supervised Learning for Phenotyping Tasks.

Authors:  Dmitriy Dligach; Timothy Miller; Guergana K Savova
Journal:  AMIA Annu Symp Proc       Date:  2015-11-05

2.  Automated Detection of Radiology Reports that Require Follow-up Imaging Using Natural Language Processing Feature Engineering and Machine Learning Classification.

Authors:  Robert Lou; Darco Lalevic; Charles Chambers; Hanna M Zafar; Tessa S Cook
Journal:  J Digit Imaging       Date:  2020-02       Impact factor: 4.056

3.  Clinical Document Classification Using Labeled and Unlabeled Data Across Hospitals.

Authors:  Hamed Hassanzadeh; Mahnoosh Kholghi; Anthony Nguyen; Kevin Chu
Journal:  AMIA Annu Symp Proc       Date:  2018-12-05

4.  N-gram support vector machines for scalable procedure and diagnosis classification, with applications to clinical free text data from the intensive care unit.

Authors:  Ben J Marafino; Jason M Davies; Naomi S Bardach; Mitzi L Dean; R Adams Dudley
Journal:  J Am Med Inform Assoc       Date:  2014-04-30       Impact factor: 4.497

5.  Natural Language Processing for Automated Quantification of Brain Metastases Reported in Free-Text Radiology Reports.

Authors:  Joeky T Senders; Aditya V Karhade; David J Cote; Alireza Mehrtash; Nayan Lamba; Aislyn DiRisio; Ivo S Muskens; William B Gormley; Timothy R Smith; Marike L D Broekman; Omar Arnaout
Journal:  JCO Clin Cancer Inform       Date:  2019-04

6.  A Web Application for Adrenal Incidentaloma Identification, Tracking, and Management Using Machine Learning.

Authors:  Wasif Bala; Jackson Steinkamp; Timothy Feeney; Avneesh Gupta; Abhinav Sharma; Jake Kantrowitz; Nicholas Cordella; James Moses; Frederick Thurston Drake
Journal:  Appl Clin Inform       Date:  2020-09-16       Impact factor: 2.342

Review 7.  Natural Language Processing for EHR-Based Computational Phenotyping.

Authors:  Zexian Zeng; Yu Deng; Xiaoyu Li; Tristan Naumann; Yuan Luo
Journal:  IEEE/ACM Trans Comput Biol Bioinform       Date:  2018-06-25       Impact factor: 3.710

8.  Raman Microspectroscopic Investigation and Classification of Breast Cancer Pathological Characteristics.

Authors:  Heping Li; Tian Ning; Fan Yu; Yishen Chen; Baoping Zhang; Shuang Wang
Journal:  Molecules       Date:  2021-02-09       Impact factor: 4.411

9.  A novel method of adverse event detection can accurately identify venous thromboembolisms (VTEs) from narrative electronic health record data.

Authors:  Christian M Rochefort; Aman D Verma; Tewodros Eguale; Todd C Lee; David L Buckeridge
Journal:  J Am Med Inform Assoc       Date:  2014-10-20       Impact factor: 4.497

10.  A machine learning approach to identify clinical trials involving nanodrugs and nanodevices from ClinicalTrials.gov.

Authors:  Diana de la Iglesia; Miguel García-Remesal; Alberto Anguita; Miguel Muñoz-Mármol; Casimir Kulikowski; Víctor Maojo
Journal:  PLoS One       Date:  2014-10-27       Impact factor: 3.240

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.