Warning: Undefined array key "mm" in /www/wwwroot/www.ai-bt.com/si.php on line 10 Deprecated: trim(): Passing null to parameter #1 ($string) of type string is deprecated in /www/wwwroot/www.ai-bt.com/si.php on line 10 Do Neural Information Extraction Algorithms Generalize Across Institutions?

Literature DB >> 31310566

Do Neural Information Extraction Algorithms Generalize Across Institutions?

Enrico Santus¹, Clara Li¹, Adam Yala¹, Donald Peck^2,3, Rufina Soomro⁴, Naveen Faridi⁴, Isra Mamshad⁴, Rong Tang⁵, Conor R Lanahan⁶, Regina Barzilay¹, Kevin Hughes⁶.

Abstract

PURPOSE: Natural language processing (NLP) techniques have been adopted to reduce the curation costs of electronic health records. However, studies have questioned whether such techniques can be applied to data from previously unseen institutions. We investigated the performance of a common neural NLP algorithm on data from both known and heldout (ie, institutions whose data were withheld from the training set and only used for testing) hospitals. We also explored how diversity in the training data affects the system's generalization ability.
METHODS: We collected 24,881 breast pathology reports from seven hospitals and manually annotated them with nine key attributes that describe types of atypia and cancer. We trained a convolutional neural network (CNN) on annotations from either only one (CNN1), only two (CNN2), or only four (CNN4) hospitals. The trained systems were tested on data from five organizations, including both known and heldout ones. For every setting, we provide the accuracy scores as well as the learning curves that show how much data are necessary to achieve good performance and generalizability.
RESULTS: The system achieved a cross-institutional accuracy of 93.87% when trained on reports from only one hospital (CNN1). Performance improved to 95.7% and 96%, respectively, when the system was trained on reports from two (CNN2) and four (CNN4) hospitals. The introduction of diversity during training did not lead to improvements on the known institutions, but it boosted performance on the heldout institutions. When tested on reports from heldout hospitals, CNN4 outperformed CNN1 and CNN2 by 2.13% and 0.3%, respectively.
CONCLUSION: Real-world scenarios require that neural NLP approaches scale to data from previously unseen institutions. We show that a common neural NLP algorithm for information extraction can achieve this goal, especially when diverse data are used during training.

Entities: Disease Gene

Mesh：

Year: 2019 PMID： 31310566 PMCID： PMC6874001 DOI： 10.1200/CCI.18.00160

Source DB: PubMed Journal: JCO Clin Cancer Inform ISSN： 2473-4276

5 in total

1. Information extraction from multi-institutional radiology reports.

Authors: Saeed Hassanpour; Curtis P Langlotz
Journal: Artif Intell Med Date: 2015-10-03 Impact factor: 5.326

2. Using machine learning to parse breast pathology reports.

Authors: Adam Yala; Regina Barzilay; Laura Salama; Molly Griffin; Grace Sollender; Aditya Bardia; Constance Lehman; Julliette M Buckley; Suzanne B Coopey; Fernanda Polubriaginof; Judy E Garber; Barbara L Smith; Michele A Gadd; Michelle C Specht; Thomas M Gudewicz; Anthony J Guidi; Alphonse Taghian; Kevin S Hughes
Journal: Breast Cancer Res Treat Date: 2016-11-08 Impact factor: 4.872

3. The feasibility of using natural language processing to extract clinical information from breast pathology reports.

Authors: Julliette M Buckley; Suzanne B Coopey; John Sharko; Fernanda Polubriaginof; Brian Drohan; Ahmet K Belli; Elizabeth M H Kim; Judy E Garber; Barbara L Smith; Michele A Gadd; Michelle C Specht; Constance A Roche; Thomas M Gudewicz; Kevin S Hughes
Journal: J Pathol Inform Date: 2012-06-30

4. Variable generalization performance of a deep learning model to detect pneumonia in chest radiographs: A cross-sectional study.

Authors: John R Zech; Marcus A Badgeley; Manway Liu; Anthony B Costa; Joseph J Titano; Eric Karl Oermann
Journal: PLoS Med Date: 2018-11-06 Impact factor: 11.069

5. Validation of natural language processing to extract breast cancer pathology procedures and results.

Authors: Arika E Wieneke; Erin J A Bowles; David Cronkite; Karen J Wernli; Hongyuan Gao; David Carrell; Diana S M Buist
Journal: J Pathol Inform Date: 2015-06-23

5 in total

3 in total

1. Automated NLP Extraction of Clinical Rationale for Treatment Discontinuation in Breast Cancer.

Authors: Matthew S Alkaitis; Monica N Agrawal; Gregory J Riely; Pedram Razavi; David Sontag
Journal: JCO Clin Cancer Inform Date: 2021-05

2. Artificial Intelligence-Aided Precision Medicine for COVID-19: Strategic Areas of Research and Development.

Authors: Enrico Santus; Nicola Marino; Davide Cirillo; Emmanuele Chersoni; Arnau Montagud; Antonella Santuccione Chadha; Alfonso Valencia; Kevin Hughes; Charlotta Lindvall
Journal: J Med Internet Res Date: 2021-03-12 Impact factor: 5.428

Review 3. Assessment of Electronic Health Record for Cancer Research and Patient Care Through a Scoping Review of Cancer Natural Language Processing.

Authors: Liwei Wang; Sunyang Fu; Andrew Wen; Xiaoyang Ruan; Huan He; Sijia Liu; Sungrim Moon; Michelle Mai; Irbaz B Riaz; Nan Wang; Ping Yang; Hua Xu; Jeremy L Warner; Hongfang Liu
Journal: JCO Clin Cancer Inform Date: 2022-07

3 in total