Literature DB >> 35923377

Performance of Multiple Pretrained BERT Models to Automate and Accelerate Data Annotation for Large Datasets.

Ali S Tejani1, Yee S Ng1, Yin Xi1, Julia R Fielding1, Travis G Browning1, Jesse C Rayan1.   

Abstract

Purpose: To develop and evaluate domain-specific and pretrained bidirectional encoder representations from transformers (BERT) models in a transfer learning task on varying training dataset sizes to annotate a larger overall dataset. Materials and
Methods: The authors retrospectively reviewed 69 095 anonymized adult chest radiograph reports (reports dated April 2020-March 2021). From the overall cohort, 1004 reports were randomly selected and labeled for the presence or absence of each of the following devices: endotracheal tube (ETT), enterogastric tube (NGT, or Dobhoff tube), central venous catheter (CVC), and Swan-Ganz catheter (SGC). Pretrained transformer models (BERT, PubMedBERT, DistilBERT, RoBERTa, and DeBERTa) were trained, validated, and tested on 60%, 20%, and 20%, respectively, of these reports through fivefold cross-validation. Additional training involved varying dataset sizes with 5%, 10%, 15%, 20%, and 40% of the 1004 reports. The best-performing epochs were used to assess area under the receiver operating characteristic curve (AUC) and determine run time on the overall dataset.
Results: The highest average AUCs from fivefold cross-validation were 0.996 for ETT (RoBERTa), 0.994 for NGT (RoBERTa), 0.991 for CVC (PubMedBERT), and 0.98 for SGC (PubMedBERT). DeBERTa demonstrated the highest AUC for each support device trained on 5% of the training set. PubMedBERT showed a higher AUC with a decreasing training set size compared with BERT. Training and validation time was shortest for DistilBERT at 3 minutes 39 seconds on the annotated cohort.
Conclusion: Pretrained and domain-specific transformer models required small training datasets and short training times to create a highly accurate final model that expedites autonomous annotation of large datasets.Keywords: Informatics, Named Entity Recognition, Transfer Learning Supplemental material is available for this article. ©RSNA, 2022See also the commentary by Zech in this issue.
© 2022 by the Radiological Society of North America, Inc.

Entities:  

Keywords:  Informatics; Named Entity Recognition; Transfer Learning

Year:  2022        PMID: 35923377      PMCID: PMC9344209          DOI: 10.1148/ryai.220007

Source DB:  PubMed          Journal:  Radiol Artif Intell        ISSN: 2638-6100


  19 in total

1.  Automated classification of radiology reports to facilitate retrospective study in radiology.

Authors:  Yihua Zhou; Per K Amundson; Fang Yu; Marcus M Kessler; Tammie L S Benzinger; Franz J Wippold
Journal:  J Digit Imaging       Date:  2014-12       Impact factor: 4.056

2.  Natural Language Processing of Radiology Text Reports: Interactive Text Classification.

Authors:  Walter F Wiggins; Felipe Kitamura; Igor Santos; Luciano M Prevedello
Journal:  Radiol Artif Intell       Date:  2021-05-12

3.  Machine learning based natural language processing of radiology reports in orthopaedic trauma.

Authors:  A W Olthof; P Shouche; E M Fennema; F F A IJpma; R H C Koolstra; V M A Stirler; P M A van Ooijen; L J Cornelissen
Journal:  Comput Methods Programs Biomed       Date:  2021-07-23       Impact factor: 5.428

4.  Highly accurate classification of chest radiographic reports using a deep learning natural language model pre-trained on 3.8 million text reports.

Authors:  Keno K Bressem; Lisa C Adams; Robert A Gaudin; Daniel Tröltzsch; Bernd Hamm; Marcus R Makowski; Chan-Yong Schüle; Janis L Vahldiek; Stefan M Niehues
Journal:  Bioinformatics       Date:  2021-01-29       Impact factor: 6.937

5.  Classification of CT pulmonary angiography reports by presence, chronicity, and location of pulmonary embolism with natural language processing.

Authors:  Sheng Yu; Kanako K Kumamaru; Elizabeth George; Ruth M Dunne; Arash Bedayat; Matey Neykov; Andetta R Hunsaker; Karin E Dill; Tianxi Cai; Frank J Rybicki
Journal:  J Biomed Inform       Date:  2014-08-10       Impact factor: 6.317

6.  Automatic Diagnosis Labeling of Cardiovascular MRI by Using Semisupervised Natural Language Processing of Text Reports.

Authors:  Sameer Zaman; Camille Petri; Kavitha Vimalesvaran; James Howard; Anil Bharath; Darrel Francis; Nicholas Peters; Graham D Cole; Nick Linton
Journal:  Radiol Artif Intell       Date:  2021-11-24

7.  RadBERT-CL: Factually-Aware Contrastive Learning For Radiology Report Classification.

Authors:  Ajay Jaiswal; Liyan Tang; Meheli Ghosh; Justin F Rousseau; Yifan Peng; Ying Ding
Journal:  Proc Mach Learn Res       Date:  2021-12

8.  Racial and Sex Disparities in Catheter Use and Dialysis Access in the United States Medicare Population.

Authors:  Shipra Arya; Taylor A Melanson; Elizabeth L George; Kara A Rothenberg; Manjula Kurella Tamura; Rachel E Patzer; Jason M Hockenberry
Journal:  J Am Soc Nephrol       Date:  2020-01-15       Impact factor: 10.121

9.  Preparing Medical Imaging Data for Machine Learning.

Authors:  Martin J Willemink; Wojciech A Koszek; Cailin Hardell; Jie Wu; Dominik Fleischmann; Hugh Harvey; Les R Folio; Ronald M Summers; Daniel L Rubin; Matthew P Lungren
Journal:  Radiology       Date:  2020-02-18       Impact factor: 11.105

10.  BioBERT: a pre-trained biomedical language representation model for biomedical text mining.

Authors:  Jinhyuk Lee; Wonjin Yoon; Sungdong Kim; Donghyeon Kim; Sunkyu Kim; Chan Ho So; Jaewoo Kang
Journal:  Bioinformatics       Date:  2020-02-15       Impact factor: 6.937

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.