Warning: Undefined array key "mm" in /www/wwwroot/www.ai-bt.com/si.php on line 10 Deprecated: trim(): Passing null to parameter #1 ($string) of type string is deprecated in /www/wwwroot/www.ai-bt.com/si.php on line 10 Performance of Multiple Pretrained BERT Models to Automate and Accelerate Data Annotation for Large Datasets.

Literature DB >> 35923377

Performance of Multiple Pretrained BERT Models to Automate and Accelerate Data Annotation for Large Datasets.

Ali S Tejani¹, Yee S Ng¹, Yin Xi¹, Julia R Fielding¹, Travis G Browning¹, Jesse C Rayan¹.

Abstract

Purpose: To develop and evaluate domain-specific and pretrained bidirectional encoder representations from transformers (BERT) models in a transfer learning task on varying training dataset sizes to annotate a larger overall dataset. Materials and
Methods: The authors retrospectively reviewed 69 095 anonymized adult chest radiograph reports (reports dated April 2020-March 2021). From the overall cohort, 1004 reports were randomly selected and labeled for the presence or absence of each of the following devices: endotracheal tube (ETT), enterogastric tube (NGT, or Dobhoff tube), central venous catheter (CVC), and Swan-Ganz catheter (SGC). Pretrained transformer models (BERT, PubMedBERT, DistilBERT, RoBERTa, and DeBERTa) were trained, validated, and tested on 60%, 20%, and 20%, respectively, of these reports through fivefold cross-validation. Additional training involved varying dataset sizes with 5%, 10%, 15%, 20%, and 40% of the 1004 reports. The best-performing epochs were used to assess area under the receiver operating characteristic curve (AUC) and determine run time on the overall dataset.
Results: The highest average AUCs from fivefold cross-validation were 0.996 for ETT (RoBERTa), 0.994 for NGT (RoBERTa), 0.991 for CVC (PubMedBERT), and 0.98 for SGC (PubMedBERT). DeBERTa demonstrated the highest AUC for each support device trained on 5% of the training set. PubMedBERT showed a higher AUC with a decreasing training set size compared with BERT. Training and validation time was shortest for DistilBERT at 3 minutes 39 seconds on the annotated cohort.
Conclusion: Pretrained and domain-specific transformer models required small training datasets and short training times to create a highly accurate final model that expedites autonomous annotation of large datasets.Keywords: Informatics, Named Entity Recognition, Transfer Learning Supplemental material is available for this article. ©RSNA, 2022See also the commentary by Zech in this issue.

Entities: Chemical

Keywords: Informatics; Named Entity Recognition; Transfer Learning

Year: 2022 PMID： 35923377 PMCID： PMC9344209 DOI： 10.1148/ryai.220007

Source DB: PubMed Journal: Radiol Artif Intell ISSN： 2638-6100

Keyword Cloud
References

19 in total

1. Automated classification of radiology reports to facilitate retrospective study in radiology.

Authors: Yihua Zhou; Per K Amundson; Fang Yu; Marcus M Kessler; Tammie L S Benzinger; Franz J Wippold
Journal: J Digit Imaging Date: 2014-12 Impact factor: 4.056

2. Natural Language Processing of Radiology Text Reports: Interactive Text Classification.

Authors: Walter F Wiggins; Felipe Kitamura; Igor Santos; Luciano M Prevedello
Journal: Radiol Artif Intell Date: 2021-05-12

3. Machine learning based natural language processing of radiology reports in orthopaedic trauma.

Authors: A W Olthof; P Shouche; E M Fennema; F F A IJpma; R H C Koolstra; V M A Stirler; P M A van Ooijen; L J Cornelissen
Journal: Comput Methods Programs Biomed Date: 2021-07-23 Impact factor: 5.428

4. Highly accurate classification of chest radiographic reports using a deep learning natural language model pre-trained on 3.8 million text reports.

Authors: Keno K Bressem; Lisa C Adams; Robert A Gaudin; Daniel Tröltzsch; Bernd Hamm; Marcus R Makowski; Chan-Yong Schüle; Janis L Vahldiek; Stefan M Niehues
Journal: Bioinformatics Date: 2021-01-29 Impact factor: 6.937

5. Classification of CT pulmonary angiography reports by presence, chronicity, and location of pulmonary embolism with natural language processing.

Authors: Sheng Yu; Kanako K Kumamaru; Elizabeth George; Ruth M Dunne; Arash Bedayat; Matey Neykov; Andetta R Hunsaker; Karin E Dill; Tianxi Cai; Frank J Rybicki
Journal: J Biomed Inform Date: 2014-08-10 Impact factor: 6.317

6. Automatic Diagnosis Labeling of Cardiovascular MRI by Using Semisupervised Natural Language Processing of Text Reports.

Authors: Sameer Zaman; Camille Petri; Kavitha Vimalesvaran; James Howard; Anil Bharath; Darrel Francis; Nicholas Peters; Graham D Cole; Nick Linton
Journal: Radiol Artif Intell Date: 2021-11-24

Performance of Multiple Pretrained BERT Models to Automate and Accelerate Data Annotation for Large Datasets.

1. Automated classification of radiology reports to facilitate retrospective study in radiology.

2. Natural Language Processing of Radiology Text Reports: Interactive Text Classification.

3. Machine learning based natural language processing of radiology reports in orthopaedic trauma.

4. Highly accurate classification of chest radiographic reports using a deep learning natural language model pre-trained on 3.8 million text reports.

5. Classification of CT pulmonary angiography reports by presence, chronicity, and location of pulmonary embolism with natural language processing.

6. Automatic Diagnosis Labeling of Cardiovascular MRI by Using Semisupervised Natural Language Processing of Text Reports.

7. RadBERT-CL: Factually-Aware Contrastive Learning For Radiology Report Classification.

8. Racial and Sex Disparities in Catheter Use and Dialysis Access in the United States Medicare Population.

9. Preparing Medical Imaging Data for Machine Learning.

10. BioBERT: a pre-trained biomedical language representation model for biomedical text mining.