Literature DB >> 35854741

MetBERT: a generalizable and pre-trained deep learning model for the prediction of metastatic cancer from clinical notes.

Ke Liu1,2,3, Omkar Kulkarni4,3, Martin Witteveen-Lane4, Bin Chen1,2,5, Dave Chesla4,6,5.   

Abstract

Distant metastasis is the major cause of cancer-related deaths; however, early diagnosis of cancer metastasis remains a significant challenge. The recent advances in pre-trained natural language processing models coupled with the accumulation of publicly available Electronic Health Records (EHR) data provide an unprecedented opportunity to computationally tackle the challenge. Here, we fine-tuned multiple state-of-the-art BERT-based models using discharge summaries from the open MIMIC-III dataset and derived MetBERT, a novel model tailored to predict cancer metastasis from clinical notes. MetBERT achieved high performance (AUC=0.94) on our in-house validation dataset, suggesting its high generalizability. In addition, MetBERT enabled determining the date of cancer metastasis using the rich information in clinical notes and therefore could be potentially deployed as a tool for early diagnosis. Finally, we interpreted MetBERT at different scales and revealed a possible association between radiation therapy and metastasis risk in multiple cancer types. ©2022 AMIA - All rights reserved.

Entities:  

Mesh:

Year:  2022        PMID: 35854741      PMCID: PMC9285138     

Source DB:  PubMed          Journal:  AMIA Annu Symp Proc        ISSN: 1559-4076


  6 in total

1.  Semi-supervised learning of the electronic health record for phenotype stratification.

Authors:  Brett K Beaulieu-Jones; Casey S Greene
Journal:  J Biomed Inform       Date:  2016-10-12       Impact factor: 6.317

Review 2.  Emerging Biological Principles of Metastasis.

Authors:  Arthur W Lambert; Diwakar R Pattabiraman; Robert A Weinberg
Journal:  Cell       Date:  2017-02-09       Impact factor: 41.582

3.  Comparing deep learning and concept extraction based methods for patient phenotyping from clinical narratives.

Authors:  Sebastian Gehrmann; Franck Dernoncourt; Yeran Li; Eric T Carlson; Joy T Wu; Jonathan Welt; John Foote; Edward T Moseley; David W Grant; Patrick D Tyler; Leo A Celi
Journal:  PLoS One       Date:  2018-02-15       Impact factor: 3.240

4.  Combining deep learning with token selection for patient phenotyping from electronic health records.

Authors:  Zhen Yang; Matthias Dehmer; Olli Yli-Harja; Frank Emmert-Streib
Journal:  Sci Rep       Date:  2020-01-29       Impact factor: 4.379

5.  MIMIC-III, a freely accessible critical care database.

Authors:  Alistair E W Johnson; Tom J Pollard; Lu Shen; Li-Wei H Lehman; Mengling Feng; Mohammad Ghassemi; Benjamin Moody; Peter Szolovits; Leo Anthony Celi; Roger G Mark
Journal:  Sci Data       Date:  2016-05-24       Impact factor: 6.444

6.  BioBERT: a pre-trained biomedical language representation model for biomedical text mining.

Authors:  Jinhyuk Lee; Wonjin Yoon; Sungdong Kim; Donghyeon Kim; Sunkyu Kim; Chan Ho So; Jaewoo Kang
Journal:  Bioinformatics       Date:  2020-02-15       Impact factor: 6.937

  6 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.