Literature DB >> 33480858

ALBERT-Based Self-Ensemble Model With Semisupervised Learning and Data Augmentation for Clinical Semantic Textual Similarity Calculation: Algorithm Validation Study.

Junyi Li1, Xuejie Zhang1, Xiaobing Zhou1.   

Abstract

BACKGROUND: In recent years, with increases in the amount of information available and the importance of information screening, increased attention has been paid to the calculation of textual semantic similarity. In the field of medicine, electronic medical records and medical research documents have become important data resources for clinical research. Medical textual semantic similarity calculation has become an urgent problem to be solved.
OBJECTIVE: This research aims to solve 2 problems-(1) when the size of medical data sets is small, leading to insufficient learning with understanding of the models and (2) when information is lost in the process of long-distance propagation, causing the models to be unable to grasp key information.
METHODS: This paper combines a text data augmentation method and a self-ensemble ALBERT model under semisupervised learning to perform clinical textual semantic similarity calculations.
RESULTS: Compared with the methods in the 2019 National Natural Language Processing Clinical Challenges Open Health Natural Language Processing shared task Track on Clinical Semantic Textual Similarity, our method surpasses the best result by 2 percentage points and achieves a Pearson correlation coefficient of 0.92.
CONCLUSIONS: When the size of medical data set is small, data augmentation can increase the size of the data set and improved semisupervised learning can boost the learning efficiency of the model. Additionally, self-ensemble methods improve the model performance. Our method had excellent performance and has great potential to improve related medical problems. ©Junyi Li, Xuejie Zhang, Xiaobing Zhou. Originally published in JMIR Medical Informatics (http://medinform.jmir.org), 22.01.2021.

Entities:  

Keywords:  ALBERT; algorithm; clinical semantic textual similarity; data augmentation; data sets; model; self-ensemble; semantic; semisupervised

Year:  2021        PMID: 33480858      PMCID: PMC7864778          DOI: 10.2196/23086

Source DB:  PubMed          Journal:  JMIR Med Inform


  10 in total

1.  A comparison of word embeddings for the biomedical natural language processing.

Authors:  Yanshan Wang; Sijia Liu; Naveed Afzal; Majid Rastegar-Mojarad; Liwei Wang; Feichen Shen; Paul Kingsbury; Hongfang Liu
Journal:  J Biomed Inform       Date:  2018-09-12       Impact factor: 6.317

2.  The reliability of a two-item scale: Pearson, Cronbach, or Spearman-Brown?

Authors:  Rob Eisinga; Manfred te Grotenhuis; Ben Pelzer
Journal:  Int J Public Health       Date:  2012-10-23       Impact factor: 3.380

3.  Semi-supervised and unsupervised extreme learning machines.

Authors:  Gao Huang; Shiji Song; Jatinder N D Gupta; Cheng Wu
Journal:  IEEE Trans Cybern       Date:  2014-12       Impact factor: 11.448

4.  Deep Convolutional Neural Networks for Computer-Aided Detection: CNN Architectures, Dataset Characteristics and Transfer Learning.

Authors:  Hoo-Chang Shin; Holger R Roth; Mingchen Gao; Le Lu; Ziyue Xu; Isabella Nogues; Jianhua Yao; Daniel Mollura; Ronald M Summers
Journal:  IEEE Trans Med Imaging       Date:  2016-02-11       Impact factor: 10.048

5.  Semi Supervised Learning with Deep Embedded Clustering for Image Classification and Segmentation.

Authors:  Joseph Enguehard; Peter O'Halloran; Ali Gholipour
Journal:  IEEE Access       Date:  2019-01-09       Impact factor: 3.367

6.  Classification of lung nodules in CT scans using three-dimensional deep convolutional neural networks with a checkpoint ensemble method.

Authors:  Hwejin Jung; Bumsoo Kim; Inyeop Lee; Junhyun Lee; Jaewoo Kang
Journal:  BMC Med Imaging       Date:  2018-12-03       Impact factor: 1.930

7.  Distributed representation and one-hot representation fusion with gated network for clinical semantic textual similarity.

Authors:  Ying Xiong; Shuai Chen; Haoming Qin; He Cao; Yedan Shen; Xiaolong Wang; Qingcai Chen; Jun Yan; Buzhou Tang
Journal:  BMC Med Inform Decis Mak       Date:  2020-04-30       Impact factor: 2.796

8.  Categorization of Third-Party Apps in Electronic Health Record App Marketplaces: Systematic Search and Analysis.

Authors:  Jordon Ritchie; Brandon Welch
Journal:  JMIR Med Inform       Date:  2020-05-29

9.  A Gated Dilated Convolution with Attention Model for Clinical Cloze-Style Reading Comprehension.

Authors:  Bin Wang; Xuejie Zhang; Xiaobing Zhou; Junyi Li
Journal:  Int J Environ Res Public Health       Date:  2020-02-19       Impact factor: 3.390

10.  The 2019 n2c2/OHNLP Track on Clinical Semantic Textual Similarity: Overview.

Authors:  Yanshan Wang; Sunyang Fu; Feichen Shen; Sam Henry; Ozlem Uzuner; Hongfang Liu
Journal:  JMIR Med Inform       Date:  2020-11-27
  10 in total
  2 in total

1.  Identifying infected patients using semi-supervised and transfer learning.

Authors:  Fereshteh S Bashiri; John R Caskey; Anoop Mayampurath; Nicole Dussault; Jay Dumanian; Sivasubramanium V Bhavani; Kyle A Carey; Emily R Gilbert; Christopher J Winslow; Nirav S Shah; Dana P Edelson; Majid Afshar; Matthew M Churpek
Journal:  J Am Med Inform Assoc       Date:  2022-09-12       Impact factor: 7.942

2.  An Efficient Parallelized Ontology Network-Based Semantic Similarity Measure for Big Biomedical Document Clustering.

Authors:  Meijing Li; Tianjie Chen; Keun Ho Ryu; Cheng Hao Jin
Journal:  Comput Math Methods Med       Date:  2021-11-09       Impact factor: 2.238

  2 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.