Literature DB >> 33581474

GT-Finder: Classify the family of glucose transporters with pre-trained BERT language models.

Syed Muazzam Ali Shah1, Semmy Wellem Taju1, Quang-Thai Ho1, Trinh-Trung-Duong Nguyen1, Yu-Yen Ou2.   

Abstract

Recently, language representation models have drawn a lot of attention in the field of natural language processing (NLP) due to their remarkable results. Among them, BERT (Bidirectional Encoder Representations from Transformers) has proven to be a simple, yet powerful language model that has achieved novel state-of-the-art performance. BERT adopted the concept of contextualized word embeddings to capture the semantics and context in which words appear. We utilized pre-trained BERT models to extract features from protein sequences for discriminating three families of glucose transporters: the major facilitator superfamily of glucose transporters (GLUTs), the sodium-glucose linked transporters (SGLTs), and the sugars will eventually be exported transporters (SWEETs). We treated protein sequences as sentences and transformed them into fixed-length meaningful vectors where a 768- or 1024-dimensional vector represents each amino acid. We observed that BERT-Base and BERT-Large models improved the performance by more than 4% in terms of average sensitivity and Matthews correlation coefficient (MCC), indicating the efficiency of this approach. We also developed a bidirectional transformer-based protein model (TransportersBERT) for comparison with existing pre-trained BERT models.
Copyright © 2021. Published by Elsevier Ltd.

Entities:  

Keywords:  BERT; Bidirectional encoder representations from transformers; Contextualized word embedding; Feature importance; Glucose transporter

Year:  2021        PMID: 33581474     DOI: 10.1016/j.compbiomed.2021.104259

Source DB:  PubMed          Journal:  Comput Biol Med        ISSN: 0010-4825            Impact factor:   4.589


  4 in total

1.  ProtTrans-Glutar: Incorporating Features From Pre-trained Transformer-Based Models for Predicting Glutarylation Sites.

Authors:  Fatma Indriani; Kunti Robiatul Mahmudah; Bedy Purnama; Kenji Satou
Journal:  Front Genet       Date:  2022-05-31       Impact factor: 4.772

2.  Development and multicenter validation of chest X-ray radiography interpretations based on natural language processing.

Authors:  Yaping Zhang; Mingqian Liu; Shundong Hu; Yao Shen; Jun Lan; Beibei Jiang; Geertruida H de Bock; Rozemarijn Vliegenthart; Xu Chen; Xueqian Xie
Journal:  Commun Med (Lond)       Date:  2021-10-28

3.  BERT-PPII: The Polyproline Type II Helix Structure Prediction Model Based on BERT and Multichannel CNN.

Authors:  Chuang Feng; Zhen Wang; Guokun Li; Xiaohan Yang; Nannan Wu; Lei Wang
Journal:  Biomed Res Int       Date:  2022-08-24       Impact factor: 3.246

4.  ISTRF: Identification of sucrose transporter using random forest.

Authors:  Dong Chen; Sai Li; Yu Chen
Journal:  Front Genet       Date:  2022-09-12       Impact factor: 4.772

  4 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.