Literature DB >> 34581805

BioSeq-BLM: a platform for analyzing DNA, RNA and protein sequences based on biological language models.

Hong-Liang Li1, Yi-He Pang1, Bin Liu1,2.   

Abstract

In order to uncover the meanings of 'book of life', 155 different biological language models (BLMs) for DNA, RNA and protein sequence analysis are discussed in this study, which are able to extract the linguistic properties of 'book of life'. We also extend the BLMs into a system called BioSeq-BLM for automatically representing and analyzing the sequence data. Experimental results show that the predictors generated by BioSeq-BLM achieve comparable or even obviously better performance than the exiting state-of-the-art predictors published in literatures, indicating that BioSeq-BLM will provide new approaches for biological sequence analysis based on natural language processing technologies, and contribute to the development of this very important field. In order to help the readers to use BioSeq-BLM for their own experiments, the corresponding web server and stand-alone package are established and released, which can be freely accessed at http://bliulab.net/BioSeq-BLM/.
© The Author(s) 2021. Published by Oxford University Press on behalf of Nucleic Acids Research.

Entities:  

Mesh:

Substances:

Year:  2021        PMID: 34581805      PMCID: PMC8682797          DOI: 10.1093/nar/gkab829

Source DB:  PubMed          Journal:  Nucleic Acids Res        ISSN: 0305-1048            Impact factor:   16.971


  65 in total

1.  Mismatch string kernels for discriminative protein classification.

Authors:  Christina S Leslie; Eleazar Eskin; Adiel Cohen; Jason Weston; William Stafford Noble
Journal:  Bioinformatics       Date:  2004-01-22       Impact factor: 6.937

2.  Predicting the sequence specificities of DNA- and RNA-binding proteins by deep learning.

Authors:  Babak Alipanahi; Andrew Delong; Matthew T Weirauch; Brendan J Frey
Journal:  Nat Biotechnol       Date:  2015-07-27       Impact factor: 54.908

3.  Long short-term memory.

Authors:  S Hochreiter; J Schmidhuber
Journal:  Neural Comput       Date:  1997-11-15       Impact factor: 2.026

4.  iDHS-EL: identifying DNase I hypersensitive sites by fusing three different modes of pseudo nucleotide composition into an ensemble learning framework.

Authors:  Bin Liu; Ren Long; Kuo-Chen Chou
Journal:  Bioinformatics       Date:  2016-04-08       Impact factor: 6.937

5.  Improving protein disorder prediction by deep bidirectional long short-term memory recurrent neural networks.

Authors:  Jack Hanson; Yuedong Yang; Kuldip Paliwal; Yaoqi Zhou
Journal:  Bioinformatics       Date:  2017-03-01       Impact factor: 6.937

6.  iLearnPlus: a comprehensive and automated machine-learning platform for nucleic acid and protein sequence analysis, prediction and visualization.

Authors:  Zhen Chen; Pei Zhao; Chen Li; Fuyi Li; Dongxu Xiang; Yong-Zi Chen; Tatsuya Akutsu; Roger J Daly; Geoffrey I Webb; Quanzhi Zhao; Lukasz Kurgan; Jiangning Song
Journal:  Nucleic Acids Res       Date:  2021-06-04       Impact factor: 16.971

7.  BioSeq-Analysis2.0: an updated platform for analyzing DNA, RNA and protein sequences at sequence level and residue level based on machine learning approaches.

Authors:  Bin Liu; Xin Gao; Hanyu Zhang
Journal:  Nucleic Acids Res       Date:  2019-11-18       Impact factor: 16.971

8.  M6AMRFS: Robust Prediction of N6-Methyladenosine Sites With Sequence-Based Features in Multiple Species.

Authors:  Xiaoli Qiang; Huangrong Chen; Xiucai Ye; Ran Su; Leyi Wei
Journal:  Front Genet       Date:  2018-10-25       Impact factor: 4.599

9.  Using recursive feature elimination in random forest to account for correlated variables in high dimensional data.

Authors:  Burcu F Darst; Kristen C Malecki; Corinne D Engelman
Journal:  BMC Genet       Date:  2018-09-17       Impact factor: 2.797

10.  SPOT-Disorder2: Improved Protein Intrinsic Disorder Prediction by Ensembled Deep Learning.

Authors:  Jack Hanson; Kuldip K Paliwal; Thomas Litfin; Yaoqi Zhou
Journal:  Genomics Proteomics Bioinformatics       Date:  2020-03-13       Impact factor: 7.691

View more
  20 in total

1.  BioAutoML: automated feature engineering and metalearning to predict noncoding RNAs in bacteria.

Authors:  Robson P Bonidia; Anderson P Avila Santos; Breno L S de Almeida; Peter F Stadler; Ulisses N da Rocha; Danilo S Sanches; André C P L F de Carvalho
Journal:  Brief Bioinform       Date:  2022-07-18       Impact factor: 13.994

2.  TACOS: a novel approach for accurate prediction of cell-specific long noncoding RNAs subcellular localization.

Authors:  Young-Jun Jeon; Md Mehedi Hasan; Hyun Woo Park; Ki Wook Lee; Balachandran Manavalan
Journal:  Brief Bioinform       Date:  2022-07-18       Impact factor: 13.994

3.  iPro-WAEL: a comprehensive and robust framework for identifying promoters in multiple species.

Authors:  Pengyu Zhang; Hongming Zhang; Hao Wu
Journal:  Nucleic Acids Res       Date:  2022-10-14       Impact factor: 19.160

4.  Deepm5C: A deep-learning-based hybrid framework for identifying human RNA N5-methylcytosine sites using a stacking strategy.

Authors:  Md Mehedi Hasan; Sho Tsukiyama; Jae Youl Cho; Hiroyuki Kurata; Md Ashad Alam; Xiaowen Liu; Balachandran Manavalan; Hong-Wen Deng
Journal:  Mol Ther       Date:  2022-05-06       Impact factor: 12.910

5.  Gene-Based Testing of Interactions Using XGBoost in Genome-Wide Association Studies.

Authors:  Yingjie Guo; Chenxi Wu; Zhian Yuan; Yansu Wang; Zhen Liang; Yang Wang; Yi Zhang; Lei Xu
Journal:  Front Cell Dev Biol       Date:  2021-12-16

6.  Immunoglobulin Classification Based on FC* and GC* Features.

Authors:  Hao Wan; Jina Zhang; Yijie Ding; Hetian Wang; Geng Tian
Journal:  Front Genet       Date:  2022-01-24       Impact factor: 4.599

7.  A SNARE Protein Identification Method Based on iLearnPlus to Efficiently Solve the Data Imbalance Problem.

Authors:  Dong Ma; Zhihua Chen; Zhanpeng He; Xueqin Huang
Journal:  Front Genet       Date:  2022-01-28       Impact factor: 4.599

8.  An Effective Hypoxia-Related Long Non-Coding RNA Assessment Model for Prognosis of Lung Adenocarcinoma.

Authors:  Yuanshuai Li; Xiaofang Sun
Journal:  Front Genet       Date:  2022-03-16       Impact factor: 4.599

9.  VTP-Identifier: Vesicular Transport Proteins Identification Based on PSSM Profiles and XGBoost.

Authors:  Yue Gong; Benzhi Dong; Zixiao Zhang; Yixiao Zhai; Bo Gao; Tianjiao Zhang; Jingyu Zhang
Journal:  Front Genet       Date:  2022-01-03       Impact factor: 4.599

10.  BigFiRSt: A Software Program Using Big Data Technique for Mining Simple Sequence Repeats From Large-Scale Sequencing Data.

Authors:  Jinxiang Chen; Fuyi Li; Miao Wang; Junlong Li; Tatiana T Marquez-Lago; André Leier; Jerico Revote; Shuqin Li; Quanzhong Liu; Jiangning Song
Journal:  Front Big Data       Date:  2022-01-18
View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.