Literature DB >> 33693454

The stacking strategy-based hybrid framework for identifying non-coding RNAs.

Xin Wang1, Yang Yang1, Jian Liu1, Guohua Wang1.   

Abstract

With the development of next-generation sequencing technology, a large number of transcripts need to be analyzed, and it has been a challenge to distinguish non-coding ribonucleic acid (RNAs) (ncRNAs) from coding RNAs. And for non-model organisms, due to the lack of transcriptional data, many existing methods cannot identify them. Therefore, in addition to using deoxyribonucleic acid-based and RNA-based features, we also proposed a hybrid framework based on the stacking strategy to identify ncRNAs, and we innovatively added eight features based on predicted peptides. The proposed framework was based on stacking two-layer classifier which combined random forest (RF), LightGBM, XGBoost and logistic regression (LR) models. We used this framework to build two types of models. For cross-species ncRNAs identification model, we tested it on six different species: human, mouse, zebrafish, fruit fly, worm and Arabidopsis. Compared with other tools, our model was the best in datasets of Arabidopsis, worm and zebrafish with the accuracy of 98.36%, 99.65% and 94.12%. For performance metrics analysis, the datasets of the six species were considered as a whole set, and the sensitivity, accuracy, precision and F1 values of our model were the best. For the plant-specific ncRNAs identification model, the average values of the six metrics of the two experiments were all greater than 95%, which demonstrated it can be used to identify ncRNAs in plants. The above indicates that the hybrid framework we designed is universal between animals and plants and has significant advantages in the identification of cross-species ncRNAs.
© The Author(s) 2021. Published by Oxford University Press. All rights reserved. For Permissions, please email: journals.permissions@oup.com.

Entities:  

Keywords:  machine learning; ncRNAs identification; predicted peptide; stacking strategy

Year:  2021        PMID: 33693454     DOI: 10.1093/bib/bbab023

Source DB:  PubMed          Journal:  Brief Bioinform        ISSN: 1467-5463            Impact factor:   11.622


  10 in total

1.  Prediction of Plant Resistance Proteins Based on Pairwise Energy Content and Stacking Framework.

Authors:  Yifan Chen; Zejun Li; Zhiyong Li
Journal:  Front Plant Sci       Date:  2022-05-31       Impact factor: 6.627

2.  SNAREs-SAP: SNARE Proteins Identification With PSSM Profiles.

Authors:  Zixiao Zhang; Yue Gong; Bo Gao; Hongfei Li; Wentao Gao; Yuming Zhao; Benzhi Dong
Journal:  Front Genet       Date:  2021-12-20       Impact factor: 4.599

3.  Pseudo-188D: Phage Protein Prediction Based on a Model of Pseudo-188D.

Authors:  Xiaomei Gu; Lina Guo; Bo Liao; Qinghua Jiang
Journal:  Front Genet       Date:  2021-12-01       Impact factor: 4.599

4.  iAIPs: Identifying Anti-Inflammatory Peptides Using Random Forest.

Authors:  Dongxu Zhao; Zhixia Teng; Yanjuan Li; Dong Chen
Journal:  Front Genet       Date:  2021-11-30       Impact factor: 4.599

5.  KK-DBP: A Multi-Feature Fusion Method for DNA-Binding Protein Identification Based on Random Forest.

Authors:  Yuran Jia; Shan Huang; Tianjiao Zhang
Journal:  Front Genet       Date:  2021-11-29       Impact factor: 4.599

6.  Immunoglobulin Classification Based on FC* and GC* Features.

Authors:  Hao Wan; Jina Zhang; Yijie Ding; Hetian Wang; Geng Tian
Journal:  Front Genet       Date:  2022-01-24       Impact factor: 4.599

7.  A SNARE Protein Identification Method Based on iLearnPlus to Efficiently Solve the Data Imbalance Problem.

Authors:  Dong Ma; Zhihua Chen; Zhanpeng He; Xueqin Huang
Journal:  Front Genet       Date:  2022-01-28       Impact factor: 4.599

Review 8.  MoRF-FUNCpred: Molecular Recognition Feature Function Prediction Based on Multi-Label Learning and Ensemble Learning.

Authors:  Haozheng Li; Yihe Pang; Bin Liu; Liang Yu
Journal:  Front Pharmacol       Date:  2022-03-08       Impact factor: 5.810

9.  Identifying and Classifying Enhancers by Dinucleotide-Based Auto-Cross Covariance and Attention-Based Bi-LSTM.

Authors:  Shulin Zhao; Qingfeng Pan; Quan Zou; Ying Ju; Lei Shi; Xi Su
Journal:  Comput Math Methods Med       Date:  2022-04-05       Impact factor: 2.238

10.  VTP-Identifier: Vesicular Transport Proteins Identification Based on PSSM Profiles and XGBoost.

Authors:  Yue Gong; Benzhi Dong; Zixiao Zhang; Yixiao Zhai; Bo Gao; Tianjiao Zhang; Jingyu Zhang
Journal:  Front Genet       Date:  2022-01-03       Impact factor: 4.599

  10 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.