Warning: Undefined array key "mm" in /www/wwwroot/www.ai-bt.com/si.php on line 10 Deprecated: trim(): Passing null to parameter #1 ($string) of type string is deprecated in /www/wwwroot/www.ai-bt.com/si.php on line 10 SubFeat: Feature subspacing ensemble classifier for function prediction of DNA, RNA and protein sequences.

Literature DB >> 33932779

SubFeat: Feature subspacing ensemble classifier for function prediction of DNA, RNA and protein sequences.

H M Fazlul Haque¹, Muhammod Rafsanjani¹, Fariha Arifin¹, Sheikh Adilina¹, Swakkhar Shatabda².

Abstract

The information of a cell is primarily contained in deoxyribonucleic acid (DNA). There is a flow of DNA information to protein sequences via ribonucleic acids (RNA) through transcription and translation. These entities are vital for the genetic process. Recent epigenetics developments also show the importance of the genetic material and knowledge of their attributes and functions. However, the growth in these entities' available features or functionalities is still slow due to the time-consuming and expensive in vitro experimental methods. In this paper, we have proposed an ensemble classification algorithm called SubFeat to predict biological entities' functionalities from different types of datasets. Our model uses a feature subspace-based novel ensemble method. It divides the feature space into sub-spaces, which are then passed to learn individual classifier models. The ensemble is built on these base classifiers that use a weighted majority voting mechanism. SubFeat tested on four datasets comprising two DNA, one RNA, and one protein dataset, and it outperformed all the existing single classifiers and the ensemble classifiers. SubFeat is made available as a Python-based tool. We have made the package SubFeat available online along with a user manual. It is freely accessible from here: https://github.com/fazlulhaquejony/SubFeat.

Keywords: Biological entities; Classification; Ensemble classifier; Feature subspacing; Machine learning

Year: 2021 PMID： 33932779 DOI： 10.1016/j.compbiolchem.2021.107489

Source DB: PubMed Journal: Comput Biol Chem ISSN： 1476-9271 Impact factor: 2.877

Keyword Cloud
Cited

3 in total

1. MathFeature: feature extraction package for DNA, RNA and protein sequences based on mathematical descriptors.

Authors: Robson P Bonidia; Douglas S Domingues; Danilo S Sanches; André C P L F de Carvalho
Journal: Brief Bioinform Date: 2022-01-17 Impact factor: 11.622

Review 2. Predicting Parkinson disease related genes based on PyFeat and gradient boosted decision tree.

Authors: Marwa Helmy; Eman Eldaydamony; Nagham Mekky; Mohammed Elmogy; Hassan Soliman
Journal: Sci Rep Date: 2022-06-15 Impact factor: 4.996

3. Integrated Analysis of Multiomics Data Identified Molecular Subtypes and Oxidative Stress-Related Prognostic Biomarkers in Glioblastoma Multiforme.

Authors: Yawen Ma; Zhuo Xi
Journal: Oxid Med Cell Longev Date: 2022-09-22 Impact factor: 7.310

3 in total