Rafsanjani Muhammod1, Sajid Ahmed1, Dewan Md Farid1, Swakkhar Shatabda1, Alok Sharma2,3,4, Abdollah Dehzangi5. 1. Department of Computer Science and Engineering, United International University, Dhaka, Bangladesh. 2. School of Engineering and Physics, University of the South Pacific, Private Mail Bag, Laucala Campus, Suva, Fiji. 3. RIKEN Center for Integrative Medical Sciences, Yokohama, Kanagawa, Japan. 4. Institite for Integrated and Intelligent Systems, Griffith University, Brisbane, Queensland, Australia. 5. Department of Computer Science, Morgan State University, Baltimore, MD, USA.
Abstract
MOTIVATION: Extracting useful feature set which contains significant discriminatory information is a critical step in effectively presenting sequence data to predict structural, functional, interaction and expression of proteins, DNAs and RNAs. Also, being able to filter features with significant information and avoid sparsity in the extracted features require the employment of efficient feature selection techniques. Here we present PyFeat as a practical and easy to use toolkit implemented in Python for extracting various features from proteins, DNAs and RNAs. To build PyFeat we mainly focused on extracting features that capture information about the interaction of neighboring residues to be able to provide more local information. We then employ AdaBoost technique to select features with maximum discriminatory information. In this way, we can significantly reduce the number of extracted features and enable PyFeat to represent the combination of effective features from large neighboring residues. As a result, PyFeat is able to extract features from 13 different techniques and represent context free combination of effective features. The source code for PyFeat standalone toolkit and employed benchmarks with a comprehensive user manual explaining its system and workflow in a step by step manner are publicly available. RESULTS: https://github.com/mrzResearchArena/PyFeat/blob/master/RESULTS.md. AVAILABILITY AND IMPLEMENTATION: Toolkit, source code and manual to use PyFeat: https://github.com/mrzResearchArena/PyFeat/. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
MOTIVATION: Extracting useful feature set which contains significant discriminatory information is a critical step in effectively presenting sequence data to predict structural, functional, interaction and expression of proteins, DNAs and RNAs. Also, being able to filter features with significant information and avoid sparsity in the extracted features require the employment of efficient feature selection techniques. Here we present PyFeat as a practical and easy to use toolkit implemented in Python for extracting various features from proteins, DNAs and RNAs. To build PyFeat we mainly focused on extracting features that capture information about the interaction of neighboring residues to be able to provide more local information. We then employ AdaBoost technique to select features with maximum discriminatory information. In this way, we can significantly reduce the number of extracted features and enable PyFeat to represent the combination of effective features from large neighboring residues. As a result, PyFeat is able to extract features from 13 different techniques and represent context free combination of effective features. The source code for PyFeat standalone toolkit and employed benchmarks with a comprehensive user manual explaining its system and workflow in a step by step manner are publicly available. RESULTS: https://github.com/mrzResearchArena/PyFeat/blob/master/RESULTS.md. AVAILABILITY AND IMPLEMENTATION: Toolkit, source code and manual to use PyFeat: https://github.com/mrzResearchArena/PyFeat/. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
Authors: Zhen Chen; Pei Zhao; Fuyi Li; André Leier; Tatiana T Marquez-Lago; Yanan Wang; Geoffrey I Webb; A Ian Smith; Roger J Daly; Kuo-Chen Chou; Jiangning Song Journal: Bioinformatics Date: 2018-07-15 Impact factor: 6.937
Authors: Robson P Bonidia; Douglas S Domingues; Danilo S Sanches; André C P L F de Carvalho Journal: Brief Bioinform Date: 2022-01-17 Impact factor: 11.622
Authors: Robson P Bonidia; Anderson P Avila Santos; Breno L S de Almeida; Peter F Stadler; Ulisses N da Rocha; Danilo S Sanches; André C P L F de Carvalho Journal: Brief Bioinform Date: 2022-07-18 Impact factor: 13.994
Authors: Muhammad Nabeel Asim; Muhammad Ali Ibrahim; Christoph Zehe; Johan Trygg; Andreas Dengel; Sheraz Ahmed Journal: Interdiscip Sci Date: 2022-08-10 Impact factor: 3.492