Literature DB >> 30850831

PyFeat: a Python-based effective feature generation tool for DNA, RNA and protein sequences.

Rafsanjani Muhammod1, Sajid Ahmed1, Dewan Md Farid1, Swakkhar Shatabda1, Alok Sharma2,3,4, Abdollah Dehzangi5.   

Abstract

MOTIVATION: Extracting useful feature set which contains significant discriminatory information is a critical step in effectively presenting sequence data to predict structural, functional, interaction and expression of proteins, DNAs and RNAs. Also, being able to filter features with significant information and avoid sparsity in the extracted features require the employment of efficient feature selection techniques. Here we present PyFeat as a practical and easy to use toolkit implemented in Python for extracting various features from proteins, DNAs and RNAs. To build PyFeat we mainly focused on extracting features that capture information about the interaction of neighboring residues to be able to provide more local information. We then employ AdaBoost technique to select features with maximum discriminatory information. In this way, we can significantly reduce the number of extracted features and enable PyFeat to represent the combination of effective features from large neighboring residues. As a result, PyFeat is able to extract features from 13 different techniques and represent context free combination of effective features. The source code for PyFeat standalone toolkit and employed benchmarks with a comprehensive user manual explaining its system and workflow in a step by step manner are publicly available.
RESULTS: https://github.com/mrzResearchArena/PyFeat/blob/master/RESULTS.md.
AVAILABILITY AND IMPLEMENTATION: Toolkit, source code and manual to use PyFeat: https://github.com/mrzResearchArena/PyFeat/. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
© The Author(s) 2019. Published by Oxford University Press. All rights reserved. For permissions, please e-mail: journals.permissions@oup.com.

Mesh:

Substances:

Year:  2019        PMID: 30850831      PMCID: PMC6761934          DOI: 10.1093/bioinformatics/btz165

Source DB:  PubMed          Journal:  Bioinformatics        ISSN: 1367-4803            Impact factor:   6.937


  10 in total

1.  iRecSpot-EF: Effective sequence based features for recombination hotspot prediction.

Authors:  Md Rafsan Jani; Md Toha Khan Mozlish; Sajid Ahmed; Niger Sultana Tahniat; Dewan Md Farid; Swakkhar Shatabda
Journal:  Comput Biol Med       Date:  2018-10-11       Impact factor: 4.589

2.  Identifying Sigma70 Promoters with Novel Pseudo Nucleotide Composition.

Authors:  Hao Lin; Zhi-Yong Liang; Hua Tang; Wei Chen
Journal:  IEEE/ACM Trans Comput Biol Bioinform       Date:  2017-02-08       Impact factor: 3.710

3.  iFeature: a Python package and web server for features extraction and selection from protein and peptide sequences.

Authors:  Zhen Chen; Pei Zhao; Fuyi Li; André Leier; Tatiana T Marquez-Lago; Yanan Wang; Geoffrey I Webb; A Ian Smith; Roger J Daly; Kuo-Chen Chou; Jiangning Song
Journal:  Bioinformatics       Date:  2018-07-15       Impact factor: 6.937

4.  BioSeq-Analysis: a platform for DNA, RNA and protein sequence analysis based on machine learning approaches.

Authors:  Bin Liu
Journal:  Brief Bioinform       Date:  2019-07-19       Impact factor: 11.622

5.  Pse-in-One: a web server for generating various modes of pseudo components of DNA, RNA, and protein sequences.

Authors:  Bin Liu; Fule Liu; Xiaolong Wang; Junjie Chen; Longyun Fang; Kuo-Chen Chou
Journal:  Nucleic Acids Res       Date:  2015-05-09       Impact factor: 16.971

6.  propy: a tool to generate various modes of Chou's PseAAC.

Authors:  Dong-Sheng Cao; Qing-Song Xu; Yi-Zeng Liang
Journal:  Bioinformatics       Date:  2013-02-19       Impact factor: 6.937

7.  PAI: Predicting adenosine to inosine editing sites by using pseudo nucleotide compositions.

Authors:  Wei Chen; Pengmian Feng; Hui Ding; Hao Lin
Journal:  Sci Rep       Date:  2016-10-11       Impact factor: 4.379

8.  iDNAProt-ES: Identification of DNA-binding Proteins Using Evolutionary and Structural Features.

Authors:  Shahana Yasmin Chowdhury; Swakkhar Shatabda; Abdollah Dehzangi
Journal:  Sci Rep       Date:  2017-11-02       Impact factor: 4.379

9.  Pse-Analysis: a python package for DNA/RNA and protein/ peptide sequence analysis based on pseudo components and kernel methods.

Authors:  Bin Liu; Hao Wu; Deyuan Zhang; Xiaolong Wang; Kuo-Chen Chou
Journal:  Oncotarget       Date:  2017-02-21

10.  Enhanced regulatory sequence prediction using gapped k-mer features.

Authors:  Mahmoud Ghandi; Dongwon Lee; Morteza Mohammad-Noori; Michael A Beer
Journal:  PLoS Comput Biol       Date:  2014-07-17       Impact factor: 4.475

  10 in total
  17 in total

1.  MathFeature: feature extraction package for DNA, RNA and protein sequences based on mathematical descriptors.

Authors:  Robson P Bonidia; Douglas S Domingues; Danilo S Sanches; André C P L F de Carvalho
Journal:  Brief Bioinform       Date:  2022-01-17       Impact factor: 11.622

2.  BioAutoML: automated feature engineering and metalearning to predict noncoding RNAs in bacteria.

Authors:  Robson P Bonidia; Anderson P Avila Santos; Breno L S de Almeida; Peter F Stadler; Ulisses N da Rocha; Danilo S Sanches; André C P L F de Carvalho
Journal:  Brief Bioinform       Date:  2022-07-18       Impact factor: 13.994

3.  TACOS: a novel approach for accurate prediction of cell-specific long noncoding RNAs subcellular localization.

Authors:  Young-Jun Jeon; Md Mehedi Hasan; Hyun Woo Park; Ki Wook Lee; Balachandran Manavalan
Journal:  Brief Bioinform       Date:  2022-07-18       Impact factor: 13.994

4.  BoT-Net: a lightweight bag of tricks-based neural network for efficient LncRNA-miRNA interaction prediction.

Authors:  Muhammad Nabeel Asim; Muhammad Ali Ibrahim; Christoph Zehe; Johan Trygg; Andreas Dengel; Sheraz Ahmed
Journal:  Interdiscip Sci       Date:  2022-08-10       Impact factor: 3.492

5.  Geographic encoding of transcripts enabled high-accuracy and isoform-aware deep learning of RNA methylation.

Authors:  Daiyun Huang; Kunqi Chen; Bowen Song; Zhen Wei; Jionglong Su; Frans Coenen; João Pedro de Magalhães; Daniel J Rigden; Jia Meng
Journal:  Nucleic Acids Res       Date:  2022-10-14       Impact factor: 19.160

6.  iFeatureOmega: an integrative platform for engineering, visualization and analysis of features from molecular sequences, structural and ligand data sets.

Authors:  Zhen Chen; Xuhan Liu; Pei Zhao; Chen Li; Yanan Wang; Fuyi Li; Tatsuya Akutsu; Chris Bain; Robin B Gasser; Junzhou Li; Zuoren Yang; Xin Gao; Lukasz Kurgan; Jiangning Song
Journal:  Nucleic Acids Res       Date:  2022-05-07       Impact factor: 19.160

7.  Prediction of m5C Modifications in RNA Sequences by Combining Multiple Sequence Features.

Authors:  Lijun Dou; Xiaoling Li; Hui Ding; Lei Xu; Huaikun Xiang
Journal:  Mol Ther Nucleic Acids       Date:  2020-06-10       Impact factor: 8.886

8.  Deep learning for HGT insertion sites recognition.

Authors:  Chen Li; Jiaxing Chen; Shuai Cheng Li
Journal:  BMC Genomics       Date:  2020-12-29       Impact factor: 3.969

9.  IRC-Fuse: improved and robust prediction of redox-sensitive cysteine by fusing of multiple feature representations.

Authors:  Md Mehedi Hasan; Md Ashad Alam; Watshara Shoombuatong; Hiroyuki Kurata
Journal:  J Comput Aided Mol Des       Date:  2021-01-04       Impact factor: 3.686

10.  Identifying Heat Shock Protein Families from Imbalanced Data by Using Combined Features.

Authors:  Xiao-Yang Jing; Feng-Min Li
Journal:  Comput Math Methods Med       Date:  2020-09-23       Impact factor: 2.238

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.