Literature DB >> 35696075

FEPS: A Tool for Feature Extraction from Protein Sequence.

Hamid Ismail1, Clarence White2, Hussam Al-Barakati3, Robert H Newman4, Dukka B Kc5.   

Abstract

Machine learning has become one of the most popular choices for developing computational approaches in protein structural bioinformatics. The ability to extract features from protein sequence/structure often becomes one of the crucial steps for the development of machine learning-based approaches. Over the years, various sequence, structural, and physicochemical descriptors have been developed for proteins and these descriptors have been used to predict/solve various bioinformatics problems. Hence, several feature extraction tools have been developed over the years to help researchers to generate numeric features from protein sequences. Most of these tools have some limitations regarding the number of sequences they can handle and the subsequent preprocessing that is required for the generated features before they can be fed to machine learning methods. Here, we present Feature Extraction from Protein Sequences (FEPS), a toolkit for feature extraction. FEPS is a versatile software package for generating various descriptors from protein sequences and can handle several sequences: the number of which is limited only by the computational resources. In addition, the features extracted from FEPS do not require subsequent processing and are ready to be fed to the machine learning techniques as it provides various output formats as well as the ability to concatenate these generated features. FEPS is made freely available via an online web server as well as a stand-alone toolkit. FEPS, a comprehensive toolkit for feature extraction, will help spur the development of machine learning-based models for various bioinformatics problems.
© 2022. The Author(s), under exclusive license to Springer Science+Business Media, LLC, part of Springer Nature.

Entities:  

Keywords:  Feature extraction; Machine learning; Posttranslational modifications; Protein descriptors; Sequence-based features

Mesh:

Substances:

Year:  2022        PMID: 35696075     DOI: 10.1007/978-1-0716-2317-6_3

Source DB:  PubMed          Journal:  Methods Mol Biol        ISSN: 1064-3745


  60 in total

1.  Classification of nuclear receptors based on amino acid composition and dipeptide composition.

Authors:  Manoj Bhasin; Gajendra P S Raghava
Journal:  J Biol Chem       Date:  2004-03-23       Impact factor: 5.157

2.  Predicting protein subnuclear localization using GO-amino-acid composition features.

Authors:  Wen-Lin Huang; Chun-Wei Tung; Hui-Ling Huang; Shinn-Ying Ho
Journal:  Biosystems       Date:  2009-07-05       Impact factor: 1.973

3.  Using neural networks for prediction of the subcellular location of proteins.

Authors:  A Reinhardt; T Hubbard
Journal:  Nucleic Acids Res       Date:  1998-05-01       Impact factor: 16.971

4.  PredHydroxy: computational prediction of protein hydroxylation site locations based on the primary structure.

Authors:  Shao-Ping Shi; Xiang Chen; Hao-Dong Xu; Jian-Ding Qiu
Journal:  Mol Biosyst       Date:  2014-12-23

5.  RF-NR: Random Forest Based Approach for Improved Classification of Nuclear Receptors.

Authors:  Hamid D Ismail; Hiroto Saigo; Dukka B Kc
Journal:  IEEE/ACM Trans Comput Biol Bioinform       Date:  2017-11-14       Impact factor: 3.710

Review 6.  Prediction of the secondary structure of proteins from their amino acid sequence.

Authors:  P Y Chou; G D Fasman
Journal:  Adv Enzymol Relat Areas Mol Biol       Date:  1978

7.  Prediction of protein secondary structure at better than 70% accuracy.

Authors:  B Rost; C Sander
Journal:  J Mol Biol       Date:  1993-07-20       Impact factor: 5.469

8.  Classification of proteins into groups based on amino acid composition and other characters. II. Grouping into four types.

Authors:  K Nishikawa; Y Kubota; T Ooi
Journal:  J Biochem       Date:  1983-09       Impact factor: 3.387

9.  Combining evolutionary information and neural networks to predict protein secondary structure.

Authors:  B Rost; C Sander
Journal:  Proteins       Date:  1994-05

10.  RF-Hydroxysite: a random forest based predictor for hydroxylation sites.

Authors:  Hamid D Ismail; Robert H Newman; Dukka B Kc
Journal:  Mol Biosyst       Date:  2016-07-19
View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.