Literature DB >> 26130574

ProFET: Feature engineering captures high-level protein functions.

Dan Ofer1, Michal Linial1.   

Abstract

MOTIVATION: The amount of sequenced genomes and proteins is growing at an unprecedented pace. Unfortunately, manual curation and functional knowledge lag behind. Homologous inference often fails at labeling proteins with diverse functions and broad classes. Thus, identifying high-level protein functionality remains challenging. We hypothesize that a universal feature engineering approach can yield classification of high-level functions and unified properties when combined with machine learning approaches, without requiring external databases or alignment.
RESULTS: In this study, we present a novel bioinformatics toolkit called ProFET (Protein Feature Engineering Toolkit). ProFET extracts hundreds of features covering the elementary biophysical and sequence derived attributes. Most features capture statistically informative patterns. In addition, different representations of sequences and the amino acids alphabet provide a compact, compressed set of features. The results from ProFET were incorporated in data analysis pipelines, implemented in python and adapted for multi-genome scale analysis. ProFET was applied on 17 established and novel protein benchmark datasets involving classification for a variety of binary and multi-class tasks. The results show state of the art performance. The extracted features' show excellent biological interpretability. The success of ProFET applies to a wide range of high-level functions such as subcellular localization, structural classes and proteins with unique functional properties (e.g. neuropeptide precursors, thermophilic and nucleic acid binding). ProFET allows easy, universal discovery of new target proteins, as well as understanding the features underlying different high-level protein functions.
AVAILABILITY AND IMPLEMENTATION: ProFET source code and the datasets used are freely available at https://github.com/ddofer/ProFET. CONTACT: michall@cc.huji.ac.il SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
© The Author 2015. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.

Mesh:

Substances:

Year:  2015        PMID: 26130574     DOI: 10.1093/bioinformatics/btv345

Source DB:  PubMed          Journal:  Bioinformatics        ISSN: 1367-4803            Impact factor:   6.937


  18 in total

Review 1.  Predictive analytics in mental health: applications, guidelines, challenges and perspectives.

Authors:  T Hahn; A A Nierenberg; S Whitfield-Gabrieli
Journal:  Mol Psychiatry       Date:  2016-11-15       Impact factor: 15.992

2.  NetGO: improving large-scale protein function prediction with massive network information.

Authors:  Ronghui You; Shuwei Yao; Yi Xiong; Xiaodi Huang; Fengzhu Sun; Hiroshi Mamitsuka; Shanfeng Zhu
Journal:  Nucleic Acids Res       Date:  2019-07-02       Impact factor: 16.971

3.  Learned protein embeddings for machine learning.

Authors:  Kevin K Yang; Zachary Wu; Claire N Bedbrook; Frances H Arnold
Journal:  Bioinformatics       Date:  2018-08-01       Impact factor: 6.937

4.  A Multi-Label Supervised Topic Model Conditioned on Arbitrary Features for Gene Function Prediction.

Authors:  Lin Liu; Lin Tang; Xin Jin; Wei Zhou
Journal:  Genes (Basel)       Date:  2019-01-17       Impact factor: 4.096

5.  Quantifying gene selection in cancer through protein functional alteration bias.

Authors:  Nadav Brandes; Nathan Linial; Michal Linial
Journal:  Nucleic Acids Res       Date:  2019-07-26       Impact factor: 16.971

6.  MathFeature: feature extraction package for DNA, RNA and protein sequences based on mathematical descriptors.

Authors:  Robson P Bonidia; Douglas S Domingues; Danilo S Sanches; André C P L F de Carvalho
Journal:  Brief Bioinform       Date:  2022-01-17       Impact factor: 11.622

7.  Cluster learning-assisted directed evolution.

Authors:  Yuchi Qiu; Jian Hu; Guo-Wei Wei
Journal:  Nat Comput Sci       Date:  2021-12-09

8.  FFPred 3: feature-based function prediction for all Gene Ontology domains.

Authors:  Domenico Cozzetto; Federico Minneci; Hannah Currant; David T Jones
Journal:  Sci Rep       Date:  2016-08-26       Impact factor: 4.379

9.  iFeatureOmega: an integrative platform for engineering, visualization and analysis of features from molecular sequences, structural and ligand data sets.

Authors:  Zhen Chen; Xuhan Liu; Pei Zhao; Chen Li; Yanan Wang; Fuyi Li; Tatsuya Akutsu; Chris Bain; Robin B Gasser; Junzhou Li; Zuoren Yang; Xin Gao; Lukasz Kurgan; Jiangning Song
Journal:  Nucleic Acids Res       Date:  2022-05-07       Impact factor: 19.160

Review 10.  Recent applications of deep learning and machine intelligence on in silico drug discovery: methods, tools and databases.

Authors:  Ahmet Sureyya Rifaioglu; Heval Atas; Maria Jesus Martin; Rengul Cetin-Atalay; Volkan Atalay; Tunca Doğan
Journal:  Brief Bioinform       Date:  2019-09-27       Impact factor: 11.622

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.