Literature DB >> 15376912

Hierarchical learning architecture with automatic feature selection for multiclass protein fold classification.

Chuen-Der Huang1, Chin-Teng Lin, Nikhil Ranjan Pal.   

Abstract

The structure classification of proteins plays a very important role in bioinformatics, since the relationships and characteristics among those known proteins can be exploited to predict the structure of new proteins. The success of a classification system depends heavily on two things: the tools being used and the features considered. For the bioinformatics applications, the role of appropriate features has not been paid adequate importance. In this investigation we use three novel ideas for multiclass protein fold classification. First, we use the gating neural network, where each input node is associated with a gate. This network can select important features in an online manner when the learning goes on. At the beginning of the training, all gates are almost closed, i.e., no feature is allowed to enter the network. Through the training, gates corresponding to good features are completely opened while gates corresponding to bad features are closed more tightly, and some gates may be partially open. The second novel idea is to use a hierarchical learning architecture (HLA). The classifier in the first level of HLA classifies the protein features into four major classes: all alpha, all beta, alpha + beta, and alpha/beta. And in the next level we have another set of classifiers, which further classifies the protein features into 27 folds. The third novel idea is to induce the indirect coding features from the amino-acid composition sequence of proteins based on the N-gram concept. This provides us with more representative and discriminative new local features of protein sequences for multiclass protein fold classification. The proposed HLA with new indirect coding features increases the protein fold classification accuracy by about 12%. Moreover, the gating neural network is found to reduce the number of features drastically. Using only half of the original features selected by the gating neural network can reach comparable test accuracy as that using all the original features. The gating mechanism also helps us to get a better insight into the folding process of proteins. For example, tracking the evolution of different gates we can find which characteristics (features) of the data are more important for the folding process. And, of course, it also reduces the computation time.

Mesh:

Substances:

Year:  2003        PMID: 15376912     DOI: 10.1109/tnb.2003.820284

Source DB:  PubMed          Journal:  IEEE Trans Nanobioscience        ISSN: 1536-1241            Impact factor:   2.935


  7 in total

1.  Early diagnosis of systemic lupus erythmatosus using ANN models of dsDNA binding antibody sequence data.

Authors:  Mohamad Hasan Bahari; Mahmoud Mahmoudi; Asad Azemi; Mir Mojtaba Mirsalehi; Morteza Khademi
Journal:  Bioinformation       Date:  2010-07-06

2.  Building multiclass classifiers for remote homology detection and fold recognition.

Authors:  Huzefa Rangwala; George Karypis
Journal:  BMC Bioinformatics       Date:  2006-10-16       Impact factor: 3.169

3.  Intelligent screening systems for cervical cancer.

Authors:  Yessi Jusman; Siew Cheok Ng; Noor Azuan Abu Osman
Journal:  ScientificWorldJournal       Date:  2014-05-11

4.  An empirical study of different approaches for protein classification.

Authors:  Loris Nanni; Alessandra Lumini; Sheryl Brahnam
Journal:  ScientificWorldJournal       Date:  2014-06-15

5.  A protein structural study based on the centrality analysis of protein sequence feature networks.

Authors:  Xiaogeng Wan; Xinying Tan
Journal:  PLoS One       Date:  2021-03-29       Impact factor: 3.240

6.  A Tool Preference Choice Method for RNA Secondary Structure Prediction by SVM with Statistical Tests.

Authors:  Chiou-Yi Hor; Chang-Biau Yang; Chia-Hung Chang; Chiou-Ting Tseng; Hung-Hsin Chen
Journal:  Evol Bioinform Online       Date:  2013-04-14       Impact factor: 1.625

7.  A study on separation of the protein structural types in amino acid sequence feature spaces.

Authors:  Xiaogeng Wan; Xinying Tan
Journal:  PLoS One       Date:  2019-12-23       Impact factor: 3.240

  7 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.