Literature DB >> 14992521

Accurate classification of protein structural families using coherent subgraph analysis.

J Huan1, W Wang, A Washington, J Prins, R Shah, A Tropsha.   

Abstract

Protein structural annotation and classification is an important problem in bioinformatics. We report on the development of an efficient subgraph mining technique and its application to finding characteristic substructural patterns within protein structural families. In our method, protein structures are represented by graphs where the nodes are residues and the edges connect residues found within certain distance from each other. Application of subgraph mining to proteins is challenging for a number reasons: (1) protein graphs are large and complex, (2) current protein databases are large and continue to grow rapidly, and (3) only a small fraction of the frequent subgraphs among the huge pool of all possible subgraphs could be significant in the context of protein classification. To address these challenges, we have developed an information theoretic model called coherent subgraph mining. From information theory, the entropy of a random variable X measures the information content carried by X and the Mutual Information (MI) between two random variables X and Y measures the correlation between X and Y. We define a subgraph X as coherent if it is strongly correlated with every sufficiently large sub-subgraph Y embedded in it. Based on the MI metric, we have designed a search scheme that only reports coherent subgraphs. To determine the significance of coherent protein subgraphs, we have conducted an experimental study in which all coherent subgraphs were identified in several protein structural families annotated in the SCOP database (Murzin et al, 1995). The Support Vector Machine algorithm was used to classify proteins from different families under the binary classification scheme. We find that this approach identifies spatial motifs unique to individual SCOP families and affords excellent discrimination between families.

Mesh:

Substances:

Year:  2004        PMID: 14992521     DOI: 10.1142/9789812704856_0039

Source DB:  PubMed          Journal:  Pac Symp Biocomput        ISSN: 2335-6928


  6 in total

1.  Characterizing the regularity of tetrahedral packing motifs in protein tertiary structure.

Authors:  Ryan Day; Kristin P Lennox; David B Dahl; Marina Vannucci; Jerry W Tsai
Journal:  Bioinformatics       Date:  2010-11-02       Impact factor: 6.937

2.  GPD: a graph pattern diffusion kernel for accurate graph classification with applications in cheminformatics.

Authors:  Aaron Smalter; Jun Luke Huan; Yi Jia; Gerald Lushington
Journal:  IEEE/ACM Trans Comput Biol Bioinform       Date:  2010 Apr-Jun       Impact factor: 3.710

3.  GPM: A Graph Pattern Matching Kernel with Diffusion for Chemical Compound Classification.

Authors:  Aaron Smalter; Jun Huan; Gerald Lushington
Journal:  Proc IEEE Int Symp Bioinformatics Bioeng       Date:  2008-12-08

4.  Discrimination of thermophilic and mesophilic proteins.

Authors:  Todd J Taylor; Iosif I Vaisman
Journal:  BMC Struct Biol       Date:  2010-05-17

5.  STRALCP--structure alignment-based clustering of proteins.

Authors:  Adam Zemla; Brian Geisbrecht; Jason Smith; Marisa Lam; Bonnie Kirkpatrick; Mark Wagner; Tom Slezak; Carol Ecale Zhou
Journal:  Nucleic Acids Res       Date:  2007-11-26       Impact factor: 16.971

Review 6.  Neuronal Graphs: A Graph Theory Primer for Microscopic, Functional Networks of Neurons Recorded by Calcium Imaging.

Authors:  Carl J Nelson; Stephen Bonner
Journal:  Front Neural Circuits       Date:  2021-06-10       Impact factor: 3.492

  6 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.