Literature DB >> 24117330

Smoothing 3D protein structure motifs through graph mining and amino acid similarities.

Wajdi Dhifli1, Rabie Saidi, Engelbert Mephu Nguifo.   

Abstract

One of the most powerful techniques to study proteins is to look for recurrent fragments (also called substructures), then use them as patterns to characterize the proteins under study. Although protein sequences have been extensively studied in the literature, studying protein three-dimensional (3D) structures can reveal relevant structural and functional information that may not be derived from protein sequences alone. An emergent trend consists of parsing proteins 3D structures into graphs of amino acids. Hence, the search of recurrent substructures is formulated as a process of frequent subgraph discovery where each subgraph represents a 3D motif. In this scope, several efficient approaches for frequent 3D motif discovery have been proposed in the literature. However, the set of discovered 3D motifs is too large to be efficiently analyzed and explored in any further process. In this article, we propose a novel pattern selection approach that shrinks the large number of frequent 3D motifs by selecting a subset of representative ones. Existing pattern selection approaches do not exploit the domain knowledge. Yet, in our approach, we incorporate the evolutionary information of amino acids defined in the substitution matrices in order to select the representative 3D motifs. We show the effectiveness of our approach on a number of real datasets. The results issued from our experiments show that considering the substitution between amino acids allows our approach to detect many similarities between patterns that are ignored by current subgraph selection approaches, and that it is able to considerably decrease the number of 3D motifs while enhancing their interestingness.

Mesh:

Substances:

Year:  2013        PMID: 24117330     DOI: 10.1089/cmb.2013.0092

Source DB:  PubMed          Journal:  J Comput Biol        ISSN: 1066-5277            Impact factor:   1.479


  3 in total

1.  Mining the entire Protein DataBank for frequent spatially cohesive amino acid patterns.

Authors:  Pieter Meysman; Cheng Zhou; Boris Cule; Bart Goethals; Kris Laukens
Journal:  BioData Min       Date:  2015-01-31       Impact factor: 2.522

2.  ProtNN: fast and accurate protein 3D-structure classification in structural and topological space.

Authors:  Wajdi Dhifli; Abdoulaye Baniré Diallo
Journal:  BioData Min       Date:  2016-09-23       Impact factor: 2.522

Review 3.  Grasping frequent subgraph mining for bioinformatics applications.

Authors:  Aida Mrzic; Pieter Meysman; Wout Bittremieux; Pieter Moris; Boris Cule; Bart Goethals; Kris Laukens
Journal:  BioData Min       Date:  2018-09-03       Impact factor: 2.522

  3 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.