Literature DB >> 23861573

SpecP: A tool for spectral partitioning of protein contact graph.

Saritha Namboodiri1, Kripadas K.   

Abstract

SpecP is an open-source Python module that performs Spectral Partitioning on Protein Contact Graphs. Protein Contact Graphs are graph theory based representation of the protein structure, where each amino acid forms a 'vertex' and spatial contact of any two amino acids is an 'edge' between them. Spectral partitioning is carried out in SpecP based on the second smallest spectral value (eigen value) of the Protein Contact Graph. The eigen vector corresponding to the second smallest spectral value are partitioned into two clusters based on the sign of the corresponding vector entry. Spectral Partitioning algorithm is repeatedly carried out until the desired numbers of partitions are obtained. SpecP visualizes the spectrally partitioned clusters of protein structure along with the Protein Contact Map and Protein Contact Graph which can be saved for later use. It also possesses an interactive mode whereby the user has the ability to zoom, pan, resize and save these raster images in various image formats (.eps, .jpg, .png) manually. SpecP is a stand-alone extensible tool useful for structural analysis of proteins.

Entities:  

Year:  2013        PMID: 23861573      PMCID: PMC3705632          DOI: 10.6026/97320630009545

Source DB:  PubMed          Journal:  Bioinformation        ISSN: 0973-2063


Background

Spectral partitioning is a graph partition algorithm which partitions data represented in the form of a graph G = (V,E), with V vertices and E edges, into smaller components with specific properties [1]. Spectral partitioning has gained momentum in recent times due to its simplicity and better performance. They have been successfully applied in protein science [2]. Proteins are linear, ordered chain of amino acids that fold by virtue of chemical forces to form a 3D structure. Coarse grain models of proteins using graph theory are spawned to gain insight into the structures of proteins. Proteins are depicted as graph with the amino acids as nodes and the positional information of the Csub>αsub> atoms that form the backbone of protein structure, as edge connectivity or ‘contact’. This graph-theory based network forms the Protein Contact Graph [3]. The Protein Contact Graph of the first 10 nodes for the protein (PDB id 4q21) is as shown in (Figure 1). Protein Contact Maps are a reduced representation of Protein Contact Graph [4], providing a quick way of visually inspecting structural features. A contact map or adjacency matrix is a square matrix M where M=1 if the distance between Cα atoms of residues i and j is below cut-off threshold atomic distance, or M=0 otherwise. The cut-off atomic threshold distance is > 8 Ǻ for long-range contact network, between 4Ǻ and 8Ǻ (inclusive) for medium-range contact network and < 4 Ǻ for short-range contact networks.
Figure 1

Protein Contact Network (first 10 nodes) of the protein (PDB id 4q21).

The Spectral Partitioning algorithm is applied on the Protein Contact Map obtained from Protein Contact Graph. The algorithm considers the eigen vectors (Fiedler vector) of the second smallest eigen value that yields a lower bound on the optimal cost of ratio-cut partition and bisects the graph into two disjoint sets based on the sign of the corresponding vector entry [5]. The algorithm is repeated until desired numbers of partitions are obtained. SpecP can generate Spectral Partitions, Protein Contact Graph and Protein Contact Map that can be visualized and the saved for later use.

Methodology

Computing the adjacency matrix is the first step in Spectral partitioning. The adjacency matrix for the first 10 amino acids of the protein with PDB id 4q21 is given in (Figure 2).
Figure 2

Representation of Adjacency matrix (A), Diagnol matrix (D), and Laplacian matrix (L).

To compute the Laplacian matrix from the adjacency matrix, the degree matrix must be obtained. Degree matrix is a diagonal matrix that holds the degree of each vertex of a graph. All the elements of a degree matrix are 0 except for the diagonal elements. The degree of each vertex is computed by summing up each row (that correspond to a vertex) of the adjacency matrix and placing it in the diagonal element of that row. The degree matrix of the above adjacency matrix is given in (Figure 2). Spectral partitioning which takes the Laplacian matrix, is worked out as L=D-A, where D is diagonal degree matrix and A is the adjacency matrix. Eigen values and eigenvectors of the Laplacian matrix are computed. Spectral partitioning makes use of the spectral value, its corresponding eigen vectors contain all significant topological information about the graph. The eigen values and corresponding eigen vector of the Laplacian matrix is as shown in Table 1 (see supplementary material). The Fiedler vector bisects the graph into two partitions based on the sign of the corresponding vector entry. By repeatedly applying Spectral partitioning algorithm, the desired numbers of partitions can be obtained,

Software Input

The user provides the PDB (Protein Data Bank) file that can either be uploaded from the http://www.rcsb.org (PDB) site or from the local disk. The clustering is performed on activating the ‘Spectral Partitioning’ button. The user is prompted to select the threshold atomic distance before partitioning. The ‘Cluster Again’button will result in partitioning the selected partition. This can be continued until the desired number of partition is obtained. The Graphical user interface of SpecP tool is specified in (Figure 3).
Figure 3

Graphical user interface of SpecP

Software Output

Figure 4 (A), 4(B), & 4(C), depicts the screenshot of the first two Spectral Partitions of the protein (PDB id 4q21) generated from SpecP tool along with its Protein Contact Network and Protein Contact Map.
Figure 4

A) Two clusters obtained by spectral partitioning; B) The Protein Contact Map; C) Protein Contact Network of protein (PDB id 4q21).

The ‘Cluster Again’ option spectrally partitions the selected partition. (Figure 5), displays the SpecP tool producing 2, 3, 4, 5 partitions of the protein (PDB id4Q21).
Figure 5

Spectral Partitioning of the protein (PDB id4Q21) into 2, 3, 4, 5 partitions

Caveat & Future development

SpecP is a stand-alone package developed in Python 2.6 on Windows platform. GUI was designed using Tkinter. Numpy and Scipy modules were used for scientific computing, Igraph, Networkx modules were added to create and manipulate graphs, and to represent and manipulate complex network structures respectively. MatplotLib and Pylab were imported to plot data and generate output in different formats. Future version will be a web-based application capable of performing spectral partitioning on the basis of surrounding hydrophobicity and build to compute clustering coefficient, cyclic coefficient, characteristic path length, associative coefficient and triangle density from the of the Protein Contact Network generated.
  3 in total

1.  Finding community structure in networks using the eigenvectors of matrices.

Authors:  M E J Newman
Journal:  Phys Rev E Stat Nonlin Soft Matter Phys       Date:  2006-09-11

2.  PConPy--a Python module for generating 2D protein maps.

Authors:  Hui Kian Ho; Michael J Kuiper; Ramamohanarao Kotagiri
Journal:  Bioinformatics       Date:  2008-10-31       Impact factor: 6.937

Review 3.  Protein contact networks: an emerging paradigm in chemistry.

Authors:  L Di Paola; M De Ruvo; P Paci; D Santoni; A Giuliani
Journal:  Chem Rev       Date:  2012-11-27       Impact factor: 60.622

  3 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.