Saritha Namboodiri1, Kripadas K. 1. Department of Computer Science, V.T.Bhattathiripad College, University of Calicut, Thenhipalam, Kerala 673635 India.
Abstract
SpecP is an open-source Python module that performs Spectral Partitioning on Protein Contact Graphs. Protein Contact Graphs are graph theory based representation of the protein structure, where each amino acid forms a 'vertex' and spatial contact of any two amino acids is an 'edge' between them. Spectral partitioning is carried out in SpecP based on the second smallest spectral value (eigen value) of the Protein Contact Graph. The eigen vector corresponding to the second smallest spectral value are partitioned into two clusters based on the sign of the corresponding vector entry. Spectral Partitioning algorithm is repeatedly carried out until the desired numbers of partitions are obtained. SpecP visualizes the spectrally partitioned clusters of protein structure along with the Protein Contact Map and Protein Contact Graph which can be saved for later use. It also possesses an interactive mode whereby the user has the ability to zoom, pan, resize and save these raster images in various image formats (.eps, .jpg, .png) manually. SpecP is a stand-alone extensible tool useful for structural analysis of proteins.
SpecP is an open-source Python module that performs Spectral Partitioning on Protein Contact Graphs. Protein Contact Graphs are graph theory based representation of the protein structure, where each amino acid forms a 'vertex' and spatial contact of any two amino acids is an 'edge' between them. Spectral partitioning is carried out in SpecP based on the second smallest spectral value (eigen value) of the Protein Contact Graph. The eigen vector corresponding to the second smallest spectral value are partitioned into two clusters based on the sign of the corresponding vector entry. Spectral Partitioning algorithm is repeatedly carried out until the desired numbers of partitions are obtained. SpecP visualizes the spectrally partitioned clusters of protein structure along with the Protein Contact Map and Protein Contact Graph which can be saved for later use. It also possesses an interactive mode whereby the user has the ability to zoom, pan, resize and save these raster images in various image formats (.eps, .jpg, .png) manually. SpecP is a stand-alone extensible tool useful for structural analysis of proteins.
Spectral partitioning is a graph partition algorithm which
partitions data represented in the form of a graph G = (V,E),
with V vertices and E edges, into smaller components with
specific properties [1]. Spectral partitioning has gained
momentum in recent times due to its simplicity and better
performance. They have been successfully applied in protein
science [2].Proteins are linear, ordered chain of amino acids that fold by
virtue of chemical forces to form a 3D structure. Coarse grain
models of proteins using graph theory are spawned to gain
insight into the structures of proteins. Proteins are depicted as
graph with the amino acids as nodes and the positional
information of the Csub>αsub> atoms that form the backbone of protein
structure, as edge connectivity or ‘contact’. This graph-theory
based network forms the Protein Contact Graph [3]. The Protein
Contact Graph of the first 10 nodes for the protein (PDB id
4q21) is as shown in (Figure 1). Protein Contact Maps are a
reduced representation of Protein Contact Graph [4], providing
a quick way of visually inspecting structural features. A contact
map or adjacency matrix is a square matrix M where M=1 if the
distance between Cα atoms of residues i and j is below cut-off
threshold atomic distance, or M=0 otherwise. The cut-off
atomic threshold distance is > 8 Ǻ for long-range contact
network, between 4Ǻ and 8Ǻ (inclusive) for medium-range
contact network and < 4 Ǻ for short-range contact networks.
Figure 1
Protein Contact Network (first 10 nodes) of the protein
(PDB id 4q21).
The Spectral Partitioning algorithm is applied on the Protein
Contact Map obtained from Protein Contact Graph. The
algorithm considers the eigen vectors (Fiedler vector) of the
second smallest eigen value that yields a lower bound on the
optimal cost of ratio-cut partition and bisects the graph into two
disjoint sets based on the sign of the corresponding vector entry
[5]. The algorithm is repeated until desired numbers of
partitions are obtained. SpecP can generate Spectral Partitions,
Protein Contact Graph and Protein Contact Map that can be
visualized and the saved for later use.
Methodology
Computing the adjacency matrix is the first step in Spectral
partitioning. The adjacency matrix for the first 10 amino acids of
the protein with PDB id 4q21 is given in (Figure 2).
Figure 2
Representation of Adjacency matrix (A), Diagnol
matrix (D), and Laplacian matrix (L).
To compute the Laplacian matrix from the adjacency matrix, the
degree matrix must be obtained. Degree matrix is a diagonal
matrix that holds the degree of each vertex of a graph. All the
elements of a degree matrix are 0 except for the diagonal
elements. The degree of each vertex is computed by summing
up each row (that correspond to a vertex) of the adjacency
matrix and placing it in the diagonal element of that row. The
degree matrix of the above adjacency matrix is given in
(Figure 2).Spectral partitioning which takes the Laplacian matrix, is worked
out as L=D-A, where D is diagonal degree matrix and A is the
adjacency matrix.Eigen values and eigenvectors of the Laplacian matrix are
computed. Spectral partitioning makes use of the spectral value,
its corresponding eigen vectors contain all significant
topological information about the graph. The eigen values and
corresponding eigen vector of the Laplacian matrix is as shown
in Table 1 (see supplementary material).The Fiedler vector bisects the graph into two partitions based
on the sign of the corresponding vector entry. By repeatedly
applying Spectral partitioning algorithm, the desired numbers of
partitions can be obtained,
Software Input
The user provides the PDB (Protein Data Bank) file that can
either be uploaded from the http://www.rcsb.org (PDB) site or
from the local disk. The clustering is performed on activating
the ‘Spectral Partitioning’ button. The user is prompted to select
the threshold atomic distance before partitioning. The ‘Cluster
Again’button will result in partitioning the selected partition.
This can be continued until the desired number of partition is
obtained. The Graphical user interface of SpecP tool is specified
in (Figure 3).
Figure 3
Graphical user interface of SpecP
Software Output
Figure 4 (A), 4(B), & 4(C), depicts the screenshot of the first two
Spectral Partitions of the protein (PDB id 4q21) generated from
SpecP tool along with its Protein Contact Network and Protein
Contact Map.
Figure 4
A) Two clusters obtained by spectral partitioning; B) The Protein Contact Map; C) Protein Contact Network of protein
(PDB id 4q21).
The ‘Cluster Again’ option spectrally partitions the selected
partition. (Figure 5), displays the SpecP tool producing 2, 3, 4, 5
partitions of the protein (PDB id4Q21).
Figure 5
Spectral Partitioning of the protein (PDB id4Q21) into 2, 3, 4, 5 partitions
Caveat & Future development
SpecP is a stand-alone package developed in Python 2.6 on
Windows platform. GUI was designed using Tkinter. Numpy
and Scipy modules were used for scientific computing, Igraph,
Networkx modules were added to create and manipulate
graphs, and to represent and manipulate complex network
structures respectively. MatplotLib and Pylab were imported to
plot data and generate output in different formats.Future version will be a web-based application capable of
performing spectral partitioning on the basis of surrounding
hydrophobicity and build to compute clustering coefficient,
cyclic coefficient, characteristic path length, associative
coefficient and triangle density from the of the Protein Contact
Network generated.