Literature DB >> 27330281

PDBparam: Online Resource for Computing Structural Parameters of Proteins.

R Nagarajan¹, A Archana¹, A Mary Thangakani², S Jemimah¹, D Velmurugan², M Michael Gromiha¹.

Abstract

Understanding the structure-function relationship in proteins is a longstanding goal in molecular and computational biology. The development of structure-based parameters has helped to relate the structure with the function of a protein. Although several structural features have been reported in the literature, no single server can calculate a wide-ranging set of structure-based features from protein three-dimensional structures. In this work, we have developed a web-based tool, PDBparam, for computing more than 50 structure-based features for any given protein structure. These features are classified into four major categories: (i) interresidue interactions, which include short-, medium-, and long-range interactions, contact order, long-range order, total contact distance, contact number, and multiple contact index, (ii) secondary structure propensities such as α-helical propensity, β-sheet propensity, and propensity of amino acids to exist at various positions of α-helix and amino acid compositions in high B-value regions, (iii) physicochemical properties containing ionic interactions, hydrogen bond interactions, hydrophobic interactions, disulfide interactions, aromatic interactions, surrounding hydrophobicity, and buriedness, and (iv) identification of binding site residues in protein-protein, protein-nucleic acid, and protein-ligand complexes. The server can be freely accessed at http://www.iitm.ac.in/bioinfo/pdbparam/. We suggest the use of PDBparam as an effective tool for analyzing protein structures.

Entities: Chemical Disease Gene Species

Keywords: binding sites; physicochemical properties; protein three-dimensional structure; secondary structure propensity

Year: 2016 PMID： 27330281 PMCID： PMC4909059 DOI： 10.4137/BBI.S38423

Source DB: PubMed Journal: Bioinform Biol Insights ISSN： 1177-9322

Introduction

It is widely accepted that the structure of a protein dictates its function.1 Most studies of protein structure and function rely on the analysis of the crystal structure of proteins. This is done by calculating various structure-based parameters, which have been developed to describe the folding, stability, and functions of proteins and their complexes, such as the nature of interactions among the amino acid residues and the surrounding solvent molecules, the preferred amino acid residues in the protein environment, the location of residues in the interior/surface of the protein, and the amino acid clusters.2 These parameters focus on specific aspects of the protein structure and are described in the literature. For instance, Lee and Richards3 developed the concept of solvent accessibility of amino acid residues. Chou and Fasman4 studied the secondary structures of proteins and deduced the propensity of amino acid residues present in α-helices, β-strands, and turns. Thornton’s group developed several algorithms for identifying ion pairs, hydrogen bonds, and catalytic sites in proteins.5–7 Manavalan and Ponnuswamy8 proposed the concept of surrounding hydrophobicity to characterize the hydrophobic behavior of amino acid residues in the protein environment. Plaxco et al.9 analyzed the contacts between amino acid residues and developed the concept of contact order (CO) to relate the folding rates of two-state proteins. Gromiha and Selvaraj10 considered contacts that are close in space but far away in the sequence and proposed long-range order (LRO) as a parameter for understanding protein-folding rates. This concept was refined by developing multiple contact index, ie, residues having multiple contacts in two- and three-state proteins.11 Methods are also available to identify binding site residues in protein complexes based on distances between atoms, energetic contributions, and changes in accessible surface area upon binding.12–14 Many standalone programs and online servers (such as DSSP,15 NACCESS,16 HYDROPRO,17 HYDRONMR,18 GETAREA,19 SCide,20 ContPro,21 CAPTURE,22 HBPLUS,23 CALCOM,24 PSAP,25 and SBPS26) are available to calculate various structural parameters. For instance, DSSP15 provides information on the secondary structure and accessible surface area of each amino acid residue in a protein. CALCOM is used to locate residues in the interior and surface based on the distance between the residues and the calculated center of mass of the given protein or peptide chain.24 Tina et al.27 developed a server, protein interactions calculator, to calculate the center of mass, hydrogen bond interactions, hydrophobic interactions, aromatic–aromatic interactions, aromatic–sulfur interactions, and cation–π interactions. Kozma et al.28 developed a server to obtain the contact map for any given protein. Magyar et al.29 utilized the concept of surrounding hydrophobicity, LRO, stabilization center, and conservation scores to identify the stabilizing residues in protein structures. ExPASy30 is a collection of tools on various bioinformatic aspects including proteomics, genomics, structural bioinformatics, and systems biology. PDBsum31 provides pictorial analyses of several structural features of proteins, DNA, and ligands, as well as the interactions between them. Although a number of structural parameters have been described in the literature and can be calculated using various servers and standalone programs, no single server exists to calculate a diverse set of parameters and provide the output in a standard format. Hence, we have developed a web server, PDB-param (http://www.iitm.ac.in/bioinfo/pdbparam/), to calculate the following four distinct groups of properties: (i) physicochemical properties, (ii) secondary structure propensities, (iii) interresidue interactions, and (iv) identification of binding site residues in protein–DNA/RNA, protein–ligand, and protein–protein complexes. The server and the properties calculated are explained later.

Materials and Methods

A brief description of the properties under the four categories (physicochemical properties, secondary structure propensities, interresidue interactions, and binding site residues in protein complexes) is provided in this section.

Interresidue interactions

For the past three decades, studies on the mechanism of protein folding and stability have focused on interresidue interactions.32 Interactions between amino acid residues of the protein and with the surrounding solvent molecules play an important role in the formation of stable secondary structures and a unique tertiary structure for the protein. These interactions are usually noncovalent and include hydrogen bonds, ion pairs, van der Waals interactions, and hydrophobic interactions. In fact, parameters such as CO and LRO show a very strong correlation with the folding rate of small proteins.9,10

Short-, medium-, and long-range interactions

For a given residue, the surrounding residues within a sphere of 8 Å radius are analyzed in terms of their sequence position. Residues within a distance of two residues from the central residue are considered to contribute to short-range interactions, those within a window between three and four residues to medium-range interactions and those more than four residues apart to long-range interactions.

Number of contacts (8/14 Å, Cα/Cβ atoms)

The contacts between amino acid residues in the crystal structure are computed with cutoffs of 8 and 14 Å using Cα or Cβ atoms, as reported widely in literature.32

Contact order

This parameter reflects the relative importance of local and nonlocal contacts to the native structure of a protein.9 It is defined as where N is the total number of contacts, ΔS is the sequence separation between two contacting residues i and j, and L is the total number of residues in the protein.

Long-range order

LRO is derived from long-range contacts (contacts between two residues that are close in space and far in the sequence) in the protein structure.10 It is defined as where i and j are the two contacting residues within a distance of 8 Å, and N represents the total number of residues in the protein.

Total contact distance

A new parameter total contact distance was developed by taking the product of CO and LRO. This parameter shows good correlation with the folding rates of proteins.33

Multiple contact index

It considers the distance between amino acid residues in protein structure, residue separation at the sequence level, and the number of residues that have multiple contacts.11 Multiple contact index has been derived separately for two- and three-state proteins. Two-state proteins: Three-state proteins: where nc is the number of contacts for each residue, and r is the distance between the residues i and j.

Propensities

Propensities indicate the preference of amino acid residues for different secondary structures. The propensities listed in PDBparam are given below.

α-Helical, β-strand, and coil tendencies

The α-helical propensities can be computed by taking into account the frequency of amino acids in these regions. i varies from 1 to 20, number of amino acid residues. Similar equations have been used to compute strand and coil propensities.

Frequency of occurrence in β-bends

Certain segments in the polypeptide chain help in bringing the distant residues into close proximity during the folding process. For example, β-bends34 allow hydrogen bonds to form between the C = O group of residue i and NH group of residue (i + 3). Criteria to occur in β-bends: Distance between Cα(i) to Cα(i + 3) carbon atoms should be less than 7 Å. The (i + 1)th or (i + 2)th residue is not in an α-helix.

Amino acid compositions in turns

An open turn exists in a protein if the distance between to carbon atoms is <5.7 Å.35 Turns are usually present where a strand of β-sheet reverses itself to form the next antiparallel strand or keep the helices, β-sheets, and random coils in a compact globular form and are thus used to predict protein structure.

Normalized frequency of helix

Helical regions are divided into three zones35: the first three residues represent the N- helix, the last three represent the C-helix, and the residues in the middle represent the M-helix. The amino acid frequency in each helical zone divided by the total frequency (in the entire protein) constitutes normalized frequency.

Propensity to form multiple contact index

The frequency of occurrence of amino acid residues that form multiple contacts (fmc) and in the protein as a whole (ft) is computed.11 The propensity, Pmc can be calculated as follows: where i represents each of the 20 amino acid residues.

Amino acid composition in high B-value regions

Temperature factors (ie, B-values) provide a measure of the degree of uncertainty in the position of an atom due to thermal motion and/or positional disorder. Analyzing B-values provides insights into protein flexibility and protein dynamics. The B-values at Cα atoms are normalized and residues with B-values greater than Bmean + 0.5 × Bσ are labeled as high B-value residues.36

Physicochemical properties of proteins

Center of mass

The center of mass can be used to define constraints in predicting protein tertiary structures to assess the global shape of the protein partners in protein–protein complexes and to measure their distance.24 It is given by where x is the X coordinate of the atom i and m is the atomic mass. The Y and Z coordinates of the center of mass can be calculated using a corresponding formula.

Radius of gyration

The radius of gyration describes the compactness of the protein. It is calculated as follows: where m denotes the mass of each atom, COM denotes the center of mass of protein, and x represents the atomic coordinate.

Surrounding hydrophobicity

The sum of hydrophobic indices assigned to the residues that appear within a distance of 8 Å from the central residue8 can be used to characterize the hydrophobic behavior of each amino acid residue in the protein environment. It is defined as where n is the total number of surrounding residues of type j around the ith residue of the protein, and h is the hydrophobicity index (kcal/mol) obtained from thermodynamic transfer experiments.37,38

Gain in surrounding hydrophobicity of a residue

For a given amino acid, the increase in surrounding hydrophobicity as the protein transitions from its unfolded state to its native (ie, folded) state represents the enrichment in the hydrophobic property of that residue. To compute the gain in surrounding hydrophobicity39 for each residue in the protein molecule, it is assumed that the fully extended chain conformation is the unfolded reference state. Surrounding hydrophobicity in the unfolded The average gain ratio in surrounding hydrophobicity is given by where H f and H u denote the hydrophobic index of the jth residue in the folded state and unfolded state of the protein, respectively.

Surface hydrophobicity

This is computed from the protein crystal structure by considering the hydrophobic contribution of exposed amino acid residues. Surface hydrophobicity38 is given by where s is the solvent accessible surface area occupied by the ith residue, ψ is the hydrophobicity value assigned to the residue, and sp is the solvent accessible surface area of protein.

Hydrophobic accessible area

It is calculated as the solvent accessible surface area of the hydrophobic residues on the protein surface.40 We considered Ala, Val, Leu, Ile, Met, Phe, and Pro as the hydrophobic residues to calculate the hydrophobic accessible area.

Accessible surface area for the native protein

The accessible surface area (ASA) for the native protein is calculated as the sum of the accessible surface area of each residue present in the protein, which is obtained from DSSP.15

Buriedness

The buriedness2 of each residue is calculated as the ratio of number of residues in the interior of the protein and the total number of residues in the protein.

Mean area buried on transfer

The mean area buried on transfer41 is given by difference in the accessible area in the unfolded and folded states of the protein. where A0 and represent the accessible areas in unfolded and folded states of protein, respectively.

Distance criteria for noncovalent interactions and disulfide bonds.

NAME OF THE INTERACTIONS	INTERACTING RESIDUES	DISTANCE CRITERIA
Disulfide	Pair of cysteines	2.2Å
Ionic Interactions	(R,K) with (D,E,H)	6.0 Å
Hydrophobic interactions	A,V,L,I,M,F,W,P,Y	5.0 Å
Hydrogen bond interactions	Donor-acceptor distance cut-off (O and N)Donor-acceptor distance cut-off (sulfur)	3.5Å4.0Å
Aromatic-Aromatic interactions	Pairs of phenyl ring	4.5 to 7.0 Å
Aromatic-sulfur interactions	Sulfur atoms of C, M and thearomatic rings of F,Y,W	5.3 Å
Cation-π interactions	Cationic side chain (Lys or Arg) is near an aromatic side chain (Phe, Tyr, or Trp)	6.0 Å

Hydrophobic-free energy

The hydrophobic-free energy43 is expressed as where A(folded) and A(unfolded) represent the accessible surface areas of each atom in the folded and unfolded (extended) states of the protein, respectively. The solvent accessible surface areas of all the atoms in the folded state were computed using the program NAC-CESS.16 The extended state ASA of the atom was obtained from literature. They are in the form of a Gly–X–Gly (where X is the amino acid) sequence in a typical extended conformation. σ (atomic solvation parameters) for the five classes of atoms (namely, carbon, neutral nitrogen and oxygen, charged nitrogen, charged oxygen, and sulfur) are determined by a least-squares fit of above equation. The σ values are C: 12.02, N/O: −5.86, N+: −19.46, O−: −34.98, and S: 35.51 (in units of cal/mol Å2).43

Free energy due to disulfide interactions

The free energy due to disulfide interactions is calculated using the formula: where Nss is the number of disulfide bonds in the protein.

Hydrogen bond interactions

It is classified into the following three main categories: main chain–main chain, main chain–side chain, and side chain–side chain interactions. These interactions are calculated using HBPLUS,23 a hydrogen bond calculation program.

Identification of binding sites in protein–DNA/RNA and protein–protein complexes

Protein–DNA interactions play a key role in many vital processes, including regulation of gene expression, DNA replication and repair, and packaging. The binding sites for a protein–DNA/RNA complex can be identified using the following distance criteria12: an amino acid residue within a protein is designated as a binding site residue if its side chain or backbone atoms are within a cutoff distance (eg, 3.5 Å) from any atom in DNA/RNA.44–46 The binding sites for protein–protein complexes were also computed using the distance criteria between different chains present in the protein.

Server Description and Implementation

The PDBparam server can calculate more than 50 parameters from the three-dimensional structure of a protein. Each parameter has been treated as a separate module, and the script has been written using perl. The perl-CGI scripts are used to render the HTML web pages. The PDBparam server works with the PDB file as input and provides the computed results in a single output page. The output can be downloaded as a PDF file. The results for all the parameters were cross-checked manually with several structures of proteins and their complexes. Furthermore, the documentation has been provided for all the parameters listed in PDBparam on the website. It is linked with other online tools available in the literature. The utility of the server is described with a few examples. Example 1: Identify the binding site residues in a protein–DNA complex (PDB code: 6CRO) using the distance cutoff of 3.5 Å. Steps: Enter the PDB code and chain (optional; case sensitive); eg, PDB code: 6CRO. Check “identification of binding site” and submit. In the new page, check protein–DNA/RNA. Give the distance (default cutoff is 3.5 Å). Click on submit. Figure 1 shows the relevant items to be checked, the required information, and the output. The output contains information on the residue name, residue number, atom name, and chain name of both protein and DNA and the distance between the atoms. These residues are identified as binding sites. We have also provided options to display the structure of the complex, highlighting the binding site residues.

Figure 1

Steps to identify the binding sites in a protein–DNA complex.

Example 2: Calculate the CO of the protein, 6CRO (A chain), and the number of contacts for all the residues using Cα atoms within the limit of 8 Å. Steps: Enter the PDB code and chain (optional; case sensitive). Check “interresidue interactions” and submit. In the new page, check “contact order and number of contacts (8 Å, CA atoms)”. Click on submit. Figure 2 shows the relevant items for computing the CO and number of contacts and the output. The output displays the CO for the protein and the number of contacts for all the residues with residue name and number. The contacting residues are also shown in the output.

Figure 2

Example to compute the contact order of a protein and the number of contacts for all the amino acid residues in a protein.

Availability of PDBparam

PDBparam is freely available at http://www.iitm.ac.in/bioinfo/pdbparam.

Applications

PDBparam computes various structure-based parameters on interresidue interactions, amino acid propensities, physicochemical properties, and binding sites. This information can be used to understand the structure and functions of proteins and their complexes. The contacts between amino acid residues in protein structures provide data on the location of amino acid residues and preferred contacts in the protein environment, which can be used to comprehend protein folding and predict protein structures.32 The topological parameters, such as CO, LRO, total contact distance, and multiple contact distance, are helpful in understanding protein-folding rates and folding kinetics.9–11 Specific physicochemical interactions between amino acid residues in protein structures, such as cation–π, aromatic clusters, and hydrogen bonds, reveal the importance of these interactions inproteinstability.27 The combination of secondary structure and solvent accessibility is useful in identifying functionally important residues in proteins.15,16 Furthermore, the identification of binding sites in protein–protein, protein–nucleic acid, and protein–ligand complexes can be effectively used to compute the binding propensity and affinity and understand the recognition mechanism of protein complexes.46–51 PDBparam can be used to compute important parameters for any specific protein, providing deep insights into its structure–function relationship. It can also be used for large-scale analysis of different types of proteins to explore potential interactions and contacts, which will provide insights on the similarities and differences crucial to understanding the function.

Conclusion

The PDBparam server can calculate more than 50 parameters from the three-dimensional structure of a protein, classified into the following four categories: physicochemical properties, interresidue interactions, secondary structure propensities, and identification of binding sites in protein–DNA/RNA and protein–protein complexes. All the parameters have been coded using perl. Furthermore, perl-CGI scripts are used to render the HTML web pages. Detailed documentation for the protein properties and links of other available web servers related to such properties are provided, in order to enhance the user’s ease of access.

42 in total

1. SCide: identification of stabilization centers in proteins.

Authors: Zsuzsanna Dosztányi; Csaba Magyar; Gábor Tusnády; István Simon
Journal: Bioinformatics Date: 2003-05-01 Impact factor: 6.937

Review 2. Inter-residue interactions in protein folding and stability.

Authors: M Michael Gromiha; S Selvaraj
Journal: Prog Biophys Mol Biol Date: 2004-10 Impact factor: 3.667

3. Effect of surface hydrophobicity distribution on retention of ribonucleases in hydrophobic interaction chromatography.

Authors: A Mahn; M E Lienqueo; J A Asenjo
Journal: J Chromatogr A Date: 2004-07-16 Impact factor: 4.759

4. Analysis and prediction of DNA-binding proteins and their binding residues based on composition, sequence and structural information.

Authors: Shandar Ahmad; M Michael Gromiha; Akinori Sarai
Journal: Bioinformatics Date: 2004-01-22 Impact factor: 6.937

Review 5. Computational approaches for predicting the binding sites and understanding the recognition mechanism of protein-DNA complexes.

Authors: M Michael Gromiha; R Nagarajan
Journal: Adv Protein Chem Struct Biol Date: 2013 Impact factor: 3.507

6. Scoring function based approach for locating binding sites and understanding recognition mechanism of protein-DNA complexes.

Authors: M Michael Gromiha; Kazuhiko Fukui
Journal: J Chem Inf Model Date: 2011-02-28 Impact factor: 4.956

7. The reverse turn as a polypeptide conformation in globular proteins.

Authors: J L Crawford; W N Lipscomb; C G Schellman
Journal: Proc Natl Acad Sci U S A Date: 1973-02 Impact factor: 11.205

8. Hydrophobicity of amino acid residues in globular proteins.

Authors: G D Rose; A R Geselowitz; G J Lesser; R H Lee; M H Zehfus
Journal: Science Date: 1985-08-30 Impact factor: 47.728

9. Ion-pairs in proteins.

Authors: D J Barlow; J M Thornton
Journal: J Mol Biol Date: 1983-08-25 Impact factor: 5.469

10. Hydrophobic character of amino acid residues in globular proteins.

Authors: P Manavalan; P K Ponnuswamy
Journal: Nature Date: 1978-10-19 Impact factor: 49.962

7 in total

1. Sortase-assembled pili in Corynebacterium diphtheriae are built using a latch mechanism.

Authors: Scott A McConnell; Rachel A McAllister; Brendan R Amer; Brendan J Mahoney; Christopher K Sue; Chungyu Chang; Hung Ton-That; Robert T Clubb
Journal: Proc Natl Acad Sci U S A Date: 2021-03-23 Impact factor: 11.205

Review 2. Role of Structural Bioinformatics in Drug Discovery by Computational SNP Analysis: Analyzing Variation at the Protein Level.

Authors: David K Brown; Özlem Tastan Bishop
Journal: Glob Heart Date: 2017-03-13

3. Fluorescent Imaging of Extracellular Fungal Enzymes Bound onto Plant Cell Walls.

Authors: Neus Gacias-Amengual; Lena Wohlschlager; Florian Csarman; Roland Ludwig
Journal: Int J Mol Sci Date: 2022-05-06 Impact factor: 6.208

4. iFeatureOmega: an integrative platform for engineering, visualization and analysis of features from molecular sequences, structural and ligand data sets.

Authors: Zhen Chen; Xuhan Liu; Pei Zhao; Chen Li; Yanan Wang; Fuyi Li; Tatsuya Akutsu; Chris Bain; Robin B Gasser; Junzhou Li; Zuoren Yang; Xin Gao; Lukasz Kurgan; Jiangning Song
Journal: Nucleic Acids Res Date: 2022-05-07 Impact factor: 19.160

5. Biophysical Characterization Platform Informs Protein Scaffold Evolvability.

Authors: Alexander W Golinski; Patrick V Holec; Katelynn M Mischler; Benjamin J Hackel
Journal: ACS Comb Sci Date: 2019-02-18 Impact factor: 3.784

6. Residue Cluster Classes: A Unified Protein Representation for Efficient Structural and Functional Classification.

Authors: Fernando Fontove; Gabriel Del Rio
Journal: Entropy (Basel) Date: 2020-04-20 Impact factor: 2.524

7. Elucidating important structural features for the binding affinity of spike - SARS-CoV-2 neutralizing antibody complexes.

Authors: Divya Sharma; Puneet Rawat; Vani Janakiraman; M Michael Gromiha
Journal: Proteins Date: 2021-11-17