| Literature DB >> 25364645 |
Sandeep Chakraborty1, Basuthkar J Rao1, Nathan Baker2, Bjarni Asgeirsson3.
Abstract
Phylogenetic analysis of proteins using multiple sequence alignment (MSA) assumes an underlying evolutionary relationship in these proteins which occasionally remains undetected due to considerable sequence divergence. Structural alignment programs have been developed to unravel such fuzzy relationships. However, none of these structure based methods have used electrostatic properties to discriminate between spatially equivalent residues. We present a methodology for MSA of a set of related proteins with known structures using electrostatic properties as an additional discriminator (STEEP). STEEP first extracts a profile, then generates a multiple structural superimposition providing a consolidated spatial framework for comparing residues and finally emits the MSA. Residues that are aligned differently by including or excluding electrostatic properties can be targeted by directed evolution experiments to transform the enzymatic properties of one protein into another. We have compared STEEP results to those obtained from a MSA program (ClustalW) and a structural alignment method (MUSTANG) for chymotrypsin serine proteases. Subsequently, we used PhyML to generate phylogenetic trees for the serine and metallo-β-lactamase superfamilies from the STEEP generated MSA, and corroborated the accepted relationships in these superfamilies. We have observed that STEEP acts as a functional classifier when electrostatic congruence is used as a discriminator, and thus identifies potential targets for directed evolution experiments. In summary, STEEP is unique among phylogenetic methods for its ability to use electrostatic congruence to specify mutations that might be the source of the functional divergence in a protein family. Based on our results, we also hypothesize that the active site and its close vicinity contains enough information to infer the correct phylogeny for related proteins.Entities:
Year: 2013 PMID: 25364645 PMCID: PMC4212511 DOI: 10.4161/idp.25463
Source DB: PubMed Journal: Intrinsically Disord Proteins ISSN: 2169-0707
Table 1. Potential and spatial congruence of the active site residues in proteins from the chymotrypsin superfamily
| PDB | ab | ac | bc | |
|---|---|---|---|---|
| 2ALP | D | 4.7 | 3.1 | 6.2 |
| PD | -13.6 | -86.5 | -72.9 | |
| 1SGT | D | 5.5 | 3 | 8 |
| PD | 4.2 | -120.4 | -124.5 | |
| 1TGS | D | 5.2 | 2.6 | 7.3 |
| PD | 31.6 | -85.8 | -117.4 | |
| 2SGA | D | 4.6 | 3 | 6.2 |
| PD | 59.1 | -123.6 | -182.6 | |
| 1PPF | D | 5.4 | 2.5 | 7.3 |
| PD | -29 | -103.7 | -74.6 | |
| 3EST | D | 4.6 | 3.2 | 6.4 |
| PD | -3.7 | -124 | -120.3 | |
| 3RP2 | D | 5 | 3.1 | 6.5 |
| PD | -51.9 | -136.4 | -84.5 | |
| 1TPP | D | 5.5 | 2.7 | 7.6 |
| PD | -83.9 | -162.5 | -78.6 |
The active site atoms are HIS57NE2 (a), ASP102OD1 (b) and SER195OG (c). D = Pairwise distance in Å. PD = Pairwise potential difference. The electrostatic potential is in dimensionless units of kT/e where k is Boltzmann's constant, T is the temperature in K and e is the charge of an electron.

Figure 1. Superimposing multiple proteins based on the homologous active site scaffolds for trypsin serine proteases. (A) STEEP generated superimposition, where each amino acid is represented by a user defined reactive atom. (B) MUSTANG generated superimposition. It can be seen that MUSTANG generates a better overall superimposition, but the active site residues are less dispersed after the superimposition by STEEP. (C) STEEP generated superimposition, where each amino acid is represented by the Cα atom.

Figure 2. Multiple sequence alignments using STEEP, ClustalW or MUSTANG, and phylogenetic trees generated using PhyML for the chymotrypsin superfamily. The active site motif is marked as `*'. The residues used to initiate STEEP were within a radius of 9 Å from the specified active site residues. (A) Alignment using spatial proximity using STEEP. (B) Alignment using spatial proximity and electrostatic congruence using STEEP. (C) Cladogram generated from (A). (D) Cladogram generated from (B). (E) Alignment using ClustalW. (F) Alignment using MUSTANG. (G) Cladogram generated from (E) (ClustalW). (H) Cladogram generated from (F) (MUSTANG).
Table 2. Comparing results obtained with STEEP or MUSTANG for serine proteases
| PDB | RMSD | Residues matched (out of 222) | |
|---|---|---|---|
| STEEP | 1PPF | 1 | 170 |
| 2ALP | 1.1 | 97 | |
| 1SGT | 1.1 | 170 | |
| 2SGA | 1.2 | 97 | |
| 3EST | 0.9 | 187 | |
| 3RP2 | 1.1 | 179 | |
| 1TPP | 1.2 | 160 | |
| STEEP | 1PPF | 1.4 | 89 |
| 2ALP | 1.5 | 39 | |
| 1SGT | 1.4 | 62 | |
| 2SGA | 1.4 | 39 | |
| 3EST | 1.5 | 51 | |
| 3RP2 | 1.5 | 48 | |
| 1TPP | 0.5 | 206 | |
| MUSTANG | 1PPF | 0.9 | 176 |
| 2ALP | 1.3 | 96 | |
| 1SGT | 0.9 | 177 | |
| 2SGA | 1.3 | 100 | |
| 3EST | 0.9 | 187 | |
| 3RP2 | 0.9 | 182 |
The RMSD obtained for superimposing one protein (PDBid:1TGS, 222 amino acids) with all other proteins are shown. Cα atoms that are within 2 Å of each other are considered to be equivalent. The number of residues matched is another important metric, since an inferior superimposition might have an equivalent RMSD, but align fewer residues. It is seen that when each amino acid is represented by the Cα atom rather than the reactive atom STEEP results in much smaller RMSD and more equivalent residues.
Table 3. Potential and spatial congruence of the active site residues in proteins from the Serine β-lactamase superfamily
| PDB | Active site atoms (a,b,c,d) | ab | ac | ad | bc | bd | cd | |
| 1E25 | Ser70OG,Lys73NZ,Ser130OG,Lys234NZ, | D | 2.8 | 3.2 | 4.7 | 3.6 | 5.6 | 2.9 |
| Class A Serine β-lactamase | PD | -125.6 | 22.4 | -189.1 | 148.1 | -63.5 | -211.5 | |
| 1I2S | Ser70OG,Lys73NZ,Ser130OG,Lys234NZ, | D | 2.7 | 3.2 | 4.5 | 3.1 | 5 | 2.8 |
| Class A Serine β-lactamase | PD | -166.4 | -35.5 | -219.5 | 130.9 | -53.1 | -184 | |
| 1BSG | Ser70OG,Lys73NZ,Ser130OG,Lys234NZ, | D | 2.8 | 3.4 | 4.7 | 3.3 | 5.3 | 2.9 |
| Class A Serine β-lactamase | PD | -178.3 | -31.4 | -188.6 | 146.8 | -10.3 | -157.1 | |
| 2WZX | Ser90OG,Lys93NZ,Tyr177OH,Lys342NZ, | D | 3.5 | 3 | 4.5 | 2.6 | 5 | 2.8 |
| Class C Serine β-lactamase | PD | -161.5 | -56.9 | -153.1 | 104.6 | 8.4 | -96.2 | |
| 1KE4 | Ser64OG,Lys67NZ,Tyr150OH,Lys315NZ, | D | 2.9 | 3 | 4.6 | 3.4 | 5.6 | 2.8 |
| Class C Serine β-lactamase | PD | -228 | -10.4 | -187.1 | 217.6 | 40.9 | -176.7 | |
| 1FR6 | Ser64OG,Lys67NZ,Tyr150OH,Lys315NZ, | D | 2.9 | 3.3 | 4.6 | 2.4 | 5 | 3.1 |
| Class C Serine β-lactamase | PD | -132.1 | -18.4 | -164.2 | 113.7 | -32.1 | -145.8 | |
| 1K57 | Ser67OG,Lys70NZ,Ser115OG,Lys205NZ, | D | 2.8 | 2.6 | 4.7 | 3.1 | 5.6 | 3.8 |
| Class D Serine β-lactamase | PD | -162.9 | 51.7 | -184.7 | 214.6 | -21.8 | -236.4 | |
| 3ISG | Ser67OG,Lys70NZ,Ser115OG,Lys212NZ, | D | 3.3 | 3.9 | 4.3 | 4.8 | 5.3 | 2.2 |
| Class D Serine β-lactamase | PD | -246.3 | -13.9 | -231.5 | 232.4 | 14.8 | -217.7 | |
| 1K38 | Ser67OG,Lys70NZ,Ser115OG,Lys205NZ, | D | 3.1 | 3.7 | 4.9 | 4.7 | 5.6 | 2.7 |
| Class D Serine β-lactamase | PD | -292.5 | -50.7 | -309.8 | 241.8 | -17.3 | -259.1 | |
| 1QME | Ser337OG,Lys340NZ,Ser395OG,Lys547NZ, | D | 2.9 | 3.2 | 4.5 | 2.7 | 5 | 3 |
| Penicillin binding protein | PD | -211.5 | -38.2 | -242 | 173.3 | -30.5 | -203.8 | |
| 1NZO | Ser44OG,Lys47NZ,Ser110OG,Lys213NZ, | D | 3.1 | 4.2 | 6.3 | 5.1 | 6.8 | 2.7 |
| Penicillin binding protein | PD | -241.6 | -68.8 | -277.9 | 172.8 | -36.2 | -209.1 | |
| 2EX2 | Ser62OG,Lys65NZ,Ser306OG,Lys417NZ, | D | 2.9 | 3 | 4.3 | 3.3 | 5 | 2.9 |
| Penicillin binding protein | PD | -213.6 | -84 | -264.8 | 129.6 | -51.2 | -180.8 | |
| 1XA1 | Ser59OG,Lys62NZ,Ser107OG,Lys196NZ, | D | 2.6 | 3.5 | 4.7 | 3.8 | 5.8 | 2.9 |
| Signal transducer BlaR1 | PD | -126.2 | 73.7 | -175.8 | 199.9 | -49.6 | -249.5 | |
| 1NRF | SER402OG,LYS405NZ,SER450OG,LYS539NZ, | D | 2.7 | 3.6 | 4.7 | 4.9 | 6.1 | 2.8 |
| Signal transducer BlaR1 | PD | -249.6 | 2.1 | -217.7 | 251.7 | 31.9 | -219.8 |
D = Pairwise distance in Å. PD = Pairwise potential difference. The electrostatic potential is in dimensionless units of kT/e where k is Boltzmann's constant, T is the temperature in K and e is the charge of an electron.
Table 4. Potential and spatial congruence of the active site residues in proteins from the metallo- β-lactamase superfamily
| PDB | Active site atoms (a,b,c,d) | ab | ac | ad | bc | bd | cd | |
|---|---|---|---|---|---|---|---|---|
| 1ZNB | HIS101NE2,ASP103OD1,HIS162NE2,HIS223NE2, | D | 6.3 | 4.9 | 9.1 | 5.8 | 4.7 | 6 |
| Class B1 | PD | 124.5 | 152.2 | 168.3 | 27.7 | 43.8 | 16.1 | |
| 1DD6 | HIS79NE2,ASP81OD1,HIS139NE2,HIS197NE2, | D | 6.8 | 5 | 9.4 | 6.1 | 5.2 | 6 |
| Class B1 | PD | 97 | 98.6 | 47.3 | 1.6 | -49.7 | -51.3 | |
| 1M2X | HIS118NE2,ASP120OD1,HIS196NE2,HIS263NE2, | D | 6.9 | 5 | 9.3 | 6.1 | 4.9 | 6.1 |
| Class B1 | PD | 59.9 | 100.8 | 6.1 | 40.9 | -53.8 | -94.8 | |
| 3F9O | HIS118NE2,ASP120OD1,HIS196NE2,HIS263NE2, | D | 6.8 | 4.8 | 10 | 5.6 | 5 | 6.2 |
| Class B2 | PD | -109.3 | -180.8 | -74.4 | -71.6 | 34.8 | 106.4 | |
| 1JT1 | HIS118NE2,ASP120OD1,HIS196NE2,HIS263NE2, | D | 8 | 4.5 | 9.4 | 6.6 | 3.1 | 6.5 |
| Class B3 | PD | 245.3 | 65 | 152.3 | -180.2 | -93 | 87.3 | |
| 1SML | HIS86NE2,ASP88OD1,HIS160NE2,HIS225NE2, | D | 6.3 | 4.7 | 9.3 | 6 | 4.9 | 6.1 |
| Class B3 | PD | 140.3 | 93.7 | 147.2 | -46.6 | 6.9 | 53.5 | |
| 3LVZ | HIS103NE2,ASP105OD1,HIS177NE2,HIS242NE2, | D | 6.5 | 4.4 | 9.4 | 6.1 | 4.9 | 6.3 |
| Class B3 | PD | 131 | 110.6 | 104.5 | -20.4 | -26.5 | -6.1 | |
| 1QH5 | HIS56NE2,ASP58OD1,HIS110NE2,HIS173NE2, | D | 6.5 | 4.5 | 9.4 | 6.6 | 5 | 6.6 |
| glyoxalase II | PD | 246.3 | 249.1 | 254.4 | 2.9 | 8.1 | 5.2 | |
| 1P9E | HIS149NE2,ASP151OD1,HIS234NE2,HIS302NE2, | D | 6.3 | 4.4 | 9.1 | 6.8 | 5.1 | 6.7 |
| methyl parathion hydrolase | PD | 14.3 | -72.8 | 27.4 | -87 | 13.1 | 100.2 |
D = Pairwise distance in Å. PD = Pairwise potential difference. The electrostatic potential is in dimensionless units of kT/e where k is Boltzmann's constant, T is the temperature in K and e is the charge of an electron.

Figure 3. Superimposing multiple proteins based on the homologous active site scaffolds for serine and metallo-β-lactamases (SBL, MBL). SBL motif = (Ser70, Lys73, Ser130, Lys234), MBL motif = (His118, His196, Asp120 and His263). Ser70 and His118 are colored black and are at the center of the coordinate axes (X = 0, Y = 0, Z = 0) for SBLs and MBLs, respectively. The proteins are colored red, yellow and blue respectively in order of appearance. (A) Three class A SBLs - PDBids:1E25, 1I2S and 1BSG. (B) A class A (PDBid:1E25), a class C (PDBid:1KE4) and a class D (PDBid:3ISG) SBL. (C) A class A SBL (PDBid:1E25), a penicillin binding protein (PDBid:1NZO) and a signal transducer BlaR1 protein (PDBid:1XA1). (D) Three class B1 MBLs―PDBids:1ZNB, 1DD6 and 1M2X. (E) A class B1 (PDBid:1ZNB), a class B2 (PDBid:3F9O) and a class B3 (PDBid:1JT1) MBL. (F) A class B3 MBL (PDBid:3LVZ), a human glyoxalase II (PDBid:1QH5) and a methyl parathion hydrolase (PDBid:1P9E).

Figure 4. Multiple sequence alignments obtained using STEEP, and phylogenetic trees generated using PhyML for serine and metallo-β-lactamases (SBL, MBL). The active site motif is marked as `*'. The residues are within a radius of 9 Å from the specified active site residues. AS = alignment using spatial proximity. ASE = alignment using spatial proximity and electrostatic congruence. (A) AS for SBLs. (B) ASE for SBLs. (C) Cladogram generated from (A). (D) Cladogram generated from (B). (E) AS for MBLs. (F) ASE for MBLs. (G) Cladogram generated from (E). (H) Cladogram generated from (F).
Table 5. Extending the profile
| SBL | MBL | |||||
|---|---|---|---|---|---|---|
| Index | Count | Amino Acid Types | Index | Count | Amino Acid Types | |
| 1 | 10 | (F/M/P/I/L) | 2 | 6 | (A/Y/E/V) | |
| 3 | 14 | (S) | 3 | 7 | (S/W/D/P/V) | |
| 4 | 13 | (F/T/L/V) | 5 | 7 | (H/N/I/Y/L/V) | |
| 6 | 14 | (K) | 8 | 7 | (S/A/T/D/G) | |
| 7 | 14 | (A/M/T/I/L/V) | 12 | 6 | (S/T/N) | |
| 8 | 13 | (A/F/S/T/N/P/Y/I/L) | 13 | 8 | (H) | |
| 20 | 14 | (S/Y) | 14 | 8 | (H/S/F/A/M/W) | |
| 22 | 10 | (N) | 15 | 9 | (H) | |
| 23 | 13 | (S/W/T/P/Y/V/M/C) | 16 | 8 | (A/F/S/W/D/P/G/L) | |
| 24 | 12 | (F/S/A/I/G/Y/V) | 17 | 9 | (D) | |
| 30 | 11 | (S/Q/M/D/K/P/Y/E) | 19 | 6 | (T/I/G) | |
| 37 | 14 | (K) | 20 | 9 | (A/R/P/G) | |
| 38 | 13 | (S/T) | 22 | 8 | (W/I/L/V) | |
| 39 | 14 | (G) | 29 | 6 | (F/M/Y/L) | |
| 40 | 10 | (F/S/A/T/R) | 31 | 9 | (H) | |
| 41 | 12 | (A/S/T/E/H/Q/R/I) | 32 | 8 | (S/T/D) | |
| 42 | 12 | (A/S/W/N/G/L/Y) | 36 | 8 | (H/T/D/N/C) | |
| 37 | 8 | (S/D/C) | ||||
| 38 | 6 | (T/M/I/G/L) | ||||
| 39 | 6 | (S/T/K/G/L) | ||||
| 41 | 8 | (A/T/N/P/Y) | ||||
| 43 | 8 | (D/N/L/Y/E) | ||||
| 47 | 8 | (A/D/L/Y) | ||||
| 49 | 8 | (H) | ||||
Consensus residues in the SBL and MBL superfamily with respect to spatial location and electrostatic properties. Indexing is with reference to the sequence alignment shown in Figure 4A and Figure 4E for SBL and MBL, respectively. Count is the number of proteins which have a certain amino acid in that index in the alignment. The profile is extended if there are less than 75% gaps. For SBLs, the complete set has 14 proteins, so the required count is 10. For MBLs, the complete set has 9 proteins, so the required count is 6.