Literature DB >> 17430569

Local comparison of protein structures highlights cases of convergent evolution in analogous functional sites.

Gabriele Ausiello1, Daniele Peluso, Allegra Via, Manuela Helmer-Citterich.   

Abstract

BACKGROUND: We performed an exhaustive search for local structural similarities in an ensemble of non-redundant protein functional sites. With the purpose of finding new examples of convergent evolution, we selected only those matching sites composed of structural regions whose residue order is inverted in the relative protein sequences.
RESULTS: A novel case of local analogy was detected between members of the ABC transporter and of the HprK/P families in their ATP binding site. This case cannot be derived by events of circular permutation since the residues of one of the region pairs are located in reverse order in the sequence of the two protein families. One of the analogous binding sites, the one identified in HprK/P, is known to also bind pyrophosphate, which is used as preferred energy source in its kinase and phosphorylase activity.
CONCLUSION: The discovery of this striking molecular similarity, also associated to a functional similarity, may help in suggesting new experiments aimed at a deeper understanding of members of the ABC transporter family known to be involved in many serious human diseases.

Entities:  

Mesh:

Substances:

Year:  2007        PMID: 17430569      PMCID: PMC1885854          DOI: 10.1186/1471-2105-8-S1-S24

Source DB:  PubMed          Journal:  BMC Bioinformatics        ISSN: 1471-2105            Impact factor:   3.169


Background

The global comparison of protein sequences or structures is one of the most used computational tools in the analysis of newly discovered proteins [1]. These methods can highlight evolutionary relationships. However, due to possible events of divergent evolution in the functional site/s, they cannot always allow the "guilt by association" inference of a protein function. The vast majority of known functions (enzymatic activities, binding sites etc.) are encoded by a relatively small set of residues located in a conserved geometry both in the protein sequence and in the protein structure. 3D motifs [2,3] can thus be used, in different forms, for analyzing and inferring molecular functions when the global similarity is not conserved (for a review see [4]). Local similarities in the context of a global non-similarity, are generated by phenomena of divergent [5] or convergent evolution. The latter are uncommon and few are described in the literature, well-known examples being represented by the SHD catalytic triad in serine proteases [3,6] or by the region surrounding the ploop in many nucleotide-binding proteins [7,8]. Methods for the identification of local structural similarities alone are not sufficient to spot cases of convergent evolution. Here we applied a new approach that consists of an exhaustive search for local structural similarities between known protein structures, followed by a selection of structural similarities coming from different regions and located in a different order in the sequence of the protein families sharing the site. We analyzed the results of an all-versus-all local comparison in an ensemble of protein functional sites, searching for 3D matches characterized by sequence inversion events. Non-collinear matches which also have a strong statistical significance were manually analyzed and a few cases of convergently evolved sites were identified and are discussed below.

Results

Structural comparison experiment

Starting from a non-redundant structural dataset of about 2000 protein chains, we identified 10175 surface cavities. About 2500 of those cavities were defined as functional since they contain a consistent fraction of residues associated to a PROSITE [9] pattern or a known ligand binding site (see Methods). We performed an all versus all structural comparison of the functional clefts with the whole dataset of 10000 clefts in search of significant structural similarities. For the cavities comparison, the sequence-independent Query3D algorithm [10] was used. We set highly stringent parameters to detect only strong similarities and selected only 3D matches comprising of at least 50% of known functional residues (see Methods). We obtained a total of about 66000 structural matches among protein cavities ranging from 2 to 10 residues in size. In Table 1 the distribution is reported of the number of matches with respect to their length.
Table 1

Number of structural matches found. Number of structural matches found between functionally annotated and whole ensemble of surface cavities.

matchestotal2345678910
collinear33966489361091790522831524376486105285
non-collinear31944392082301899862112214(7)23(12)4(4)12(12)
total659108813143393690329041646390509109297

The first and second row reports the number of collinear and non collinear matches, respectively, per each match length. In the third row the total number of matches is detailed. The number in parentheses indicates non-collinear matches with Z-score higher than 10.

Selection of non collinear and significant matches

We selected only those matches whose matching residues are non-collinear in the corresponding protein sequences. To do so, the list of matches has been searched for non-collinearity between the paired residues, see Methods. Table 1 reports the number of matches that are collinear and the matches that are not collinear, for each match length. We calculated a significance value for each one of these matches in the form of a Z-score. Only matches longer than 7 residues and with a Z-score higher than 10 were analyzed. This threshold is considerably stringent, as can be deduced from different tests performed in other massive structural comparison experiments [11]. This stringency in statistical significance strongly supports the hypothesis that only real structural similarities have been considered.

Analysis of non collinear matches

A total of 32 non-collinear structural matches were selected within these stringent thresholds and manually analyzed (see Table 1). 28 matches were identified as common cases of non-collinearity deriving from events of circular permutations of protein sequences [12]. Of the remaining 4 cases, three are already known in the literature and involve: the N-4 Cytosine-Specific Methyltransferase Pvu II and Catechol O-Methyltransferase [13] (Z-score 11.61); Udp-Glucose Dehydrogenase and L-Alanine Dehydrogenase [14] (Z-score 13.55); D-Ala Ligase and human Glutathione Synthetase [15] (Z-score 14.28). For details on these matches see Figure 1(a-b-c).
Figure 1

Details of structural matches. PDB codes and chain identifiers are reported for each one of the selected structural matches together with the number and type of aligned amino acids. The arrows describe permutations and inversions in protein sequences; the N to C-term direction is colour-coded (blue is associated to N-term and red to C-term). a) The residues involved in the match bind S-adenosyl-homocysteine (1boo) and S-adenosyl-methionine (1vidA), and share a high structural similarity. b) Matching residues bind ADP, Mg and PHY in the 1iow structure; and ADP, Mg and GSH in the 2hgs structure c) The 1dljA residues involved in the match bind a 1,4-dihydronicotinamide adenine dinucleotide; 1say is 92% identical to 1pjc, which binds nicotinamide-adenine-dinucleotide with the residues involved in the 3D match. d) In 1b0uA, the matching residues bind an ATP molecule; 1kklA is 100% identical to 1jb1A, which binds PO4 with the residues involved in the 3D match (see Results).

A new case of sequence inversion

The fourth case (z-score 12.14), see Figure 1d, involves one member of the ABC transporters family and a bacterial Hpr kinase/phosphorylase protein (PDB codes: 1b0u and 1kkl, respectively). This 3D similarity belongs to the category of inverted structural matches (see Method) and cannot have arisen from simple events of sequence rearrangements, therefore identifying a true and almost unique case of convergent evolution. The ATP binding subunit of a bacterial histidine permease belongs to the ABC transporters family [16], whose members are widespread into different phyla, from bacteria to humans. Some of the ABC transporters are known to be involved in several human disorders, such as cystic fibrosis, muscular dystrophy, adrenoleukodystrophy, Stargardt disease and others. The bacterial HPrK/P [17], on the other hand, is a bacterial sensor enzyme that plays a major role in the regulation of carbon metabolism and sugar transport, controlling the expression of numerous catabolic genes; it catalyzes the ATP- as well as the pyrophosphate-dependent phosphorylation of Ser-46 in HPr, a phosphocarrier protein of the phosphoenolpyruvate-dependent sugar phosphotransferase system (PTS). The nucleotide binding site of the ATP-binding subunit of the histidine permease from Salmonella typhimurium and the phosphate binding active site of the Hpr kinase/phosphorylase from Lactobacillus casei are built through the juxtaposition of three distinct residue stretches (see also Table 2): i) the ploop; ii) a pair of negatively charged residues (D178-E179 in 1b0u; D178-D179 in 1kkl) and iii) a catalytic histidine followed (in the case of Hpr K/P) or preceded (in the case of the ABC transporter) by three hydrophobic residues on a beta filament (Figure 2). The third stretch proposes the residue side-chains in very similar positions in 3D, but encoded in opposite order in the protein sequence (N to C terminus in one protein and C to N terminus in the other). The two structures look quite different after superposing the conserved residues (Figure 3), but they share a very similar core composed of 4 beta filaments and one helix (Figure 4a). One of the filaments and the helix respectively precede and follow the ploop structure; two of the three remaining beta strands follow the same direction but are oppositely oriented (Figure 4b). This finding can support the hypothesis that the structural context is necessary for the stability and function of the described ATP/PP binding site, even if some of the participating SSEs share opposite directions.
Table 2

Local multiple alignment of the two families.

TypePDB codeSWISSPROT codeResidues in inverted regionsResidues in ploopNegative chargesLigandOrganismDecription
Type 11b0uAQ5PN38_SALPAH211T210V209V208V20739–46D178E179ATPSalmonella typhimuriumATP-Binding Subunit Of The Histidine Permease
1l2tAY796_METJAH204T203V202V201V20038–45D170E171QATPMethanococcus jannaschiiHypothetical ABC Transporter ATP-Binding Protein Mj0796
1q12AMALK_ECO57H192T191V190Y189I18836–43D158E159ATPEscherichia coli K12ATP-Bound E. Coli Malk, Maltose/Maltodextrin Transport
1g291Q9HH32_PYRFUH198T197V196Y195I19436–43D164E165POPThermococcus litoralisMaltose Transport Protein Malk
1e3mAQ8VVV1_ECOLIH728T727A726F725L724614–621D693E694ADPEscherichia coliDNA Mismatch Repair Protein Muts
1ewqAMUTS_THEAQH696T695A694F693L692583–590D662E663-Thermus aquaticusDNA Mismatch Repair Protein Muts
1fnnAQ8ZYK1_PYRAEH166G165V164I163V16250–57D131D132ADPPyrobaculum aerophilumCdc6P, Cell Division Control Protein 6
ABCD1_HUMANH659T658I657S656L655507–514D629E630Homo sapiensadrenoleukodystrophy protein
CFTR_HUMANH1402E1401C1400L1399I13981244–1251D1370E1371Homo sapiensCFTR NBD2

ABC Type 21w1wAL1190S1189I1188V1187I118633–40D1157E1158ATGSaccharomyces cerevisiaeStructural Maintenance Of Chromosome 1, Head Domain Residues 1–214, 1024–1225
1jj7AQ96CP4_HUMANQ701T700I699L698L697538–545D667D668ADPHomo sapiensPeptide Transporter Tap1, C-Terminal ABC ATPase Domain
1ji0AQ9X0M3_THEMAQ196E195V194L193L19238–45D163E164ATPThermotoga maritimaABC Transporter
1xmiAQ99989_HUMANS605T604V603L602I601458–465D572S573ATPHomo sapiensCystic Fibrosis Transmembrane Conductance Regulator, Nucleotide Binding Domain One
1r0wACFTR_HUMANS605T604V603L602I601458–465D572S573-Mus musculusCystic Fibrosis Transmembrane Conductance Regulator, NDB1 Domain (Residues 389–673)

HPrK/P1jb1AHPRK_LACCAH140G141V142L143V144155–162D178D179PO4Lactobacillus caseiHprk/P Bound To Phosphate, Hprk Protein
1knxAHPRK_MYCPNH139G140V141L142L143154–161D177D178-Mycoplasma pneumoniaeHpr Kinase/Phosphatase
1ko7AHPRK_STAXYH136G137V138L139V140151–158D174D175PO4Staphylococcus xylosusHpr Kinase/Phosphatase
1kklAHPRK_LACCAH140G141V142L143V144155–162D178D179Lactobacillus caseiHprk/P In Complex With B. Subtilis Hpr, Phosphocarrier Protein Hpr
1kkmHPRK_LACCAH140G141V142L143V144155–162D178D179PO4Lactobacillus caseiHprk/P In Complex With B. Subtilis P-Ser-Hpr

A sequence alignment of different members of the ABC transporters and HPr K/P families is reported. Column 1 identifies the protein family. ABC transporters are grouped in two different types: a) type 1 members comprise the histidine permease of S. typhimurium and other Nucleotide Binding Domains (NBD) of the same family. They all display the same conserved residues, including the catalytic histidine; b) type 2 members display a non-histidine aligned to the histidine of the first group. In many cases, two different nucleotide binding domains (a non-histidine and a histidine) known as NBD1 and NBD2 belong to the same protein sequence, with different functions: NBD1 (without catalytic H) usually displays a regulative function, while NBD2 (with H) encodes a catalytic function. The remaining columns for each protein report: the PDB code, the Swiss-Prot code, the structurally aligned residues in the inverted region, the sequence range of the ploop, the two negatively charged structurally aligned residues, the bound ligand, the organism and a short description.

Figure 2

Conserved residues in the identified 3D match. Graphical view of the conserved residues in the identified 3D match. The ABC transporter (PDB code: 1b0uA) is shown in blue and the HPrK/P (PDB chain code: 1kklA) is in orange. The ploop of the two proteins is represented as a ribbon. The two beta filaments where the catalytic histidine and the hydrophobic residues are encoded in opposite order are also shown as ribbons. The ATP molecule co-crystallized with the histidine transporter subunit is shown colored by atom type; some of the side-chains involved in the functional match are also shown and are identified by their PDB residue number. The two arrows indicate the N to C orientations of the oppositely oriented beta filaments.

Figure 3

Superimposition corresponding to the local 3D match. The structural superimposition corresponding to the local 3D match described in Figure 2. The ABC transporter and the HPr K/P protein chains are shown as blue and red ribbons. The local match of 1b0uA and 1kklA residues is highlighted in cyan and yellow, respectively.

Figure 4

Superposed secondary structure elements. a) Ribbon representation of the conserved SSEs surrounding the ATP/PP binding site: four beta filaments and an alpha-helix. The 1b0uA chain is shown in blue, the 1kklA chain in orange. The four beta filaments form a beta-sheet, whose second filament corresponds in both structures to the ploop preceding sequence. The helix corresponds to the sequence immediately following the ploop structure. The superposition of the two beta-sheets shows the two internal filaments displaying the same orientation in space (in both structures, the filaments point up in an N to C direction). The two external filaments point in opposite directions in the two structures. The first filament on the left side of the picture corresponds to the sequence surrounding the catalytic histidine; the fourth filament on the right is not involved in any known specific functional region. b) Topology of the superposition. A simplified view of the superposed structure is shown with the same colors used in Figure 4a. Beta filaments are represented as arrows, pointing from N to C; the helices are represented as cylinders. The protein stretches joining the secondary structure elements are depicted with no relation to their real length.

In both proteins, the residues identified belong to a well-studied functional site, described in detail below. The co-crystallized molecules (an ATP and a pyro-phosphate) neatly superpose the corresponding phosphate atoms, in support of a functional meaning of the 3D similarity. This quite unique situation accounts for the fact that this striking similarity has not been highlighted so far. A multiple alignment of members of the two protein families derived from the corresponding superposition of the functional sites is shown in Table 2.

Conclusion

An exhaustive search has been performed for significant 3D similarities between protein functional sites containing residues that are non-collinear in the respective protein sequences. From a non-redundant set of protein structures, an ensemble of about 10,000 cavities were defined, one fourth of which could be associated to a known molecular function using PROSITE patterns and/or bound ligands. The local comparison produced more than 60 thousand 3D matches. In this list, a relatively low number of matches appeared to involve non-collinear residues. All matches were evaluated for their statistical significance using the Z-score. Cases with Z-score > 10 were carefully analyzed and manually inspected in the graphics. Four interesting cases were identified, three of which were already known in the literature as cases involving a permutation in one of the protein families comprised in the structural match. The fourth case involved a member of the ABC transporter family and a bacterial HprK/P (Figures 2 and 3). Three regions in the two protein sequences are involved in the structural match: their location in the respective sequences is 1-2-3 versus 3-1-2. This is compatible with a circular permutation in one of the protein families. But in one of the regions (the one identified by #3 in the preceding sentence) the residues are located in inverse order in the two sequences, therefore suggesting that a case of convergent evolution has occurred. Interestingly, a structural core composed of 4 beta filaments and one helix is conserved in the two structures (Figure 4), but two of the beta filaments are oriented in opposite directions. This finding can be relevant both for a better understanding of structure-function relationships and for medical significance, since members of the ABC transporter family are involved in several human diseases, such as cystic fibrosis, muscular distrophy, adrenoleukodystrophy, Stargardt disease and others. This analogy might help in suggesting experimental strategies to devise new classes of inhibitors, peptides or compounds.

Methods

Cavities dataset

We used a NCBI non-redundant PDB [18] composed of 1924 chains obtained using only X-ray solved structures and a sequence-similarity cut-off corresponding to a minimum BLAST p-value of 10e-7. Using the SURFNET algorithm [19] we identified a dataset of 10175 surface clefts on these chains with a cavity volume higher then 200 Å3. We defined each cleft as the set of residues identified by the algorithm that surrounds the cavity pocket [20].

Functional residues

Functionally important residues were identified in the set of defined cavities by searching for PROSITE [9] patterns and ligand binding sites. PROSITE motifs were searched in the sequences of our protein dataset with the ScanProsite algorithm [21]. All PROSITE regular expressions were used excluding those marked as "unspecific". Ligand binding residues were identified with a distance criterion. All residues within 3.5 Å distance from any HETEROATOM found in the selected co-ordinate set were selected and assigned to the category. All those clefts displaying less than 75% of the residues identified to be involved in one of the defined functions were discarded, with the purpose of considering only almost complete functional sites [20].

All vs. all structural comparison

The structural comparison was performed using the sequence independent local comparison algorithm Query3D [10]. Matching criteria of Query3D are both geometrical (r.m.s.d. between paired residues) and biochemical (scores from a substitution matrix). The algorithm uses a two-point representation of each residue, the C-alpha and a side-chain representative point. An exhaustive exploration guarantees finding the two largest sets of matching aminoacids in a pair of protein structures. For this experiment, high stringency parameters were set (r.m.s.d. < 0.7 Å and residue similarity > 1.2 according to the Dayhoff substitution matrix) in order to obtain only matches with a high similarity. Whenever a match involves more than 10 aminoacids, only the first ten are considered. In the all vs. all comparison experiment, all surface clefts containing at least 75% of functionally important residues were compared to the whole set of clefts. Moreover, the algorithm was forced to consider only the structural similarities involving at least 50% of aminoacids being annotated as functionally important [20]. This requirement helps in selecting only matches in protein regions characterized by an easily deducible function.

Matches significance

Each match is scored with the match length, i.e. with the number of residues that can be superposed within the defined similarity thresholds. The significance of each match is evaluated by calculating the Z-score over the value distribution of the query cleft comparison with the whole dataset. For each match, the Z-score is computed as the difference between the value of the match and the average value of all the matches for the query patch, divided by the standard deviation.

Definition of collinear and inverted structural matches

A structural match can be described as a set of pairs of residues that can be superposed in 3D. Each residue pair is identified by an uppercase letter (i.e. A) and the two composing residues with the same letter in lowercase followed by one or two apice depending on its belonging to the first or the second structure (i.e. a', a"). Given two residues a' ∈ A and b' ∈ B, a' < b' if a' precedes b' in the primary sequence. Two pairs A and B are non-collinear if a' < b' while b" < a" or if b' < a' while a" < b". A structural match is non-collinear if it contains at least 2 non-collinear pairs, and is inverted if it contains at least 3 pairs, each of these non-collinear between each other.

Authors' contributions

GA conceived the study, carried out the structural comparisons and drafted the manuscript. DP carried out the analysis of non-collinear matches. AV participated in the design of the work. MHC participated in the design of the work and in its coordination. All authors read and approved the final manuscript.
  21 in total

1.  The Protein Data Bank.

Authors:  H M Berman; J Westbrook; Z Feng; G Gilliland; T N Bhat; H Weissig; I N Shindyalov; P E Bourne
Journal:  Nucleic Acids Res       Date:  2000-01-01       Impact factor: 16.971

Review 2.  Protein surface similarities: a survey of methods to describe and compare protein surfaces.

Authors:  A Via; F Ferrè; B Brannetti; M Helmer-Citterich
Journal:  Cell Mol Life Sci       Date:  2000-12       Impact factor: 9.261

3.  Circularly permuted proteins in the protein structure database.

Authors:  J Jung; B Lee
Journal:  Protein Sci       Date:  2001-09       Impact factor: 6.725

4.  The SAM domain of polyhomeotic forms a helical polymer.

Authors:  Chongwoo A Kim; Mari Gingery; Rosemarie M Pilpa; James U Bowie
Journal:  Nat Struct Biol       Date:  2002-06

5.  Evaluation of protein fold comparison servers.

Authors:  Marian Novotny; Dennis Madsen; Gerard J Kleywegt
Journal:  Proteins       Date:  2004-02-01

6.  ScanProsite: a reference implementation of a PROSITE scanning tool.

Authors:  Alexandre Gattiker; Elisabeth Gasteiger; Amos Bairoch
Journal:  Appl Bioinformatics       Date:  2002

Review 7.  Searching for functional sites in protein structures.

Authors:  Susan Jones; Janet M Thornton
Journal:  Curr Opin Chem Biol       Date:  2004-02       Impact factor: 8.822

8.  A serine protease triad forms the catalytic centre of a triacylglycerol lipase.

Authors:  L Brady; A M Brzozowski; Z S Derewenda; E Dodson; G Dodson; S Tolley; J P Turkenburg; L Christiansen; B Huge-Jensen; L Norskov
Journal:  Nature       Date:  1990-02-22       Impact factor: 49.962

Review 9.  Transport ATPases: structure, motors, mechanism and medicine: a brief overview.

Authors:  Peter L Pedersen
Journal:  J Bioenerg Biomembr       Date:  2005-12       Impact factor: 3.853

10.  SURFACE: a database of protein surface regions for functional annotation.

Authors:  Fabrizio Ferrè; Gabriele Ausiello; Andreas Zanzoni; Manuela Helmer-Citterich
Journal:  Nucleic Acids Res       Date:  2004-01-01       Impact factor: 16.971

View more
  6 in total

1.  Real-time ligand binding pocket database search using local surface descriptors.

Authors:  Rayan Chikhi; Lee Sael; Daisuke Kihara
Journal:  Proteins       Date:  2010-07

2.  Binding ligand prediction for proteins using partial matching of local surface patches.

Authors:  Lee Sael; Daisuke Kihara
Journal:  Int J Mol Sci       Date:  2010-12-06       Impact factor: 5.923

3.  SiteMotif: A graph-based algorithm for deriving structural motifs in Protein Ligand binding sites.

Authors:  Santhosh Sankar; Nagasuma Chandra
Journal:  PLoS Comput Biol       Date:  2022-02-24       Impact factor: 4.475

4.  Structural motifs recurring in different folds recognize the same ligand fragments.

Authors:  Gabriele Ausiello; Pier Federico Gherardini; Elena Gatti; Ottaviano Incani; Manuela Helmer-Citterich
Journal:  BMC Bioinformatics       Date:  2009-06-15       Impact factor: 3.169

5.  CPSARST: an efficient circular permutation search tool applied to the detection of novel protein structural relationships.

Authors:  Wei-Cheng Lo; Ping-Chiang Lyu
Journal:  Genome Biol       Date:  2008-01-18       Impact factor: 13.583

6.  Structure- and context-based analysis of the GxGYxYP family reveals a new putative class of glycoside hydrolase.

Authors:  Daniel J Rigden; Ruth Y Eberhardt; Harry J Gilbert; Qingping Xu; Yuanyuan Chang; Adam Godzik
Journal:  BMC Bioinformatics       Date:  2014-06-17       Impact factor: 3.169

  6 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.