Literature DB >> 21364782

Classifying glycerol dehydratase by its functional residues and purifying selection in its evolution.

Andres Julian Gutierrez Escobar¹, Dolly Montoya Castaño.

Abstract

Glycerol dehydratase (GD) catalyses glycerol reductive conversion to 3-hydroxypropanaldehyde (3-HPA), this being the first step required for the microbial conversion of glycerol to 1, 3 -propanodiol. GD has been functionally characterised to date and two main groups have been determined, one of them being vitamin B(12)-dependent and the other B(12)-independent. GD evolutionary history has been described and an exhaustive analysis made for detecting the functional residues responsible for type I divergence. GD phylogenetic tree topology was seen to be statistically robust and the data indicated strong purifying selection operating on the GD proteins within it. Two clades were indentified, one for vitamin B(12)-dependent and the other for B(12)- independent classes. The ancient hot-pot residues responsible for protein divergency for each clade were also identified. The basic evolutionary biology for GD proteins has been described, thereby opening the way forward for developing rational mutagenesis studies.

Entities: Chemical Disease Species

Keywords: glycerol dehydratase; hot pots; molecular evolution; type I functional divergence

Year: 2010 PMID： 21364782 PMCID： PMC3041002 DOI： 10.6026/97320630005173

Source DB: PubMed Journal: Bioinformation ISSN： 0973-2063

Background

Interest in glycerol dehydratase GD (EC 4.2.1.30) has increased beyond academic circles in the past few years because of its role in the fermentation pathway for producing industrial 1,3- propanediol (1,3-PD). Two kinds of GD have been characterised to date. The first one catalyses glycerol conversion to 3-hydroxypropionaldehyde via a radical mechanism depending on the extensively studied 5’-deoxyadenosylcobalamin (vitamin coB12) [1] the other performs the same function but is B12-independent. Both enzymes belong to the new radical SAM superfamily of proteins which has been identified in all kingdoms of life and has been shown to catalyse a diverse array of chemical reactions having significant medical and biotechnological importance. The GDs specifically belongs to the lyase family which cleaves carbon-oxygen bonds [2]. The cofactors required for such common activation mechanism are a [4Fe- 4S] + cluster (three Fe2+ ions and one Fe3+ ion) and S-adenosylmethionine (SAM). Glycerol dehydratase is a key enzyme for the dihydroxyacetone (DHA) pathway [3]. The C. butyricum enzyme presents the highest identity (47%) with E. coli PFL (piruvate formate lyase) according to Raynaud et al., specifically the C-terminal domain (the radical loop). Its overall structure is an β/α barrel containing its catalytic properties. The B12-independent enzyme forms a monomer forming a functional dimer [4]; however, the B12-dependent one exists as an αβγ heterotrimer dimer. The α monomer corresponds to the β/αbarrel [5]. Neither the basic evolutionary biology for this class of protein nor the type of residues considered to be evolutionary hot spots has been deduced at the present. This study has examined GD molecular evolutionary history to determine whether the evolutionary process has been responsible for the high degree of sequence conservation. Different methodological approaches were used for analysing synonymous (pS) and nonsynonymous (pN) changes in 31 GD sequences. PRATT software was used for predicting the GD motif signature and the Evolutionary Trace server was used for determining evolutionary traces for the GD protein. Specific amino acids responsible for selective restriction were then identified, phylogenetic divergence being produced for this protein. DIVERGE 1.0 software was used in our approach for evaluating all protein sequences.

Methodology

Sequences

An exhaustive search was made in GenBank, EMBL and Swiss-prot databases for GD nucleotide and protein sequences. This search was optimised by using BLAST, PSI-BLAST and WU-BLAST software (6) using the Clostridium butyricum protein sequence as search entry (access number ABX56860.2). 103 hits were obtained and then filtered by removing partial and redundant sequences from the population. Complete protein representations were included by strain; our final working population consisted of 31 complete protein sequences. SMART software was used for scrutinising all sequences in the search for typical GD protein domains [7];. GD crystal structures were downloaded from the PDB database; the 1r9d structure [4]; was used as template for divergent functional residue analysis.

Alignment and phylogenetic reconstruction

Muscle software [8] was used for gene and protein alignment of the 31 previously collected sequences, using default parameters. dS and dN percentage changes were computed using a modified version of the Nei- Gojobori test; the Tajima test was calculated using MEGA 4.0 software and the SNAP server [9]. A combined strategy was used for phylogenetic analysis; the NJ method was used first for phylogenetic reconstruction and p-distance as a model for distance analysis [10]. Statistical robustness was calculated by using 5,000 Bootstrap repeats. MEGA 4.0 software was used throughout [11]. Secondly, the alignment was then analysed using ProtTest [12] to determine the protein evolution model having the best fit for GD sequence alignment. Phylogenetic analysis then used Phyml 3.0.1 [13], using 1,000 Bootstrap repeats. The phylogenetic tree was then visualised using NJplot software [14]. The best tree topology was shown.

Analysing type I functional residues

A conceptual statistical framework for modelling functional divergence was used for estimating the coefficient of functional divergence (θ) as type I functional divergence level indicator. GD protein alignments were used for determining divergence points (DIVERGE software 1.0) [16].

Discussion

Glycerol conversion to 1, 3-PD involves a B12-dependent glycerol dehydratase coenzyme [5]. However, one report has described that Clostridium butyricum VPI1718 glycerol dehydratase (extracted from 1,3- PD-producing cells) was not stimulated by coenzyme B12 and was extremely oxygen sensitive, thereby suggesting that it might be a B12- independent coenzyme [4]. It seems that B12-dependent and B12- independent enzymes are orthologous genes which have evolved in separate lines; however, β/α barrel homology indicate an ancestral relationship. GD evolution is characterised by ancient gene duplications (supported by high basal bootstrap values) followed by bifurcation having long branches, indicating independent evolution for each clade. Despite similar tree branching being observed when using both strategies (see methodology), the second one seemed to be the most parsimonious because it required less steps to reproduce the topology with a good bootstrap value (Figure Figure 1a, Figure 1b). Interestingly, the longer basal branches of the tree (1,756 for B12- independent and 2,175 for B12-dependent nodes) indicated a deep common ancestor even though each current GD clade has its own evolutionary mode. This hypothesis has been demonstrated by structural analysis for both enzymatic types in which the B12-dependent type has additional chains (contrary to the B12-independent types). JTT+γ was the evolutionary model which best fit our protein sequences [17]; this was not calculated by MEGA 4.0 but is default in Phyml 1.0 software. This strategy has been seen to be effective in predicting the best model for GD evolution

Figure 1

(A) Phylogenetic tree developed using MEGA 4.0 software, 31 GD protein sequences were aligned in Muscle. The alignment was used for constructing a tree using the NJ, p-distance and 5,000 Bootstrap repetitions for statistical robustness (B) Phylogenetic tree developed using Phyml software, 31 GD protein sequences were aligned in Muscle. The alignment was used for constructing a tree using the JTT + γ evolutionary model according results from Prottest, 5,000 Bootstrap repetitions was used for statistical robustness. Only nodes having values higher than 50% statistical significance have been shown (refer to supporting material for phylip sequence format used) C) Determining functionally important phylogenetically divergent sites for the GD protein. The sites have a θ-value of 0.7 for all 39 sites (yellow), these being statistically significant values (falling within the 5% region, having a P-value ≫0.05). Green residues considered the enzymatic active site according literature. D) PRATT traces residues for B12 Independent and dependent types.

Several approaches were applied for testing natural selection. The results suggested that dS level was higher than dN (Table 1 see supplementary material). A 0.000 probability was obtained in the Z-test (dS–dN=3.538). Tajima D value was 4.857740 and dS/dN was 1.3432 in the SNAP server. This suggested that birth and death subjected to strong purifying selection was the model best fitting GD protein evolution. Such combination has thus sought the best polymorphism by niche, explored according to species. This indicated that GD genes have been in the bacterial genome for a long time. It also suggested that GD was a determinant point of natural selection and thereby cooperated by inducing the divergence of these kinds of bacterial species. It is possible that the GD protein belongs to the radical SAM superfamily but the blast result suggested that it fit better with the RNR-PFL superfamily (data not shown). Such enzymes are strictly anaerobic (like GD) and it has been further suggested that the diversity of chemical reactions catalysed by this class of protein exceeds those catalysed by B12 [18]. Glycerol is the primary metabolite of GD but has a wide variety of catalysed substrates according to its evolutionary mode. GD displayed broad spectrum substrates in this work. GD can catalyse 1, 2-ethanediol → acetaldehyde + H2O, 1,2-propanediol → propionaldehyde + H2O [19] and ethylene glycol → acetaldehyde + H2O (20) and GD may have a plethora of substrates which have not yet been discovered. Several residues have been determined for GD function. GD has been found at Gly763 within the Clostridia Gly-radical domain (which has been identified as being the site for free radical formation) and Cys433 located around it. The active site binding glycerol and 1, 2-propanediol are mediated by H281, H164, S282, D447, E435, Y640, C433 and Y339 residues (1–4). R782 may be important for functional contact between GD and its reactivase protein [4]. The cut-off value for detecting type I divergent residues was 0.7. Thirtynine residues were detected here (Q94, E203, Y124, Y137, G162, L172, k199, Y212, N100, K316, F332, G33, K350, L388, A392, S416, G443, G463, Q469, K481, F483,Y498, I510, F520, G539, G538, S575, K578,N582, P640, Y646, L650, A653, T654, G672, C673, K703, E751 and Y753) which can be considered to be hot-spots for GD evolution. These sites may be the mutational points defining B12-dependent and B12- independent GD lines (for more details please refer to supporting material). PRATT was used for obtaining the trace maps for each GD protein (Figure 1d). Some of these sites fit just at the side of functionally proven residues from the GD active site (i.e. G162/H163 and Y639/P640). Such changes protect protein function but generate protein distortions opening up the sequence space for exploring new niches.

Conclusion

For one hand, GD protein evolution can be clearly explained by birth and death evolution in purifying selection mode and opens the way forward for future mutagenesis studies pursuing enzymatic activity improvement based on the traces identified here. For the other hand, it is important to develop non conventional data mining strategies looking for the optimal identification of RNF-PFL proteins family members in the databases.

20 in total

1. Statistical methods for testing functional divergence after gene duplication.

Authors: X Gu
Journal: Mol Biol Evol Date: 1999-12 Impact factor: 16.240

2. MUSCLE: multiple sequence alignment with high accuracy and high throughput.

Authors: Robert C Edgar
Journal: Nucleic Acids Res Date: 2004-03-19 Impact factor: 16.971

3. Insight into the mechanism of the B12-independent glycerol dehydratase from Clostridium butyricum: preliminary biochemical and structural characterization.

Authors: Jessica Rae O'Brien; Celine Raynaud; Christian Croux; Laurence Girbal; Philippe Soucaille; William N Lanzilotta
Journal: Biochemistry Date: 2004-04-27 Impact factor: 3.162

4. The rapid generation of mutation data matrices from protein sequences.

Authors: D T Jones; W R Taylor; J M Thornton
Journal: Comput Appl Biosci Date: 1992-06

5. WWW-query: an on-line retrieval system for biological sequence banks.

Authors: G Perrière; M Gouy
Journal: Biochimie Date: 1996 Impact factor: 4.079

6. Allosteric interactions in glycerol dehydratase. Purification of enzyme and effects of positive and negative cooperativity for glycerol.

Authors: A Stroinski; J Pawelkiewicz; B C Johnson
Journal: Arch Biochem Biophys Date: 1974-06 Impact factor: 4.013

7. Finding flexible patterns in unaligned protein sequences.

Authors: I Jonassen; J F Collins; D G Higgins
Journal: Protein Sci Date: 1995-08 Impact factor: 6.725

8. Glycerol fermentation in Klebsiella pneumoniae: functions of the coenzyme B12-dependent glycerol and diol dehydratases.

Authors: R G Forage; M A Foster
Journal: J Bacteriol Date: 1982-02 Impact factor: 3.490

9. SMART 6: recent updates and new developments.

Authors: Ivica Letunic; Tobias Doerks; Peer Bork
Journal: Nucleic Acids Res Date: 2008-10-31 Impact factor: 16.971

10. MUSCLE: a multiple sequence alignment method with reduced time and space complexity.

Authors: Robert C Edgar
Journal: BMC Bioinformatics Date: 2004-08-19 Impact factor: 3.169