Silvana Poggi1, Sathees B Chandra2. 1. Research Resources Centre, University of Illinois at Chicago, Chicago, USA. 2. College of Nursing and Health Sciences, Barry University, Miami, Florida, USA.
Abstract
Replicative Helicase DnaB interacts with DnaA, DnaC, DnaG, and DNA polymerase III to commence replication, increase the movement rate of the replication fork, and to assemble part of the primosome. The formation of the replication fork is limited by the ability to load DnaB to the DNA, thus DnaB has shown to be vital to a large extent. In the absence of DnaB, the replication fork is not maintained and in a state of inactivity the replication fork degrades and collapses. To further understand importance of this enzyme from an evolutionary perspective, a genomic analysis DnaB protein sequences, chosen from five Proteobacteria subclasses was performed. Our analysis indicates that, DnaB replicative helicases of Alphaproteobacteria and Epsilonproteobacteria have diverged at an earlier stage from Betaproteobacteria, Deltaproteobacteria and Gammaproteobacteria as well as from one another. Our results were further supported, when we reanalyzed and reconstructed the phylogenetic tree after the inclusion of sequences from Actinobacteria and Firmicute phylum. In addition, Betaproteobacteria, Deltaproteobacteria, and Gammaproteobacteria appear to share a closer common ancestor than from the other two subclasses. The Dot-plot analysis indicated that, the region between amino acid residues 320 to 400 was strongly conserved among all five subclasses.
Replicative Helicase DnaB interacts with DnaA, DnaC, DnaG, and DNA polymerase III to commence replication, increase the movement rate of the replication fork, and to assemble part of the primosome. The formation of the replication fork is limited by the ability to load DnaB to the DNA, thus DnaB has shown to be vital to a large extent. In the absence of DnaB, the replication fork is not maintained and in a state of inactivity the replication fork degrades and collapses. To further understand importance of this enzyme from an evolutionary perspective, a genomic analysis DnaB protein sequences, chosen from five Proteobacteria subclasses was performed. Our analysis indicates that, DnaB replicative helicases of Alphaproteobacteria and Epsilonproteobacteria have diverged at an earlier stage from Betaproteobacteria, Deltaproteobacteria and Gammaproteobacteria as well as from one another. Our results were further supported, when we reanalyzed and reconstructed the phylogenetic tree after the inclusion of sequences from Actinobacteria and Firmicute phylum. In addition, Betaproteobacteria, Deltaproteobacteria, and Gammaproteobacteria appear to share a closer common ancestor than from the other two subclasses. The Dot-plot analysis indicated that, the region between amino acid residues 320 to 400 was strongly conserved among all five subclasses.
Replicative Helicase DnaB is a multifunctional hexameric enzyme involved in the initiation of DNA synthesis and it is found in essentially all organisms from the Bacteria domain. In bacteria, replication starts at a specific sequence of nucleotides known as the oriC (1). Helicases identify this region, bind to the site, and begin unwinding the DNA double helix by breaking the hydrogen bonds that hold the strands together (2, 3). The protein complex pre-RC recognizes the origin of replication, helicases then separate the two annealed nucleic acid strands using energy obtained from ATP hydrolysis and finally primase (DnaG), once activated by DNA helicase, begins the synthesis of short RNA oligonucleotides (2, 4). The replication process of Escherichia coli is the most studied in the Bacteria Kingdom. Its DnaB, member of the hexameric DNA helicase family, weighs approximately 52kDa (2) (Bird, 2000) and forms a stable 6:6 complex with the DnaC protein (5). Over ten different helicases have been identified in E. coli with functions in DNA replication, DNA repair, and DNA recombination (6). However, to a large extent, entry of DnaB helicase at oriC has shown to be vital in the initiation process (1) as well as necessary for loading of the helicase both onto closed-circular, single-stranded DNA and onto nascent replication bubbles formed by the DnaA initiator protein (7). The formation of the replication fork is limited by the ability to load DnaB to the DNA. For an organism to survive, it must be capable of removing replication obstructions and be able to resume replication. Even though alternative DnaB loading pathways exist, elimination of all them is lethal to the organism (8, 9).Overcoming replication obstacles and the ability to resume replication is of particular interest to the scientific community for a long period of time due to obvious reasons. It has been suggested that cancer affected genes BLM and BRCA2 implicate the process of replication fork restart. A better understanding in replication restart as a housekeeping mechanism could illuminate the replication pathways of the products of these cancer predisposition genes (8). From a pharmaceutical stand point, exploiting the differences in structure and function between DnaB proteins from eukaryotic and prokaryotic cells, impeding the replicative function in prokaryotes could result in their elimination (10). Moreover, bacterial species are also highly relevant for the production of biological plastic, fuel, crops, waste removal, cancer research, and human pathogens among other uses.Proteobacteria comprise a big group of bacteria that encompasses an ample variety of pathogens as well as nitrogen fixing bacteria (11). Proteobacteria consists of 5 sub-classes namely, Alpha, Beta, Delta, Epsilon, and Gamma. Alphaproteobacteria are generally non-sulfur and aerobic bacteriochlorophyll containing bacteria (12). Betaproteobacteria are usually chemoheterotrophs and chemoautotrophs (13). Deltaproteobacteria are morphological diverse and anaerobic sulfur-reducing bacteria (14). Epsilonproteobacteria are normally found in the digestive system of humans and animals; and are for the most part chemoorganotrophs (15). Finally, Gammaproteobacteria are represented by facultative anaerobic and fermentative gram-negative properties (16). The relation of Alphaproteobacteria, Betaproteobacteria and Gammaproteobacteria was highly supported by morphological analysis, while Epsilonproteobacteria and Deltaproteobacteria subclasses were added separately and considered to have diverged earlier than the rest of the subclasses based on the phylogenetic tree of Proteobacteria (17).A total of 53 sequences with their accession numbers that were used in our analysis representing five subclasses of proteobacteria as well as Actinobacteria and Firmicutes.Although Proteobacteria are phylogenetically related, they display diverse physiology, morphology and ecology (18). Consequently, gene replication varies somewhat amongst this phylum. In the case of E.coli -a Gammaproteobacteria- DnaB associates temporarily with DnaG and it activates the DnaG’s priming activity upon introduction to the replication fork in a distributive manner (2). In contrast, DnaB in Helicobacter pylori –an Epsilonproteobacteria- strongly associates with DnaG during replication (19). DnaB contains three distinct regions; the N-terminus, C-terminus, and the linker region in between (20). Various studies debate the indispensability of one region over another (4, 19, 21). Therefore, the purpose of this article is to make an attempt to further analyze the conservation of amino acids and determine a common evolutionary pattern for replicative helicase DnaB across the Proteobacteria subclasses. For the first time in this study, we have tried to analyze the evolution of the 5 sub-classes of proteobacteria using the tools of bioinformatics
2. METHODS
The DnaB sequences for analysis were obtained from the NCBI website (http://www.ncbi.nlm.nih.gov/). Out of several hundreds of available sequences, six to ten sequences from each subclass of Proteobacteria were chosen based on their size, composition, morphological classification and percent of conserved amino acid residues. Preliminary phylogenetic trees were constructed to determine their evolutionary development relationship based on bootstrap values (22). Six to ten sequences from each Proteobacteria subgroup that yielded the highest bootstrap values were chosen to represent members of their respective groups. Multiple sequence alignment analysis within and between the subclasses was performed using Clustal Omega (23).In order to compare the similarity as well as difference in the sequences of each subclasses of proteobacteria the dot matcher program was used to construct dot plots. The similarity in the protein sequences can be easily assessed from dot plots simply by seeing a diagonal fragment in between the X and Y axis of a graph, which is constructed by using data matrix, distance matrix and chi squared analysis (24). Thus similar sequence show a diagonal line whereas this line is absent or highly fragmented in dissimilar sequences. We first constructed dot plots by using sequences belonging to the same subclass of DnaB sequences and then by using each sequence from a different subclass using various combinations. The parameters of the program were mostly set at default except for window size of 10 and a threshold of 23 (25).For the purpose of phylogenetic analysis, the selected 38 sequences from 5 subclasses from proteobacteria were obtained in FASTA format and then aligned by using Clustal X (26). Neighbour joining method was used to construct the phylogenetic tree from the sequences which were aligned using PHYLIP (22). The phylogenetic tree was then bootstrapped in order to see how well the sequences related to each other. In order to further determine the possibility of common ancestry of Proteobacterial DnaB sequences, a supplementary phylogenetic analysis with the addition of eight Firmicutes and seven Actinobacteria sequences was carried out (27) and bootstrap values were recalculated. Finally treeview was used to see their position in each clade and study if these sequences were related by evolution (28).
3. RESULTS
Multiple sequence alignment within and between each of the five Proteobacteria subclasses demonstrated a high level of sequence conservation. The highest overall sequence conservation was observed in Betaproteobacteria with 372 highly conserved amino acid residues. 256 amino acid residues were considered highly conserved in Alphaproteobacteria. Subsequently, Gammaproteobacteria and Deltaproteobacteria exhibited moderately conserved residues with 180 and 164 respectively, while Epsilonproteobacteria showed much lesser sequence conservation among five subclasses with only 101 amino acid residues (Table 2).
Table 2
Total number of Amino acid residues that are conserved in DnaB sequences among five Proteobacteria subclasses.
Total number of Amino acid residues that are conserved in DnaB sequences among five Proteobacteria subclasses.Dot-plot analysis was carried to compare the protein sequences within and between the five subclasses using one organism per group. The Dot-plots for organisms within each sub-class, not surprisingly, exhibited a high degree of co-linearity; particularly for Betaproteobacteria, Deltaproteobacteria, and Gammaproteobacteria (Figure 1). When comparing the sequences between each sub-class, the Dot-plot analysis revealed a short region of co-linearity between all subclasses from approximate amino acid position from 320 through 400 (Figure 2).
Figure 1
The dot plot comparison of DnaB sequences within the same subclasses of proteobacteria resulted in nearly perfect collinear diagonal fragment: (A) Alpha/Alpha, (B) Beta/Beta, (C) Delta/Delta, (D) Epsilon/Epsilon, and (E) Gamma/Gamma [not shown in the figure] at threshold= 23 and window size =10.
Figure 2
The dot plot comparison of DnaB sequences between subclasses of proteobacteria showed some co-linearity, but resulted in a plot with numerous non- collinear fragments: (A) Beta/Alpha, (B) Delta/Gamma, (C) Epsilon/Delta, and (D) Beta/Gamma (E) Alpha/Epsilon [not shown in the figure], at threshold= 23 and window size= 10.
The dot plot comparison of DnaB sequences within the same subclasses of proteobacteria resulted in nearly perfect collinear diagonal fragment: (A) Alpha/Alpha, (B) Beta/Beta, (C) Delta/Delta, (D) Epsilon/Epsilon, and (E) Gamma/Gamma [not shown in the figure] at threshold= 23 and window size =10.The dot plot comparison of DnaB sequences between subclasses of proteobacteria showed some co-linearity, but resulted in a plot with numerous non- collinear fragments: (A) Beta/Alpha, (B) Delta/Gamma, (C) Epsilon/Delta, and (D) Beta/Gamma (E) Alpha/Epsilon [not shown in the figure], at threshold= 23 and window size= 10.Phylogenetic analysis of the thirty-eight DnaB sequences yielded three distinct clades (Figure 3). The first clade consisted of Epsilon proteobacteria protein sequences, the second one enclosed Alpha proteobacteria sequences alone, and the third one was composed of Gammaproteobacteria, Deltaproteobacteria, and Betaproteobacteria sequences. The bootstrap values in each clade were highly significant. The bootstrap value between Deltaproteobacteria and Betaproteobacteria subclasses was found to be 91, and the bootstrap value of Deltaproteobacteria and Betaproteobacteria clade with Gammaproteobacteria was 94. Further phylogenetic analysis with additional sequences produced a rectangular cladogram consisting of selected Proteobacteria, Actinobacteria, and Firmicute sequences (Figure 4).
Figure 3
A Phylogenetic tree constructed by using DnaB amino acid sequences belonging to all the five subclasses of proteobacteria. The capitalized letter that precedes each bacterial name indicates the subclass to which it belongs to (A: Alpha, B: Beta, D: Delta, G: Gamma, E: Epsilon). The scores revealed on each branch of the tree show the sequence similarity between the sequences of each species/subclass. Please refer to Table 1 for complete description/abbrevation and accession numbers for each bacterial species that were used in our analysis.
Figure 4
A Phylogenetic tree constructed by using DnaB amino acid sequences belonging to all the five subclasses of proteobacteria with the addition of Actinobacteria and Firmicutes. The capitalized letter that precedes each bacterial name indicates the subclass to which it belongs to (A: Alpha, B: Beta, D: Delta, G: Gamma, E: Epsilon Act: Actinobacteria, F: Firmicutes). The scores revealed on each branch of the tree show the sequence similarity between the sequences of each species/subclass. Please refer to Table 1 for complete description/abbrevation and accession numbers for each bacterial species that were used in our analysis.
A Phylogenetic tree constructed by using DnaB amino acid sequences belonging to all the five subclasses of proteobacteria. The capitalized letter that precedes each bacterial name indicates the subclass to which it belongs to (A: Alpha, B: Beta, D: Delta, G: Gamma, E: Epsilon). The scores revealed on each branch of the tree show the sequence similarity between the sequences of each species/subclass. Please refer to Table 1 for complete description/abbrevation and accession numbers for each bacterial species that were used in our analysis.
Table 1
A total of 53 sequences with their accession numbers that were used in our analysis representing five subclasses of proteobacteria as well as Actinobacteria and Firmicutes.
A Phylogenetic tree constructed by using DnaB amino acid sequences belonging to all the five subclasses of proteobacteria with the addition of Actinobacteria and Firmicutes. The capitalized letter that precedes each bacterial name indicates the subclass to which it belongs to (A: Alpha, B: Beta, D: Delta, G: Gamma, E: Epsilon Act: Actinobacteria, F: Firmicutes). The scores revealed on each branch of the tree show the sequence similarity between the sequences of each species/subclass. Please refer to Table 1 for complete description/abbrevation and accession numbers for each bacterial species that were used in our analysis.
4. DISCUSSION
DnaB consists of three distinct regions; the N-terminal domain, C-terminal domain, and the linker region found in between these two domains (20). For the most part the N-terminus region ranges from amino acid residue 1 to approximately 120 while the C-terminal domain resides approximately between amino acid 175 to 488; this location may vary slightly accommodating to the size of the protein in individual organisms (20). Studies have shown the C-terminal of DnaB is crucial while others have demonstrated that the N-terminal is important in the DnaB family (4, 19). Recent structural studies have focused on the manner in which DnaB interacts with the DNA strand to maintain replication fork integrity. DnaB positions itself differently across diverse types of bacteria whether it is cracking open as it is observed in the Gammaproteobacteria E.coli, forming around the DNA strand as it is observed in the Firmicute B.subtilis, or even forming a double hexomer as it is observed in the Epsilonproteobacteria H. pylori (7).Through multiple sequence alignment and Dot-plot analysis of the thirty eight Proteobacterial sequences, our study demonstrated that a region located between 320-400 amino acid residues in the protein sequence was persistently conserved. The largest number of amino acid residues was conserved in Betaproteobacteria subclass. On the other hand the lowest number of amino acid residues conservation was observed in Epsilonproteobacteria subclass. The Dot-plot analysis further indicated strong amino acid residues conservation within each subclass displaying high level of co-linearity especially for Betaproteobacteria, Deltaproteobacteria, and Gammaproteobacteria. Dot-plot analysis between the subclasses showed a predominantly conserved region located few amino acid residues from the C-terminus area strongly supporting the work of Nitharwal (20).The DnaB sequence analysis with proteobacteria reveals that Deltaproteobacteria, Betaproteobacteria, and Gammapoteobacteria subclass have evolved together, while DnaB protein of alpha and epsilon seem to have diverged at an earlier point in evolution (Figure 4). Our phylogenetic tree further supports the evolutionary distinction among the different subclasses of Proteobacteria. Three distinct clades were observed as expected. The first clade comprises of Deltaproteobacteria, Betaproteobacteria, and Gammaproteobacteria, while the other two clades are clearly seem to have separated from one another suggesting that DnaB protein evolved separately. Our analysis seems to indicate that Epsilonproteobacteria and Alphaproteobacteria have independently evolved by diverging at an earlier stage from the rest. The bootstrap value of 91 between Betaproteobacteria and Deltaproteobacteria suggests that these have diverged more recently, and at the same time with a bootstrap value of 94 we can infer that these two have previously diverged from Gammaproteobacteria. High bootstrap values in our analysis seem to indicate that these three groups share a more recent common ancestor than any of the other Proteobacteria subclasses.Our results, after the reconstruction of phylogenetic tree with the addition Actinobacteria and Firmicute phylum, seemed to suggest that Epsilonproteobacteria was perhaps the first subclass to have diverged from the rest. In addition, Alphaproteobacteria DnaB seems to be closely related with Actinobacteria and Firmicutes than the other subclasses. This bioinformatics analysis suggests DnaB replicative helicase has evolved and diverged within the five classes of Proteobacteria in order to adapt to changing conditions in terms of evolution. The analysis of Proteobacteria subclasses with Actinobacteria and Firmicutes can further suggest that Epsilonproteobacteria has diverged much sooner than the rest. The Dot-plot analysis also supports this observation as the Epsilonproteobacteria sequences had the lowest number of conserved amino acid residues within and between the groups. Moreover, the conserved region between the 320-400 amino acid residues in the sequences suggest that this region may be crucial perhaps to shape the form of helicase, function, or stability of the protein. This region of conserved amino acid residues strongly supports the notion that C-terminal region is central to DnaB.
5. FUTURE PERSPECTIVES
DnaB is essential in the bacterial replication process. It is evident that the replicative helicase has diverged between the subclasses of Proteobacteria, nevertheless, maintaining a conserved region in the C-terminus. Focus on this region in the future can help better understand the complete role of DnaB in replication and fork maintenance as well as further defining the interactions with other proteins involved in replication. It is evident from our stud and various other studies that, Gammaproteobacteria, Betaproteobacteria, and Deltaproteobacteria share a recent common ancestor. In addition, although Epsilonproteobacteria and Alphaproteobacteria seem to have diverged at an earlier stage, it is unclear at this time at which point of time all five subclasses had a common ancestor. Further research is needed to answer this evolutionary question and perhaps explain the manner in which DnaB forms whether it is opening itself to allow DNA entry, forming around the DNA strand, or forming a double hexomer in some cases. Future work may also focus on finding a link between Actinobacteria, Firmicutes and Proteobacteria as we did notice some commonality in our analysis. Alphaproteobacteria might share a more recent common ancestor with these different phyla.
Authors: Chiara Indiani; Lance D Langston; Olga Yurieva; Myron F Goodman; Mike O'Donnell Journal: Proc Natl Acad Sci U S A Date: 2009-03-11 Impact factor: 11.205