| Literature DB >> 21437710 |
Varinia López-Ramírez1, Luis D Alcaraz, Gabriel Moreno-Hagelsieb, Gabriela Olmedo-Álvarez.
Abstract
DEAD-box proteins are found in all domains of life and participate in almost all cellular processes that involve RNA. The presence of DEAD and Helicase_C conserved domains distinguish these proteins. DEAD-box proteins exhibit RNA-dependent ATPase activity in vitro, and several also show RNA helicase activity. In this study, we analyzed the distribution and architecture of DEAD-box proteins among bacterial genomes to gain insight into the evolutionary pathways that have shaped their history. We identified 1,848 unique DEAD-box proteins from 563 bacterial genomes. Bacterial genomes can possess a single copy DEAD-box gene, or up to 12 copies of the gene, such as in Shewanella. The alignment of 1,208 sequences allowed us to perform a robust analysis of the hallmark motifs of DEAD-box proteins and determine the residues that occur at high frequency, some of which were previously overlooked. Bacterial DEAD-box proteins do not generally contain a conserved C-terminal domain, with the exception of some members that possess a DbpA RNA-binding domain (RBD). Phylogenetic analysis showed a separation of DbpA-RBD-containing and DbpA-RBD-lacking sequences and revealed a group of DEAD-box protein genes that expanded mainly in the Proteobacteria. Analysis of DEAD-box proteins from Firmicutes and γ-Proteobacteria, was used to deduce orthologous relationships of the well-studied DEAD-box proteins from Escherichia coli and Bacillus subtilis. These analyses suggest that DbpA-RBD is an ancestral domain that most likely emerged as a specialized domain of the RNA-dependent ATPases. Moreover, these data revealed numerous events of gene family expansion and reduction following speciation.Entities:
Mesh:
Substances:
Year: 2011 PMID: 21437710 PMCID: PMC3093544 DOI: 10.1007/s00239-011-9441-8
Source DB: PubMed Journal: J Mol Evol ISSN: 0022-2844 Impact factor: 2.395
DEAD-box proteins in Escherichia coli and Bacillus subtilis
| Protein | Size* | Function | DbpA-RBD | Reference |
|---|---|---|---|---|
|
| ||||
| DeaD (CsdA) | 629 | Ribosomal assembly/mRNA decay; cold sensitive (complemented by RhlE) | ✓ | Moll et al. ( |
| DbpA (RhlC) | 457 | Ribosomal assembly | ✓ | Fuller-Pace et al. ( |
| RhlB | 421 | mRNA decay | Py et al. ( | |
| SrmB (RhlA) | 444 | Ribosome assembly | Charollais et al. ( | |
| RhlE | 454 | Ribosome assembly | Jain ( | |
|
| ||||
| CshA (YdbR) | 511 | Ribosome assembly | Ando and Nakamura ( | |
| Dead (YxiN) | 479 | Ribosomal assembly | ✓ | Kossen and Uhlenbeck ( |
| YfmL | 376 | Unknown | – | |
| CshB (YqfR) | 438 | Ribosome assembly | Hunger et al. ( | |
*Size in amino acids
Fig. 4Relationship between phylogeny and the presence of putative ortholog sets of DEAD box proteins. The phylogenetic tree was based on the maximum likelihood method with the 16S rRNA gene (sequences from the RDP database) using the PhyML model (GTR + I) with 1,000 bootstrap replicates. The rows represent reference DEAD-box protein genes from E. coli (for γ-Proteobacteria) and from B. subtilis (for Firmicutes). The columns represent protein conservation using symbols to denote the presence of a putative ortholog. Red symbols are used for DEAD-box proteins containing the DbpA RNA-binding domain (RBD), while blue symbols are used for DEAD-box proteins lacking the DbpA domain. The conservation data were obtained from the phylogenetic reconstruction shown in Figs. 3 and 5, which is only based on the DEAD-Helicase_C N-terminal domain. D1 and D2 refer to the duplications 1 and 2 observed for the DEAD-box proteins from the Firmicutes. For each putative ortholog set, the length size difference is generally less than 20%, with a few exceptions (see Supplemental Tables S10 and S11 for the identification number of each protein as well as a comparison of the length for each of the sequences shown in this figure)
Fig. 1Overview of the methodology used in this work to identify and analyze bacterial DEAD-box protein genes
Fig. 2Phylogenetic tree of the DEAD-Helicase_C domains in 1211 bacterial DEAD-box proteins. Sequences containing only DEAD and Helicase_C domains were first aligned and then manually curated. We used Gblocks to extract informative positions from the protein alignment and a total of 323 non-ambiguous positions were used to conduct the phylogenetic analysis by Neighbor joining (see “Materials and methods”). The grouping of classes was determined according to clades layout. Three major groups are identified and colored yellow, blue, and green. Some Actinobacteria DEAD-box protein sequences branch in a separate divergent group (brown). The presence and absence of a DbpA RNA-binding domain (RBD) is the main feature distinguishing yellow and blue DEAD-box protein sequences, respectively, while green section sequences form a separate branch that is dominated by proteobacterial members. Location of known RNA helicases from E. coli and B. subtilis are indicated in the phylogeny. Information about the genomes and the sequences IDs used to generate this phylogeny can be found in Supplemental Tables S4 and S5
Fig. 3Bayesian inference of DEAD-box protein from Firmicutes. Complete alignment of 348 amino acid sequences from Firmicutes, which corresponds to the DEAD and Helicase_C domains without Gblocks filtering used in the Bayesian inference (see Fig. 1 and “Materials and methods”). DEAD-box proteins from B. subtilis are localized as reference sequences across the phylogram. The presence of the DbpA domain is shown with a red box on the perimeter. Notably it only occurs in a few genera, such as in Bacillus, Lysteria, and Clostridium. Proposed gene duplications are shown in the main clusters as duplication 1 and duplication 2. Interestingly, the Clostridium species appears to diverge earlier from other Firmicutes. The general clusters are shown in different colors and only values above 0.80 of posterior probability are shown. DbpA-RBD lacking branches next to DpbA-RBD containing branches are indicated
Fig. 5Bayesian inference of γ-Proteobacteria DEAD-box proteins. Complete alignment of 773 amino acid sequences from γ-Proteobacteria, which correspond to the DEAD and Helicase_C domains without Gblocks filtering used in the Bayesian inference (see Fig. 1 and “Materials and methods”). DEAD-box proteins from E. coli are localized as reference sequences across the phylogram. Note the defined group containing a DbpA-RBD that is shown with a red box around the perimeter. Noticeable gene expansion occurred in the Shewanella and Vibrio genera. From the topology, we inferred that the γ-Proteobacteria ancestor possessed both DbpA-RBD-containing and DbpA-RBD-lacking DEAD-box proteins. Genera are depicted with different colors and only posterior probability with values of 0.60 and above are shown. Three DbpA-RBD-lacking branches and two DpbA-RBD-containing branches are indicated
Distribution of the different classes of DEAD-box proteins in bacterial phyla
| Species | DEAD-box proteins | Protein architecture | ||
|---|---|---|---|---|
| DbpA-RBD containing | DbpA-RBD lacking | RhIE-like | ||
| Deinococcus-Thermus | ||||
| | 1 | 1 | ||
| Firmicutes | ||||
| | 4 | 2 | 2 | |
| | 3 | 2 | ||
| | 3 | 2 | 1 | 1 |
| | 2 | 1 | 1 | |
| | 3 | 2 | 1 | |
| | 5 | 2 | 2 | 1 |
| | 6 | 4 | 2 | |
| | 2 | 1 | 1 | |
| | 4 | 1 | 3 | |
| | 3 | 1 | 2 | |
| | 4 | 1 | 3 | |
| | 3 | 1 | 2 | |
| | 2 | 2 | ||
| | 1 | 1 | ||
| | 2 | 2 | ||
| | 3 | 1 | 2 | |
| | 3 | 3 | ||
| | 3 | 3 | ||
| | 1 | 1 | ||
| | 2 | 2 | ||
| Chloroflexi | ||||
| | 1 | 1 | ||
| | 1 | 1 | ||
| Cyanobacteria | ||||
| | 2 | 1 | 1 | |
| | 1 | 1 | ||
| | 2 | 1 | 1 | |
| | 1 | 1 | ||
| | 1 | 1 | ||
| Actinobacteria | ||||
| | 5 | 5 | ||
| | 4 | 4 | ||
| | 1 | 1 | ||
| | 3 | 3 | ||
| | 2 | 1 | 1 | |
| | 3 | 1 | 2 | |
| | 2 | 1 | 1 | |
| | 1 | 1 | ||
| | 2 | 1 | 1 | |
| | 2 | 2 | ||
| | 2 | 1 | 1 | |
| Chlamydia | ||||
| | 2 | 1 | 1 | |
| Bacteroidetes | ||||
| | 7 | 1 | 3 | 3 |
| | 4 | 1 | 1 | 2 |
| Epsilon-proteobacteria | ||||
| | 1 | 1 | ||
| | 1 | 1 | ||
| | 1 | 1 | ||
| Delta-proteobacteria | ||||
| | 6 | 3 | 3 | |
| | 3 | 1 | 1 | 1 |
| Alpha-proteobacteria | ||||
| | 1 | 1 | ||
| | 3 | 1 | 2 | |
| | 2 | 2 | ||
| | 3 | 1 | 2 | |
| | 2 | 2 | ||
| | 3 | 1 | 2 | |
| | 3 | 1 | 2 | |
| | 3 | 1 | 2 | |
| | 3 | 1 | 2 | |
| Beta-proteobacteria | ||||
| | 5 | 1 | 4 | |
| | 2 | 2 | ||
| | 2 | 2 | ||
| | 4 | 1 | 3 | |
| | 4 | 1 | 3 | |
| | 5 | 1 | 4 | |
| | 4 | 1 | 3 | |
| Gamma-proteobacteria | ||||
| | 3 | 1 | 1 | 1 |
| | 5 | 2 | 1 | 2 |
| | 6 | 2 | 2 | 2 |
| | 3 | 1 | 2 | |
| | 10 | 2 | 2 | 6 |
| | 8 | 2 | 3 | 3 |
| | 10 | 2 | 3 | 4 |
| | 7 | 2 | 3 | 2 |
| | 3 | 1 | 2 | |
| | 5 | 2 | 2 | 1 |
| | 3 | 2 | 1 | |
| | 4 | 1 | 2 | 1 |
| | 4 | 1 | 2 | 1 |
| | 3 | 2 | 1 | |
| | 3 | 1 | 1 | 1 |
| | 4 | 1 | 1 | 2 |
| | 4 | 2 | 2 | |
| | 6 | 2 | 2 | 2 |
| | 4 | 1 | 2 | 1 |
Fig. 6Conservation of motifs across DEAD and Helicase_C domains of 1208 bacterial DEAD-box proteins. Revised motifs obtained from the alignment of 1,208 sequences in comparison with the consensus motifs from DEAD-box proteins originally proposed by Rocak and Linder (2004) and Jankowsky and Putnam (2010). A traditional consensus residue format and a HMM-logo description are shown. The N- and C-terminal amino acid length distribution is also shown. Included for comparison are the following selected eukaryotic DEAD-box protein sequences: Dm: Drosophila melanogaster (NP_723899.1); Mm: eIF4AII Mus musculus (NP_001116510.1); Sc: Ded1 Saccharomyces cerevisiae (NP_014847.1); and Mj: Methanocaldococcus jannaschii DSM 2661(NP_247653.1)