| Literature DB >> 21949776 |
Jean F Challacombe1, Stephanie A Eichorst, Loren Hauser, Miriam Land, Gary Xie, Cheryl R Kuske.
Abstract
Members of the bacterial phylum Acidobacteria are widespread in soils and sediments worldwide, and are abundant in many soils. Acidobacteria are challenging to culture in vitro, and many basic features of their biology and functional roles in the soil have not been determined. Candidatus Solibacter usitatus strain Ellin6076 has a 9.9 Mb genome that is approximately 2-5 times as large as the other sequenced Acidobacteria genomes. Bacterial genome sizes typically range from 0.5 to 10 Mb and are influenced by gene duplication, horizontal gene transfer, gene loss and other evolutionary processes. Our comparative genome analyses indicate that the Ellin6076 large genome has arisen by horizontal gene transfer via ancient bacteriophage and/or plasmid-mediated transduction, and widespread small-scale gene duplications, resulting in an increased number of paralogs. Low amino acid sequence identities among functional group members, and lack of conserved gene order and orientation in regions containing similar groups of paralogs, suggest that most of the paralogs are not the result of recent duplication events. The genome sizes of additional cultured Acidobacteria strains were estimated using pulsed-field gel electrophoresis to determine the prevalence of the large genome trait within the phylum. Members of subdivision 3 had larger genomes than those of subdivision 1, but none were as large as the Ellin6076 genome. The large genome of Ellin6076 may not be typical of the phylum, and encodes traits that could provide a selective metabolic, defensive and regulatory advantage in the soil environment.Entities:
Mesh:
Substances:
Year: 2011 PMID: 21949776 PMCID: PMC3174227 DOI: 10.1371/journal.pone.0024882
Source DB: PubMed Journal: PLoS One ISSN: 1932-6203 Impact factor: 3.240
Figure 1Circular map of the Ellin6076 (Panel A) and Ellin345 (Panel B) genomes obtained from the IMG system (http://img.doe.gov).
From outside to the center: Circles 1 and 2: forward and reverse strand genes colored by COG categories; Circles 3 and 4: RNA genes (tRNAs green, sRNAs red, other RNAs black); Circles 5 and 6: mobile elements; Circle 7: GC content; Circle 8: GC skew. Colors representing the COG category codes and function definitions: cyan, [A] RNA processing and modification; light lime, [B] Chromatin structure and dynamics; light aqua, [C] Energy production and conversion; pale lavender, [D] Cell cycle control, cell division, chromosome partitioning; light crimson, [E] Amino acid transport and metabolism; light blue green, [F] Nucleotide transport and metabolism; dark pink, [G] Carbohydrate transport and metabolism; teal, [H] Coenzyme transport and metabolism; violet blue, [I] Lipid transport and metabolism; violet, [J] Translation, ribosomal structure and biogenesis; light olive, [K] Transcription; yellow, [L] Replication, recombination and repair; light brown, [M] Cell wall/membrane/envelope biogenesis; light pink, [N] Cell motility; light green, [O] Posttranslational modification, protein turnover, chaperones; orange, [P] Inorganic ion transport and metabolism; lime, [Q] Secondary metabolites biosynthesis, transport and catabolism; purple, [R] General function prediction only; aqua, [S] Function unknown; brown, [T] Signal transduction mechanisms; light blue, [U] Intracellular trafficking, secretion and vesicular transport; baby blue, [V] Defense mechanisms; lavender, [W] Extracellular structures; light red, [Y] Nuclear structure; lime green, [Z] Cytoskeleton.
Paralogs in larger-smaller genome pairs#.
| Classification | Large genome | Size (Mb) | # genes | # paralogs (% of genes) | Small genome | Size (Mb) | # genes | # paralogs (% of genes) |
| Acidobacteria | Ellin6076 | 9.97 | 8002 | 5426 (67.8%) | Ellin345 | 5.65 | 4834 | 2543 (52.6%) |
| Alpha proteobacteria |
| 9.21 | 10334 | 6113 (59.2%) |
| 4.97 | 4667 | 2428 (52.0%) |
| Alpha proteobacteria |
| 7.6 | 7352 | 4396 (59.8%) |
| 4.94 | 4686 | 2606 (55.6%) |
| Beta proteobacteria |
| 7.42 | 6702 | 4395 (65.6%) |
| 5.9 | 4418 | 2267 (51.3%) |
| Gamma proteobacteria |
| 7.22 | 6862 | 3242 (47.2%) |
| 3.87 | 3735 | 1721 (46.1%) |
| Actinobacteria |
| 6.99 | 6925 | 4535 (65.5%) |
| 3.27 | 2752 | 490 (17.8%) |
Data from the Integrated Microbial Genomes (IMG) System; http://img.jgi.doe.gov.
*Draft genome data.
Distribution of genes in COG categories for Acidobacteria strains Ellin6076 and Ellin345.
| COG CATEGORY | Strain Ellin 6076 | Strain Ellin 345 | Fold Increase+ |
|
| |||
| (L) Replication, recombination, repair | |||
| Site-specific recombinase XerD | 32 | 6 | 5.3 |
| Transposase and inactivated derivatives | 64 | 10 | 6.4 |
|
| |||
| (M) Cell wall/membrane biogenesis | |||
| 4-amino-4-deoxy-L-arabinose transferase and related | 26 | 7 | 3.7 |
| L-alanine-DL-glutamate epimerase and related | 25 | 4 | 6.3 |
| Periplasmic protease | 13 | 2 | 6.5 |
| Endopolygalacturonase | 11 | 0 | >11.0 |
| Dihydropicolinate synthase/N-acetylneuraminate lyase | 9 | 2 | 4.5 |
| ABC-type polysaccharide/polyol phosphate export system, permease | 6 | 0 | >6.0 |
| ABC-type polysaccharide/polyol phosphate export system, ATPase | 6 | 0 | >6.0 |
| Membrane proteins related to metalloendopeptidases | 5 | 1 | 5.0 |
| Sortase | 6 | 1 | 6.0 |
| (T) Signal transduction mechanisms | |||
| Antirepressor regulating drug resistance, signal transduction comp. | 12 | 3 | 4.0 |
| Bacteriophytochrome | 9 | 2 | 4.5 |
| (U) Intracellular trafficking, secretion | |||
| Flp pilus assembly protein, ATPase CpaE | 4 | 0 | >4.0 |
| Flp pilus assembly protein, ATPase CpaF | 4 | 0 | >4.0 |
| (C) Energy production, conversion | |||
| FAD/FMN-containing dehydrogenases | 8 | 2 | 4.0 |
| Carbon dioxide conc. mechanism/carboxysome shell proteins | 9 | 0 | >9.0 |
| FOG:HEAT repeat | 6 | 0 | >6.0 |
| Rieske Fe-S protein | 5 | 0 | >5.0 |
| Predicted acetamidase/formamidase | 4 | 0 | >4.0 |
| Cytochrome b subunit | 4 | 0 | >4.0 |
| (E) Amino acid transport, metabolism | |||
| Lysophospholipase L1 and related esterases | 16 | 1 | 16.0 |
| Dihydropicolinate synthase/N-acetylneuraminate lyase | 9 | 2 | 4.5 |
| Choliine dehydrogenase and related | 8 | 2 | 4.0 |
| Spermidine synthase | 5 | 0 | >5.0 |
| Asparagine synthase (glutamate hydrolyzing) | 4 | 0 | >4.0 |
| (G) Carbohydrate transport, metabolism | |||
| Sugar phosphate isomerases/epimerases | 41 | 6 | 6.8 |
| Glucose dehydrogenase | 26 | 0 | >26.0 |
| Gluconolactonase | 13 | 2 | 6.5 |
| Alpha-L-arabinofuranosidase | 8 | 2 | 4.0 |
| Alpha-L-fucosidase | 7 | 1 | 7.0 |
| Glucose/sorbosone dehydrogenases | 6 | 1 | 6.0 |
| ABC-type polysaccharide/polyol phosphate export system, permease | 6 | 0 | >6.0 |
| ABC-type polysaccharide/polyol phosphate export system, ATPase | 6 | 0 | >6.0 |
| 2,4-dihydroxyhept-2-ene-1,7-dioic acid aldolase | 5 | 0 | >5.0 |
| Beta-xylosidase | 4 | 0 | >4.0 |
| Beta-galactosidase | 4 | 1 | 4.0 |
| (H) Coenzyme transport, metabolism | |||
| 2-polyprenyl-3-methyl-5-hydroxy-6-metoxy-1,4-benzoquinol methylase | 18 | 4 | 4.5 |
| Demethylmenaquinone methyltransferase | 7 | 0 | >7.0 |
| (I) Lipid transport, metabolism | |||
| Carboxylesterase type B | 9 | 0 | >9.0 |
| (P) Inorganic ion transport, metabolism | |||
| Arylsulfatase A and related enzymes | 18 | 0 | >18.0 |
| Enterochelin esterase and related enzymes | 16 | 2 | 8.0 |
| Cytochrome c peroxidase | 6 | 0 | >6.0 |
| (Q) Secondary metabolites | |||
| Dienelactone hydrolase and related enzymes | 8 | 1 | 8.0 |
| Carbon dioxide conc. mechanism/carboxysome shell proteins | 9 | 0 | >9.0 |
| Protein involved in biosynthesis of mitomycin antibiotics/fumonisin | 4 | 0 | >4.0 |
| Predicted enzyme involved in methoxymalonyl-ACP biosynthesis | 4 | 1 | 4.0 |
+ Not normalized for genome size.
Only categories with four-fold or greater differences are shown.
Results of codon-based test of positive selection, averaging over all sequence pairs.
| Functional definition | Identifier of first sequence | dN-dS Stat from test of dN>dS (positive selection) | Probability |
| serine/threonine protein kinase | YP_821325 | −6.478 | 1.000 |
| two-component transcriptional regulator, winged helix family | YP_821372 | −6.116 | 1.000 |
| ABC transporter-related | YP_821380 | 0.218 | 0.414 |
| Carboxylesterase, type B | YP_821393 | 0.215 | 0.415 |
| Transcriptional repressor, CopY family | YP_821398 | 0.225 | 0.411 |
| Drug resistance transporter, EmrB/QacA subfamily | YP_821403 | 0.217 | 0.414 |
| TonB-dependent receptor, plug 1 | YP_821405 | −1.960 | 1.000 |
| TonB-dependent receptor, plug 2 | YP_821493 | −0.614 | 1.000 |
| anti-sigma factor antagonist | YP_821407 | −1.275 | 1.000 |
| RNA polymerase, sigma-24 subunit, ECF subfamily | YP_821437 | −0.670 | 1.000 |
| phage tail collar domain protein | YP_821449 | 3.703 | 0.000 |
| oxidoreductase domain protein | YP_821473 | −0.898 | 1.000 |
| von Willebrand factor, type A | YP_821474 | −0.583 | 1.000 |
| acetolactate synthase, large subunit, biosynthetic type | YP_821479 | −1.024 | 1.00 |
| CnaB-type protein | YP_821495 | 1.083 | 0.140 |
| ASPIC UnbV domain protein | YP_821513 | −2.277 | 1.000 |
| glycosyl transferase, family 2 | YP_821582 | −1.069 | 1.000 |
| NAD-dependent epimerase/dehydratase | YP_821583 | −1.561 | 1.000 |
| aldo/keto reductase | YP_821684 | 0.914 | 0.181 |
| phage integrase family | YP_821644 | −1.395 | 1.000 |
| phage integrase family | YP_821919 | 1.873 | 0.032 |
| phage integrase family | YP_821920 | −0.293 | 1.000 |
| phage integrase family | YP_821921 | −2.001 | 1.000 |
| integrase, catalytic region | YP_821924 | −1.212 | 1.000 |
| integrase catalytic region | YP_821733 | −0.023 | 1.000 |
| transposase IS3/IS911 family | YP_821734 | −0.783 | 1.000 |
| transposase IS3/IS911 family | YP_821923 | −0.824 | 1.000 |
*Some of the pairwise comparisons (for examples, see Tables S5, S6, S7) showed significant values (probability less than 0.05, indicating positive selection). These significant values are reflected in higher overall average values of the Z statistic and the lower values of probability. Representative paralog groups were included in this analysis. The identifier of the first sequence is shown in the table, and the remaining paralogs in each group were selected based on the criteria outlined in the methods section. The probability of rejecting the null hypothesis of strict-neutrality (dN = dS) in favor of the alternative hypothesis (dN>dS) is shown (in the probability column). Probability values less than 0.05 are considered significant at the 5% level. The Z statistic (dN - dS) is shown in the Stat column. dS and dN are the numbers of synonymous and nonsynonymous substitutions per site, respectively.
Results from substitution saturation analysis.
| Paralog group (accession of first sequence) | P(invariant) | Saturation test result | Comment |
| serine/threonine protein kinase (YP_821325) | 0.11516 | little | |
| carboxylesterase, type B (YP_821393) | 0.01800 | little | |
| acetolactate synthase, large subunit, biosynthetic type (YP_821479) | 0.03254 | little | |
| phage integrase family (YP_821919) | 0.16027 | little | |
| two component transcriptional regulator (YP_821972) | 0.00047 | substantial | |
| ABC transporter-related (YP_821380) | 0.00024 | substantial | |
| transcriptional repressor, CopY family (YP_821398) | 0.00059 | substantial | |
| anti-sigma factor antagonist (YP_821407) | 0.02566 | substantial | |
| RNA polymerase, sigma-24 subunit, ECF subfamily (YP_821437) | 0.01145 | substantial | |
| oxidoreductase domain protein (YP_821473) | 0.01711 | substantial | |
| TonB-dependent receptor, plug (YP_821493) | 0.00 | substantial | |
| CnaB-type protein (YP_821495) | 0.00 | substantial | |
| ASPIC/UnbV domain protein (YP_821513) | 0.00414 | substantial | |
| glycosyltransferase, family 2 (YP_821582) | 0.02855 | substantial | |
| NAD-dependent epimerase/dehydratase (YP_821583) | 0.00208 | substantial | |
| phage integrase family (YP_821644) | 0.00087 | substantial | |
| aldo/keto reductase (YP_821684) | 0.00 | substantial | |
| drug resistance transporter, EmrB/QacA family (YP_821403) | ND | ND | too few sequences |
| TonB-dependent receptor, plug (YP_821405) | ND | ND | too divergent |
| phage tail collar protein (YP_821449) | ND | ND | too few sequences |
| von Willebrand factor, type A (YP_821474) | ND | ND | too divergent |
| phage integrase family (YP_821920) | ND | ND | too few sequences |
| phage integrase family (YP_821921) | ND | ND | too few sequences |
| transposase, IS3/IS911 family (YP_821923) | ND | ND | too few sequences |
| integrase catalytic region (YP_821924) | ND | ND | too few sequences |
| integrase catalytic region (YP_823733) | ND | ND | too few sequences |
| transposase, IS3/IS911 family (YP_823734) | ND | ND | too few sequences |
“little” means that the test showed little substitution saturation in the group of sequences. “substantial” indicates that there was substantial substitution saturation. ND, not determined because there were either too few sequences to test, or the sequences were too divergent.
*indicates sequences that were too divergent to be useful for phylogenetic analyses.
Figure 2Maximum-likelihood tree of the Acidobacteria subdivisions 1 and 3 (indicated to the right of the group) based on the16S rRNA gene using sequences obtained from cultivated representatives and environmental clones.
Geothrix fermentans, Holophaga foetida and Acanthopleuribacter pedi of subdivision 8 were used as an outgroup (not shown). Strains for which the genome size has been determined are highlighted in bold typeface. Internal nodes support by a bootstrap value of >95% are indicated with a filled circle and >70% with an open circle. The scale bar indicates 0.10 changes per nucleotide.