| Literature DB >> 22170421 |
Chris L Dupont1, Douglas B Rusch, Shibu Yooseph, Mary-Jane Lombardo, R Alexander Richter, Ruben Valas, Mark Novotny, Joyclyn Yee-Greenbaum, Jeremy D Selengut, Dan H Haft, Aaron L Halpern, Roger S Lasken, Kenneth Nealson, Robert Friedman, J Craig Venter.
Abstract
Bacteria in the 16S rRNA clade SAR86 are among the most abundant uncultivated constituents of microbial assemblages in the surface ocean for which little genomic information is currently available. Bioinformatic techniques were used to assemble two nearly complete genomes from marine metagenomes and single-cell sequencing provided two more partial genomes. Recruitment of metagenomic data shows that these SAR86 genomes substantially increase our knowledge of non-photosynthetic bacteria in the surface ocean. Phylogenomic analyses establish SAR86 as a basal and divergent lineage of γ-proteobacteria, and the individual genomes display a temperature-dependent distribution. Modestly sized at 1.25-1.7 Mbp, the SAR86 genomes lack several pathways for amino-acid and vitamin synthesis as well as sulfate reduction, trends commonly observed in other abundant marine microbes. SAR86 appears to be an aerobic chemoheterotroph with the potential for proteorhodopsin-based ATP generation, though the apparent lack of a retinal biosynthesis pathway may require it to scavenge exogenously-derived pigments to utilize proteorhodopsin. The genomes contain an expanded capacity for the degradation of lipids and carbohydrates acquired using a wealth of tonB-dependent outer membrane receptors. Like the abundant planktonic marine bacterial clade SAR11, SAR86 exhibits metabolic streamlining, but also a distinct carbon compound specialization, possibly avoiding competition.Entities:
Mesh:
Substances:
Year: 2011 PMID: 22170421 PMCID: PMC3358033 DOI: 10.1038/ismej.2011.189
Source DB: PubMed Journal: ISME J ISSN: 1751-7362 Impact factor: 10.302
Genomic characteristics of SAR11 (Pelagibacteraceae) and SAR86
| Size (Mbp) | 1.309 | 1.457 | 1.328 | 1.25 | 1.7 | 0.75 | 0.925 |
| ORFs | 1389 | 1478 | 1423 | 1316 | 1712 | 859 | 1111 |
| %GC | 29.7 | 29 | 29 | 32.8 | 32.6 | 31.2 | 30.1 |
| %Complete (core gene count) | 97.2 (104) | 98.1 (105) | 97.2 (104) | 92.5 (99) | 93.4 (100) | 54.2 (58) | 48.6 (52) |
| B6 | No | No | No | No | No | ||
| B12 | No | No | No | No | Yes | ||
| Thiamine | No | No | No | No | No | ||
| Carotene/retinal/retinol | Yes | Yes | Yes | No | No | ||
| Folate | Yes | Yes | Yes | Yes | Yes | ||
| Biotin | No | No | No | No | No | ||
| Pantothenate | No | No | No | No | No | ||
| Glycolysis (EMP) | ED | No | ED | EMP | EMP | ||
| Pentose phosphate | No | No | No | Transitive | Full | ||
| Lipases | 0 | 0 | 0 | 9 | 7 | 1 | 1 |
| Acyl CoA synthases | 2 | 1 | 2 | 2 | 2 | 6 | 1 |
| Acyl CoA dehydrogenase | 1 | 1 | 1 | 12 | 13 | 8 | 8 |
| Enoyl CoA hydratase | 1 | 0 | 1 | 6 | 9 | 5 | 6 |
| Ehhadh | 1 | 2 | 1 | 1 | 1 | 0 | 1 |
| Ketoacyl-CoA thiolase | 0 | 1 | 0 | 2 | 2 | 1 | 4 |
| ABC transporters: import (total) | 19 (24) | 19 (24) | 19 (24) | 2 (9) | 2 (14) | 3 | 3 |
| TonB receptors | 0 | 0 | 0 | 19 | 33 | 19 | 13 |
| Beta lactamases | 2 | 3 | 2 | 5 | 8 | 2 | 6 |
| Nitropropane dioxygenases | 0 | 0 | 0 | 2 | 2 | 2 | 1 |
| Nitroreductases | 0 | 0 | 0 | 2 | 2 | 0 | 7 |
| Macrolide efflux | No | No | No | Yes | Yes | Yes | Yes |
Abbreviations: ED, Entner-Doudoroff; Ehhadh, enoyl CoA-hydratase/3-hydroxyacyl CoA dehydrogenase; EMP, Emden-Meyerhof-Parnas; ORF, open reading frame; %GC, percent the genome that is guanine-cytosine.
Incomplete genomes not used for pathway analysis.
Figure 1Recruitment of Global Ocean Sampling metagenomic data to the SAR86 assembled genomes. In the top panels, only the metagenomic reads that aligned best to the SAR86A or SAR86B reference (to the exclusion of all complete and draft microbial and viral genomes available at NCBI and each other) are shown as a dot whose color is determined by metagenomic sample from which it was identified. The pattern of recruitment of reads at greater than 90% across the entire genome (note that artificial gaps have been introduced between any two scaffolds) is largely consistent over the length of all scaffolds, and is qualitatively similar to recruitment plots seen for complete genomes (Rusch ). In the middle panels, mate-pairing information is presented to indicate the wealth of information supporting the orientation of the assembled scaffolds. In these plots lines have been drawn connecting two good mated reads, where good is defined as mates in the correct orientation and that are separated by no more and no less than two standard deviations from the expected insert distance. The ends of the lines indicate the percent identity of each mate to the reference genome. The bottom panels indicate the number of good mate pairs found for each base pair of the genome assembly (coverage). Note that the gaps between contigs will reduce the number of good mate pairs, driving down coverage for the SAR86B genome. Based on our understanding of the Celera Assembler, we interpret these plots as indicating that these assemblies are the best-supported layout of the contigs given the available information. As these assemblies represent data from many different cells, it is possible that there are other valid layouts that are less prevalent in the data.
Figure 2Phylogeny and emergent ecotypes of SAR86. (a) 16S rRNA RAxML phylogeny of the four SAR86 genome assemblies presented here, several BACs and fosmids, and the closest marine microbial genomes. Node values are bootstrap support for 100 iterations. (b) A maximum likelihood phylogeny of seven concatenated proteins found in nearly all γ-proteobacterial genomes and the four SAR86 genomes. Here, node values are approximate likelihood ratio test support for each branch point (values below 0.5 were removed). These values provide the probability (from 0–1) that such a branch point exists in the real tree.
The most abundant genomes in the GOS data set
| 163 465 | 192 515 | 18 | |
| 119 804 | 145 096 | 21 | |
| 48 213 | 70 083 | 45 | |
| 46 549 | 93 811 | 102 | |
| 28 811 | 1 128 240 | 3816 | |
| SAR86A | 27 391 | 200 708 | 633 |
| 26 071 | 35 269 | 35 | |
| 22 236 | 38 2680 | 1621 | |
| 20 901 | 373 189 | 1686 | |
| 17 732 | 29 402 | 66 | |
| 9033 | 36 462 | 304 | |
| SAR86B | 3579 | 84 868 | 2271 |
| Recruited by top 12 genomes | 5.30% | 27.90% | |
| Recruited by all the genomes ( | 5.60% | 35.20% | |
| Recruited by SAR86 | 0.31% | 2.80% | |
Abbreviation: GOS, Global Ocean Sampling.
Fragment recruitment was used to determine which sequenced microbial genomes recruit the most GOS metagenomic data. The dataset includes 10 073 000 Sanger reads and all available genomes at NCBI were used for the analysis. A best BLAST hit approach was used, that is, counts reflect only the best matches.
Figure 3Emergent ecotypes of SAR86: recruitment of metagenomic reads at 90% nucleotide identity to each SAR86 assembly is shown for 73 GOS metagenomes. Also shown is a trace of the seawater temperature measured at the time of sampling and the geographic region. The complete dataset is available in Supplementary Table S2.
Figure 4Metabolism of SAR86. A simplified and non-comprehensive schematic showing the salient metabolic features of SAR86 discussed in the text and determined with manual curations. Metabolic pathways where the genes are missing from SAR11 genomes are shown in red. DMSP, dimethylsulfoniopropionate; GBT, glycine betaine; GPX, glutathione peroxidase; GSH, glutathione.
Figure 5SAR86 genome and population diversity of carbon assimilation. (a) Alignments of orthologs regions of the SAR86A and B genomes are shown to scale. Pink lines join the orthologs from each genome. The small green ‘genes' and connecting green lines indicate the position of the non-coding RNA found in metatranscriptomic datasets. An alignment of this small RNA is shown in Supplementary Figure S5. (b) The abundances of the genes found in both genomes (glycolysis or glucose uptake) or just the SAR86B genome across 73 GOS metagenomes. The shared genes are clearly conserved, yet many genomes in natural populations lack the oxidative arm of the pentose phosphate pathway or the genes involved in pyruvate metabolism found in only SAR86B. The units on the axes are the number of reciprocal best BLAST hits to either 107 core genes (x axis) or the genes shown in the left panel (y axis), followed by normalization for the number of genes in that category.