| Literature DB >> 26904016 |
Daniel R Mende1, Frank O Aylward1, John M Eppley1, Torben N Nielsen1, Edward F DeLong1.
Abstract
Assembling complete or near complete genomes from complex microbial communities remains a significant challenge in metagenomic studies. Recent developments in single cell amplified genomes (SAGs) have enabled the sequencing of individual draft genomes representative of uncultivated microbial populations. SAGs suffer from incomplete and uneven coverage due to artifacts that arise from multiple displacement amplification techniques. Conversely, metagenomic sequence data does not suffer from the same biases as SAGs, and significant improvements have been realized in the recovery of draft genomes from metagenomes. Nevertheless, the inherent genomic complexity of many microbial communities often obfuscates facile generation of population genome assemblies from metagenomic data. Here we describe a new method for metagenomic-guided SAG assembly that leverages the advantages of both methods and significantly improves the completeness of initial SAGs assemblies. We demonstrate that SAG assemblies of two cosmopolitan marine lineages-Marine Group 1 Thaumarchaeota and SAR324 clade bacterioplankton-were substantially improved using this approach. Moreover, the improved assemblies strengthened biological inferences. For example, the improved SAR324 clade genome assembly revealed the presence of many genes in phenylalanine catabolism and flagellar assembly that were absent in the original SAG.Entities:
Keywords: SAGs; genome assembly; metagenomics; microbial oceanography; single-cell genomics
Year: 2016 PMID: 26904016 PMCID: PMC4749706 DOI: 10.3389/fmicb.2016.00143
Source DB: PubMed Journal: Front Microbiol ISSN: 1664-302X Impact factor: 5.640
Comparison of assembly statistics of the original single-cell genomics (SAGs) and iSAGs presented here for SAR324 and MGI Thaumarchaeota.
| SAR324 | MGI Thaumarchaeota | |||
|---|---|---|---|---|
| Original SAG | iSAG | Original SAG | iSAG | |
| Completeness | 43.67 | 65.78 | 96.88 | 96.88 |
| Genome size (bp) | 2,264,488 | 2,379,063 | 1,104,470 | 1,093,884 |
| #Contigs | 672 | 13 | 32 | 4 |
| N50 (contigs) | 22,317 | 191,983 | 79,020 | 313,273 |
| Longest contig (bp) | 94,006 | 354,247 | 217,386 | 319,413 |
| GC | 41.49 | 42.59 | 35.66 | 35.61 |
| GC std (contigs > 1 kbp) | 3.98 | 0.76 | 2.1 | 0.75 |
| Coding density | 87.03 | 89.67 | 92.87 | 93.11 |
| #Predicted genes | 2,533 | 2,137 | 1,356 | 1,298 |
| #Complete genes | 1,827 | 2,120 | 1,307 | 1,290 |
| #Missing marker genes | 84 | 50 | 4 | 4 |
| #Marker genes in single copy | 104 | 141 | 138 | 142 |
| #Marker genes found multiple times | 3 | 0 | 4 | 0 |
| Contamination (checkM) (%) | 0.64 | 0 | 2.07 | 0 |
| Contamination (ProDeGe): (% Contigs) | 89.88 | 0 | 40.63 | 0 |
Comparison of KEGG Orthology protein annotations of the original SAR324 SAG and the iSAG presented in this paper.
| Pathway | SAR324 SAG | SAR324 iSAG | New genes identified |
|---|---|---|---|
| Biosynthesis of amino acids | 76 | 92 | 16 |
| Phenylalanine metabolism | 7 | 21 | 14 |
| Flagellar assembly | 13 | 23 | 10 |
| Carbon metabolism | 59 | 68 | 9 |
| Two-component system | 23 | 32 | 9 |
| Oxidative phosphorylation | 21 | 29 | 8 |
| Purine metabolism | 40 | 48 | 8 |
| Cysteine and methionine metabolism | 15 | 23 | 8 |
| Ribosome | 31 | 39 | 8 |
| Glyoxylate and dicarboxylate metabolism | 14 | 21 | 7 |
| Carbon fixation pathways in prokaryotes | 16 | 23 | 7 |
| Aminoacyl-tRNA biosynthesis | 16 | 23 | 7 |
| 2-Oxocarboxylic acid metabolism | 12 | 18 | 6 |
| Pyrimidine metabolism | 26 | 32 | 6 |
| Phenylalanine, tyrosine and tryptophan biosynthesis | 15 | 21 | 6 |
| Folate biosynthesis | 4 | 10 | 6 |
| Glycerophospholipid metabolism | 6 | 11 | 5 |
| Glycine, serine and threonine metabolism | 24 | 29 | 5 |
| Ubiquinone and terpenoid-quinone biosynthesis | 4 | 9 | 5 |
| Propanoate metabolism | 9 | 13 | 4 |
| Protein-coding genes with KO annotations | 964 | 1,173 | 209 |