| Literature DB >> 19054116 |
Paul Wilmes1, Sheri L Simmons, Vincent J Denef, Jillian F Banfield.
Abstract
Community genomic data have revealed multiple levels of variation between and within microbial consortia. This variation includes large-scale differences in gene content between ecosystems as well as within-population sequence heterogeneity. In the present review, we focus specifically on how fine-scale variation within microbial and viral populations is apparent from community genomic data. A major unresolved question is how much of the observed variation is due to neutral vs. adaptive processes. Limited experimental data hint that some of this fine-scale variation may be in part functionally relevant, whereas sequence-based and modeling analyses suggest that much of it may be neutral. While methods for interpreting population genomic data are still in their infancy, we discuss current interpretations of existing datasets in the light of evolutionary processes and models. Finally, we highlight the importance of virus-host dynamics in generating and shaping within-population diversity.Entities:
Mesh:
Year: 2008 PMID: 19054116 PMCID: PMC2704941 DOI: 10.1111/j.1574-6976.2008.00144.x
Source DB: PubMed Journal: FEMS Microbiol Rev ISSN: 0168-6445 Impact factor: 16.408
Fig. 1The microbial orchestra analogy showing relatedness of individual community members in acid mine drainage biofilms with corresponding instrumental groups.
Overview of random microbial community sequencing studies in chronological order
| Microbiome | Organism(s) of interest | Number of bases sequenced | Sequencing technique | Reference |
|---|---|---|---|---|
| Seawater | Viruses | NA | Small-insert library, Sanger sequencing | |
| Drinking water network | Bacteria | >2 Mbp | Small-insert library, Sanger sequencing | |
| Human feces | Viruses | 371 kbp | Small-insert library, Sanger sequencing | |
| Acid mine drainage | Archaea and bacteria | 76.2 Mbp | Small-insert library, Sanger sequencing | |
| Sargasso Sea | Archaea and bacteria | Small-insert library, Sanger sequencing | ||
| Near-shore marine sediment | Viruses | NA | Small-insert library, Sanger sequencing | |
| Whale falls Farm soil | Archaea, bacteria and eukarya Archaea, bacteria and eukarya | 75 Mbp 100 Mbp | Small-insert library, Sanger sequencing Small-insert library, Sanger sequencing | |
| Equine feces | Viruses | 178 kbp | Small-insert library, Sanger sequencing | |
| Cave bear fossil | Cave bear | Small-insert library, Sanger sequencing | ||
| Human feces | Viruses | NA | Small-insert library, Sanger sequencing | |
| Mine water | Archaea, bacteria, eukarya, and viruses | Pyrosequencing | ||
| Ocean | Viruses | Pyrosequencing | ||
| Anammox sludge bioreactor | ‘ | NA | Medium- and large-insert library, Sanger sequencing | |
| North Pacific Subtropical Gyre | Archaea, bacteria, and viruses | 64 Mbp | Large-insert library, Sanger sequencing | |
| Mammoth fossil | Mammoth | 28 Mbp | Pyrosequencing | |
| Human distal gut | Archaea and bacteria | Small-insert library, Sanger sequencing | ||
| Seawater | RNA viruses | NA | Small-insert library, Sanger sequencing | |
| Phosphate removal sludges | ‘ | Small- and medium-insert library, Sanger sequencing | ||
| NA | NA | |||
| Gamma- and deltaproteobacterial endosymbionts | 204 Mbp | Small- and large-insert library, Sanger sequencing | ||
| Neanderthal fossil | NA | Pyrosequencing | ||
| Mouse gut | Archaea, bacteria, eukarya, and viruses | 199.5 Mbp | Small-insert libraries, Sanger sequencing; pyrosequencing | |
| Solar saltern | Large-insert library, Sanger sequencing and pyrosequencing | |||
| Acid mine drainage | Archaea and bacteria | Small-insert library, Sanger sequencing | ||
| Ocean | Archaea and bacteria | 6.3 Gbp | Small- and large-insert libraries, Sanger sequencing | |
| Mediterranean Sea | Archaea and bacteria | 7.184 Mbp | Large-insert library, Sanger sequencing | |
| Honey bee | Archaea, bacteria, eukarya, and viruses | NA | Pyrosequencing | |
| Termite hindgut | Bacteria | 71 Mbp | Small- and medium-insert library, Sanger sequencing | |
| Human gut | Archaea and bacteria | 727 Mbp | Small-insert library, Sanger sequencing | |
| Coral | Archaea, bacteria, eukarya, and viruses | 32 Mbp | Pyrosequencing | |
| Soil | Viruses | NA | Small-insert library, Sanger sequencing | |
| Coastal seawater | Bacterioplankton | Pyrosequencing | ||
| Indoor air | Archaea, bacteria, eukarya, and viruses | Small-insert library, Sanger sequencing | ||
| Ocean | Viruses | NA | Small-insert libraries, Sanger sequencing | |
| Subterranean, hypersaline ponds, marine, freshwater, coral, microbialites, fish, terrestrial animals, mosquito | Archaea, bacteria, eukarya, and viruses | Pyrosequencing | ||
| ‘ | Small-insert library, Sanger sequencing | |||
| Coral atolls | Archaea, bacteria, eukarya, and viruses | NA | Pyrosequencing | |
| Activated sludge | ‘ | 1.2 Gbp | Large-insert library, Sanger sequencing | |
| North Pacific subtropical gyre | Archaea, bacteria and viruses | 45 Mbp (DNA) and 14 Mbp (cDNA) | Pyrosequencing of DNA and cDNA | |
| Yellowstone hot springs | Viruses | 30 Mbp | Small-insert library, Sanger sequencing | |
| Peru Margin subseafloor sediments | Archaea and bacteria | 61.9 Mbp | MDA followed by pyrosequencing | |
| Controlled coastal ocean mesocosm | Archaea, bacteria, and viruses | 323 Mbp | Pyrosequencing of DNA and MDA-amplified cDNA |
Details not available.
MDA, multiple displacement amplification
Single nucleotide polymorphism (SNP) densities
| Organism | SNP density (%) | % of SNPs that are replicated | Average coverage | Environment | Reference |
|---|---|---|---|---|---|
| 0.0006 (US)/0.002 (OZ) | Replicated only | 9.2–17.5 × (US)/5.36−7.68 × (OZ) | Sludge bioreactor | ||
| 0.004 | Replicated only | 25 × | Acid mine drainage | ||
| 0.007 | NS | 22 × | Anammox bioreactor | ||
| 0.01 | Replicated only | 3.3 × | Gutless marine worm | ||
| 0.04 | Replicated only | 5.2 × | Gutless marine worm | ||
| 0.08 | Replicated only | 8.4 × | Gutless marine worm | ||
| 0.09 | 38 | 20 × | Acid mine drainage | ||
| 0.1 | Replicated only | 3 × | Gutless marine worm | ||
| ‘Iplasma’ | 0.27 | 12 | 20 × | Acid mine drainage | Unpublished data |
| 0.29 | NS | 18.6 × | |||
| ‘Eplasma’ | 0.53 | 42 | 10 × | Acid mine drainage | Unpublished data |
| 2.2 | NA | 10 × | Acid mine drainage | ||
| 3 | NA | 4.5 × | Acid mine drainage | ||
| Archaeal virus contig from metagenomic library | 7.05 | NS | 11 × | Yellowstone hot springs | |
| Archaeal virus AMDV2 | 27 | 54 | 17.5 × | Acid mine drainage |
Noted in entry whether all polymorphisms or replicated polymorphisms only were counted.
Not specified whether all polymorphisms or just replicated polymorphisms were counted.
All bases with phrap sequence quality scores <25 were ignored in the polymorphism calculation.
Calculated only for a subset of genes.
Partial assembly.
Details not available.
Fig. 2Examples of genome-wide fine-scale analysis of sequence variation in Leptospirillum group II 5-way CG. (a) Part of the Leptospirillum group II 5-way CG genome assembled from population genomic data. The first inner ring shows a moving average of SNP density. Dark red indicates local SNP density of >0.5%, while pink indicates <0.5%. The second inner ring shows a moving average of polymorphism frequency (scale 0–0.7%). Light-blue highlights indicate the location of substrains within the 5-way CG population (>99% sequence similarity). Purple highlights indicate the location of deeply sampled reads of more divergent strains incorporated into the population (c. 94% sequence similarity). (b) Closeup of the data used to generate the figure in (a). A screenshot of a contig from the program strainer is shown, with individual reads shown as light-gray blocks. Strains defined by shared polymorphisms are shown in distinct colors, with the main strain in orange. The vertical dashed lines indicate regions within the main strain not overlapped by any substrain. (c) Overview of different sources of genomic variation over a 500-kb segment. In the outer ring, tRNAs are indicated with orange, transposons with red, and integrases with ‘Int.’ The location and length of strain variant paths (see main text) are shown in green in the first inner ring, and the locations of recombinant reads are shown in the second inner ring. The innermost ring shows nonsynonymous SNPs in blue, synonymous SNPs in purple, intergenic SNPs in red, and SNPs resulting in frameshifts in orange. The image was generated with circos (M. Krzywinski, http://mkweb.bcgsc.ca/circos/). (d) Gene content variation from an assembly point of view. Alternate genome paths are shown in the top. The uppermost path shows the main genome path, and the bottom path shows the insertion of several genes (colored green, orange, and red). The lower part shows individual sequencing reads, with inserted regions indicated by dark blue. Mate-paired reads on the top line are separated by the presence of the insert.
Fig. 3Experimental evidence of the role of the ‘flexible’ genome content. (a) Environmental transcriptomic data from Prochlorococcus MIT3901 from a Sargasso Sea sample (Frias-Lopez ). The cDNA levels, normalized using the levels of DNA found in the same sample, are shown for all identified genes of this particular strain (‘core genome’ genes present in all Prochlorococcus genomes: blue; ‘flexible genome’ genes present in at least one but not all genomes: pink). Hypervariable regions are highlighted with gray bars. While many ‘flexible’ genes are expressed, genes located in the hypervariable regions are underrepresented. (© 2008 The National Academy of Sciences of the USA). (b) Heterogeneous protein expression within activated sludge dominated by ‘Candidatus Accumulibacter phosphatis’ (Accumulibacter phosphatis; Wilmes , b). Orthologous proteins (90% amino acid identity; represented by individual blocks) from the US Phrap assembly (García-Martín ) aligned against the A. phosphatis composite genome that serves as the backbone. Unique spectral counts (identified peptides specific to a certain protein variant) heat-mapped onto the alignment (gray blocks indicate absence of orthologs; black blocks indicate no unique peptide spectra identified). (c) Summary of the expression data of the Leptospirillum group II population from 27 samples from the Richmond Mine (Iron Mountain, CA) as determined by proteomics (V.J. Denef et al., unpublished data). The fraction of proteins never identified (0), identified in 1–3 samples (3), 4–6 samples (6), etc. are shown. Clearly, the unique genes [as determined from comparative genomic analysis of the two available genomes, UBA-type (blue) and 5-way CG type (red)] are expressed in significantly fewer samples than the core genome complement (green).
Fig. 4Continuum of variation with box text.
Fig. 5The dynamic interplay between viruses and their hosts (Andersson & Banfield, 2008). (a) Population structure of the AMDV2 virus population, showing extensive recombination between closely related sequence variants. Putative genes are displayed on top. Pattern of nucleotide polymorphisms (SNPs, colored bars) in a subset of sequencing reads within a region of the DNA polymerase gene. The region was divided into equally spaced blocks (A–L), and the alleles were numbered based on SNP patterns to the left of the label. In the summary table below, colors are assigned to alleles based on the read in which the allele first appears. (b) Schematic representation of the CRISPR locus of the corresponding host population, sampled 25 times, and characterized by an extensive diversity of spacer sequences (colored bars) in between the repeats (black bars). CRISPR loci grow unidirectionally, with a new spacer being introduced to the left of the neighboring CRISPR-associated protein machinery (cas genes). Because every cell is exposed to different viruses, the CRISPR spacer content, which reflects the natural history of the cell and its ancestry, might be unique for every single cell in the population.