| Literature DB >> 32791981 |
Yan Yan1,2, Long H Nguyen1,3,4, Eric A Franzosa1,2, Curtis Huttenhower5,6.
Abstract
The biological importance and varied metabolic capabilities of specific microbial strains have long been established in the scientific community. Strains have, in the past, been largely defined and characterized based on microbial isolates. However, the emergence of new technologies and techniques has enabled assessments of their ecology and phenotypes within microbial communities and the human microbiome. While it is now more obvious how pathogenic strain variants are detrimental to human health, the consequences of subtle genetic variation in the microbiome have only recently been exposed. Here, we review the operational definitions of strains (e.g., genetic and structural variants) as they can now be identified from microbial communities using different high-throughput, often culture-independent techniques. We summarize the distribution and diversity of strains across the human body and their emerging links to health maintenance, disease risk and progression, and biochemical responses to perturbations, such as diet or drugs. We list methods for identifying, quantifying, and tracking strains, utilizing high-throughput sequencing along with other molecular and "culturomics" technologies. Finally, we discuss implications of population studies in bridging experimental gaps and leading to a better understanding of the health effects of strains in the human microbiome.Entities:
Keywords: 16S; Amplicons; Metagenomics; Microbial communities; Microbial strains; Microbiome; Microbiome epidemiology
Mesh:
Substances:
Year: 2020 PMID: 32791981 PMCID: PMC7427293 DOI: 10.1186/s13073-020-00765-y
Source DB: PubMed Journal: Genome Med ISSN: 1756-994X Impact factor: 11.117
Terminology for microbial community strain analysis
Strikingly, there is no universal definition of what constitutes a microbial strain (or, for that matter, species) [ |
Fig. 1Strain identification approaches for microbial communities. This review summarizes a variety of high-throughput, often (but not always) culture-independent methods for strain identification within microbial communities. a Amplicon sequencing (e.g., 16S rRNA gene regions) can now be processed to near-strain-level fidelity, resulting in unique markers such as amplicon sequence variants (ASVs). b Shotgun metagenomic sequencing, either via assembly or using reference-based approaches, can identify strains broadly based on their single-nucleotide variants (SNVs) or structural variants (gene gain and loss events). c Whole-community transcriptomes can amplify the effects of gene gains or losses, or the effects of small variants that result in differential expression. d Single-cell methods can isolate individual microbial genomics directly from within communities, either via cell sorting and amplification, or through synthetic long-read/linked-read techniques. e High-throughput “culturomics” can be combined with rapid turnaround approaches such as peptide fingerprinting to strain-type isolates or microcolonies. f Relatedly, any combination of traditional isolation and high-throughput cultivation—batch, serial, or continuous—can be combined with growth, phenotypic, or molecular readouts for strain identification. g Finally, a variety of other approaches can be used with communities, ranging from flow- or high-content microscopic imaging to systems such as gnotobiotic animal model physiology and phenotyping
Tools for strain identification in community amplicon and shotgun metagenomic sequencing. Methods and brief summaries of their algorithms for detecting and quantifying strains (by various definitions) from 16S rRNA gene amplicon or shotgun metagenomic sequencing. These are currently the two most prevalent assays for culture-independent strain detection within microbial communities. Note that we have excluded other experimental protocols from this summary, including single-cell, long-read, and synthetic long-read sequencing, since they generally require more than application of a specific software pipeline. These alternatives, and non-sequencing-based approaches, are described in more detail in the text
| Method | Platform | Authors’ description | Reference |
|---|---|---|---|
| Oligotyping | 16S rRNA gene amplicon | “oligotyping... Focus [es] on the variable sites revealed by the entropy analysis to identify highly refined taxonomic units” | [ |
| Sub-OTU clustering | 16S rRNA gene amplicon | “we combine error-model-based denoising and systematic cross-sample comparisons to resolve the fine (sub-OTU) structure of moderate-to-high-abundance community members” | [ |
| MED | 16S rRNA gene amplicon | “MED uses information uncertainty among sequence reads to iteratively decompose a dataset until the maximum entropy criterion is satisfied for each final unit” | [ |
| DADA2 | 16S rRNA gene amplicon | “DADA2 implements a new quality-aware model of Illumina amplicon errors. Sample composition is inferred by dividing amplicon reads into partitions consistent with the error model.” | [ |
| Deblur | 16S rRNA gene amplicon | “Deblur … compares sequence-to-sequence Hamming distances within a sample to an upper-bound error profile combined with a greedy algorithm to obtain single-nucleotide resolution.” | [ |
| UNOISE2 | 16S rRNA gene amplicon | “UNOISE2... Cluster [s] the unique sequences in the reads. A cluster has a centroid sequence with higher abundance plus similar sequences having lower abundances.” | [ |
| PathoScope | Shotgun metagenomic | “PathoID … reassign [s] ambiguously aligned sequencing reads and accurately estimate [s] read proportions from each genome in the sample.” | [ |
| LSA | Shotgun metagenomic | “LSA... separates reads into biologically informed partitions and thereby enables assembly of individual genomes.” | [ |
| PanPhlAn | Shotgun metagenomic | “PanPhlAn identifies which genes are present or absent within different strains of a species, based on the entire gene set of the species’ pangenome.” | [ |
| MetaMLST | Shotgun metagenomic | “MetaMLST performs an in silico consensus sequence reconstruction of the allelic profile of the microbial strains in a metagenomics sample.” | [ |
| MIDAS | Shotgun metagenomic | “MIDAS … is a computational pipeline that quantifies bacterial species abundance and intra-species genomic variation from shotgun metagenomes.” | [ |
| ConStrains | Shotgun metagenomic | “ConStrains … exploits the polymorphism patterns in a set of universal bacterial and archaeal genes to infer strain-level structures in species populations.” | [ |
| StrainPhlAn | Shotgun metagenomic | “StrainPhlAn … is based on reconstructing consensus sequence variants within species-specific marker genes and using them to estimate strain-level phylogenies.” | [ |
| metaSNV | Shotgun metagenomic | “metaSNV … performs SNV calling for individual samples and across the whole data set, and generates various statistics for individual species” | [ |
| DESMAN | Shotgun metagenomic | “DESMAN identifies variants in core genes and uses co-occurrence across samples to link variants into haplotypes and abundance profiles.” | [ |
Fig. 2Microbial SNV, structural, and metatranscriptomic variants as features for genetic epidemiology in the human microbiome. Statistical approaches can link subspecies microbial features to human health phenotypes in several ways. a When microbial strains are identified using SNV genotypes (whether from genome bins, marker genes, core genes, etc.), any individual microbial SNV—or overall genotype—is typically of low prevalence and high variability. This means that it is extremely difficult to power significant associations with individual SNVs in reasonably sized human population studies. Instead, significant assortment of a host phenotype with strain phylogeny can be assessed, e.g., by PERMANOVA on per-species genetic distances [8] or by aggregating SNVs to genes or larger loci. b An extreme of this type of association test directly assesses the nonrandom assortment of genes’ presence or absence among microbial strain pangenomes in association with a phenotype of interest [66], since a gene loss (or gain) is essentially the “sum” of variants at every nucleotide within the gene. c Alternatively, even when no differences in genomic SNVs or structural variants are detectable at a study’s level of power, the transcriptional regulatory effects of these variants can be amplified, resulting in strain-specific differences in locus expression in association with a phenotype [156]