| Literature DB >> 25887914 |
Florian Plaza Onate1, Jean-Michel Batto2, Catherine Juste3,4, Jehane Fadlallah5,6, Cyrielle Fougeroux7, Doriane Gouas8,9, Nicolas Pons10, Sean Kennedy11, Florence Levenez12,13, Joel Dore14,15, S Dusko Ehrlich16,17, Guy Gorochov18,19,20, Martin Larsen21,22,23.
Abstract
BACKGROUND: The biological and clinical consequences of the tight interactions between host and microbiota are rapidly being unraveled by next generation sequencing technologies and sophisticated bioinformatics, also referred to as microbiota metagenomics. The recent success of metagenomics has created a demand to rapidly apply the technology to large case-control cohort studies and to studies of microbiota from various habitats, including habitats relatively poor in microbes. It is therefore of foremost importance to enable a robust and rapid quality assessment of metagenomic data from samples that challenge present technological limits (sample numbers and size). Here we demonstrate that the distribution of overlapping k-mers of metagenome sequence data predicts sequence quality as defined by gene distribution and efficiency of sequence mapping to a reference gene catalogue.Entities:
Mesh:
Year: 2015 PMID: 25887914 PMCID: PMC4373121 DOI: 10.1186/s12864-015-1406-7
Source DB: PubMed Journal: BMC Genomics ISSN: 1471-2164 Impact factor: 3.969
DNA quantity used for serial dilution library constructions
|
|
| |||||
|---|---|---|---|---|---|---|
|
|
|
|
|
|
|
|
| 10 | 34.9 | 1.00 | 6 | 30.7 | 1.00 | 6 |
| 9 | 4.67 | 0.41 | 7 | 7.17 | 0.35 | 7 |
| 8 | 2.68 | 0.07 | 8 | 2.63 | 0.06 | 8 |
| 7 | 0.358 | <0,04 | 9 | 0.356 | <0,04 | 9 |
| 6 | 0.296 | <0,03 | 10 | 0.228 | <0,02 | 10 |
1Genomic dsDNA extracted from indicated number of bacteria.
2Amount of sheared and size purified genomic DNA utilized for ligation with P1 and P2 adaptor oligonucleotides.
Figure 14-mer distribution analysis for complex microbiota metagenomes compared to individual bacterial genomes. A, Bar diagram of quantitative metagenomics of gut microbiota from two healthy volunteers, donor #1 (blue) and #2 (red), aggregated to express the frequency of a selected number of taxonomic classes from the Bacteroidetes and Firmicutes phylums. B, Line graph showing the 4-mer distribution of metagenomic sequences from gut microbiota of donor #1 and #2. A histogram depicting the 4-mer abundance distribution is plotted to the right of the line graph. Distribution entropy is indicated (normalized Shannon Entropy). C, Scatter plot visualizes the 4-mer distribution entropy for 28 bacterial genomes and two gut microbiota metagenomes. D, The 28 bacterial genomes are divided into 6 objective clusters by non-supervised agglomerative hierarchical cluster analysis of metagenomic 4-mer distributions based on Ward’s minimum variance method.
Figure 2Quantitative metagenomics of serially diluted gut microbiota. A, Scatter plot of gene frequencies derived from quantitative metagenomic profiles of undiluted gut microbiota on the x-axis versus colour coded 10-, 100-, 1000- and 10.000-fold diluted gut microbiota on the y-axis (samples derived from donor #1 gut microbiota). B, Categorical line graph depicts spearman rank correlation coefficients between gene frequencies from metagenomic analysis of undiluted gut microbiota versus gene frequencies of 10-, 100-, 1000- and 10.000-fold diluted gut microbiota from donor #1 (blue) and donor #2 (red). C, Scatter plot of gene frequencies of undiluted samples from the two unrelated donors #1 (x-axis) and #2 (y-axis) are depicted, and their spearman rank correlation is indicated as a dotted line in B, Genes, present in the reference gene catalogue, which are not detected in the samples are excluded from the analysis.
Figure 34-mer distribution analysis of raw metagenomic sequences of serially diluted gut microbiota. A, 4-mer abundance distribution (left panel) and individual frequency (right panel) of metagenomic sequences from colour coded dilution series metagenomics of gut microbiota from donor #1 (upper panel) and #2 (lower panel). B, Bar plot visualizes the normalized Shannon Entropy of 4-mer distribution for undiluted and 10-, 100-, 1000- and 10.000-fold diluted gut microbiota metagenomics from donor #1 (blue) and #2 (red). C, Scatter plots depict the correlation between 4-mer distributions of metagenomic sequences from undiluted gut microbiota (y-axis) and 4-mer distributions of metagenomic sequences from 10-, 100-, 1000- and 10.000-fold diluted gut microbiota (x-axis) for donor #1 (upper panel) and #2 (lower panel).
Figure 44-mer distribution of microbiota metagenomes correlates with gene mapping efficiency to a reference gene catalogue. A, Line graphs depict the frequency of gene mapping to a reference gene catalogue as a function of the normalized Shannon Entropy of 4-mer distributions for undiluted and 10-, 100-, 1000- and 10.000-fold diluted gut microbiota metagenomics from donor #1 (blue) and #2 (red). B, Scatter plot illustrates the association between normalized Shannon Entropy of 4-mer distributions and the frequency of gene mapping to a reference gene catalogue for 52 gut microbiota metagenomic profiles stratified according to small (red dots, <1010 bacteria) and large (black dots, >1010 bacteria) sample size. Spearman rank correlation statistics are indicated.