| Literature DB >> 26774270 |
Veronika B Dubinkina1,2, Dmitry S Ischenko3,4, Vladimir I Ulyantsev5, Alexander V Tyakht6,7, Dmitry G Alexeev8,9.
Abstract
BACKGROUND: A rapidly increasing flow of genomic data requires the development of efficient methods for obtaining its compact representation. Feature extraction facilitates classification, clustering and model analysis for testing and refining biological hypotheses. "Shotgun" metagenome is an analytically challenging type of genomic data - containing sequences of all genes from the totality of a complex microbial community. Recently, researchers started to analyze metagenomes using reference-free methods based on the analysis of oligonucleotides (k-mers) frequency spectrum previously applied to isolated genomes. However, little is known about their correlation with the existing approaches for metagenomic feature extraction, as well as the limits of applicability. Here we evaluated a metagenomic pairwise dissimilarity measure based on short k-mer spectrum using the example of human gut microbiota, a biomedically significant object of study.Entities:
Mesh:
Year: 2016 PMID: 26774270 PMCID: PMC4715287 DOI: 10.1186/s12859-015-0875-7
Source DB: PubMed Journal: BMC Bioinformatics ISSN: 1471-2105 Impact factor: 3.169
Types of reference-based analyses used in the study
| Type of reference-based analysis | Method | Beta-diversity measure | Designation |
|---|---|---|---|
| Taxonomic profiling | Mapping to a reference catalog | Bray-Curtis | BC TAX (org), (genus) |
| of 353 genomes of intestinal microbiota [ | |||
| Whole-genome version of | WG UniFrac | ||
| weighted UniFrac | |||
| Quantitative profiling of unique | Bray-Curtis | BC MetaPhlAn | |
| clade-specific marker genes (MetaPhlAn) [ | |||
| Functional profiling | Mapping to Metahit 3,9M catalog of genes [ | Bray-Curtis | BC COG |
| grouped by COGs |
Fig. 1Variation of metagenomes using different dissimilarity measures. PCoA plots for different dissimilarity measures: a BC kmer, b BC COG, c WG UniFrac, d BC TAX (org), e BC MetaPhlAn (org). Three samples-outliers are marked with asterisks. f Heatmap of Spearman correlation coefficient between dissimilarity matrices obtained using different measures (the upper triangle of matrix represents coefficients for China, the lower - for HMP)
Fig. 2Comparison of pairwise difference measures obtained by k-mer and reference-based methods. For each plot, Y-axis represents k-mer distance, X-axis - distance by one of the reference-based methods. Distribution of dissimilarity measures is shown for a BC kmer for all reads and BC TAX (org); b BC kmer for all reads and BC COG; c BC kmer for reads mapped to the catalog of genomes and BC TAX (org); d BC kmer for reads mapped to the catalog of genes and BC COG
Fig. 3Analysis of samples-outliers. a Distribution of pairwise dissimilarity obtained using k-mer and taxonomic composition for HMP cohort. Different colors indicate groups of dissimilarities for: all HMP pairs, pairs-outliers - where at least one of the samples belonged to the phage-enriched group; CP-filtered pairs - extreme outliers (all pairs with k-mer distance > 0.5) after removal of k-mers from reads mapped to crAssphage (CP) genome; b Composition of sample SRS062427 according to the combined results from two analyses (mapping to genome catalog and DIAMOND + MEGAN)