| Literature DB >> 32612579 |
Jiuhong Dong1, Shuai Liu2,3, Yaran Zhang1,4, Yi Dai1, Qi Wu1.
Abstract
The comparison of metagenomes is crucial for studying the relationship between microbial communities and environmental factors. One recently published alignment-free whole metagenome comparison method based on k-mer frequencies, Libra, showed higher resolutions than the present fastest method, Mash, on whole metagenomic sequencing reads, but it did not perform as well on the assembled contigs. Here, we developed a new alignment-free tool, KmerFreqCalc, for the comparison of the whole metagenomic data, which first calculated the frequencies of both forward and reverse complementary sequences of k-mers like Mash and then computed the cosine distance between the samples based on k-mer frequency vectors like Libra. We applied KmerFreqCalc on the assembled contigs of the gut microbiomes of wild giant pandas and compared the results to Libra and Mash. The results indicated that KmerFreqCalc was able to detect the subtle difference between giant panda samples caused by seasonal diet change, showing better clustering than Libra and Mash. Therefore, KmerFreqCalc has high resolution and accuracy for assembled contigs, being very suitable for comparison of samples with low dissimilarity.Entities:
Keywords: cosine distance; gut microbiomes; k-mer frequencies; reverse complementary sequence; whole metagenome comparison; wild giant pandas
Year: 2020 PMID: 32612579 PMCID: PMC7309450 DOI: 10.3389/fmicb.2020.01061
Source DB: PubMed Journal: Front Microbiol ISSN: 1664-302X Impact factor: 5.640
FIGURE 1KmerFreqCalc workflow and the data description. (A) The overview of the KmerFreqCalc workflow: (1) calculating the k-mer frequencies of each samples, (2) computing the distance between paired samples. (B) Two published metagenomic datasets including samples from wild giant pandas. The figure about partition of three stages and four food categories during 1 year in Qinling Mountains was adapted from “Seasonal variation in nutrient utilization shapes gut microbiome structure and function in wild giant pandas” (Wu et al., 2017) with permission. Above the time line is the four food categories. Below the time line is the three stages, in which the leaf, shoot, and transition stages are shown in green, orange, and gray, respectively.
FIGURE 2Whole metagenome comparisons of samples in QIN dataset using Mash, Libra and KmerFreqCalc. In NJ tree, two clades of the shoot stage and leaf stage are highlighted with lightorange and lightgreen, respectively. The diet stages are indicated by green circles (Bfa leaf), orange regular triangles (Bfa shoot), orange inverted triangles (Fqi shoot) and gray squares (Transition). (A) NJ tree based on the distance calculated by Mash (k = 15). (B) NJ tree based on the distance calculated by Libra (k = 21). (C) NJ tree based on the distance calculated by KmerFreqCalc (k = 21). (D) PCoA analysis using the cosine distance calculated by KmerFreqCalc (k = 21). (E) Variations in different stages (Bfa leaf, Bfa shoot, Fqi shoot and transition) were determined by cosine distance calculated by KmerFreqCalc (k = 21). Mean values ± standard errors of the means are shown. ***p < 0.001 (Rank-sum test).
FIGURE 3Whole metagenome comparisons of samples in XXL dataset and all samples from giant pandas in two datasets using KmerFreqCalc (k = 21). (A) NJ tree of samples in XXL dataset. (B) PCoA analysis using the cosine distance calculated by KmerFreqCalc (k = 21). (C) NJ tree of all samples from giant pandas in two datasets, clearly indicating two groups of QIN dataset and XXL dataset highlighted with lightorange and lightgreen, respectively.