| Literature DB >> 34342541 |
Yaowen Chen1, Pu Liu1, Runyan Liu1, Shuofeng Hu1, Zhen He1, Guohua Dong1, Chao Feng1, Sijing An1, Xiaomin Ying1.
Abstract
Liver cirrhosis (LC) has been associated with gut microbes. However, the strain diversity of species and its association with LC have received little attention. Here, we constructed a computational framework to study the strain heterogeneity in the gut microbiome of patients with LC. Only Faecalibacterium prausnitzii shows different single-nucleotide polymorphism (SNP) patterns between the LC and healthy control (HC) groups. Strain diversity analysis discovered that although most F. prausnitzii genomes are more deficient in the LC group than in the HC group at the strain level, a subgroup of 19 F. prausnitzii strains showed no sensitivity to LC, which is inconsistent with the species-level result. The functional differences between this subgroup and other strains may involve short-chain fatty acid production and chlorine-related pathways. These findings demonstrate functional differences among F. prausnitzii subgroups, which extend current knowledge about strain heterogeneity and relationships between F. prausnitzii and LC at the strain level. IMPORTANCE Most metagenomic studies focus on microbes at the species level, thus ignoring the different effects of different strains of the same species on the host. In this study, we explored the different microbes at the strain level in the intestines of patients with liver cirrhosis and of healthy people. Previous studies have shown that the species Faecalibacterium prausnitzii has a lower abundance in patients with liver cirrhosis than in healthy people. However, our results found multiple F. prausnitzii strains that do not decrease in abundance in patients with liver cirrhosis. It is more sensitive to select the appropriate strains as indicators to distinguish between the disease and the control samples than to use the entire species as an indicator. We clustered multiple F. prausnitzii strains and discuss the functional differences of different clusters. Our findings suggest that more attention should be paid to metagenomic studies at the strain level.Entities:
Keywords: Faecalibacterium prausnitzii; gut microbiome; human metagenomics; liver cirrhosis; single-nucleotide polymorphisms; species heterogeneity; strain diversity; strain-level analysis; within-species variation
Year: 2021 PMID: 34342541 PMCID: PMC8407477 DOI: 10.1128/mSystems.00775-21
Source DB: PubMed Journal: mSystems ISSN: 2379-5077 Impact factor: 6.496
FIG 1Framework of strain diversity analysis of disease-related microbes. (A) Interpretive pipeline of our strain diversity analysis tool. Red dots represent mismatches against the reference, short straight lines represent reads, black reads were assigned to the genomes below them, gray reads were assigned to other genomes, and dashed lines connect the same reads. (B) Differences in single-nucleotide polymorphism (SNP) density between the healthy control and liver cirrhosis groups for 13 prevalent strains. (C) Performance of our strain diversity analysis tool with synthetic data. (Top) Correlations between actual coverages and estimated coverages. (Bottom left) Correlations between actual abundances and estimated abundances. (Bottom right) Correlations between actual depths and estimated depths.
FIG 2Heterogeneity of F. prausnitzii strains in the disease and healthy groups. (A) The distributions of estimated coverages, depths, and relative abundances of the 136 F. prausnitzii strains in real samples. (B) Estimated read coverage for the 136 F. prausnitzii strains. (C) Clustering of F. prausnitzii strains according to their prevalence in samples. (D) Estimated coverage distributions of strain clusters in the healthy control and liver cirrhosis groups (left) and in the healthy control and Crohn’s disease groups (right).
FIG 3Gene Ontology (GO) terms deficient and specific for cluster 4 (C4) strains. The blue square indicates that the corresponding GO term on the right is annotated in the cluster.
FIG 4Performances of machine learning models to distinguish disease states. (A) Performances of different combinations of strain-level data and machine learning models. “Combined” represents the combination of coverage, depth, and abundance. (B) Receiver operating characteristic (ROC) curves for support vector machine (SVM) models to classify samples using estimated coverages of GCA_001406615.2 in samples. (C) ROC curves for SVM models to classify samples using estimated coverages of GCA_902388275.1 in samples. (D) ROC curves for SVM models to classify samples using abundances of F. prausnitzii species from MetaPhlAn2 results.