| Literature DB >> 28970823 |
Leandro Benevides1,2, Sriti Burman3, Rebeca Martin2, Véronique Robert2, Muriel Thomas2, Sylvie Miquel2,4, Florian Chain2, Harry Sokol2,5,6, Luis G Bermudez-Humaran2, Mark Morrison3, Philippe Langella2, Vasco A Azevedo1, Jean-Marc Chatel2, Siomar Soares7.
Abstract
Faecalibacterium prausnitzii is a commensal bacterium, ubiquitous in the gastrointestinal tracts of animals and humans. This species is a functionally important member of the microbiota and studies suggest it has an impact on the physiology and health of the host. F. prausnitzii is the only identified species in the genus Faecalibacterium, but a recent study clustered strains of this species in two different phylogroups. Here, we propose the existence of distinct species in this genus through the use of comparative genomics. Briefly, we performed analyses of 16S rRNA gene phylogeny, phylogenomics, whole genome Multi-Locus Sequence Typing (wgMLST), Average Nucleotide Identity (ANI), gene synteny, and pangenome to better elucidate the phylogenetic relationships among strains of Faecalibacterium. For this, we used 12 newly sequenced, assembled, and curated genomes of F. prausnitzii, which were isolated from feces of healthy volunteers from France and Australia, and combined these with published data from 5 strains downloaded from public databases. The phylogenetic analysis of the 16S rRNA sequences, together with the wgMLST profiles and a phylogenomic tree based on comparisons of genome similarity, all supported the clustering of Faecalibacterium strains in different genospecies. Additionally, the global analysis of gene synteny among all strains showed a highly fragmented profile, whereas the intra-cluster analyses revealed larger and more conserved collinear blocks. Finally, ANI analysis substantiated the presence of three distinct clusters-A, B, and C-composed of five, four, and four strains, respectively. The pangenome analysis of each cluster corroborated the classification of these clusters into three distinct species, each containing less variability than that found within the global pangenome of all strains. Here, we propose that comparison of pangenome subsets and their associated α values may be used as an alternative approach, together with ANI, in the in silico classification of new species. Altogether, our results provide evidence not only for the reconsideration of the phylogenetic and genomic relatedness among strains currently assigned to F. prausnitzii, but also the need for lineage (strain-based) differentiation of this taxon to better define how specific members might be associated with positive or negative host interactions.Entities:
Keywords: 16S rRNA gene phylogeny; Average Nucleotide Identity; Faecalibacterium prausnitzii; gene synteny; genome sequencing; new species; pangenome; phylogenomic analysis
Year: 2017 PMID: 28970823 PMCID: PMC5609107 DOI: 10.3389/fmicb.2017.01790
Source DB: PubMed Journal: Front Microbiol ISSN: 1664-302X Impact factor: 5.640
Genomic features of F. prausnitzii genomes.
| 411485.10 | United Kingdom | 25 | 3,125,761 | 56.3 | 2,776 | ||
| 657322.3 | United Kingdom | 1 | 3,214,418 | 54.81 | 3,052 | ||
| 718252.3 | United Kingdom | 1 | 3,321,367 | 55.57 | 3,232 | ||
| 748224.3 | USA | 139 | 2,907,000 | 56.27 | 2,783 | ||
| 853.123 | Australia | 85 | 3,019,317 | 57.36 | 3,201 | ||
| 853.124 | Australia | 63 | 2,879,169 | 56.82 | 2,933 | ||
| 853.62 | France | 48 | 3,043,568 | 55.7 | 3,206 | ||
| 853.63 | France | 78 | 2,822,838 | 58.11 | 2,825 | ||
| 853.64 | France | 106 | 2,914,466 | 55.83 | 3,071 | ||
| 853.65 | France | 22 | 3,080,452 | 56.2 | 3,223 | ||
| 853.66 | France | 71 | 2,808,526 | 55.98 | 2,907 | ||
| 853.67 | France | 244 | 3,422,520 | 54.88 | 3,611 | ||
| 853.68 | France | 83 | 3,275,249 | 55.9 | 3,479 | ||
| 853.69 | France | 38 | 3,088,985 | 56.26 | 3,249 | ||
| 853.70 | France | 37 | 3,006,602 | 57.51 | 3,077 | ||
| 853.71 | France | 36 | 2,915,240 | 56.37 | 3,019 | ||
| 853.73 | France | 1 | 3,110,044 | 56.33 | 3,231 |
Figure 1Phylogenetic analysis based on 16S rRNA gene sequences. Evolutionary history was inferred using the maximum-likelihood method based on the Kimura 2-parameter model (Kimura, 1980). The topology of the tree with the highest log likelihood (−3,562.92) is shown. The percentage of trees in which the associated taxa clustered together is shown next to the branches. A discrete Gamma distribution was used to model evolutionary rate differences among sites [5 categories (+G, parameter = 0.1122)]. The tree is drawn to scale, with branch lengths measured as the number of substitutions per site. The analysis involved 76 nucleotide sequences. All positions containing gaps and missing data were eliminated. The bootstrap analysis was performed with 1,000 replicates. Evolutionary analyses were conducted in MEGA7 (Kumar et al., 2016). Accession numbers of 16S rRNA sequences are given in parentheses. Filled circles indicate the strains newly sequenced for this study and open circles indicate the strains retrieved from PATRIC for genomic analysis.
Figure 2(A) Heatmap and (B) distance-matrix-based phylogenetic network of F. prausnitzii. The numbers in the heatmap show the percentage of similarity between genomes; the colors vary from red (low similarity) to green (high similarity). The network was constructed using SplitsTree software with NeighborNet and equal angle methods, based on a distance matrix from Gegenees software.
Figure 3Dendrogram constructed with wgMLST profiles for 17 F. prausnitzii genomes. The PGAdb profile from the genomes was used to construct a wgMLST tree using the Build_wgMLSTtree module (Liu et al., 2016). Bootstrap values are shown next to the nodes. The dendrogram was constructed with the UPGMA clustering algorithm.
Average nucleotide identity.
| CNCM_I_4546 | 100 | 97.37 | 95.03 | 97.33 | 97.36 | 86.23 | 86.76 | 86.85 | 86.02 | 86.47 | 87.6 | 86.88 | 87.44 | 85.48 | 86.57 | 86.81 | 85.72 |
| CNCM_I_4573 | 97.37 | 100 | 95.02 | 97.13 | 97.19 | 86.79 | 86.2 | 85.99 | 86.66 | 87.11 | 86.86 | 86.88 | 87.21 | 86.01 | 86.24 | 88.42 | 84.64 |
| CNCM_I_4644 | 95.03 | 95.02 | 100 | 95.09 | 94.99 | 86.28 | 85.88 | 85.81 | 85.6 | 85.81 | 86.44 | 86.34 | 86.57 | 85.62 | 85.91 | 86.48 | 84.87 |
| M21-2 | 97.33 | 97.13 | 95.09 | 100 | 97.36 | 86.75 | 86.03 | 85.9 | 85.78 | 86.48 | 86.8 | 86.92 | 87.35 | 85.58 | 86.52 | 86.79 | 85.43 |
| SL3-3 | 97.36 | 97.19 | 94.99 | 97.36 | 100 | 86.16 | 86.47 | 86.44 | 86.21 | 86.03 | 86.67 | 86.85 | 87.05 | 85.6 | 86.3 | 86.33 | 85.34 |
| A2-165_PacBio | 86.23 | 86.79 | 86.28 | 86.75 | 86.16 | 100 | 98.08 | 97.99 | 97.12 | 85.21 | 86.11 | 85.47 | 86.22 | 86.1 | 85.28 | 86.32 | 86.09 |
| CNCM_I_4543 | 86.76 | 86.2 | 85.88 | 86.03 | 86.47 | 98.08 | 100 | 99.9 | 97.08 | 84.82 | 85.58 | 85.42 | 85.75 | 85.85 | 85.2 | 85.9 | 85.71 |
| CNCM_I_4574 | 86.85 | 85.99 | 85.81 | 85.9 | 86.44 | 97.99 | 99.9 | 100 | 97.1 | 84.77 | 85.71 | 85.26 | 85.63 | 86 | 85.02 | 85.85 | 85.99 |
| HMI-19 | 86.02 | 86.66 | 85.6 | 85.78 | 86.21 | 97.12 | 97.08 | 97.1 | 100 | 86.08 | 85.56 | 85.59 | 85.59 | 85.96 | 85.47 | 86.1 | 85.93 |
| CNCM_I_4540 | 86.47 | 87.11 | 85.81 | 86.48 | 86.03 | 85.27 | 84.82 | 84.77 | 86.08 | 100 | 97.62 | 97.57 | 97.52 | 85.03 | 87.65 | 85.7 | 85.61 |
| CNCM_I_4542 | 87.6 | 86.86 | 86.44 | 86.6 | 86.67 | 86.11 | 85.58 | 85.71 | 85.56 | 97.62 | 100 | 98.46 | 98.1 | 85.62 | 88.09 | 85.97 | 86.13 |
| CNCM_I_4544 | 86.88 | 86.88 | 86.34 | 86.92 | 86.85 | 85.47 | 85.42 | 85.26 | 85.59 | 97.57 | 98.46 | 100 | 98.14 | 85.63 | 88.05 | 85.87 | 86.03 |
| KLE1255 | 87.44 | 87.21 | 86.57 | 87.35 | 87.05 | 86.22 | 85.75 | 85.63 | 85.59 | 97.52 | 98.1 | 98.14 | 100 | 85.73 | 87.94 | 86.52 | 86.47 |
| AHMP-21 | 85.48 | 86.01 | 85.61 | 85.58 | 85.6 | 86.1 | 85.85 | 86 | 85.96 | 85.03 | 85.62 | 85.64 | 85.73 | 100 | 85.14 | 88.31 | 86.21 |
| CNCM_I_4541 | 86.57 | 86.24 | 85.91 | 86.52 | 86.3 | 85.25 | 85.2 | 85.02 | 85.47 | 87.65 | 88.09 | 88.05 | 87.94 | 85.14 | 100 | 85.25 | 85.22 |
| CNCM_I_4575 | 86.81 | 88.42 | 86.48 | 86.79 | 86.33 | 86.32 | 85.9 | 85.85 | 86.1 | 85.7 | 85.97 | 85.87 | 86.52 | 88.31 | 85.25 | 100 | 86.87 |
| L2-6 | 85.72 | 84.64 | 84.87 | 85.43 | 85.34 | 86.09 | 85.71 | 85.99 | 85.93 | 85.61 | 86.13 | 86.03 | 86.47 | 86.21 | 85.22 | 86.87 | 100 |
The colors purple, blue, and green corresponds to the clusters A, B, and C respectively.
Figure 4Genomic synteny and gene conservation among the genomes of F. prausnitzii. The left side of the figure (A) shows the LCBs of all genomes studied here. The right side depicts the LCBs of the genomes within each of the three clusters previously obtained from ANI analysis: top right (B)—cluster A, middle right (C)—cluster B, and bottom right (D)—cluster C.
Figure 5Diagram depicting the subsets of the Faecalibacterium pangenome. The numbers represent the coding sequences belonging to each subset. Upper left chart (A): pangenome subsets from an analysis based on all 17 genomes of Faecalibacterium. Upper right chart (B): subset based on analysis of 5 genomes from group A. Lower left chart (C): subset based on analysis of 4 genomes from group B. Lower right chart (D): subset based on analysis of 4 genomes from group C.
Figure 6Pangenome development. Upper left chart: pangenome development based on permutations of all 17 genomes of Faecalibacterium. Upper right chart: development based on permutations of 5 genomes from group A. Lower left chart: development based on permutations of 4 genomes from group B. Lower right chart: development based on permutations of 4 genomes from group C.
Figure 7Development of core genome and singletons. Upper left chart: core-genome and singleton development based on permutations of all 17 genomes of Faecalibacterium. Upper right chart: development based on permutations of 5 genomes from group A. Lower left chart: development based on permutations of 4 genomes from group B. Lower right chart: development based on permutations of 4 genomes from group C.