| Literature DB >> 33981291 |
José Luis Maturana1, Juan P Cárdenas1,2.
Abstract
Blautia, a genus established in 2008, is a relevantly abundant taxonomic group present in the microbiome of human and other mammalian gastrointestinal (GI) tracts. Several described (or proposed) Blautia species are available at this date. However, despite the increasing level of knowledge about Blautia, its diversity is still poorly understood. The increasing availability of Blautia genomic sequences in the public databases opens the possibility to study this genus from a genomic perspective. Here we report the pangenome analysis and the phylogenomic study of 225 Blautia genomes available in RefSeq. We found 33 different potential species at the genomic level, 17 of them previously undescribed; we also confirmed by genomic standards the status of 4 previously proposed new Blautia species. Comparative genomic analyses suggest that the Blautia pangenome is open, with a relatively small core genome (∼ 700-800 gene families). Utilizing a set of representative genomes, we performed a gene family gain/loss model for the genus, showing that despite terminal nodes suffered more massive gene gain events than internal nodes (i.e., predicted ancestors), some ancestors were predicted to have gained an important number of gene families, some of them associated with the possible acquisition of metabolic abilities. Gene loss events remained lower than gain events in most cases. General aspects regarding pangenome composition and gene gain/loss events are discussed, as well as the proposition of changes in the taxonomic assignment of B. coccoides TY and the proposition of a new species, "B. pseudococcoides.".Entities:
Keywords: Blautia; diversity; gene gain/loss; genomic species; pangenome; phylogenomics
Year: 2021 PMID: 33981291 PMCID: PMC8107234 DOI: 10.3389/fmicb.2021.660920
Source DB: PubMed Journal: Front Microbiol ISSN: 1664-302X Impact factor: 5.640
Genomic species groups found in the Blautia dataset from the use of ANI, TETRA and AAI data.
| Species cluster | Proposed species assignation | Included sequenced type strain? | # of genomes | Comments |
| 1 | Yes | 1 | It includes type strain KCTC 15426 genome (GCF_003287895.1) | |
| 2 | Yes | 10 | It includes type strain SG-772 genome (GCF_003011855.2) | |
| 3 | Yes | 1 | It includes type strain KGMB01111 genome (GCF_004123145.1) | |
| 4 | “ | Yes # | 2 | It includes proposed type strain BX17 genome (GCF_014287535.1); representative rRNA sequence (MT905180.1) had 97.78% identity with NR_109014.1 ( |
| 5 | [unknown group A] | No | 1 | Representative rRNA sequence (locus B5F53_RS19410) had 92.88% identity with NR_026312.1 ( |
| 6 | [unknown group C] | No | 1 | The only representative does not contain a suitable 16S rRNA gene |
| 7 | [unknown group D] | No | 1 | Representative rRNA sequence (locus_tag G5B11_RS18600) had 95.44% identity with |
| 8 | [unknown group E] | No | 1 | Representative rRNA sequence (locus_tag G5A70_RS15400) had 97.45% identity with |
| 9 | “ | Yes # | 1 | It includes the proposed type strain M29 genome (GCF_014297245.1) |
| 10 | Yes | 1 | It includes type strain DSM 20583 genome (GCF_002222595.2) | |
| 11 | Yes | 3 | It includes the genomes of | |
| 12 | Yes | 3 | It includes type strain DSM 10507 genome (GCF_000157975.1) | |
| 13 | Yes | 1 | It includes type strain DSM 14534 genome (GCF_009707925.1) | |
| 14 | “ | Yes # | 5 | It includes proposed type strain 2744 genome (GCF_014297355.1) |
| 15 | [unknown group B] | No | 1 | The genome of the unique member of this clade was labeled as “ |
| 16 | Yes | 25 | It includes type strain GD9 genome (GCF_001487165.1) | |
| 17 | [unknown group F] | No | 2 | Representative RNA genes had < 98.5% identity with other RNA genes from known species. This suggests that it could be a new species |
| 18 | Yes | 36 | It includes type strain ATCC 29174 genome (GCF_000153905.1) | |
| 19 | Yes | 5 | It includes | |
| 20 | [unknown group J] | No | 6 | Representative RNA genes had <98.5% identity with other RNA genes from known species. This suggests that it could be a new species |
| 21 | [unknown group G] | No | 4 | Representative RNA genes had <98.5% identity with other RNA genes from known species. This suggests that it could be a new species |
| 22 | [unknown group H] | No | 2 | Representative RNA genes had <98.5% identity with other RNA genes from known species. This suggests that it could be a new species |
| 23 | [unknown group K] | No | 2 | Representative RNA genes were very close to other strains, but <98.5% identity. This suggests that it could be a new species |
| 24 | No | 5 | Representative rRNA sequences (e.g., locus G4470_RS18450) had >99% identity with | |
| 25 | [unknown group L] | No | 3 | Representative RNA genes had <98.5% identity with other RNA genes from known species. This suggests that it could be a new species |
| 26 | [unknown group I] | No | 23 | Representative RNA genes had <98.5% identity with other RNA genes from known species. This suggests that it could be a new species |
| 27 | No | 2 | Representative rRNA sequence (locus G4948_RS15000) had 99.35% identity with | |
| 28 | “ | Yes # | 5 | It includes strain NSJ-34 genome (GCF_014287615.1); representative rRNA sequence (MT905182) had 98.26% identity with NR_163637.1 ( |
| 29 | Yes | 66 | It includes strain DSM 19850 genome (GCF_000484655.1) | |
| 30 | [unknown group X] | No | 2 | Representative rRNA sequences from this group have > 99% identity with type strain |
| 31 | [unknown group Y] | No | 3 | It includes |
| 32 | [unknown group W] | No | 1 | Representative RNA genes were very close to |
| 33 | [unknown group Z] | No | 1 | Representative RNA genes had < 97% identity with other RNA genes from known species. This suggests that it could be a new species |
FIGURE 1Phylogenomic tree from selected representative genomes from the clusters of genomic species from the Blautia dataset. The tree was computed using maximum likelihood, from the concatenated sequence alignment of 190 conserved single-copy orthogroups. The numeric data in format n/n represent values for bootstrap support (UF-bootstrap) and approximate likelihood-ratio test values (SH-aLRT). The tree is rooted using Robinsoniella peorensis data as the outgroup. The arbitrarily designed lineages I to VI are also shown. The color of each taxon name reflect the status of the current representative genome: red: undescribed potential new species; blue: type strain of a validated species; green: strain of a validated species, but not a type strain; brown: proposed type strain from a non-officially accepted species; black: current strains reassigned into another species. See text for more details.
Number of genes per pangenome category.
| Program | Core genes | Soft core genes | Shell genes | Cloud genes | Total genes |
| Peppan | 821 | 136 | 4452 | 38136 | 43545 |
| PanX | 722 | 173 | 3433 | 31109 | 35437 |
| Roary | 117 | 36 | 5875 | 82534 | 88562 |
FIGURE 2Rarefaction curves for “pan” (A) and “core” (B) genes. The curves were fitted to median values of 1,000 iterations following the Power Law previously described (Tettelin et al., 2008). The Shadows indicate 95% confidence intervals. (C) Distribution of the 43545 pan genes among the 224 genomes of the Blautia dataset. The bar representing the core genes is highlighted with red color. Inset: Percentages for each gene category in the pangenome, see Table 2 for definitions.
FIGURE 3Heatmap of the KEGG modules completeness of the core pangenomes of 4 selected genomic species groups, clustered by category of metabolism. All clades have at least 23 genomes and are composed of genomes pertaining exclusively to one cluster defined by ANI. Only modules with 50% completeness or more were considered.
FIGURE 4Gain/loss profile among different ancestors and lineages among the representative set of Blautia genomic species. The same tree (with the same members) shown in Figure 1, was presented as a cladogram, and the data from Count analysis of gene gain/loss models, following Wagner parsimony model for 12,691 gene families, was represented. The roman numbers near each branch represent each proposed lineage among the members. Black numbers represent the number of shared families for a given node (i.e., predicted ancestor), as well as the green and red numbers represent the number of gene families gained and lost for the ancestor. The green and red numbers on the upper side of each terminal branch represented the same terms as before, but for the members of the tree (the terminal nodes). See text for more details.
FIGURE 5Taxonomic abundance profile of the top five HGT donors for the Blautia genomes from the representative dataset. Each genome was analyzed by HGTector and the frequency of taxonomic assignments for the top 5 putative donors were plotted for each case. The color code corresponds to the donors listed in the legend (upper right side).