| Literature DB >> 31481601 |
Stijn Wittouck1,2, Sander Wuyts1, Conor J Meehan3,4, Vera van Noort2, Sarah Lebeer5.
Abstract
There are more than 200 published species within the Lactobacillus genus complex (LGC), the majority of which have sequenced type strain genomes available. Although genome-based species delimitation cutoffs are accepted as the gold standard by the community, these are seldom actually checked for new or already published species. In addition, the availability of genome data is revealing inconsistencies in the species-level classification of many strains. We constructed a de novo species taxonomy for the LGC based on 2,459 publicly available genomes, using a 94% core nucleotide identity cutoff. We reconciled these de novo species with published species and subspecies names by (i) identifying genomes of type strains and (ii) comparing 16S rRNA genes of the genomes with 16S rRNA genes of type strains. We found that genomes within the LGC could be divided into 239 de novo species that were discontinuous and exclusive. Comparison of these de novo species to published species led to the identification of nine sets of published species that can be merged and one species that can be split. Further, we found at least eight de novo species that constitute new, unpublished species. Finally, we reclassified 74 genomes on the species level and identified for the first time the species of 98 genomes. Overall, the current state of LGC species taxonomy is largely consistent with genome-based species delimitation cutoffs. There are, however, exceptions that should be resolved to evolve toward a taxonomy where species share a consistent diversity in terms of sequence divergence.IMPORTANCE The Lactobacillus genus complex is a group of bacteria that constitutes an important source of strains with medical and food applications. The number of bacterial whole-genome sequences available for this taxon has been increasing rapidly in recent years. Despite this wealth of information, the species within this group are still largely defined by older techniques. Here, we constructed a completely new species-level taxonomy for the Lactobacillus genus complex based on ∼2,500 whole-genome sequences. As a result of this effort, we found that many genomes are not classified to their correct species, and we were able to correct these. In addition, we found that some published species are abnormally large, while others are too small. Finally, we discovered at least eight completely novel species that have not been published before. Our work will help the field to evolve toward a more meaningful and complete taxonomy, based on whole-genome sequences.Entities:
Keywords: Lactobacilluszzm321990; core genome; genomics; species delimitation; taxonomy
Year: 2019 PMID: 31481601 PMCID: PMC6722421 DOI: 10.1128/mSystems.00264-19
Source DB: PubMed Journal: mSystems ISSN: 2379-5077 Impact factor: 6.496
FIG 1Genome quality control based on single-copy core gene (SCG) completeness and redundancy. (A) Density of genome completeness. For each genome, the percentage of SCGs that were present is shown. (B) Density of genome redundancy. For each genome, the percentage of SCGs with one or more extra copies is shown. The small vertical bars at the bottom of the graphs represent individual genomes and are shown to visualize outliers more clearly.
FIG 2Pairwise genome distance values between LGC genomes. (A) All pairwise core nucleotide identity (CNI) similarities between LGC genomes. The gray area indicates the CNI values of >94% (the same-species range). (B) All pairwise fastANI similarities between LGC genomes. They gray area indicates the fastANI values of >93% (the hypothetical same-species range). (C) CNI versus ANI values. Only genome pairs with fastANI values of >75% are shown, since the fastANI tool does not compute values under 75%.
Inconsistencies between published and de novo species
| Cluster | Merger
| Type strain(s) present | Minimum
|
|---|---|---|---|
| 135 | Merger | 1 | |
| 218 | Merger | 0.981 | |
| 179 | Merger | 0.979 | |
| 178 | Merger | 0.974 | |
| 173 | Merger | 0.964 | |
| 238 | Merger | 0.954 | |
| 236 | Merger | 0.952 | |
| 225 | Merger | 0.946 | |
| 233 | Merger | 0.946 | |
| 141 | Merger | 0.944 | |
| 64 | Split | 1 | |
| 149 | Split | 0.953 |
Genome clusters without a type strain genome
| Cluster | NCBI species | No. of 16S
| 16S hits | Species |
|---|---|---|---|---|
| 155 | 28 | |||
| 185 | 6 | |||
| 4 | 1 | |||
| 175 | 12 | |||
| 165 | 1 | |||
| 7 | 4 | |||
| 157 | 32 | |||
| 176 | 25 | |||
| 129 | 1 | |||
| 12 | 25 | |||
| 184 | 93 | |||
| 11 | 1 | |||
| 167 | 10 | |||
| 164 | NA | 1 | NA | New species 1 |
| 166 | NA | 1 | NA | New species 2 |
| 168 | NA | 5 | NA | New species 3 |
| 169 | NA | 5 | NA | New species 4 |
| 192 | NA | 5 | NA | New species 5 |
| 211 | NA | 1 | NA | New species 6 |
| 25 | NA | 1 | NA | New species 7 |
| 3 | NA | 1 | NA | New species 8 |
| 206 | 0 | NA | Unidentified species 1 | |
| 207 | 0 | NA | Unidentified species 2 | |
| 1 | NA | 0 | NA | Unidentified species 3 |
| 132 | NA | 10 | Unidentified species 4 | |
| 190 | NA | 4 | Unidentified species 5 | |
| 196 | NA | 4 | Unidentified species 6 | |
| 2 | NA | 5 | Unidentified species 7 | |
| 217 | NA | 1 | Unidentified species 8 |
Leuc., Leuconostoc; NA, not available.
A question mark in parentheses indicates that this species is a “best guess” (see “Comparison with published genomes” in Results).
FIG 3Maximum likelihood phylogenetic tree of all genome clusters of the LGC. The tree was inferred on a nucleotide supermatrix of 100 SCGs and one representative genome per species. The genes and representative genomes were selected to maximize completeness of the supermatrix. The names and terminal branches of new and unidentified species are shown in orange and green, respectively. The names of “type species” of genera or phylogroups within Lactobacillus (2) are shown in darker blue. Weak clades (with a bootstrap value of <70) are indicated with open circles. The root position was taken from the literature; the outgroup tip is artificial, and its branch length was chosen in order to optimally visualize the tree topology.
Isolation sources, phylogroups, and predicted lifestyles of new species
| Species | Isolation source
| Phylogroup | Phylogroup lifestyle |
|---|---|---|---|
| New species 1 | A | Vertebrate-adapted | |
| New species 2 | Urine catheter | Vertebrate-adapted | |
| New species 3 | Kimchi | Nomadic | |
| New species 4 | Kimchi | Nomadic | |
| New species 5 | Gut of | Insect-adapted | |
| New species 6 | Cow rumen | Vertebrate-adapted | |
| New species 7 | Insect-adapted | ||
| New species 8 | Human gut | Vertebrate-adapted |
FIG 4LGC genome reclassifications. (A) Classification of genomes that are currently unclassified in NCBI database, using the CNI-based genome clusters. (B) Reclassification of genomes that have an NCBI species label available but were found in a different CNI species cluster. Species names that are attached to an identical set of genomes in the NCBI and CNI classifications are not shown.