| Literature DB >> 27530250 |
Abstract
BACKGROUND: Identification and classification of highly similar microbial strains is a challenging issue in microbiology, ecology and evolutionary biology. Among various available approaches, gene content analysis is also at the core of microbial taxonomy. However, no threshold has been determined for grouping microorgnisms to different taxonomic levels, and it is still not clear that to what extent genomic fluidity should occur to form a microbial taxonomic group.Entities:
Keywords: Gene content dissimilarity; Genomic fluidity; Highly similar strains; Microbial subclassification
Mesh:
Substances:
Year: 2016 PMID: 27530250 PMCID: PMC4988056 DOI: 10.1186/s12864-016-2991-9
Source DB: PubMed Journal: BMC Genomics ISSN: 1471-2164 Impact factor: 3.969
Fig. 1The flowchart of applying gene content dissimilarity for microbial delineation and classification. Three main steps were included. First, orthologous gene profiles were obtained for all selected microbial genomes by searching against the eggNOG database. Second, pairwise gene content dissimilarity as measured by Bray-Curtis dissimilarity was calculated for all pairs of microbial strains. Third, microbial strains were clustered into different groups
Fig. 2Distribution of gene content dissimilarity for the retrieved microbial genomes at different taxonomic levels, including species (a), genus (b), family (c), and order (d). Cutoffs of 0.2 and 0.4 were recommended for microbial species and family delineation, respectively
Fig. 3Comparison of 16S rRNA gene identity, ANI, and gene content dissimilarity in microbial species delineation. A cutoff of 0.2 corresponded to 98 % 16S rRNA gene identity and 94 % ANI in species delineation. A total of 5008 intra-species and 8642 intra-genus comparisons were plotted. Red dots falling in the Q1 quadrant were mostly several clostridium strains, for which misclassification may have occurred. Red dots represented intra-species comparisons, and blue dots indicated intra-genus comparisons
Fig. 4Application of gene content dissimilarity in classifying microbial strains belonging to Enterobacteriaceae. a PCoA clustering of all selected microbial strains belonging to Enterobacteriaceae. b PCoA clustering of highly similar microbial strains including E. coli and Shigella. A clear separation of Shigella and E. coli O157:H7 from other E. coli strains could be observed
Fig. 5Application of gene content dissimilarity in classifying Streptococcus strains. Clear separation of different species into different groups could be observed. Highly similar strains belonging to S. mitis, S. oralis, and S. pneumoniae were also well separated
An example showing how Bray-Curtis dissimilarity was calculated between strain I and strain J. (Note: dissimilarity calculation in real case would be more complex because typical microbial genomes usually comprise thousands of genes)
| Orthologous groups | # genes mapped in strain | # genes mapped in strain |
|
|
|
|
|
|---|---|---|---|---|---|---|---|
| OG1 | 1 | 1 | 1 | 7 | 9 | 11 | 0.3 |
| OG2 | 2 | 4 | 2 | ||||
| OG3 | 4 | 2 | 2 | ||||
| OG4 | 0 | 2 | 0 | ||||
| OG5 | 2 | 2 | 2 |
* C was the number of common genes assigned to the same orthologous group, the lesser number of mapped genes was used
# C was the sum of C , and represented the total number of genes assigned to common orthologous groups between strain I and J
$ S was the total number of genes in strain I mapped to orthologous groups in eggNOG database
% S was the total number of genes in strain J mapped to orthologous groups in eggNOG database
& BC is the Bray-Curtis dissimilarity between strain I and J