| Literature DB >> 36008577 |
Saidi Wang1, Minerva Ventolero2, Haiyan Hu3,4, Xiaoman Li5.
Abstract
Universal single-copy genes (USCGs) are widely used for species classification and taxonomic profiling. Despite many studies on USCGs, our understanding of USCGs in bacterial genomes might be out of date, especially how different the USCGs are in different studies, how well a set of USCGs can distinguish two bacterial species, whether USCGs can separate different strains of a bacterial species, to name a few. To fill the void, we studied USCGs in the most updated complete bacterial genomes. We showed that different USCG sets are quite different while coming from highly similar functional categories. We also found that although USCGs occur once in almost all bacterial genomes, each USCG does occur multiple times in certain genomes. We demonstrated that USCGs are reliable markers to distinguish different species while they cannot distinguish different strains of most bacterial species. Our study sheds new light on the usage and limitations of USCGs, which will facilitate their applications in evolutionary, phylogenomic, and metagenomic studies.Entities:
Mesh:
Year: 2022 PMID: 36008577 PMCID: PMC9411617 DOI: 10.1038/s41598-022-18762-z
Source DB: PubMed Journal: Sci Rep ISSN: 2045-2322 Impact factor: 4.996
Figure 1USCG sets and functional categories. (A) Overlap of the seven USCG sets. (B) The percentage of the functional categories of the USCGs in the seven sets. For each set, the percent of USCGs from the functional categories J, L, H, O, F, and others is shown in order.
Universalism and uniqueness of USCGs.
| Mean | SD | Minimum | Median | Maximum | |||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| C | W | A | C | W | A | C | W | A | C | W | A | C | W | A | |
| fetchMG | 99.0 | 98.6 | 99.0 | 0.5 | 2.7 | 0.8 | 97.7 | 83.9 | 95.9 | 99.3 | 99.3 | 99.3 | 99.6 | 99.7 | 99.6 |
| BLAST | 96.1 | 88.9 | 94.2 | 3.0 | 7.0 | 4.5 | 87.1 | 58.9 | 76.7 | 96.7 | 90.9 | 96.0 | 99.7 | 96.8 | 99.0 |
| universal | 100 | 99.8 | 99.9 | 3.4 | 9.4 | 5.3 | 99.7 | 93.4 | 99.0 | 100 | 99.9 | 100 | 100 | 100 | 100 |
| unique | 98.5 | 97.3 | 98.2 | 1.0 | 3.9 | 2.2 | 95.4 | 82.6 | 89.0 | 99.0 | 99.1 | 99.1 | 99.5 | 99.6 | 99.5 |
The subcolumns with the name C, W, and A refer to the corresponding percentage for the USCGs from Creevey et al., Wu et al., and Alneberg et al., respectively.
Figure 2The universalism and uniqueness of the three USCG sets. Universalism means how many percent of genomes has at least a copy of a USCG. Uniqueness refers to the percentage of genomes with only one copy of a USCG.
Figure 3The cumulative distribution of the PID of species pairs and strain pairs. (A) Species pairs. (B) Strain pairs.