| Literature DB >> 30386709 |
Lucia Graña-Miraglia1, César Arreguín-Pérez2, Gamaliel López-Leal3, Alan Muñoz1, Angeles Pérez-Oseguera1, Estefan Miranda-Miranda2, Raquel Cossío-Bayúgar2, Santiago Castillo-Ramírez1.
Abstract
Although genome sequencing has become a very promising approach to conduct microbial taxonomy, few labs have the resources to afford this especially when dealing with data sets of hundreds to thousands of isolates. The goal of this study was to identify the most adequate loci for inferring the phylogeny of the species within the genus Staphylococcus; with the idea that those who cannot afford whole genome sequencing can use these loci to carry out species assignation confidently. We retrieved 177 orthologous groups (OGs) by using a genome-based phylogeny and an average nucleotide identity analysis. The top 26 OGs showed topologies similar to the species tree and the concatenation of them yielded a topology almost identical to that of the species tree. Furthermore, a phylogeny of just the top seven OGs could be used for species assignment. We sequenced four staphylococcus isolates to test the 26 OGs and found that these OGs were far superior to commonly used markers for this genus. On the whole, our procedure allowed identification of the most adequate markers for inferring the phylogeny within the genus Staphylococcus. We anticipate that this approach will be employed for the identification of the most suitable markers for other bacterial genera and can be very helpful to sort out poorly classified genera.Entities:
Keywords: Bacterial species; Evolutionary biology; Infectious diseases; Phylogenomics; Staphylococcus
Year: 2018 PMID: 30386709 PMCID: PMC6203942 DOI: 10.7717/peerj.5839
Source DB: PubMed Journal: PeerJ ISSN: 2167-8359 Impact factor: 2.984
Figure 1Maximum likelihood phylogenetic tree based on the super alignment of the orthologous groups.
Maximum likelihood phylogenetic tree based on the super alignment of the 177 non-recombinant SGFs. Strains of the same species are labelled with the same colour and black stars indicate type strain for some species. Green and violet rectangles highlight strains with conflicting clade assignment, S. warneri and S. saccharolyticus respectively. Bootstrap values above 80 are shown in orange dots. The scale bar shows the number substitutions per site. The code for the strain identifiers is in Table S1.
Figure 2Heat map of the Average Nucleotide Identity (ANI) analysis of the 269 strains from Staphylococcus spp.
The reddish cells show identity percentages above 95% implying that the strains belong to the same species, whereas non-reddish colors denote identity percentages below 95% (see ANI key, A). The rows on and by the heat map (B) show the species assignation (see Species key, C) and the dendograms show the clustering of the strains. Strain identifiers (D) are as in Fig. 1 and the color-coding is as follows: blue gives the type strains, whereas red shows the strains with issues of mis-classification. This analysis indicates 44 clearly discernible Staphylococcus spp.
Figure 3Similarity to the species tree and nucleotide diversity.
Percentage of similarity between gene trees and the PSTT and the nucleotide diversity for each of the 177 SGF (green dots). The most similar gene tree topologies correspond to genes with high levels of nucleotide diversity. The commonly used marker genes dnaJ, rpoB and tuf are highlighted (orange dots), they all show low to moderate nucleotide diversity values and none of them show a similarity percentage to the PSTT above 60%.
Top 26 genuine orthologous genes.
These are the most adequate markers for inferring the species phylogeny.
| Description | NCBI reference sequence | GC content | Nucleotide diversity | Proportion of variable sites | Similarity with species tree (Robinson-fould distance) |
|---|---|---|---|---|---|
|
| 0.342 | 0.2370335 | 0.412 | 65.4135 (184) | |
|
| 0.345 | 0.2638535 | 0.588 | 65.4135 (184) | |
|
| 0.361 | 0.3404309 | 0.760 | 64.6617 (188) | |
|
| 0.339 | 0.2515735 | 0.688 | 64.6617 (188) | |
|
| 0.366 | 0.2683997 | 0.696 | 63.9098 (192) | |
|
| 0.340 | 0.2553319 | 0.623 | 63.9098 (192) | |
|
| 0.381 | 0.1488365 | 0.442 | 63.5338 (194) | |
| Pyruvate kinase ( |
| 0.361 | 0.2139191 | 0.584 | 63.5338 (194) |
| DNA polymerase III subunit alpha ( |
| 0.336 | 0.2895834 | 0.737 | 63.1579 (196) |
| UDP-N-acetylmuramate–L-alanine ligase ( |
| 0.335 | 0.2352931 | 0.622 | 63.1579 (196) |
| Fibronectin-binding domain-containing protein ( |
| 0.338 | 0.2644582 | 0.667 | 62.782 (198) |
| UDP-N-acetylmuramoyl-L-alanyl-D-glutamate-L-lysine ligase ( |
| 0.374 | 0.2366963 | 0.638 | 62.0301 (202) |
| Homoserine dehydrogenase ( |
| 0.344 | 0.2720552 | 0.579 | 62.0301 (202) |
| Acetyl-coenzyme A carboxylase carboxyl transferase subunit alpha ( |
| 0.357 | 0.2383029 | 0.584 | 62.0301 (202) |
| Molybdopterin molybdenumtransferase ( |
| 0.386 | 0.2844009 | 0.679 | 62.0301 (202) |
| S-adenosylmethionine:tRNA ribosyltransferase-isomerase ( |
| 0.357 | 0.2468871 | 0.620 | 61.6541 (204) |
| ATP-dependent DNA helicase (RecQ) |
| 0.345 | 0.2654353 | 0.659 | 61.2782 (206) |
| hypothetical protein ( |
| 0.363 | 0.3498756 | 0.723 | 60.9023 (208) |
| DNA polymerase III subunit tau ( |
| 0.368 | 0.2727606 | 0.645 | 60.9023 (208) |
| Ribonuclease R (rnr) |
| 0.367 | 0.2284088 | 0.626 | 60.5263 (210) |
| Ktr system potassium uptake protein B ( |
| 0.342 | 0.2674265 | 0.646 | 60.5263 (210) |
| Dihydrolipoyl dehydrogenase ( |
| 0.353 | 0.3373007 | 0.759 | 60.5263 (210) |
| 3-oxoacyl-[acyl-carrier-protein] synthase 3 (fabH) |
| 0.377 | 0.2367141 | 0.601 | 60.1504 (212) |
| Signal recognition particle receptor ( |
| 0.366 | 0.1979450 | 0.506 | 60.1504 (212) |
| 6-phosphogluconolactonase (hypothetical) |
| 0.358 | 0.3401093 | 0.737 | 60.1504 (212) |
| Aspartate aminotransferase ( |
| 0.342 | 0.2853785 | 0.678 | 60.1504 (212) |
Notes.
Gene names were taken from the UniProt (http://www.uniprot.org) database when possible, otherwise the proteins were blasted against NCBI protein database (http://www.ncbi.nlm.nih.gov) and a well-annotated Staphylococcus species was used for annotation.
Figure 4Phylogenies of the top orthologous groups.
Phylogenetic trees based on the concatenated alignments of the 26 orthologous groups listed in Table 1 (A) and on the concatenated alignments of the top seven orthologous groups (B). The tree in A has a percentage of similarity with the PSTT of 86% and the tree in B of 79%. Bootstrap values above 80 are shown in blue dots. Scale bar shows the number of substitutions per site.