| Literature DB >> 36198716 |
Iva Budimir1, Enrico Giampieri2, Edoardo Saccenti3, Maria Suarez-Diez3, Martina Tarozzi2, Daniele Dall'Olio1, Alessandra Merlotti1, Nico Curti2, Daniel Remondini1, Gastone Castellani4, Claudia Sala2.
Abstract
The ability to detect and characterize bacteria within a biological sample is crucial for the monitoring of infections and epidemics, as well as for the study of human health and its relationship with commensal microorganisms. To this aim, a commonly used technique is the 16S rRNA gene targeted sequencing. PCR-amplified 16S sequences derived from the sample of interest are usually clustered into the so-called Operational Taxonomic Units (OTUs) based on pairwise similarities. Then, representative OTU sequences are compared with reference (human-made) databases to derive their phylogeny and taxonomic classification. Here, we propose a new reference-free approach to define the phylogenetic distance between bacteria based on protein domains, which are the evolving units of proteins. We extract the protein domain profiles of 3368 bacterial genomes and we use an ecological approach to model their Relative Species Abundance distribution. Based on the model parameters, we then derive a new measurement of phylogenetic distance. Finally, we show that such model-based distance is capable of detecting differences between bacteria in cases in which the 16S rRNA-based method fails, providing a possibly complementary approach , which is particularly promising for the analysis of bacterial populations measured by shotgun sequencing.Entities:
Mesh:
Substances:
Year: 2022 PMID: 36198716 PMCID: PMC9534902 DOI: 10.1038/s41598-022-21036-3
Source DB: PubMed Journal: Sci Rep ISSN: 2045-2322 Impact factor: 4.996
Figure 1Fit of protein domains RSA. (a) Example of protein domains Preston plot fitted with three different distributions: the Poisson Log-Normal, the Negative Binomial and the Log-Series. Results refer to the bacterial genome . The Negative Binomial and the Log-Series fit overlap. This implies that the dispersion parameter of the Negative Binomial distribution (see Eq. (6)) is close to zero. The mean and the median of the dispersion parameter obtained for the 3368 bacterial genomes are and , in agreement with the observed overlap. (b) Distribution of the difference between the AIC obtained with the Poisson Log-Normal model (PL) and the Log-Series (LS) or the Negative Binomial (NB) model, considering all the 3368 bacterial genomes.
Figure 2Distribution of species according to the model parameters. Scatter plot of Poisson Log-Normal parameters versus obtained fitting the protein domains RSAs. Only species represented by at least 10 different strains were included in the plot, for a total of 1173 bacterial genomes which belong to 48 different species. Different colors represent different species as indicated in the legend.
Figure 3Comparison between the three clustering results at different taxonomic levels. NMI scores (y-axis) are calculated as a measurement of agreement between clusters based on: RSA method and taxonomy (blue), 16S rRNA gene and taxonomy (red), RSA method and 16S rRNA gene (green). Different taxonomic levels are considered for the comparison: phylum, class, order, family, genus and species (x-axis). The box plots represent the baselines of NMI score and are based on simulations.
Figure 4(Previous page.) Hierarchical clustering of bacteria at the intraspecies level, comparing solutions obtained by RSA and 16S rRNA method. Each subplot shows a tanglegram with RSA-based dendrogram on the left and 16S rRNA-based dendrogram on the right. Lines connect the same bacteria from two dendrograms. The color/type of the line represents the feature of the bacterium it connects. (a) 22 strains of Xanthomonas citri belong to two different pathovars: A (orange) and (purple). (b) 10 strains of Chlamydia pneumoniae are isolated from different tissues: conjuctival (yellow), respiratory (magenta) and vascular (violet). 9 strains represented with solid line are human (Homo sapiens) pathogens while the one strain represented by dashed line is koala (Phascolarctos cinereus) pathogen. (c) 14 strains of Vibrio cholerae are colored based on their karyotype. 11 strains have two circular chromosomes Chr1 ( Mb) and Chr2 ( Mb) (magenta). 2 strains have one 4 Mb long circular chromosome (yellow). One strain has three chromosomes Chr1 (3 Mb), Chr2 (1 Mb) and Chr3 (1 Mb) (violet).