| Literature DB >> 31053724 |
Pascal Lapébie1, Vincent Lombard1, Elodie Drula1, Nicolas Terrapon1, Bernard Henrissat2,3.
Abstract
Unlike proteins, glycan chains are not directly encoded by DNA, but by the specificity of the enzymes that assemble them. Theoretical calculations have proposed an astronomical number of possible isomers (> 1012 hexasaccharides) but the actual diversity of glycan structures in nature is not known. Bacteria of the Bacteroidetes phylum are considered primary degraders of polysaccharides and they are found in all ecosystems investigated. In Bacteroidetes genomes, carbohydrate-degrading enzymes (CAZymes) are arranged in gene clusters termed polysaccharide utilization loci (PULs). The depolymerization of a given complex glycan by Bacteroidetes PULs requires bespoke enzymes; conversely, the enzyme composition in PULs can provide information on the structure of the targeted glycans. Here we group the 13,537 PULs encoded by 964 Bacteroidetes genomes according to their CAZyme composition. We find that collectively Bacteroidetes have elaborated a few thousand enzyme combinations for glycan breakdown, suggesting a global estimate of diversity of glycan structures much smaller than the theoretical one.Entities:
Mesh:
Substances:
Year: 2019 PMID: 31053724 PMCID: PMC6499787 DOI: 10.1038/s41467-019-10068-5
Source DB: PubMed Journal: Nat Commun ISSN: 2041-1723 Impact factor: 14.919
Fig. 1Schematic view of the approach taken to estimate the number CAZyme combinations in PULs. The depolymerization of a given complex glycan by Bacteroidetes PULs requires bespoke enzymes secreted in the perisplasm and the extracellular milieu. Conversely, the enzyme composition of PULs provides information on the structure of the targeted glycan. The enumeration of PULs according to their composition encoded CAZymes gives an estimate of the diversity of glycans degraded by Bacteroidetes
Fig. 2The PUL analysis pipeline. a Selection and sorting of data from PULDB. b Clustering of PULs according to their CAZyme composition. The distance between each pairs of PULs has been calculated according to the composition in enzyme (sub)families. Hierarchical clustering with different distance thresholds (from 0 to 50% mismatch in CAZyme composition) yields a number of unique PULs between ~1200 and ~2900
Fig. 3Phylogenetic trees of SusC and SusD proteins encoded by tandem-repeat susC/D loci. SusC and SusD form color-coded congruent clades in the two phylogenetic trees. Each member of each clade has same genomic position in the repeat, revealing a strict synteny within each trsusC/D groups
Fig. 4Number of unique PULs according to the number of genomes analyzed. The number of PULs was calculated by randomly resampling an increasing number of genomes from our data set (x-axis). The resampling was performed ten times; the median value is represented on the y-axis. A second order polynomial regression gives the trend of two sets of values corresponding to 0 (red) and 20% (blue) mismatch used during PUL clustering