Literature DB >> 29088389

PULDB: the expanded database of Polysaccharide Utilization Loci.

Nicolas Terrapon^1,2, Vincent Lombard^1,2, Élodie Drula^1,2, Pascal Lapébie^1,2, Saad Al-Masaudi³, Harry J Gilbert⁴, Bernard Henrissat^1,2,3.

Abstract

The Polysaccharide Utilization Loci (PUL) database was launched in 2015 to present PUL predictions in ∼70 Bacteroidetes species isolated from the human gastrointestinal tract, as well as PULs derived from the experimental data reported in the literature. In 2018 PULDB offers access to 820 genomes, sampled from various environments and covering a much wider taxonomical range. A Krona dynamic chart was set up to facilitate browsing through taxonomy. Literature surveys now allows the presentation of the most recent (i) PUL repertoires deduced from RNAseq large-scale experiments, (ii) PULs that have been subjected to in-depth biochemical analysis and (iii) new Carbohydrate-Active enzyme (CAZyme) families that contributed to the refinement of PUL predictions. To improve PUL visualization and genome browsing, the previous annotation of genes encoding CAZymes, regulators, integrases and SusCD has now been expanded to include functionally relevant protein families whose genes are significantly found in the vicinity of PULs: sulfatases, proteases, ROK repressors, epimerases and ATP-Binding Cassette and Major Facilitator Superfamily transporters. To cope with cases where susCD may be absent due to incomplete assemblies/split PULs, we present 'CAZyme cluster' predictions. Finally, a PUL alignment tool, operating on the tagged families instead of amino-acid sequences, was integrated to retrieve PULs similar to a query of interest. The updated PULDB website is accessible at www.cazy.org/PULDB_new/.

Entities: Chemical

Mesh：

Substances：

Year: 2018 PMID： 29088389 PMCID： PMC5753385 DOI： 10.1093/nar/gkx1022

Source DB: PubMed Journal: Nucleic Acids Res ISSN： 0305-1048 Impact factor: 16.971

INTRODUCTION

Polysaccharides constitute the main source of carbon for most organisms on Earth. Because of their enormous structural diversity, polysaccharide deconstruction requires the concerted action of large numbers of specific enzymes. While most bacteria break down polysaccharides by exporting their carbohydrate-active enzymes (CAZymes) into the extracellular milieu and import the simple sugars produced, an inventive solution operates in Gram-negative bacteria of the Bacteroidetes phylum. The genomes of these bacteria feature Polysaccharide Utilization Loci, or PULs. A PUL comprises a single genomic locus that encodes the necessary proteins to bind a given polysaccharide at the cell surface, to perform an initial cleavage to large oligosaccharides, to import these oligosaccharides in the periplasmic space, to complete the degradation into monosaccharides and to regulate PUL gene expression. Some Bacteroidetes species contains up to 100 PULs with almost 20% of their genome dedicated to these systems (1), explaining their evolutionary success as primary glycan degraders in the human gut microbiota (2). Bacteroidetes are found in almost all environments, and the last decade has seen a continuous acceleration of published PUL analyses, notably by RNAseq experiments and in-depth biochemistry. To facilitate individual PUL analysis, in 2015 we launched PULDB to present PULs predicted solely from genome sequences along with those reported in the literature (3). The principle of the PUL prediction is to start from every susCD-like gene pair, and then to extend PUL boundaries to operonic genes (based on intergenetic distances between genes on the same strand (4)) and to more distant regulators and CAZyme coding genes which catalyze polysaccharide breakdown. While we previously mainly focused on the algorithm and presented a limited number of genomes with a recognized bias towards human gut species/strains, we present here a major update of PULDB. This release includes a 10-fold increase in analyzed genomes that offers a much deeper coverage of the Bacteroidetes phylum and different environments. A tool has been integrated to the web interface to facilitate taxonomy browsing in PULDB. Also this release updates to the most recent literature-derived PULs and CAZyme families. Additional protein families relevant in a PUL context are now displayed and used in a PUL aligner that allows the user to retrieve the most conserved modular PUL organizations.

10-FOLD INCREASE IN CAZy-ANALYZED SPECIES

In order to achieve a >10-fold increase in PULDB, we analyzed 820 complete genome sequences (∼3 million genes) mostly of the Bacteroidetes phylum downloaded from JGI (http://genome.jgi.doe.gov/) and NCBI (https://www.ncbi.nlm.nih.gov/nuccore) servers. Our PUL prediction procedure relies on genomic data but also requires the semi-manual expert annotation of CAZymes (5). We identified 153 202 CAZyme modules in the 820 genomes, mostly glycoside hydrolases (53%) and glycosyltransferases (31%), classified according to the sequence-based families that are described in the CAZy database. Then the 820 genomes were subjected to the PUL predictions as described earlier. Compared to the 2015 PULDB dataset (3), the new genome sampling expands far beyond the human gastrointestinal tract (now represented by ∼80 species), and notably includes 64 rumen gut species, as well as many bacterial species from soil or marine environments. The coverage of Bacteroidetes taxonomical diversity also drastically increased. The 2015 dataset almost exclusively consisted of species from the Bacteroidales order (70% belonging to the Bacteroides genus). In the current dataset, Bacteroidales only represents 40% (only half being from the Bacteroides genus), a proportion comparable to the Flavobacteriales order while three additional orders (Cytophagales, Sphingobacteriales and Chitinophagales) are now also presented. Moreover, the presence of the PUL fundamental susCD gene tandem now allows the prediction of PULs beyond the Bacteroidetes phylum, namely in the Gemmatimonadetes and Ignavibacteriae phyla (which group with Bacteroidetes in the FCB group), and also in the Balneolaeota phylum. To facilitate navigation across the various taxonomical levels, and to identify species of interest, we implemented a new browsing tool in PULDB. We adapted the Krona multilayered pie-chart, introduced for metagenomics analysis (6), to represent the hierarchical aspects of the taxonomy (Figure 1). Implemented using the latest HTML5 and JavaScript interactive technology, Krona allows zooming in and out very efficiently and can be easily customized by the user for the desired taxonomic depth or font size, allowing the production of high-quality publication-ready pictures. It also offers text searches and improved navigation. We also added a color scale indicative of the number of PULs per genome (estimated for ancestral taxa by a simple arithmetic mean) which immediately offers an overview of the PUL diversity at the different taxonomical levels. Finally, in the upright part, where Krona provides statistics about genome for each taxa, we added several hyperlinks to the species list, to the NCBI taxonomy and to PULDB predicted PULs in this group/species.

Figure 1.

Krona multilayered pie-charts of taxonomy in PULDB. The top-left corner includes web-browser classical features (text search area and buttons for browsing back and forward), and display features (depth of the taxonomy, font and chart sizes). The bottom-left corner displays the color scale that represents the number of PULs per species (averaged in ancestral nodes). The top-right corner indicates the selected taxonomic level and its relative information: the number of species (with a link to the listing), a link to PULDB to visualize all PULs for this taxon, a link to the NCBI taxonomy, etc. (A) Initial display of the most general taxonomic level, labeled ALL at the center, with a search for the character string ‘frigo’ highlighting the taxa having a positive result. (B) Display of the Sphingobacteriaceae level, resulting from a zoom-in by double-clicking on the ‘Sphingobacteriaceae’ area in (A) chart. Going back to (A) or intermediary levels is possible through the lineage links at the center.

LITERATURE-DERIVED PULs, COGNATE SUBSTRATES AND NEW CAZymes FAMILIES

The study of polysaccharide degradation by PUL encoded systems is a highly active research field. A continuous literature survey enabled us to complete the PULDB data with literature-derived PUL data (previously called experimentally-validated PULs). Notably, recent high-throughput experiments led to the delineation of PULs in Bacteroides cellulosilyticus WH2 (7), Bacteroides thetaiotaomicron 7330 (8) and Zobellia galactinovorans (9). Attempts to define PUL boundaries in the absence of expression data were also reported in the genome publication of Capnocytophaga canimorsus Cc5 (10). Moreover, several specific analyses have focused on the degradation of defined polysaccharides by their corresponding PULs, including plant (fructan (11), pectin (12), xylan (13,14), xyloglucan (15) and type II rhamnogalacturonan (RGII) (16)) and non-plant (α-mannan (17), galactomannan (18), 1,6-β-glucan (19), mucin (20), sialoglycoconjugates (21), N-glycan (17,22–24), heparin and heparan sulfate (25), chitin (26), alginate and laminarin (27)) polysaccharides. To facilitate the retrieval of characterized PULs by their cognate substrate, a new field appears in the PULDB homepage, to search for a given character substring within the PUL substrate labels. Finally, the recent RGII publication notably reported the biochemical characterization of seven new glycoside hydrolase families that were immediately added to the CAZy database, designated GH137 to GH143. Similarly, other publications led to the creation of new CAZyme families: GH136, GH144, GH145, PL24 to PL27 (28–34). All new CAZy families have also been added to PULDB. As a consequence, the PUL predictions are improved by these new families, which allow refinement of both PUL boundaries and prediction confidence, as illustrated with the Jbrowse view (35) of the homologous RGII-PUL in Terrimonas ferruginea DSM 30193 (Figure 2).

Figure 2.

Example of the improved PUL predictions by the inclusion of recently created CAZyme families in the RGII PUL of Terrimonas ferruginea DSM 30193. Panel (A) displays the JBrowse view (35) of the region before the creation of families GH136-GH143 (16). The predicted PUL is depicted at the bottom of the panel by a green, yellow and red line, according to confidence levels as previously described (3). Panel (B) displays the same region with the genes belonging to these seven families now annotated and highlighted by black boxes. These annotations lead to a PUL prediction with improved confidence (left and middle arrows), and improved PUL boundaries (right arrow), compared to (A).

ADDITIONAL DISPLAY OF SULFATASES, PROTEASES, EPIMERASES, ROKs AND TRANSPORTERS

In PULDB, simplified representations of PULs are proposed as trains whose wagons, the constitutive proteins, are colored/tagged if their protein function is relevant in the PUL context. We initially focused on SusC outer-membrane transporter (purple), SusD outer-membrane binding proteins (orange), several regulators (light blue), integrases which sometimes join adjacent PULs (dark gray) and CAZyme families (mainly glycoside hydrolases in light pink, polysaccharide lyases in dark pink, carbohydrate-binding modules in green, carbohydrate esterases in brown). All other proteins remained tagged as ‘unknown’ (light gray). To increase readability of these PUL representations, we searched additional protein families with relevant function in PULs, based on (i) the literature, (ii) over-representation in PUL contexts and (iii) reliability of Pfam domain annotation (36). These new families are now tagged/colored in the new PULDB release. The most important accessory enzymes that directly assist polysaccharide degradation are the sulfatases, which remove sulfate groups from algal and mammalian-host glycans (25,37,38). Sulfatases now appear colored in yellow in PULDB and are labeled according to their SulfAtlas family classification (39). Proteins in the Major Facilitator Superfamily (MFS) are inner membrane transporters that participate in carbohydrate metabolism after polysaccharide depolymerization (40). Their presence in the vicinity of PULs and their participation in species growth have been demonstrated (41). MFS are thus colored in purple in PULDB, like SusC transporters, as well as ATP-Binding Cassette transporters. Even though PULDB has not been designed to annotate carbohydrate (monosaccharide) metabolism, in which a large variety of protein functions are involved, we intend to provide users with some indicators that several ‘unknown’ genes in a given PUL may not contribute to polysaccharide deconstruction. Thus, we colored in light blue and tagged domains of the ROK family (Repressors, ORFs and Kinases), and as well as some epimerases (42,43) that are frequently found in PULs. Finally, proteases have been shown to appear in some operons with susCD genes and to participate to the degradation of non-glycan substrates (20), raising the question of the extension the PUL paradigm beyond glycans. The observation of their high frequency in some PULs without CAZyme genes, motivates the integration of proteases in PULDB (gold-colored), labeled with the clan information of the MEROPS classification (44). All tags that can be searched and displayed in PULDB are shown in Figure 3, and are available at www.cazy.org/PULDB/tags.html.

Figure 3.

Module tags in PULDB. (A) Examples of predicted PULs including the newly tagged modules (highlighted in black boxes). (B) Complete list of tagged modules which can be searched and displayed in PULDB (listed at www.cazy.org/PULDB/tags.html).

CAZyme CLUSTERS

While most PULs resemble simple operonic systems, some substrates have been shown to activate the concerted action of several PULs, e.g. RGII (16), and sometimes a PUL and an additional gene cluster devoid of susCD genes, thus failing to fulfill the standard PUL paradigm. This was exemplified by the xylan degradation system of Bacteroides xylanisolvens (26). Indeed, when the complexity of the substrate increases, more enzymes are required for its breakdown and thus a ‘longer’ PUL needs to be maintained. This represents a challenge for bacteria to constrain all necessary enzymes within a single locus/regulatory system. Comparative genomics analysis of homologous PULs for RGII breakdown (16), the most complex known polysaccharide, revealed many species with several scattered loci, one containing susCD genes and several others made of three or more clustered CAZyme genes. To cope with such detached gene clusters, the present PULDB update introduces the display of so-called ‘CAZyme clusters’. To predict CAZyme clusters, we apply exactly the same algorithm as in PUL prediction, but instead of initiating the prediction around susCD genes, we start from a core of at least three adjacent CAZyme genes, not necessarily on the same strand, separated by a maximum of one single inserted gene. The display of CAZyme clusters in the PULDB web interface is accessible via a checkbox. CAZyme clusters will also help in PUL annotation of fragmented genomes. For example, despite an incomplete genome assembly, Bacteroides ovatus ATCC 8483 became a model Bacteroidetes species thanks to RNA analysis conducted by Martens and coworkers (45). The complete genome sequence obtained later; however, reveals that the incomplete initial assembly prevented the delineation of a large PUL (Bovatus_02505 to Bovatus_02540). This was because the locus was scattered across four different short scaffolds for which CAZyme cluster definition would have at least reported two of the three split clusters.

THE PUL ALIGNER

A new tool is presented in this PULDB release to allow a user to search and identify PULs that are similar to a PUL of interest, and is accessible in the web pages dedicated to each PUL. This tool is a PUL aligner which allows retrieval conserved modular organizations. Inspired from the RADS modular alignment method for proteins (46), this tool produces local alignments of a query PUL (or CAZyme cluster) against all PULs (and CAZyme clusters) in PULDB. However, instead of aligning concatenated amino-acid sequences of proteins, it treats each protein relevant to PUL function as one character. Implementing the classical Needleman–Wunsch algorithm (47), it requires a substitution-scoring matrix between modules, as well as gap costs. A simple scheme based on the most relevant features of PULs was empirically designed. Matches of identical glycoside hydrolase and polysaccharide lyase families are given a score of +200 because they are the main actors of the polysaccharide breakdown specificity, matches of all other proteins families a score of +100 and a match of the susCD pair a value of +50 only, due to its presence in all predicted PULs. Proteins tagged as unknown are ignored. Given that a mutation of a protein domain into another is an evolutionary event less likely than for amino-acids, our scoring scheme also favors gaps over substitutions by giving the following penalties: internal gap opening/extension: −20/−10 and terminal gap opening/extension: −10/−5; substitution: −50. As a result, the alignment scores allow the ranking of similar PULs from the most identical (syntenic) to the most rearranged. Figure 4 shows the results of a search starting from the xyloglucan PUL of B. ovatus ATCC 8483 (15) as the query and three aligned PULs with various conservation levels. The PUL aligner can also help in comparative genomics studies of a PUL, (i) by estimating its spread among strains of the same species, among its genus, and beyond, and (ii) by identifying the rearrangements (deletion/insertion) events that occurred during the evolution of a particular PUL.

Figure 4.

Illustration of the PUL aligner output from the literature-derived PUL 113 for xyloglucan utilization in Bacteroides ovatus ATCC 8483. The top left panel display the result summary, viz. the list of similar PULs (with links to each corresponding PUL webpage) ranked according to their scores, with links to the corresponding pairwise PUL alignment. The other panels present three pairwise alignments that obtained different scores in the top-left panel (highlighted by black arrows). The PUL modular organizations are displayed vertically with the query on the left and the subject on the right. Matching modules are separated by central green rectangles, while gaps are depicted by red rectangles and unaligned ‘unknown’ modules remained uncolored.

46 in total

1. Characterization of a gene cluster for sialoglycoconjugate utilization in Bacteroides fragilis.

Authors: Haruyuki Nakayama-Imaohji; Minoru Ichimura; Tomoya Iwasa; Natsumi Okada; Yoshinari Ohnishi; Tomomi Kuwahara
Journal: J Med Invest Date: 2012

2. Biochemical and structural analyses of a bacterial endo-β-1,2-glucanase reveal a new glycoside hydrolase family.

Authors: Koichi Abe; Masahiro Nakajima; Tetsuro Yamashita; Hiroki Matsunaga; Shinji Kamisuki; Takanori Nihira; Yuta Takahashi; Naohisa Sugimoto; Akimasa Miyanaga; Hiroyuki Nakai; Takatoshi Arakawa; Shinya Fushinobu; Hayao Taguchi
Journal: J Biol Chem Date: 2017-03-07 Impact factor: 5.157

3. New Family of Ulvan Lyases Identified in Three Isolates from the Alteromonadales Order.

Authors: Moran Kopel; William Helbert; Yana Belnik; Vitaliy Buravenkov; Asael Herman; Ehud Banin
Journal: J Biol Chem Date: 2016-01-13 Impact factor: 5.157

4. Lacto-N-biosidase encoded by a novel gene of Bifidobacterium longum subspecies longum shows unique substrate specificity and requires a designated chaperone for its active expression.

Authors: Haruko Sakurama; Masashi Kiyohara; Jun Wada; Yuji Honda; Masanori Yamaguchi; Satoru Fukiya; Atsushi Yokota; Hisashi Ashida; Hidehiko Kumagai; Motomitsu Kitaoka; Kenji Yamamoto; Takane Katayama
Journal: J Biol Chem Date: 2013-07-10 Impact factor: 5.157

5. Sialic acid (N-acetyl neuraminic acid) utilization by Bacteroides fragilis requires a novel N-acetyl mannosamine epimerase.

Authors: Christopher Brigham; Ruth Caughlan; Rene Gallegos; Mary Beth Dallas; Veronica G Godoy; Michael H Malamy
Journal: J Bacteriol Date: 2009-03-20 Impact factor: 3.490

6. Genetic determinants of in vivo fitness and diet responsiveness in multiple human gut Bacteroides.

Authors: Meng Wu; Nathan P McNulty; Dmitry A Rodionov; Matvei S Khoroshkin; Nicholas W Griffin; Jiye Cheng; Phil Latreille; Randall A Kerstetter; Nicolas Terrapon; Bernard Henrissat; Andrei L Osterman; Jeffrey I Gordon
Journal: Science Date: 2015-10-02 Impact factor: 47.728

7. The N-glycan glycoprotein deglycosylation complex (Gpd) from Capnocytophaga canimorsus deglycosylates human IgG.

Authors: Francesco Renzi; Pablo Manfredi; Manuela Mally; Suzette Moes; Paul Jenö; Guy R Cornelis
Journal: PLoS Pathog Date: 2011-06-30 Impact factor: 6.823

8. Matching the Diversity of Sulfated Biomolecules: Creation of a Classification Database for Sulfatases Reflecting Their Substrate Specificity.

Authors: Tristan Barbeyron; Loraine Brillet-Guéguen; Wilfrid Carré; Cathelène Carrière; Christophe Caron; Mirjam Czjzek; Mark Hoebeke; Gurvan Michel
Journal: PLoS One Date: 2016-10-17 Impact factor: 3.240

9. A polysaccharide utilization locus from Flavobacterium johnsoniae enables conversion of recalcitrant chitin.

Authors: Johan Larsbrink; Yongtao Zhu; Sampada S Kharade; Kurt J Kwiatkowski; Vincent G H Eijsink; Nicole M Koropatkin; Mark J McBride; Phillip B Pope
Journal: Biotechnol Biofuels Date: 2016-11-28 Impact factor: 6.040

10. Twenty years of the MEROPS database of proteolytic enzymes, their substrates and inhibitors.

Authors: Neil D Rawlings; Alan J Barrett; Robert Finn
Journal: Nucleic Acids Res Date: 2015-11-02 Impact factor: 16.971

64 in total

1. Adaptation of Syntenic Xyloglucan Utilization Loci of Human Gut Bacteroidetes to Polysaccharide Side Chain Diversity.

Authors: Guillaume Déjean; Alexandra S Tauzin; Stuart W Bennett; A Louise Creagh; Harry Brumer
Journal: Appl Environ Microbiol Date: 2019-10-01 Impact factor: 4.792

2. Interspecies Competition Impacts Targeted Manipulation of Human Gut Bacteria by Fiber-Derived Glycans.

Authors: Michael L Patnode; Zachary W Beller; Nathan D Han; Jiye Cheng; Samantha L Peters; Nicolas Terrapon; Bernard Henrissat; Sophie Le Gall; Luc Saulnier; David K Hayashi; Alexandra Meynier; Sophie Vinoy; Richard J Giannone; Robert L Hettich; Jeffrey I Gordon
Journal: Cell Date: 2019-09-19 Impact factor: 41.582

3. Prospecting for microbial α-N-acetylgalactosaminidases yields a new class of GH31 O-glycanase.

Authors: Peter Rahfeld; Jacob F Wardman; Kevin Mehr; Drew Huff; Connor Morgan-Lang; Hong-Ming Chen; Steven J Hallam; Stephen G Withers
Journal: J Biol Chem Date: 2019-09-17 Impact factor: 5.157

Review 4. If you eat it, or secrete it, they will grow: the expanding list of nutrients utilized by human gut bacteria.

Authors: Robert W P Glowacki; Eric C Martens
Journal: J Bacteriol Date: 2020-11-09 Impact factor: 3.490

5. The agar-specific hydrolase ZgAgaC from the marine bacterium Zobellia galactanivorans defines a new GH16 protein subfamily.

Authors: Anaïs Naretto; Mathieu Fanuel; David Ropartz; Hélène Rogniaux; Robert Larocque; Mirjam Czjzek; Charles Tellier; Gurvan Michel
Journal: J Biol Chem Date: 2019-03-07 Impact factor: 5.157

6. Structural insights into β-1,3-glucan cleavage by a glycoside hydrolase family.

Authors: Camila R Santos; Pedro A C R Costa; Plínio S Vieira; Sinkler E T Gonzalez; Thamy L R Correa; Evandro A Lima; Fernanda Mandelli; Renan A S Pirolla; Mariane N Domingues; Lucelia Cabral; Marcele P Martins; Rosa L Cordeiro; Atílio T Junior; Beatriz P Souza; Érica T Prates; Fabio C Gozzo; Gabriela F Persinoti; Munir S Skaf; Mario T Murakami
Journal: Nat Chem Biol Date: 2020-05-25 Impact factor: 15.040

7. Structural and functional analyses of glycoside hydrolase 138 enzymes targeting chain A galacturonic acid in the complex pectin rhamnogalacturonan II.

Authors: Aurore Labourel; Arnaud Baslé; Jose Munoz-Munoz; Didier Ndeh; Simon Booth; Sergey A Nepogodiev; Robert A Field; Alan Cartmell
Journal: J Biol Chem Date: 2019-03-15 Impact factor: 5.157

8. A Novel Auxiliary Agarolytic Pathway Expands Metabolic Versatility in the Agar-Degrading Marine Bacterium Colwellia echini A3^T.

Authors: Duleepa Pathiraja; Line Christiansen; Byeonghyeok Park; Mikkel Schultz-Johansen; Geul Bang; Peter Stougaard; In-Geol Choi
Journal: Appl Environ Microbiol Date: 2021-05-26 Impact factor: 4.792

9. Surface glycan-binding proteins are essential for cereal beta-glucan utilization by the human gut symbiont Bacteroides ovatus.

Authors: Kazune Tamura; Matthew H Foley; Bernd R Gardill; Guillaume Dejean; Matthew Schnizlein; Constance M E Bahr; A Louise Creagh; Filip van Petegem; Nicole M Koropatkin; Harry Brumer
Journal: Cell Mol Life Sci Date: 2019-05-06 Impact factor: 9.261

10. A unique combination of glycoside hydrolases in Streptococcus suis specifically and sequentially acts on host-derived αGal-epitope glycans.

Authors: Ping Chen; Ran Liu; Mengmeng Huang; Jinlu Zhu; Dong Wei; Francis J Castellino; Guanghui Dang; Fang Xie; Gang Li; Ziyin Cui; Siguo Liu; Yueling Zhang
Journal: J Biol Chem Date: 2020-06-09 Impact factor: 5.157