| Literature DB >> 29088389 |
Nicolas Terrapon1,2, Vincent Lombard1,2, Élodie Drula1,2, Pascal Lapébie1,2, Saad Al-Masaudi3, Harry J Gilbert4, Bernard Henrissat1,2,3.
Abstract
The Polysaccharide Utilization Loci (PUL) database was launched in 2015 to present PUL predictions in ∼70 Bacteroidetes species isolated from the human gastrointestinal tract, as well as PULs derived from the experimental data reported in the literature. In 2018 PULDB offers access to 820 genomes, sampled from various environments and covering a much wider taxonomical range. A Krona dynamic chart was set up to facilitate browsing through taxonomy. Literature surveys now allows the presentation of the most recent (i) PUL repertoires deduced from RNAseq large-scale experiments, (ii) PULs that have been subjected to in-depth biochemical analysis and (iii) new Carbohydrate-Active enzyme (CAZyme) families that contributed to the refinement of PUL predictions. To improve PUL visualization and genome browsing, the previous annotation of genes encoding CAZymes, regulators, integrases and SusCD has now been expanded to include functionally relevant protein families whose genes are significantly found in the vicinity of PULs: sulfatases, proteases, ROK repressors, epimerases and ATP-Binding Cassette and Major Facilitator Superfamily transporters. To cope with cases where susCD may be absent due to incomplete assemblies/split PULs, we present 'CAZyme cluster' predictions. Finally, a PUL alignment tool, operating on the tagged families instead of amino-acid sequences, was integrated to retrieve PULs similar to a query of interest. The updated PULDB website is accessible at www.cazy.org/PULDB_new/.Entities:
Mesh:
Substances:
Year: 2018 PMID: 29088389 PMCID: PMC5753385 DOI: 10.1093/nar/gkx1022
Source DB: PubMed Journal: Nucleic Acids Res ISSN: 0305-1048 Impact factor: 16.971
Figure 1.Krona multilayered pie-charts of taxonomy in PULDB. The top-left corner includes web-browser classical features (text search area and buttons for browsing back and forward), and display features (depth of the taxonomy, font and chart sizes). The bottom-left corner displays the color scale that represents the number of PULs per species (averaged in ancestral nodes). The top-right corner indicates the selected taxonomic level and its relative information: the number of species (with a link to the listing), a link to PULDB to visualize all PULs for this taxon, a link to the NCBI taxonomy, etc. (A) Initial display of the most general taxonomic level, labeled ALL at the center, with a search for the character string ‘frigo’ highlighting the taxa having a positive result. (B) Display of the Sphingobacteriaceae level, resulting from a zoom-in by double-clicking on the ‘Sphingobacteriaceae’ area in (A) chart. Going back to (A) or intermediary levels is possible through the lineage links at the center.
Figure 2.Example of the improved PUL predictions by the inclusion of recently created CAZyme families in the RGII PUL of Terrimonas ferruginea DSM 30193. Panel (A) displays the JBrowse view (35) of the region before the creation of families GH136-GH143 (16). The predicted PUL is depicted at the bottom of the panel by a green, yellow and red line, according to confidence levels as previously described (3). Panel (B) displays the same region with the genes belonging to these seven families now annotated and highlighted by black boxes. These annotations lead to a PUL prediction with improved confidence (left and middle arrows), and improved PUL boundaries (right arrow), compared to (A).
Figure 3.Module tags in PULDB. (A) Examples of predicted PULs including the newly tagged modules (highlighted in black boxes). (B) Complete list of tagged modules which can be searched and displayed in PULDB (listed at www.cazy.org/PULDB/tags.html).
Figure 4.Illustration of the PUL aligner output from the literature-derived PUL 113 for xyloglucan utilization in Bacteroides ovatus ATCC 8483. The top left panel display the result summary, viz. the list of similar PULs (with links to each corresponding PUL webpage) ranked according to their scores, with links to the corresponding pairwise PUL alignment. The other panels present three pairwise alignments that obtained different scores in the top-left panel (highlighted by black arrows). The PUL modular organizations are displayed vertically with the query on the left and the subject on the right. Matching modules are separated by central green rectangles, while gaps are depicted by red rectangles and unaligned ‘unknown’ modules remained uncolored.