| Literature DB >> 34850161 |
Elodie Drula1,2, Marie-Line Garron1,2, Suzan Dogan1,2, Vincent Lombard1,2, Bernard Henrissat1,2,3,4, Nicolas Terrapon1,2.
Abstract
Thirty years have elapsed since the emergence of the classification of carbohydrate-active enzymes in sequence-based families that became the CAZy database over 20 years ago, freely available for browsing and download at www.cazy.org. In the era of large scale sequencing and high-throughput Biology, it is important to examine the position of this specialist database that is deeply rooted in human curation. The three primary tasks of the CAZy curators are (i) to maintain and update the family classification of this class of enzymes, (ii) to classify sequences newly released by GenBank and the Protein Data Bank and (iii) to capture and present functional information for each family. The CAZy website is updated once a month. Here we briefly summarize the increase in novel families and the annotations conducted during the last 8 years. We present several important changes that facilitate taxonomic navigation, and allow to download the entirety of the annotations. Most importantly we highlight the considerable amount of work that accompanies the analysis and report of biochemical data from the literature.Entities:
Mesh:
Substances:
Year: 2022 PMID: 34850161 PMCID: PMC8728194 DOI: 10.1093/nar/gkab1045
Source DB: PubMed Journal: Nucleic Acids Res ISSN: 0305-1048 Impact factor: 16.971
Growth of the CAZy database during the past 8 years. For the distinct protein classes (rows), columns indicate: the total number of modules annotated in CAZy (‘Modules’); the number of modules which have been biochemically characterized (‘Characterized’); the number of modules with at least one 3D-structure in the PDB (‘With Structure’); and the number of created families in each class (‘Families’)
| Modules | Characterized | With Structure | Families | |||||
|---|---|---|---|---|---|---|---|---|
| Protein class | Dec-2013 | Sept-2021 | Dec-2013 | Sept-2021 | Dec-2013 | Sept-2021 | Dec-2013 | Sept-2021 |
| GH | 162 550 | 995 295 | 6 094 | 7 248 | 790 | 1 567 | 133 | 171 |
| GT | 122 853 | 849 449 | 1 507 | 1 862 | 137 | 298 | 94 | 114 |
| PL | 4 114 | 31 710 | 246 | 357 | 52 | 99 | 23 | 42 |
| CE | 16 467 | 97 226 | 198 | 216 | 61 | 99 | 16 | 19 |
| CBM | 33 793 | 277 412 | - | - | 269 | 408 | 68 | 88 |
| AA | 4 921 | 18 935 | 134 | 255 | 56 | 118 | 11 | 17 |
Figure 1.The new headers of CAZy (sub)family webpages. Boxes highlight the novel features: (1) direct access to the form to report functional characterization(s); (2) download link to the complete list of protein accessions and their CAZy modules; (3) for subfamilies with characterized members, only the subfamily-specific functions are now listed; (4) taxonomic tabs have been removed and a ‘Download’ tab gives access to a text file corresponding to previous ‘ALL’ tab with the complete list of protein accessions belonging to the (sub)family; (5) a ‘Taxonomic display’ tab links to a Krona visualization of the family members as illustrated in Figure 2; (6) links to the publications that describe the functions of the enzymes are now given (preferentially PubMed, otherwise DOI or occasionally URL).
Figure 2.Krona chart browsing into family GH5. After navigation through taxonomic levels, this figure shows the display of the Basidiomycota, in bold in the center. The central part also recalls the results of the text search performed from the top form with the string ‘mi’. Results are highlighted both within the Basidiomycota, two matching genomes at right have their sector highlighted and the ‘mi’ string in yellow background, as well as in the levels above as the center shows for example 11 results in Dikarya. All sectors are color-coded according to the number of GH5 modules in the complete genome, and one sector was selected, the genome of Ceratobasidium sp. AG-Ba JN. Once a sector is selected, various links appear at the top right, to CAZy or NCBI, and multiple pie charts illustrate the representativeness of this genome in its taxonomic lineage.