| Literature DB >> 30053267 |
Le Huang1, Han Zhang1, Peizhi Wu1, Sarah Entwistle2, Xueqiong Li2, Tanner Yohe3, Haidong Yi1, Zhenglu Yang1, Yanbin Yin2.
Abstract
Carbohydrate-active enzyme (CAZymes) are not only the most important enzymes for bioenergy and agricultural industries, but also very important for human health, in that human gut microbiota encode hundreds of CAZyme genes in their genomes for degrading various dietary and host carbohydrates. We have built an online database dbCAN-seq (http://cys.bios.niu.edu/dbCAN_seq) to provide pre-computed CAZyme sequence and annotation data for 5,349 bacterial genomes. Compared to the other CAZyme resources, dbCAN-seq has the following new features: (i) a convenient download page to allow batch download of all the sequence and annotation data; (ii) an annotation page for every CAZyme to provide the most comprehensive annotation data; (iii) a metadata page to organize the bacterial genomes according to species metadata such as disease, habitat, oxygen requirement, temperature, metabolism; (iv) a very fast tool to identify physically linked CAZyme gene clusters (CGCs) and (v) a powerful search function to allow fast and efficient data query. With these unique utilities, dbCAN-seq will become a valuable web resource for CAZyme research, with a focus complementary to dbCAN (automated CAZyme annotation server) and CAZy (CAZyme family classification and reference database).Entities:
Mesh:
Substances:
Year: 2018 PMID: 30053267 PMCID: PMC5753378 DOI: 10.1093/nar/gkx894
Source DB: PubMed Journal: Nucleic Acids Res ISSN: 0305-1048 Impact factor: 16.971
CAZymes are most abundant in plants and plant-associated microbes
| Organism | # of CAZymes | % of gene content |
|---|---|---|
|
| 1,134 | 4.1% |
|
| 506 | 3.9% |
|
| 391 | 8.2% |
|
| 106 | 2.6% |
|
| 340 | 1.7% |
Figure 1.dbCAN has been cited in various research fields.
Figure 2.CAZymes in different phyla. Pie charts (left): the relative fraction of different CAZyme classes, which include GTs (glycosyltransferases), GHs (glycoside hydrolases), PLs (polysaccharide lyases), CEs (carbohydrate esterases), AAs (enzymes of the auxiliary activities), and CBMs (carbohydrate-binding modules). Box plots (right): the percentage of CAZymes in different bacterial phyla. The number in the parentheses is the number of genomes.
Figure 3.CAZyme gene cluster (CGC) definition and example. (A) Definition of CGC: One CGC must contain at least one CAZyme (red). Two other signature gene classes could also be present: TF (green) and TC (blue) genes. A small number of non-signature genes (gray) can be inserted between two neighboring signature genes. (B) An example CGC from Paludibacter propionicigenes WB4, which has 42 genes in one cluster including 35 CAZymes (red), 1 TC (blue), 2 TFs (green) and 4 other genes (gray). The CAZyme gene labels are based on CAZyme domain assignment. The TC gene labels are based on the best hit in TC-DB. The TF gene labels are based on the best hit in a few TF databases (see main text).