| Literature DB >> 25125445 |
Alexander Ekstrom1, Rahil Taujale1, Nathan McGinn1, Yanbin Yin2.
Abstract
PlantCAZyme is a database built upon dbCAN (database for automated carbohydrate active enzyme annotation), aiming to provide pre-computed sequence and annotation data of carbohydrate active enzymes (CAZymes) to plant carbohydrate and bioenergy research communities. The current version contains data of 43,790 CAZymes of 159 protein families from 35 plants (including angiosperms, gymnosperms, lycophyte and bryophyte mosses) and chlorophyte algae with fully sequenced genomes. Useful features of the database include: (i) a BLAST server and a HMMER server that allow users to search against our pre-computed sequence data for annotation purpose, (ii) a download page to allow batch downloading data of a specific CAZyme family or species and (iii) protein browse pages to provide an easy access to the most comprehensive sequence and annotation data. DATABASE URL: http://cys.bios.niu.edu/plantcazyme/Entities:
Mesh:
Substances:
Year: 2014 PMID: 25125445 PMCID: PMC4132414 DOI: 10.1093/database/bau079
Source DB: PubMed Journal: Database (Oxford) ISSN: 1758-0463 Impact factor: 3.451
Thirty-five plant and algal genomes that are included in the PlantCAZyme database
| Clade | Source | # of genes | # of CAZyme genes | % of CAZyme genes | |
|---|---|---|---|---|---|
| Chlorophyte | Phytozome | 14 971 | 198 | 1.32 | |
| Chlorophyte | Phytozome | 20 497 | 285 | 1.39 | |
| Bryophyta | Phytozome | 21 173 | 857 | 4.05 | |
| Lycophyta | Phytozome | 22 285 | 919 | 4.12 | |
| Gymnosperm | Congenie | 71 158 | 1843 | 2.59 | |
| Dicot | Phytozome | 24 823 | 1099 | 4.43 | |
| Dicot | Phytozome | 32 670 | 1232 | 3.77 | |
| Dicot | Phytozome | 27 416 | 1224 | 4.46 | |
| Dicot | Phytozome | 40 905 | 1812 | 4.43 | |
| Dicot | Phytozome | 26 521 | 1211 | 4.57 | |
| Dicot | Phytozome | 27 769 | 845 | 3.04 | |
| Dicot | Phytozome | 24 553 | 1098 | 4.47 | |
| Dicot | Phytozome | 25 379 | 1083 | 4.27 | |
| Dicot | Phytozome | 21 503 | 1008 | 4.69 | |
| Dicot | Phytozome | 36 376 | 1711 | 4.70 | |
| Dicot | Phytozome | 65 662 | 1105 | 1.68 | |
| Dicot | Phytozome | 54 175 | 2354 | 4.35 | |
| Dicot | Phytozome | 37 505 | 1648 | 4.39 | |
| Dicot | Phytozome | 43 471 | 2018 | 4.64 | |
| Dicot | Phytozome | 63 514 | 2220 | 3.50 | |
| Dicot | Phytozome | 30 666 | 1442 | 4.70 | |
| Dicot | Phytozome | 44 135 | 1173 | 2.66 | |
| Dicot | Phytozome | 26 718 | 1271 | 4.76 | |
| Dicot | Phytozome | 27 197 | 1351 | 4.97 | |
| Dicot | Phytozome | 41 335 | 1751 | 4.24 | |
| Dicot | Phytozome | 27 864 | 1288 | 4.62 | |
| Dicot | Phytozome | 31 221 | 1135 | 3.64 | |
| Dicot | Phytozome | 26 351 | 1132 | 4.30 | |
| Dicot | Phytozome | 26 346 | 1096 | 4.16 | |
| Monocot | Phytozome | 26 552 | 1243 | 4.68 | |
| Monocot | Phytozome | 39 234 | 1363 | 3.47 | |
| Monocot | Phytozome | 65 878 | 2624 | 3.98 | |
| Monocot | Phytozome | 35 471 | 1487 | 4.19 | |
| Monocot | Phytozome | 27 608 | 1334 | 4.83 | |
| Monocot | Phytozome | 39 656 | 1475 | 3.72 |
Figure 1.Evaluation of the impact of E-value and coverage parameters to the accuracy of pre-computed PlantCAZyme sequence data for Arabidopsis and rice; x-axis (horizontal): E-value, y-axis (vertical): F-measure, Z-axis: coverage. For both species, E-value < 1e–23 and coverage > 0.2 gave the highest F-measure. The detailed calculations are provided in Supplementary Table S1 and S2.
The E-value and Coverage cutoffs that lead to the best F-measure in Arabidopsis
| Arabidopsis | # of CAZyme families | Coverage | Sensitivity | Precision | ||
|---|---|---|---|---|---|---|
| 98 | 1.00 | 0.2 | 0.909236762 | 0.894071914 | 0.924924925 | |
| 43 | 1.00E | 0.25 | 0.937634409 | 0.947826087 | 0.927659574 | |
| 36 | 1.00 | 0.05 | 0.974811083 | 0.969924812 | 0.979746835 | |
| 5 | 1.00 | 0.95 | 0.945741134 | 0.917647059 | 0.975609756 | |
| 2 | 1.00 | 0.25 | 0.970588235 | 0.970588235 | 0.970588235 | |
| 10 | 1.00 | 0.75 | 0.79613773 | 0.821428571 | 0.772357724 |
The E-value and coverage cutoffs that lead to the best F-measure in Rice
| Rice | # of CAZyme families | Coverage | Sensitivity | Precision | ||
|---|---|---|---|---|---|---|
| 97 | 1.00 | 0.2 | 0.845169681 | 0.840619308 | 0.849769585 | |
| 44 | 1.00 | 0.35 | 0.906381793 | 0.908931699 | 0.903846154 | |
| 35 | 1.00 | 0.1 | 0.92415331 | 0.91745283 | 0.930952381 | |
| 5 | 1.00 | 0.95 | 0.913545252 | 0.905660377 | 0.921568627 | |
| 2 | 1.00 | 0.7 | 0.827586207 | 0.75 | 0.923076923 | |
| 9 | 1.00 | 0.45 | 0.716031632 | 0.857142857 | 0.614814815 |
Figure 2.A schematic architecture of the PlantCAZyme database