| Literature DB >> 30850540 |
William Helbert1, Laurent Poulet2, Sophie Drouillard2, Sophie Mathieu2, Mélanie Loiodice2, Marie Couturier2, Vincent Lombard3,4, Nicolas Terrapon3,4, Jeremy Turchetto3,4, Renaud Vincentelli3,4, Bernard Henrissat5,4,6.
Abstract
Over the last two decades, the number of gene/protein sequences gleaned from sequencing projects of individual genomes and environmental DNA has grown exponentially. Only a tiny fraction of these predicted proteins has been experimentally characterized, and the function of most proteins remains hypothetical or only predicted based on sequence similarity. Despite the development of postgenomic methods, such as transcriptomics, proteomics, and metabolomics, the assignment of function to protein sequences remains one of the main challenges in modern biology. As in all classes of proteins, the growing number of predicted carbohydrate-active enzymes (CAZymes) has not been accompanied by a systematic and accurate attribution of function. Taking advantage of the CAZy database, which groups CAZymes into families and subfamilies based on amino acid similarities, we recombinantly produced 564 proteins selected from subfamilies without any biochemically characterized representatives, from distant relatives of characterized enzymes and from nonclassified proteins that show little similarity with known CAZymes. Screening these proteins for activity on a wide collection of carbohydrate substrates led to the discovery of 13 CAZyme families (two of which were also discovered by others during the course of our work), revealed three previously unknown substrate specificities, and assigned a function to 25 subfamilies.Entities:
Keywords: CAZymes; polysaccharides; screening
Mesh:
Substances:
Year: 2019 PMID: 30850540 PMCID: PMC6442616 DOI: 10.1073/pnas.1815791116
Source DB: PubMed Journal: Proc Natl Acad Sci U S A ISSN: 0027-8424 Impact factor: 11.205
Fig. 1.Overexpression results. (A) Results presented according to three broad classes: (i) proteins from uncharacterized subfamilies within known families (GH/PL subfamilies), (ii) proteins classified into a CAZy family but only distantly related to characterized members (GH/PL distant), and (iii) remote homologs whose similarity is too low for inclusion in existing CAZy families (highly remote). “Soluble” refers to overexpressed proteins purified by nickel-affinity chromatography and detected using gel electrophoresis; “not observed,” to proteins that did not bind to any affinity column (e.g., inclusion bodies, misfolded proteins). (B) Absolute frequency of overexpressed enzymes according to their molecular mass.
Assignment of function to 25 subfamilies
| CAZy subfamily | GenBank accession no. | Substrate | Organism |
| GH5_13 | ZP_02065960.1 | pNP-β- | |
| GH5_13 | WP_018627464.1 | pNP-α- | |
| GH5_18 | ACU71175.1 | pNP-β- | |
| GH5_35 | ACT02895.1 | Arabinoxylan | |
| GH5_40 | SCG47572.1 | Konjac glucomannan | |
| GH5_41 | ABD80383.1 | β-mannan | |
| GH5_43 | ADI04784.1 | pNP-β- | |
| GH5_45 | SDT09889.1 | pNP-α- | |
| GH5_45 | ACO76963.1 | pNP-β- | |
| GH13_38 | WP_029428030.1 | pNP-α- | |
| GH13_38 | ABD79820.1 | pNP-α- | |
| GH30_6 | WP_028726386.1 | pNP-β- | |
| GH43_2 | ACU61943.1 | pNP-α- | |
| GH43_2 | SDS19757.1 | pNP-α- | |
| GH43_3 | WP_007211145.1 | pNP-β- | |
| GH43_8 | EIY66405.1 | pNP-β- | |
| GH43_9 | AMX03466.1 | pNP-α- | |
| GH43_17 | ADQ05609.1 | pNP-α- | |
| GH43_18 | WP_029328006.1 | pNP-α- | |
| GH43_18 | WP_029427512.1 | pNP-α- | |
| GH43_18 | WP_018628786.1 | pNP-α- | |
| GH43_18 | AHF90946.1 | pNP-α- | |
| GH43_20 | SCF26596.1 | pNP-α- | |
| GH43_20 | CBG71495.1 | pNP-α- | |
| GH43_23 | ADO69162.1 | pNP-α- | |
| GH43_30 | SCG78792.1 | pNP-β- | |
| GH43_30 | ADD39925.1 | pNP-β- | |
| GH43_31 | AFL85801.1 | pNP-β- | |
| GH43_32 | ACB77177.1 | pNP-β- | |
| GH43_32 | SDH69004.1 | pNP-β- | |
| GH43_34 | WP_044096317.1 | pNP-α- | |
| GH43_34 | ZP_02066340.1 | pNP-β- | |
| GH43_34 | ACS99115.1 | pNP-β- | |
| GH43_37 | ADJ47124.1 | pNP-β- | |
| PL7_4 | ACU70527.1 | β-glucuronan | |
| PL14_2 | AAC96919.1 | Alginate | |
| PL15_2 | ALJ58962.1 | Heparan sulfate |
Enzyme activities (substrate specificities) were established using colorimetric and/or chromatography assays. The substrates used as well as the organism of origin of the protein are indicated. “Weak” indicates limited cleavage.
Activity of enzymes distantly related to the described GH or PL (GH/PLxx_dist) families
| Distant CAZy family | GenBank accession no. | Substrate | Organism |
| GH2_dist | WP_029427454.1 | pNP-β- | |
| GH2_dist | WP_029428707.1 | Tamarind gum (new) | |
| GH2_dist | WP_029428765.1 | pNP-β- | |
| GH2_dist | WP_018628801.1 | pNP-β- | |
| GH3_dist | AJG33435.1 | pNP-β- | |
| GH5_dist | ZP_06241352.1 | pNP-β- | |
| GH10_dist | EMS72420.1 | pNP-β- | |
| GH16_dist | ZP_02063674.1 | pNP-β- | |
| GH20_dist | AEV99795.1 | pNP-β- | |
| GH20_dist | AHF94523.1 | pNP-β- | Opitutaceae bacterium TAV5 |
| GH31_dist | EIY61740.1 | pNP-α- | |
| GH36_dist | EIY66649.1 | pNP-α- | |
| GH36_dist | ACS99969.1 | pNP-α- | |
| GH36_dist | ACS99975.1 | pNP-α- | |
| GH36_dist | ZP_06242255.1 | pNP-α- | |
| GH42_dist | EIY59668.1 | pNP-α- | |
| GH49_dist | EDY96541.1 | ||
| GH49_dist | EDY96565.1 | ||
| GH51_dist | WP_084555785.1 | Lichenan (new) | |
| GH76_dist | ADO68190.1 | pNP-α- | |
| GH106_dist | WP_018627535.1 | pNP-α- | |
| GH106_dist | ACT02314.1 | pNP-α- | |
| GH117_dist | WP_010134686.1 | pNP-β- | Flavobacteriaceae bacterium S85 |
This set encompasses enzymes that fall outside of established subfamilies or that are only distantly related to biochemically characterized enzymes. “New” designates novel specificity in the family. CWP, cell wall polysaccharide.
Substrate specificity of new CAZy families
| New family | GenBank accession no. | Substrate | Activity | Organism |
| GH147 | WP_029428318.1 | β-galactan | Endo-β-(1,4)-galactanase | |
| GH147 | EFI37897.1 | β-galactan | Endo-β-(1,4)-galactanase | |
| GH148 | AGN79260.1 | Konjac glucomannan | Endo-β-(1,4)-glucosidase | |
| GH148 | ACR13278.1 | Konjac glucomannan | Endo-β-(1,4)-glucosidase | |
| GH157 | WP_029429093.1 | CM-curdlan | Endo-β-glycosidase | |
| GH158 | ZP_06243608.1 | CM-curdlan | Endo-β-glycosidase | |
| GH159 | WP_007210837.1 | pNP-β- | β- | |
| GH160 | AEI51087.1 | EPS | Endo-β-(1,4)-galactosidase | |
| PL30 | WP_029426181.1 | Hyaluronan | Endo-hyaluronan lyase | |
| PL31 | ABD82242.1 | β-glucuronan | Endo-β-(1,4)-glucuronan lyase | |
| PL31 | AGF62897.1 | β-glucuronan | Endo-β-(1,4)-glucuronan lyase | |
| PL32 | EIY62149.1 | β-mannuronan | Endo-mannuronan lyase | |
| PL33 | ALJ61728.1 | Hyaluronan | Endo-hyaluronan | |
| PL33 | AHF90976.1 | Gellan (new) | Endo-gellan lyase | Opitutaceae bacterium TAV5 |
| PL33 | AHF90672.1 | Chondroitin sulfate | Endo-chondroitin sulfate lyase | Opitutaceae bacterium TAV5 |
| PL33 | AHF90411.1 | Gellan (new) | Endo-gellan lyase | Opitutaceae bacterium TAV5 |
| PL34 | AHF91913.1 | Alginate | Endo-alginate lyase | Opitutaceae bacterium TAV5 |
| PL35 | ZP_06241351.1 | Chondroitin | Endo-chondroitin lyase | |
| PL36 | WP_084332190.1 | β-mannuronan | Endo-mannuronan lyase |
The substrate and the modality of substrate degradation are specified. “New” designates novel specificity not reported previously. Note that families GH147 and 148 were reported by other groups during the course of our work (30, 31). CM, carboxymethyl.