| Literature DB >> 32680862 |
Geizecler Tomazetto1,2, Agnes C Pimentel3, Daniel Wibberg4, Neil Dixon2, Fabio M Squina5.
Abstract
Lignocellulose is one of the most abundant renewable carbon sources, representing an alternative to petroleum for the production of fuel and chemicals. Nonetheless, the lignocellulose saccharification process, to release sugars for downstream applications, is one of the most crucial factors economically challenging to its use. The synergism required among the various carbohydrate-active enzymes (CAZymes) for efficient lignocellulose breakdown is often not satisfactorily achieved with an enzyme mixture from a single strain. To overcome this challenge, enrichment strategies can be applied to develop microbial communities with an efficient CAZyme arsenal, incorporating complementary and synergistic properties, to improve lignocellulose deconstruction. We report a comprehensive and deep analysis of an enriched rumen anaerobic consortium (ERAC) established on sugarcane bagasse (SB). The lignocellulolytic abilities of the ERAC were confirmed by analyzing the depolymerization of bagasse by scanning electron microscopy, enzymatic assays, and mass spectrometry. Taxonomic analysis based on 16S rRNA sequencing elucidated the community enrichment process, which was marked by a higher abundance of Firmicutes and Synergistetes species. Shotgun metagenomic sequencing of the ERAC disclosed 41 metagenome-assembled genomes (MAGs) harboring cellulosomes and polysaccharide utilization loci (PULs), along with a high diversity of CAZymes. The amino acid sequences of the majority of the predicted CAZymes (60% of the total) shared less than 90% identity with the sequences found in public databases. Additionally, a clostridial MAG identified in this study produced proteins during consortium development with scaffoldin domains and CAZymes appended to dockerin modules, thus representing a novel cellulosome-producing microorganism.IMPORTANCE The lignocellulolytic ERAC displays a unique set of plant polysaccharide-degrading enzymes (with multimodular characteristics), cellulosomal complexes, and PULs. The MAGs described here represent an expansion of the genetic content of rumen bacterial genomes dedicated to plant polysaccharide degradation, therefore providing a valuable resource for the development of biocatalytic toolbox strategies to be applied to lignocellulose-based biorefineries.Entities:
Keywords: anaerobic consortium; lignocellulose degradation; metagenome; metasecretome; polysaccharide utilization loci; rumen
Mesh:
Substances:
Year: 2020 PMID: 32680862 PMCID: PMC7480376 DOI: 10.1128/AEM.00199-20
Source DB: PubMed Journal: Appl Environ Microbiol ISSN: 0099-2240 Impact factor: 4.792
FIG 1Biochemical assays using the enriched rumen anaerobic consortium (ERAC) metaproteome against nine different substrates. Reducing sugars were released from reactions of the ERAC metaproteome against xylan, lichenan, β-glucan, rye arabinoxylan, xyloglucan, rhamnogalacturonan, pectin, mannan, and carboxymethyl cellulose sodium salt (CMC).
FIG 2Scanning electron microscopy images of the sugarcane bagasse prior to incubation (A and B) and after 7 days of incubation (C and D) with the enriched rumen anaerobic consortium (ERAC).
FIG 3Relative abundance (%) of the phylum (A) and family (B) taxons identified in the cow rumen sample and enriched rumen anaerobic consortium (ERAC). Abundances were determined based on the 16S rRNA gene amplicon sequences. Phyla represented by less than 1% and families represented by less than 3% of the total reads were combined in the groups named “Phyla < 1%” and “Families < 3%,” respectively.
The most common CAZyme modules predicted in the total ERAC metagenome and their relative abundance in ERACgs, according to their representation in the CAZy database
| Family | No. of CAZyme modules | |
|---|---|---|
| Total metagenome | ERACgs | |
| Most common GH families | ||
| GH13 | 181 | 147 |
| GH3 | 126 | 100 |
| GH2 | 117 | 101 |
| GH43 | 117 | 84 |
| GH23 | 84 | 56 |
| GH5 | 69 | 47 |
| GH25 | 67 | 47 |
| GH77 | 56 | 41 |
| GH31 | 52 | 39 |
| Most common CBM families | ||
| CBM50 | 187 | 139 |
| CBM32 | 98 | 80 |
| CBM48 | 66 | 53 |
| CBM6 | 33 | 15 |
| CBM67 | 29 | 27 |
| Most common CE families | ||
| CE1 | 223 | 139 |
| CE10 | 173 | 122 |
| CE4 | 148 | 110 |
| CE3 | 92 | 60 |
| CE9 | 43 | 28 |
| CE1 | 223 | 139 |
| Most common PL families | ||
| PL12 | 18 | 13 |
| PL22 | 14 | 14 |
| PL1 | 11 | 7 |
| Most common AA families | ||
| AA6 | 136 | 83 |
| AA3 | 17 | 10 |
Abbreviations: ERAC, enriched rumen anaerobic consortium; ERACgs, enriched rumen anaerobic consortium genomes; CAZyme and CAZy, carbohydrate-active enzyme; GH, glycoside hydrolase; CBM, carbohydrate-binding module; CE, carbohydrate esterases; PL, polysaccharide lyases; AA, auxiliary activities.
FIG 4Distribution of the percent identity of the carbohydrate-active enzyme (CAZyme) sequences predicted in the enriched rumen anaerobic consortium (ERAC) against the four classes of the CAZy database. Only the maximum percent identities for each CAZyme ERAC were considered. GH, glycoside hydrolase; PL, polysaccharide lyases; CE, carbohydrate esterases; AA, auxiliary activities.
Genomic features of ERACgs from ERAC metagenome shotgun sequencing
| ERACg identifier | Phyla-AMPHORA classification | Completeness (%) | Contamination (%) | Genome size (Mb) | Predicted no. of genes | GC content (%) | No. of CAZymes | |
|---|---|---|---|---|---|---|---|---|
| Class | Predicted taxon | |||||||
| ERACg_2 | 96.6 | 0 | 2.65 | 2,421 | 38.2 | 106 | ||
| ERACg_3 | 95.3 | 0 | 2.76 | 2,608 | 49.4 | 63 | ||
| ERACg_5 | 89.2 | 0 | 2.29 | 1,970 | 51.2 | 50 | ||
| ERACg_9 | 92.6 | 0 | 3.53 | 3,317 | 57.5 | 186 | ||
| ERACg_11 | 95.9 | 0 | 2.99 | 2,705 | 43.6 | 156 | ||
| ERACg_12 | 91.9 | 0 | 2.36 | 2,169 | 52.5 | 52 | ||
| ERACg_13 | 78.4 | 0 | 2.42 | 2,356 | 62.9 | 81 | ||
| ERACg_15 | 83.8 | 0 | 2.30 | 2,277 | 55 | 50 | ||
| ERACg_16 | 92.6 | 0 | 1.97 | 1,904 | 59.5 | 42 | ||
| ERACg_21 | 84.5 | 0 | 2.78 | 2,576 | 56.4 | 99 | ||
| ERACg_23 | 81.7 | 2.5 | 3.56 | 3,485 | 31 | 101 | ||
| ERACg_25 | 95.3 | 0 | 2.57 | 2,387 | 39.1 | 49 | ||
| ERACg_26 | 93.2 | 0 | 2.35 | 2,275 | 60.2 | 61 | ||
| ERACg_32 | 93.9 | 0 | 3.09 | 2,790 | 45.2 | 168 | ||
| ERACg_42 | 95.3 | 0 | 2.79 | 2,496 | 48.3 | 180 | ||
| ERACg_45 | 91.2 | 0 | 2.31 | 2,292 | 30.5 | 65 | ||
| ERACg_48 | 88.5 | 0 | 1.75 | 1,665 | 47.6 | 55 | ||
| ERACg_50 | 91.9 | 0 | 2.59 | 2,624 | 28.3 | 50 | ||
| ERACg_57 | 93.9 | 0 | 4.48 | 3,824 | 42.4 | 135 | ||
| ERACg_58 | 91.2 | 0 | 1.57 | 1,409 | 56 | 12 | ||
| ERACg_41 | 97.3 | 0 | 2.05 | 1,823 | 51.2 | 56 | ||
| ERACg_8 | 96.6 | 0 | 3.11 | 2,772 | 53.9 | 50 | ||
| ERACg_1 | 87.8 | 0 | 1.39 | 1,245 | 32.2 | 43 | ||
| ERACg_19 | 62.8 | 0 | 1.81 | 1,468 | 52.7 | 115 | ||
| ERACg_30 | 83.1 | 0 | 2.25 | 1,861 | 49.1 | 164 | ||
| ERACg_35 | 91.9 | 0 | 2.23 | 1,924 | 50 | 113 | ||
| ERACg_37 | 84.5 | 0 | 3.14 | 2,574 | 46 | 136 | ||
| ERACg_43 | 92.6 | 0 | 3.95 | 3,136 | 46.6 | 336 | ||
| ERACg_55 | 79.1 | 0 | 2.54 | 2,081 | 56 | 128 | ||
| ERACg_56 | 68.2 | 0 | 2.05 | 1,650 | 53.6 | 140 | ||
| ERACg_14 | 85.8 | 0 | 2.63 | 2,307 | 54.9 | 78 | ||
| ERACg_31 | 81.8 | 0.7 | 3.13 | 2,774 | 36.5 | 112 | ||
| ERACg_36 | 81.8 | 1.7 | 2.59 | 2,431 | 50 | 82 | ||
| ERACg_52 | 80.4 | 0 | 2.76 | 2,364 | 38.3 | 81 | ||
| ERACg_4 | 65.5 | 0 | 4.51 | 3,901 | 43.3 | 358 | ||
| ERACg_38 | 89.9 | 0 | 4.07 | 3,935 | 44.5 | 72 | ||
| ERACg_49 | 91.2 | 0 | 2.24 | 2,154 | 41.5 | 47 | ||
| ERACg_18 | 76.4 | 1.22 | 2.21 | 2,175 | 57.4 | 41 | ||
| ERACg_34 | 85.8 | 0 | 2.69 | 2,260 | 64.7 | 61 | ||
| ERACg_54 | 89.2 | 0 | 3.35 | 3,152 | 66.5 | 99 | ||
| ERAC_46 | 85.8 | 0 | 2.49 | 2,428 | 60.8 | 43 | ||
Abbreviations: ERACg, enriched rumen anaerobic consortium genomes; ERAC, enriched rumen anaerobic consortium.
Total number of carbohydrate-active enzymes (CAZymes) predicted.
FIG 5(A) Distribution of the predicted carbohydrate-active enzymes (CAZymes) found in enriched rumen anaerobic consortium genomes (ERACgs) at the class level. (B) Total CAZymes found in the rumen-derived anaerobic microbial consortium (enriched rumen anaerobic consortium [ERAC]) metagenome data. Red, nonbinned metagenome contigs; green, ERACgs.
FIG 6Heat map displaying the distribution of the most abundant glycoside hydrolases (GH1) found in the ERACgs from ERAC. GH families were grouped according to their action on components of the plant cell wall.
FIG 7Examples of polysaccharide utilization loci (PUL) predicted in Bacteroidia ERACgs reconstructed from the enriched rumen anaerobic consortium metagenome. To facilitate the visualization of gene arrangements, the predicted proteins were colored according to the function of the encoded proteins: SusC, SusD, glycoside hydrolase (GH), polysaccharide lyase (PL), carbohydrate esterase (CE), peptidase, and regulators (AraC, MaR, LacI). Genes that do not encode PUL components or that encode hypothetical proteins are identified as non-PUL genes. All PULs predicted in Bacteroidia ERACgs are presented in Data Set S2 in the supplemental material.
Putative cellulosomal proteins and SusC/SusD families identified by LC-MS/MS from ERAC grown on sugarcane bagasse
| ERACg identifier | Predicted protein | Modular architecture | Signal peptide | Total spectral count |
|---|---|---|---|---|
| Cellulosomal protein | CBM6-CBM6-CBM6-CBM6-CBM2 | Yes | 14 | |
| Putative scaffoldin | 6× cohesin_I-CttA | Yes | 10 | |
| Putative scaffoldin | Cohesin | Yes | 23 | |
| Putative scaffoldin | Cohesin_III | Yes | 5 | |
| Putative scaffoldin | Cohesin_I-dockerin_I | Yes | 19 | |
| Putative scaffoldin | Dockerin_I-Cthe_2159-Cthe_2159 | Yes | 14 | |
| Putative scaffoldin | Dockerin_III-cohesin_III-Dockerin_I | Yes | 5 | |
| Putative scaffoldin C | No domain | Yes | 2 | |
| Putative scaffoldin | No domain | Yes | 49 | |
| Cellulosomal protein | Dockerin_I | Yes | 4 | |
| Cellulosomal protein | LRR_5-dockerin_I | Yes | 19 | |
| Cellulosomal protein | LRR_5-dockerin_I | Yes | 1 | |
| Cellulosomal protein | LRR_5-dockerin_I | Yes | 6 | |
| Cellulosomal protein | LRR_5-dockerin_I | Yes | 2 | |
| Cellulosomal protein | DUF4874-DUF4832-dockerin_I | Yes | 1 | |
| Peptidase | Dockerin_I-peptidase | Yes | 1 | |
| Cellulosomal protein | Dockerin_I | Yes | 6 | |
| SusD family protein | No | 1 | ||
| Starch binding associated with outer membrane | No | 1 | ||
| TonB-linked outer membrane protein, SusC/RagA family | No | 1 | ||
| TonB-linked outer membrane protein, SusC/RagA family | No | 1 | ||
| TonB-linked outer membrane protein, SusC/RagA family | No | 6 | ||
| TonB-linked outer membrane protein, SusC/RagA family | No | 1 | ||
| SusD family protein | No | 1 | ||
| TonB-linked outer membrane protein, SusC/RagA family | No | 6 |
Abbreviations: cohesin_number, cohesin type number; dockerin_number, dockerin type number; Cthe_2159 represents a novel family of cellulose-binding beta-helix proteins from Clostridium thermocellum; LRR_5, leucine-rich repeats; PUL, polysaccharide utilization loci. Cohesin and dockerin domains are represented with the family number according to their representation in the dbCAN database. The protein set secreted by enriched rumen anaerobic consortium (ERAC) is given in Data Set S3 in the supplemental material.
Prediction of signal peptides based on SignalP analysis.
Metaproteome analysis based on spectral counting.
CAZy families identified by LC-MS/MS from ERAC grown on sugarcane bagasse
| ERACg identifier | Predicted protein | Modular architecture | EC no. | Secretion signal | Total spectrum count |
|---|---|---|---|---|---|
| Endoglucanase | CBM79-CBM79-GH5_4 | 3.2.1.4 | Yes | 8 | |
| Endoglucanase | GH5_1-dockerin_I | 3.2.1.4 | Yes | 7 | |
| Endoglucanase | GH5_1-dockerin_I | 3.2.1.4 | Yes | 17 | |
| Cellulase | GH9-CBM3-dockerin_I | 3.2.1.4 | Yes | 14 | |
| Cellulase:acetylxylan esterase | GH5_4-CBM22-CE3-dockerin_I | 3.2.1.4, 3.1.1.72 | Yes | 7 | |
| Cellulase | CBM4-CBM30-GH9-dockerin_I | 3.2.1.4 | Yes | 2 | |
| Cellulase | GH9-CBM3-dockerin_I | 3.2.1.4 | Yes | 8 | |
| Endoglucanase | GH9-CBM79-dockerin_I | 3.2.1.4 | Yes | 39 | |
| Cellulase | GH5_4-CBM80-dockerin_I | 3.2.1.4 | Yes | 69 | |
| Cellulase | GH9-CBM3-dockerin_I | 3.2.1.4 | Yes | 8 | |
| Cellulase | GH9-CBM3-dockerin_I | 3.2.1.4 | Yes | 32 | |
| Glycoside hydrolase family 44 | GH44-CBM76-dockerin_I | Not determined | Yes | 2 | |
| Cellulase:acetylxylan esterase | GH5_4-CBM22-CE3-Dockerin_I | 3.2.1.4, 3.1.1.72 | Yes | 7 | |
| Mannan endo-1,4-β-mannosidase | CBM35-GH26-dockerin_I | 3.2.1.78 | Yes | 1 | |
| Xyloglucan-specific endo-β-1,4-glucanase | GH5_4-CBM22-dockerin_I | 3.2.1.151 | Yes | 14 | |
| Glucuronoarabinoxylan endo-1,4-β-xylanase; feruloyl esterase | GH5_4-CBM22-dockerin_I | 3.2.1.136, 3.1.1.73 | Yes | 11 | |
| Endo-1,4-β-xylanase; feruloyl esterase | GH10-CBM22-CE1 | 3.2.1.8, 3.1.1.73 | Yes | 11 | |
| Endo-1,4-β-xylanase; nonreducing end α- | CBM22-GH10-CBM22-dockerin_I-GH43-CBM36 | 3.2.1.8, 3.2.1.55 | Yes | 37 | |
| Endo-1,4-β-xylanase | CBM22-GH10-dockerin_I | 3.2.1.8 | Yes | 3 | |
| Endo-1,4-β-xylanase; feruloyl esterase | GH43_10-CBM22-dockerin_I-CE1 | 3.2.1.37, 3.1.1.73 | Yes | 15 | |
| Xylan-1,4-β-xylosidase | GH43_29-CBM6-CBM22-dockerin_I | 3.2.1.37 | Yes | 2 | |
| Oligoxyloglucan reducing-end-specific cellobiohydrolase | GH74-dockerin_I | 3.2.1.150 | Yes | 12 | |
| Endo-β-1,4-xylanase; chitin deacetylase | GH11-CBM22-dockerin_I-CBM22-CE4 | 3.2.1.8, 3.5.1.41 | Yes | 1 | |
| Arabinan endo-1,5-α- | GH43-CBM13-dockerin_I | 3.2.1.99 | Yes | 12 | |
| Mannan endo-1,4-β-mannosidase | CBM35-GH26-dockerin_I | 3.2.1.78 | Yes | 1 | |
| Acetylxylan esterase | Dockerin_I-CE2-CBM4 | 3.1.1.72 | Yes | 1 | |
| Putative glycoside hydrolase family 141 | GH141-CBM6-dockerin_I | Not determined | Yes | 1 | |
| β-Galactosidase | GH2-dockerin_I | 3.2.1.23 | Yes | 8 | |
| Carbohydrate esterase family 12 | CE12-CBM13-dockerin_I-CBM35-CE12 | 3.1.1.86 | Yes | 2 | |
| Rhamnogalacturonan endolyase | PL11-dockerin_I | 4.2.2.23 | Yes | 13 | |
| Pectate lyase | PL1-PL9-dockerin_I | 4.2.2.2; 4.2.2.9 | Yes | 3 | |
| Glycoside hydrolase family 18 | GH18 | Not determined | Yes | 1 |
Abbreviations: ERAC, enriched rumen anaerobic consortium; ERACg, enriched rumen anaerobic consortium genome; EC, Enzyme Commission; cohesin_number, cohesin type number; dockerin_number, dockerin type number; GH, glycoside hydrolase; CBM, carbohydrate-binding module; CE, carbohydrate esterases; PL, polysaccharide lyases; ND, not determined. CAZymes are represented with the family number according their representation in the CAZy database.
Prediction of signal peptides based on SignalP analysis.
Metaproteome analysis based on spectral counting.