Literature DB >> 29025413

Carbohydrate-active enzymes in Trichoderma harzianum: a bioinformatic analysis bioprospecting for key enzymes for the biofuels industry.

Jaire Alves Ferreira Filho1,2, Maria Augusta Crivelente Horta1, Lilian Luzia Beloti1, Clelton Aparecido Dos Santos1, Anete Pereira de Souza3,4.   

Abstract

BACKGROUND: Trichoderma harzianum is used in biotechnology applications due to its ability to produce powerful enzymes for the conversion of lignocellulosic substrates into soluble sugars. Active enzymes involved in carbohydrate metabolism are defined as carbohydrate-active enzymes (CAZymes), and the most abundant family in the CAZy database is the glycoside hydrolases. The enzymes of this family play a fundamental role in the decomposition of plant biomass.
RESULTS: In this study, the CAZymes of T. harzianum were identified and classified using bioinformatic approaches after which the expression profiles of all annotated CAZymes were assessed via RNA-Seq, and a phylogenetic analysis was performed. A total of 430 CAZymes (3.7% of the total proteins for this organism) were annotated in T. harzianum, including 259 glycoside hydrolases (GHs), 101 glycosyl transferases (GTs), 6 polysaccharide lyases (PLs), 22 carbohydrate esterases (CEs), 42 auxiliary activities (AAs) and 46 carbohydrate-binding modules (CBMs). Among the identified T. harzianum CAZymes, 47% were predicted to harbor a signal peptide sequence and were therefore classified as secreted proteins. The GH families were the CAZyme class with the greatest number of expressed genes, including GH18 (23 genes), GH3 (17 genes), GH16 (16 genes), GH2 (13 genes) and GH5 (12 genes). A phylogenetic analysis of the proteins in the AA9/GH61, CE5 and GH55 families showed high functional variation among the proteins.
CONCLUSIONS: Identifying the main proteins used by T. harzianum for biomass degradation can ensure new advances in the biofuel production field. Herein, we annotated and characterized the expression levels of all of the CAZymes from T. harzianum, which may contribute to future studies focusing on the functional and structural characterization of the identified proteins.

Entities:  

Keywords:  CAZymes; Cellulases; Glycoside hydrolases; Phylogeny; RNA-Seq; Trichoderma harzianum

Mesh:

Substances:

Year:  2017        PMID: 29025413      PMCID: PMC5639747          DOI: 10.1186/s12864-017-4181-9

Source DB:  PubMed          Journal:  BMC Genomics        ISSN: 1471-2164            Impact factor:   3.969


Background

Brazil harbors a highly diverse fungal community and many of these species are of great biotechnological relevance [1]. Fungal species produce important enzyme classes that play a key role in the decomposition of organic materials, particularly those derived from plants. These enzymes can be applied in various industrial areas [2], such as the production of biofuels from vegetal biomass. Cellulose is a major component of the cell wall of plants. This structural polysaccharide is a complex polymer, and only a specific set of enzymes can breakdown cellulose [3]. Many of these enzymes are produced by fungi, such as those of the genera Aspergillus, Neurospora and Trichoderma. Trichoderma harzianum is a filamentous fungus [4] and a recognized biocontrol agent that is effective against phytopathogenic fungi [5]. T. reesei is the best-studied cellulolytic fungus for which there are available resolved crystallographic structures among enzymes of the cellulolytic complex [6, 7]. However, T. harzianum also produces enzymes capable of hydrolyzing and metabolizing the cellulose present in plant biomass [8, 9]. Studies on T. harzianum strains show that they are able to produce a cellulolytic complex with higher beta-glucosidase activity than that shown by T. reesei [8, 10]. Furthermore, significant endoglucanase, xylanase and cellobiohydrolase activities have also been detected [11]. Although the enzymes produced by T. harzianum harbor great biotechnological potential, most studies in this species have been directed toward the area of biological control [12, 13]. Thus, there is a shortage of studies exploring genomic data related to its capacity to degrade biomass and the regulation and expression of the genes involved in biodegradation. Enzymes active in carbohydrate metabolism are defined in the international literature as carbohydrate-active enzymes (CAZymes) [14]. The families of enzymes that compose this group can be found in the CAZy database (www.cazy.org). The protein families in the CAZy database are grouped into five different classes, as follows: glycoside hydrolases (GHs); glycosyl transferases (GTs); polysaccharide lyases (LPs); carbohydrate esterases (CEs); and auxiliary activities (AAs). The most abundant family in the CAZy database is the GHs; the enzymes in this family play a fundamental role in the decomposition of plant biomass and have therefore been the target of several studies on enzymatic hydrolysis [15, 16]. Recent studies have revealed the importance of the AA family as aids in the process of cellulose degradation [17]. In this work, we used the available structural genomic data from T. harzianum T6776 [18] to perform a complete functional annotation of the CAZyme content. In addition, we employed the data generated from a previous RNA-Seq study [19] to analyze the expression of this set of genes. To investigate the functional diversity of these proteins, phylogenetic analyses of the AA9/GH61, CE5 and GH55 families were performed. Based on our results, we delineated specific CAZyme clusters that act on biomass substrates for depolymerization. These data should contribute to the search for more efficient enzymatic systems for the biomass degradation process and for the functional and heterologous expression of important proteins with hydrolytic activity.

Methods

Data sources

The nucleotide and protein sequences of T. harzianum T6776 - Th (PRJNA252551), T. reesei RUT C-30 - Tr (PRJNA207855), T. atroviride IMI 206040 - Ta (PRJNA19867) and T. virens Gv29–8 - Tv (PRJNA19983) were downloaded from the NCBI database (www.ncbi.nlm.nih.gov). The T. harzianum IOC-3844 RNA-Seq reads are available in the NCBI Sequence Read Archive (SRA) under accession number PRJNA175485.

Functional annotation of CAZymes

Information derived from the CAZy database [14] was downloaded for each CAZyme family (www.cazy.org). The protein sequences of T. harzianum, T. reesei, T. atroviride and T. virens were used as queries in BLASTp (Basic local alignment search tool) searches against the locally built CAZy BLAST database. Only blast matches showing an e-value less than 10−11, identity greater than 30% and queries covering greater than 70% of the sequence length were retained and classified according CAZyme catalytic group as GHs, GTs, PLs, CEs, CBMs or AAs (Additional files 1, 2, 3 and 4). Annotations were performed with Blast2Go [20] using BLASTx. All of the protein sequences of the T. harzianum CAZymes were functionally annotated based on homology. The T. harzianum CAZymes were further aligned to the PFAM profiles [21] of the families through a search of the Conserved Domain Database (CDD) of NCBI [22] and a search against the PFAM v28.0 database. InterPro protein domains were predicted using InterProScan (http://www.ebi.ac.uk/interpro/) [23]. Signal peptides of the T. harzianum CAZymes were predicted using SignalP v.4.1 (http://www.cbs.dtu.dk/services/SignalP/) [24] with default parameters, and all of the proteins with signal peptides were analyzed for the presence of transmembrane domains using the web server TMHMM v.2.0 (http://www.cbs.dtu.dk/services/TMHMM/) [25] with default parameters. TargetP v.1.1 (http://www.cbs.dtu.dk/services/TargetP/) [26] with the following parameters: organism group – “Non-plant” and cutoffs – default, and Cello v.2.5 (http://cello.life.nctu.edu.tw/) [27] with default parameters, were employed for the prediction of subcellular localization.

Transcriptional analysis of T. harzianum CAZymes

The expression levels of the T. harzianum CAZymes were analyzed using RNA-Seq data obtained from a previous study [19] in which the transcripts were obtained following growth of the fungus on three different carbon sources, lactose (LAC), cellulose (CEL), and delignified sugarcane bagasse (DSB), to induce mycelial growth. The quality control of the reads were as follows: quality limit - 0.03; ambiguous limit - 2; minimum final number of nucleotides in read - 65; and phred scale - 15 [19]. The reads from the RNA-Seq library were mapped against the T. harzianum CAZymes using the CLC Genomics Workbench (CLC bio – v9.0; Finlandsgade, Dk) [28] with the following parameters: mapping settings (minimum length fraction = 0.9, minimum similarity fraction = 0.8, and maximum number of hits for a read = 15) and paired settings (minimum distance = 180 and maximum distance = 250, including the broken pairs counting scheme). The expression values were expressed in reads per kilobase of exon model per million mapped reads (RPKM) and the normalized value for each sample was calculated in transcripts per million (TPM) [29] according to the following formula: where RPKM is the expression value for each gene in a sample and ∑ RPKM is the sum of the RPKM values of all genes in a sample. A total of the TPM expression values was calculated for each GH family according to the summing the expression level of all the genes that were identified for that particular family. A hierarchical clustering analysis was conducted with CLC bio using the single linkage method and Euclidian distance according to the cluster features of the log2-transformed expression values.

Phylogenetic analysis of CAZymes

The CAZyme sequences from the AA9/GH61, CE5 and GH55 families from T. harzianum, T. reesei, T. atroviride, T. virens and ten other species of fungi (Additional file 5) were used as the basis for constructing the phylogenetic trees. The sequences were aligned using ClustalW [30], implemented in the Molecular Evolutionary Genetics Analysis (MEGA) software version 7.0 [31] with the following parameters: gap opening penalties of 10 and 3 and gap extension penalties of 0.1 and 5 for the pairwise alignment and multiple alignment, respectively. The phylogenetic analyses were performed in MEGA7 using the maximum likelihood (ML) method inference [32] based on the Jones-Taylor-Thornton (JTT) matrix-based model and 1000 bootstrap replicates [33] for each analysis. The initial tree is drawn to scale, obtained automatically by applying the Neighbor-joining and BioNJ algorithms to a matrix of pairwise distances estimated using the JTT model and the topology with superior log likelihood values. Pairwise deletion was employed to address alignment gaps and missing data. The trees were visualized and edited using the Figtree program (http://tree.bio.ed.ac.uk/software/figtree/).

Results

Identification of CAZymes in T. harzianum

In this study, protein prediction for T. harzianum T6776 was used as a tool to annotate and analyze the total CAZyme content of this fungus. The identified proteins were re-annotated based on their functional classification, protein domains and secretion signal prediction. In addition, an expression analysis was performed by mapping the reads obtained from RNA-Seq under three biological conditions, and three CAZyme families were employed for the analysis of phylogenetic diversity. Using genomic, transcriptomic and phylogenetic association data, this study describes and characterizes the total (and more complete) set of CAZymes for T. harzianum, which is a species that presents high potential in prospecting for hydrolytic enzymes for the degradation of lignocellulosic biomass. The initial set of CAZymes was identified by mapping all of the proteins of T. harzianum against the CAZy database using the BLASTp search tool. After removing the proteins that did not meet the filtering criteria, a total of 430 proteins were maintained, which corresponds to 3.7% of the total of 11,498 proteins predicted for this organism [18]. The total protein pool was grouped according to the classification criteria of the CAZy database. Thus, a total of 259 GHs, 101 GTs, 6 PLs, 22 CEs, and 42 AAs as well as an additional 46 proteins that also contained a CBM were identified (Fig. 1 and Tables 1 and Additional file 6: Table S1). A total of 63 GHs families were identified in the T. harzianum genome (Table 2).
Fig. 1

Number of CAZymes in T. harzianum, T. reesei, T. virens and T. atroviride. Total of the CAZyme classes for the evaluated species (a); number of GH families: GH18, GH16, GH3, GH2, GH5 and GH55 (b); number of CE families: CE1, CE5, CE3, CE9 and CE16 (c); number of AA families: AA3, AA9, AA11, AA7 and AA8 (d); and number of CBM modules: CBM1, CBM18, CBM50, CBM24 and CBM43 (e)

Table 1

Total CAZymes from T. harzianum and other fungal species

SpeciesGHGTPLCEAACBMTotalReference
T. harzianum 2591016224246430This study
T. reesei RUT C-30198925204035355This study
T. virens 2561045294768441This study
T. atroviride 2441057254157422This study
T. reesei QM6a2019952224327[39]
F. graminearum 25311021457287501*
A. nidulans 2669121303344441*
N. crassa 173784213542311*
S. cerevisiae 577302512137*
A. oryzae 30611723292937504*
A. niger 2531218686855518*

*CAZy database (www.cazy.org)

Table 2

Evaluation of the GH classes in T. harzianum

GH familyNumberPrincipal enzyme activitya EC numbera RNA-Seq (TPM)b
CELDSBLAC
GH14β-glucosidase3.2.1.2126,464.624,945.625,780.2
GH213β-mannosidase3.2.1.2510,109.910,652.810,615.0
GH317β-glucosidase3.2.1.2123,132.524,232.124,991.6
GH41α-glucosidase3.2.1.221405.51866.21500.3
GH512Endo- β-1,4-glucanase3.2.1.415,970.517,669.516,084.8
GH61cellobiohydrolase3.2.1.9135,003.846,112.348,148.5
GH72Endo- β-1,4-glucanase3.2.1.456,503.362,881.973,957.0
GH102Endo-1,4- β-xylanase3.2.1.846,748.061,201.953,469.7
GH114Endo-1,4- β-xylanase3.2.1.853,279.569,046.464,929.2
GH123Endo- β-1,4-glucanase3.2.1.410,362.012,937.410,779.4
GH138α-amylase3.2.1.15028.84708.14757.5
GH152Glucoamylase3.2.1.33776.33476.43103.1
GH1616Xyloglucanase3.2.1.15159,203.156,083.162,280.8
GH174Endo-1,3- β-glucosidase3.2.1.3911,593.911,757.213,578.5
GH1823Chitinase3.2.1.1463,673.684,188.293,866.4
GH203β-hexosaminidase3.2.1.5235,521.833,517.723,889.0
GH231Lysozyme3.2.1.171227.2849.41093.1
GH241Lysozyme3.2.1.172492.61450.61586.9
GH251Lysozyme3.2.1.171885.91657.91559.4
GH262β-mannanase3.2.1.78314.0315.2308.3
GH279α-galactosidase3.2.1.221524.01354.31373.8
GH283polygalacturonase3.2.1.1538,662.726,783.227,177.1
GH306Endo- β-1,4-xylanase3.2.1.84233.33652.14332.6
GH315α-glucosidase3.2.1.206161.95753.57524.8
GH322Invertase3.2.1.260.00.00.0
GH331neuraminidase3.2.1.180.00.00.0
GH351β-galactosidase3.2.1.231814.51099.71961.4
GH362α-galactosidase3.2.1.22308.4255.6240.4
GH372α-trehalase3.2.1.282379.51017.61640.6
GH381α-mannosidase3.2.1.24964.81116.1960.2
GH393β-xylosidase3.2.1.37648.2444.7446.0
GH435β-xylosidase3.2.1.373643.73004.44153.1
GH453Endoglucanase3.2.1.44845.54891.14353.0
GH476α-mannosidase3.2.1.1138278.36385.77684.9
GH491Dextranase3.2.1.110.00.00.0
GH542α-L-arabinofuranosidase3.2.1.551253.11234.02333.9
GH559Endo- β-1,3-glucanase3.2.1.393113.42662.33444.8
GH622α-L-arabinofuranosidase3.2.1.559919.78963.614,790.8
GH631α-glucosidase3.2.1.106812.5509.6521.8
GH643β-1,3-glucanase3.2.1.399176.48197.08943.6
GH651α-trehalase3.2.1.281585.31273.21085.3
GH672α-glucuronidase3.2.1.1391300.6732.91057.3
GH716α-1,3-glucanase3.2.1.5913,633.215,112.416,804.9
GH725β-1,3-glucanosyltransglycosylase2.4.1-26,604.325,458.026,995.4
GH755Chitosanase3.2.1.1323030.04812.34033.0
GH769α-1,6-mannanase3.2.1.10114,462.011,978.610,448.3
GH783α-L-rhamnosidase3.2.1.40111.944.655.3
GH793β-glucuronidase3.2.1.31299.1232.8219.4
GH812Endo- β-1,3-glucanase3.2.1.39244.0164.1137.3
GH882d-4,5-unsaturated β-glucuronyl hydrolase3.2.1263.6100.380.0
GH892α-N-acetylglucosaminidase3.2.1.501059.2760.0533.5
GH928Mannosyl-oligosaccharide α-1,2-mannosidase3.2.1.1137002.95376.45435.1
GH934Exo- α-L-1,5-arabinanase3.2.17736.14164.54140.9
GH955α-L-fucosidase3.2.1.51680.7554.0602.6
GH1051Unsaturated rhamnogalacturonyl hydrolase3.2.1.1720.00.00.0
GH1141Endo- α-1,4-polygalactosaminidase3.2.1.1092290.21929.71214.4
GH1151Xylan α-1,2-glucuronidase3.2.1.1313420.73848.93455.7
GH1252Exo- α-1,6-mannosidase3.2.13221.82177.32401.5
GH1271β-L-arabinofuranosidase3.2.1.185628.6399.2378.0
GH1284β-1,3-glucanase3.2.1.395748.13170.03356.8
GH1322Activity on β-1,3-glucan4211.05376.75835.3
GH1331Amylo- α-1,6-glucosidase3.2.1.331480.51058.11132.8
GH1352α-1,4-galactosaminogalactan hydrolase3.2.1153.7134.0121.9

TPM, transcripts per million

aCAZy database ()

bThe expression values for the different GH families were calculated by summing the level of TPM expression in each gene composing the family

Number of CAZymes in T. harzianum, T. reesei, T. virens and T. atroviride. Total of the CAZyme classes for the evaluated species (a); number of GH families: GH18, GH16, GH3, GH2, GH5 and GH55 (b); number of CE families: CE1, CE5, CE3, CE9 and CE16 (c); number of AA families: AA3, AA9, AA11, AA7 and AA8 (d); and number of CBM modules: CBM1, CBM18, CBM50, CBM24 and CBM43 (e) Total CAZymes from T. harzianum and other fungal species *CAZy database (www.cazy.org) Evaluation of the GH classes in T. harzianum TPM, transcripts per million aCAZy database () bThe expression values for the different GH families were calculated by summing the level of TPM expression in each gene composing the family Among the 430 CAZymes identified, 204 (47%) were predicted to harbor a signal peptide and were classified as potential secreted proteins, including 155 GHs, 17 CEs, 8 GTs, 6 PLs and 18 AAs. The proteins that contained a signal peptide were also investigated for the presence of transmembrane domains, resulting in the identification of 19 proteins containing such domains within this group. Searches for mitochondrial proteins were also performed in the CAZymes set, and 37 proteins were identified. The CAZyme contents of T. reesei RUT C-30 were also investigated, and 198 GHs, 92 GTs, 5 PLs, 20 CEs, 40 AAs and 35 CBMs were identified, totaling 355 CAZymes (Table 1 and Additional file 6: Table S1). In T. virens and T. atroviride, 441 and 422 CAZymes were identified, respectively. The GH18 family, represented mainly by chitinase-like proteins, was the most frequent in the four species, while the GH3 family, which includes enzymes important for the process of cellulose degradation, was the second most frequent in the studied species (Fig. 1).

Enzymes related to cellulose and hemicellulose degradation in T. harzianum

The GHs are the main class of enzymes within the CAZymes responsible for cellulose degradation, and the GH5, GH7, GH12, GH45, GH1, GH3 and GH6 families play particularly important roles (Table 3). Twenty-three CBM1 domains have been identified in T. harzianum. Two auxiliary families were identified that are related to the cellulose degradation mechanism in T. harzianum: AA8 (2 proteins) and AA9 (4 proteins).
Table 3

Cellulolytic enzymes of Trichoderma spp

SpeciesGH5GH7GH12GH45GH1GH3GH6Total
T. harzianum 12233417142
T. reesei 7221213128
T. atroviride 11231415137
T. virens 12242219142
Cellulolytic enzymes of Trichoderma spp The major families involved in the degradation of hemicellulose in T. harzianum are GH95, GH67, GH62, GH54, GH43, GH26, GH11, and GH10, and the total number of enzymes, including all families, is 24. The greatest number of enzymes belong to the GH43 and GH95 family (5 proteins) and the fewest to the GH67, GH62, GH54, GH26 and GH10 families (with 2 proteins each) (Table 4).
Table 4

Hemicellulose-degrading enzymes of Trichoderma spp

SpeciesGH95GH67GH62GH54GH43GH26GH11GH10Total
T. harzianum 5222524224
T. reesei 4112203114
T. atroviride 4222604121
T. virens 4232424223
Hemicellulose-degrading enzymes of Trichoderma spp

Expression analysis using T. harzianum RNA-Seq data

The relative expression of the 430 CAZymes of T. harzianum was studied through computational analysis with the T. harzianum IOC3844 RNA-Seq reads (Fig. 2a and Table 2 and Additional file 7: Table S2). The enzymes that exhibited expression greater than zero under the cellulose condition included 400 CAZymes, as follows: 243 GHs, 19 CEs, 4 PLs and 35 AAs. The GH families with the greatest number of expressed genes were GH18 (23 genes – 63,673.6 TPM in CEL), GH3 (17 genes – 11,593.9 TPM in CEL), GH16 (16 genes – 59,203.1 TPM in CEL), GH2 (13 genes – 5028.8 TPM in CEL) and GH5 (12 genes – 15,970.5 TPM in CEL).
Fig. 2

Evaluation of CAZyme expression in T. harzianum by means of RNA-Seq. Quantification of the expression of the main CAZyme classes (GH, GT, AA and CE) in TPM (a) and hierarchical clustering of the log2-transformed expression values of the 26 CAZymes of T. harzianum (b). TPM, transcripts per million

Evaluation of CAZyme expression in T. harzianum by means of RNA-Seq. Quantification of the expression of the main CAZyme classes (GH, GT, AA and CE) in TPM (a) and hierarchical clustering of the log2-transformed expression values of the 26 CAZymes of T. harzianum (b). TPM, transcripts per million The most highly expressed genes were an exoglucanase 1 (CBM1 module) from the GH7 family (KKO99004.1), under conditions of CEL and LAC induction, and an endo-1,4-β-xylanase from the GH10 family (KKP04658.1), under DSB induction. Among the 430 genes, 30 were not expressed in CEL, 17 were not expressed in DSB, and 16 were not expressed in LAC. The CAZyme genes related to the degradation of cellulose (GH1, GH3, GH5, GH6, GH45 and AA9), hemicellulose (GH31 and GH62), and glucan (GH17 and GH55) as well as esterase enzymes (CE5 and CE15) were further employed for an analysis of their expression levels by means of hierarchical grouping, and it was possible to observe five expression groups (Fig. 2b). Group I contained the genes expressed at lower levels, including the GH55 endo-1,3-β-glucosidase (KKO98539.1) and a subgroup consisting of a GH3 β-glucosidase H (KKP04308.1) and GH45 endoglucanase (KKO98959.1). Group II was composed of 7 β-glucosidase, including three β-glucosidase A genes belonging to the GH3 family (KKP00605.1, KKP05725.1, and KKP05758.1), a β-glucosidase F from the GH3 family (KKP01385.1), a β-glucosidase E from the GH3 family (KKO97125.1) and a β-glucosidase B from the GH5 family (KKP05604.1). Group III was formed by two subgroups. Subgroup III.a was composed of an endoglucanase II from the GH5 family (KKP03485.1) and a β-glucosidase M from the GH3 family (KKO97043.1), while subgroup III.b was composed of an endo-1,3-β-glucosidase M from the GH17 family (KKP05999.1) and a glucan-1,3-β-glucosidase from the GH55 family (KKP03210.1). Group IV consisted of an esterase with a CBM1 module from the CE5 family (KKP06491.1), an α-glucosidase from the GH31 family (KKP03969.1), a 1,3-β-glucosidase from the GH17 family (KKP05186.1), a methylesterase from the CE15 family (KKO96622.1), an endoglucanase from the GH45 family (KKP04958.1), a β-glucosidase 3A from the GH3 family (KKP07011.1) and an endoglucanase from the GH5 family (KKO98175.1). Group V consisted of two genes: an endoglucanase from the GH7 family (KKP04531.1) and an α-1-arabinofuranosidase from the GH62 family (KKP03811.1). A β-glucosidase 1B gene from the GH1 family (KKO98105.1), a cellobiohydrolase from the GH6 family (KKP03494.1) and an AA9/GH61 protein with a CBM1 module (KKP05760.1) were not grouped.

Phylogenetic analysis and functional diversity

A phylogenetic analysis was performed to study the functional diversity of the three CAZyme families from T. harzianum (GH55, AA9/GH61 and CE5) and nine other species of fungi. The GH55 family was formed by nine glucan 1,3-β-glucosidase proteins in T. harzianum (identified as: KKO97443.1, KKO99433.1, KKP07835.1, KKO98539.1, KKP02807.1, KKP03210.1, KKP01538.1, KKP04907.1 and KKP00524.1). Most of the T. harzianum GH55 family proteins formed a separate clade with species from the same genus, including T. reesei, T. atroviride and T. virens. However, KKP00524.1 formed an external group that was closest to the GH55 of Cordyceps militaris (EGX89976.1) (Fig. 3a).
Fig. 3

Phylogenetic tree of the GH55, AA9/GH61 and CE5 families. The tree includes the GH55 (a), AA9/GH61 (b), and CE5 (c) proteins from twelve species and the predicted functional domains of the major proteins of the families in T. harzianum (d). T. harzianum proteins are highlighted in red in the tree

Phylogenetic tree of the GH55, AA9/GH61 and CE5 families. The tree includes the GH55 (a), AA9/GH61 (b), and CE5 (c) proteins from twelve species and the predicted functional domains of the major proteins of the families in T. harzianum (d). T. harzianum proteins are highlighted in red in the tree The AA9/GH61 family (Copper-dependent lytic polysaccharide monooxygenases – LPMOs) was formed by three proteins in T. harzianum, KKO97863.1, KKP05760.1 and KKO97781.1, and it formed a separate clade with other AA9 proteins from T. reesei, T. atroviride and T. virens. KKO97781.1 was grouped with a T. virens AA9 (XP_013954128.1) protein with 92% bootstrap support (Fig. 3b). The CE5 family was formed by five proteins in T. harzianum, including two cutinase (KKP01664.1 and KKO97447.1) and three acetylxylan esterases (KKP00250.1, KKP06491.1 and KKP05135.1). All T. harzianum CE5 proteins were grouped with T. virens proteins throughout the tree. Wide phylogenetic diversity was observed among the CE5 family of T. harzianum, distributed in nearly all clades of the phylogenetic tree (Fig. 3c). Functional diversity was also analyzed using the functional domains and ontologies. The functional domains validated the CAZyme classifications and confirmed the characteristic module of each family (Fig. 3d and Additional file 8: Table S3). A total of 281 sequences were annotated with gene ontology (GO) terms (Fig. 4), and all of the proteins presented correspondence with the InterPro database (Additional file 9: Table S4). Catalytic activity was the main function under the molecular function term with 262 annotated sequences, while for the biological process term, the main functions were metabolic processes (202 sequences) and cellular processes (127 sequences).
Fig. 4

GO terms for the CAZymes of T. harzianum. The sequences were annotated according to three main GO terms: biological process (a), cellular components (b), and molecular functions (c)

GO terms for the CAZymes of T. harzianum. The sequences were annotated according to three main GO terms: biological process (a), cellular components (b), and molecular functions (c)

Discussion

In this study, we performed a comprehensive analysis of the total content of the CAZymes in T. harzianum T6776, which is widely used in biologic control and has recently been the focus of studies related to enzymes involved in the degradation of vegetal biomass. Furthermore, we performed a comparison with other Trichoderma spp. (T. reesei, T atroviride and T. virens), completed an expression analysis via RNA-Seq and explored functional diversity through a phylogenetic analysis. Thus, we present the most complete report on the total CAZymes annotated for the cellulolytic fungus T. harzianum. Until recently, most studies involving T. harzianum were restricted to investigating its high biocontrol capacity [5, 34]. However, our research group has performed many studies evaluating the biotechnological potential of this species. Horta et al. (2014) [19] determined the transcriptome profile of this fungus during biomass degradation and delimited groups of overexpressed genes. Crucello et al. (2015) [35] constructed the first bacterial artificial chromosome (BAC) library and found a cluster of CAZymes that were probably co-expressed, and Santos et al. (2016) [36] heterologously expressed a GH1 beta-glucosidase from T. harzianum in Escherichia coli and characterized its structure and function. Several previous studies have described the contents of CAZymes in bacteria [37, 38], fungi [39-41] and plants [42, 43]. We identified a large number of CAZymes in T. harzianum, T. reesei, T. atroviride and T. virens (430, 355, 441 and 422 CAZymes, respectively), with a high diversity of families (Fig. 1 and Table 1). These results demonstrate the importance and complexity of the system involved in the degradation of vegetal biomass that has developed in these fungi over evolutionary time [44]. The contents of CAZymes have been determined in several fungi of biotechnological importance, including Fusarium graminearum, Aspergillus nidulans, Neurospora crassa and A. niger, and 501, 441, 311 and 518 CAZy proteins were identified, respectively (CAZy database [14]). T. harzianum presented 259 GHs, a number that is greater than that found in T. reesei RUT C-30 (198 GHs) (Table 1). Although T. reesei presents a lower CAZyme content than T. harzianum, its high efficiency in the production of hydrolytic enzymes has been demonstrated in several studies [45, 46]. This result indicates that the number of enzymes is not related to the efficiency of the biomass degradation process. In addition, previous studies in which genomic comparisons of Trichoderma species were performed confirmed that T. reesei has suffered events involving loss of CAZy genes [44, 47]. The number of CAZymes identified in T. reesei RUT C-30 (198 GHs, 20 CEs and 5 PLs) in our analysis was close to that found in a study in which the CAZyme content of T. reesei QM6a was re-annotated [39], which identified 201 GHs, 22 CEs and 5 PLs. The CAZyme class with the greatest number of proteins among all species evaluated was the GHs. This class of enzymes is important in several metabolic routes in the fungus, including those for chitin and cellulose [48, 49]. Within this class, the most representative family was GH18, whose members are mainly involved in the degradation of chitin, which is an important route both biologically and economically, since they are applied in the biological control [50]. Another very representative family in the analysis was GH3, including enzymes showing beta-glucosidase and xylosidase activities, which are important in the metabolism of glucose and xylose [51], respectively. We analyzed the expression of the 430 CAZymes of T. harzianum by mapping previously obtained RNA-Seq data from three biological conditions (with cellulose, lactose or sugarcane bagasse as a carbon source) [19]. In total, 243 GHs were expressed in the cellulose condition, which was expected because a high proportion of the enzymes that act in the degradation of biomass are from the GH family [52]. In an experiment using T. harzianum grown in the presence of sugarcane bagasse, a large number of GH5 and GH16 family members were observed [19]. The AA family also exhibited a large number of expressed genes, and 35 of the 42 total genes from this group were expressed in the cellulose condition. This finding reinforces that auxiliary enzymes are of great importance in the efficiency of the synergism of the main enzymes that act in the process of cellulose degradation [53]. The phylogenetic analysis of three CAZyme families (GH55, CE5 and AA9) from T. harzianum showed high functional diversity in these groups of proteins. This functional diversity may be a reflection of changes in the functional domains of these proteins that imply different biological actions and efficiencies of these enzymes in biological processes [54]. In addition, a large majority of the enzymes of T. harzianum formed groups with enzymes from T. reesei, T. virens and T. atroviride, demonstrating the phylogenetic proximity of these species. This result reinforces the hypothesis that the CAZymes of Trichoderma spp. evolved from a common ancestor [47].

Conclusions

Searching for the main proteins used by T. harzianum in the degradation of biomass ensures new advances in the field of biofuel production. Herein, we annotated and characterized all of the CAZymes from T. harzianum at the expression level. The obtained data will contribute to future studies focusing on the functional and structural characterization of the identified proteins. We found a large variety of enzyme families that are related to the cellulose degradation process, and based on our results, groups of enzymes can be selected for testing enzymatic efficiency and characterizing functionality using heterologous expression. Through phylogenetic analysis of the three CAZyme families (AA9/GH61, CE5 and GH55), it was possible to observe high functional diversity within a given family, which may have implications in the process of choosing the most efficient enzymes. Protein sequence of the CAZymes in T. harzianum in FASTA format. (FASTA 264 kb) Protein sequence of the CAZymes in T. reesei in FASTA format. (FASTA 226 kb) Protein sequence of the CAZymes in T. virens in FASTA format. (FASTA 283 kb) Protein sequence of the CAZymes in T. atroviride in FASTA format. (FASTA 274 kb) Multiple sequence alignments used in the phylogenetic analysis of the GH55, AA9/GH61 and CE5 families. (PDF 1916 kb) Mapping of the Trichoderma spp. proteins against the CAZy database. (XLSX 172 kb) Mapping of RNA-Seq reads against the CAZymes of T. harzianum. (XLSX 121 kb) PFAM domains of the CAZymes from T. harzianum. (XLSX 70 kb) Gene ontology and InterPro domains of the CAZymes in T. harzianum. (XLSX 67 kb)
  48 in total

1.  Predicting transmembrane protein topology with a hidden Markov model: application to complete genomes.

Authors:  A Krogh; B Larsson; G von Heijne; E L Sonnhammer
Journal:  J Mol Biol       Date:  2001-01-19       Impact factor: 5.469

2.  Crystallization and preliminary X-ray diffraction analysis of endoglucanase III from Trichoderma harzianum.

Authors:  Marcelo Vizoná Liberato; Wesley Cardoso Generoso; Wilson Malagó; Flávio Henrique-Silva; Igor Polikarpov
Journal:  Acta Crystallogr Sect F Struct Biol Cryst Commun       Date:  2012-02-23

3.  Trichoderma: systematics, the sexual state, and ecology.

Authors:  Gary J Samuels
Journal:  Phytopathology       Date:  2006-02       Impact factor: 4.025

4.  Role and significance of beta-glucosidases in the hydrolysis of cellulose for bioethanol production.

Authors:  Reeta Rani Singhania; Anil Kumar Patel; Rajeev K Sukumaran; Christian Larroche; Ashok Pandey
Journal:  Bioresour Technol       Date:  2012-09-14       Impact factor: 9.642

5.  Quantitative secretomic analysis of Trichoderma reesei strains reveals enzymatic composition for lignocellulosic biomass degradation.

Authors:  Sunil S Adav; Lim Tze Chao; Siu Kwan Sze
Journal:  Mol Cell Proteomics       Date:  2012-02-20       Impact factor: 5.911

6.  Curation of characterized glycoside hydrolases of fungal origin.

Authors:  Caitlin Murphy; Justin Powlowski; Min Wu; Greg Butler; Adrian Tsang
Journal:  Database (Oxford)       Date:  2011-05-26       Impact factor: 3.451

7.  Re-annotation of the CAZy genes of Trichoderma reesei and transcription in the presence of lignocellulosic substrates.

Authors:  Mari Häkkinen; Mikko Arvas; Merja Oja; Nina Aro; Merja Penttilä; Markku Saloheimo; Tiina M Pakula
Journal:  Microb Cell Fact       Date:  2012-10-04       Impact factor: 5.328

8.  Closely related fungi employ diverse enzymatic strategies to degrade plant biomass.

Authors:  Isabelle Benoit; Helena Culleton; Miaomiao Zhou; Marcos DiFalco; Guillermo Aguilar-Osorio; Evy Battaglia; Ourdia Bouzid; Carlo P J M Brouwer; Hala B O El-Bushari; Pedro M Coutinho; Birgit S Gruben; Kristiina S Hildén; Jos Houbraken; Luis Alexis Jiménez Barboza; Anthony Levasseur; Eline Majoor; Miia R Mäkelä; Hari-Mander Narang; Blanca Trejo-Aguilar; Joost van den Brink; Patricia A vanKuyk; Ad Wiebenga; Vincent McKie; Barry McCleary; Adrian Tsang; Bernard Henrissat; Ronald P de Vries
Journal:  Biotechnol Biofuels       Date:  2015-08-01       Impact factor: 6.040

9.  The InterPro protein families database: the classification resource after 15 years.

Authors:  Alex Mitchell; Hsin-Yu Chang; Louise Daugherty; Matthew Fraser; Sarah Hunter; Rodrigo Lopez; Craig McAnulla; Conor McMenamin; Gift Nuka; Sebastien Pesseat; Amaia Sangrador-Vegas; Maxim Scheremetjew; Claudia Rato; Siew-Yit Yong; Alex Bateman; Marco Punta; Teresa K Attwood; Christian J A Sigrist; Nicole Redaschi; Catherine Rivoire; Ioannis Xenarios; Daniel Kahn; Dominique Guyot; Peer Bork; Ivica Letunic; Julian Gough; Matt Oates; Daniel Haft; Hongzhan Huang; Darren A Natale; Cathy H Wu; Christine Orengo; Ian Sillitoe; Huaiyu Mi; Paul D Thomas; Robert D Finn
Journal:  Nucleic Acids Res       Date:  2014-11-26       Impact factor: 16.971

10.  The Pfam protein families database: towards a more sustainable future.

Authors:  Robert D Finn; Penelope Coggill; Ruth Y Eberhardt; Sean R Eddy; Jaina Mistry; Alex L Mitchell; Simon C Potter; Marco Punta; Matloob Qureshi; Amaia Sangrador-Vegas; Gustavo A Salazar; John Tate; Alex Bateman
Journal:  Nucleic Acids Res       Date:  2015-12-15       Impact factor: 16.971

View more
  9 in total

1.  Genetic Transformation of Trichoderma spp.

Authors:  Feng Cai; Christian P Kubicek; Irina S Druzhinina
Journal:  Methods Mol Biol       Date:  2021

2.  Network of proteins, enzymes and genes linked to biomass degradation shared by Trichoderma species.

Authors:  Maria Augusta Crivelente Horta; Jaire Alves Ferreira Filho; Natália Faraj Murad; Eidy de Oliveira Santos; Clelton Aparecido Dos Santos; Juliano Sales Mendes; Marcelo Mendes Brandão; Sindelia Freitas Azzoni; Anete Pereira de Souza
Journal:  Sci Rep       Date:  2018-01-22       Impact factor: 4.379

3.  Temporal Expression Dynamics of Plant Biomass-Degrading Enzymes by a Synthetic Bacterial Consortium Growing on Sugarcane Bagasse.

Authors:  Diego Javier Jiménez; Maryam Chaib De Mares; Joana Falcão Salles
Journal:  Front Microbiol       Date:  2018-02-26       Impact factor: 5.640

4.  Genome-wide annotation, comparison and functional genomics of carbohydrate-active enzymes in legumes infecting Fusarium oxysporum formae speciales.

Authors:  Abhijeet Roy; Aiswarya Jayaprakash; Raja Rajeswary T; A Annamalai; Ptv Lakshmi
Journal:  Mycology       Date:  2020-01-04

5.  Low-Cost Cellulase-Hemicellulase Mixture Secreted by Trichoderma harzianum EM0925 with Complete Saccharification Efficacy of Lignocellulose.

Authors:  Yu Zhang; Jinshui Yang; Lijin Luo; Entao Wang; Ruonan Wang; Liang Liu; Jiawen Liu; Hongli Yuan
Journal:  Int J Mol Sci       Date:  2020-01-07       Impact factor: 5.923

6.  "Integrative genomic analysis of the bioprospection of regulators and accessory enzymes associated with cellulose degradation in a filamentous fungus (Trichoderma harzianum)".

Authors:  Jaire A Ferreira Filho; Maria Augusta C Horta; Clelton A Dos Santos; Deborah A Almeida; Natália F Murad; Juliano S Mendes; Danilo A Sforça; Claudio Benício C Silva; Aline Crucello; Anete P de Souza
Journal:  BMC Genomics       Date:  2020-11-02       Impact factor: 3.969

7.  Filamentous fungal applications in biotechnology: a combined bibliometric and patentometric assessment.

Authors:  Pamina Füting; Lars Barthel; Timothy C Cairns; Heiko Briesen; Stefan Schmideder
Journal:  Fungal Biol Biotechnol       Date:  2021-12-28

Review 8.  Factors regulating cellulolytic gene expression in filamentous fungi: an overview.

Authors:  Anu Jose Mattam; Yogesh Babasaheb Chaudhari; Harshad Ravindra Velankar
Journal:  Microb Cell Fact       Date:  2022-03-22       Impact factor: 5.328

9.  The genome sequence of the commercially cultivated mushroom Agrocybe aegerita reveals a conserved repertoire of fruiting-related genes and a versatile suite of biopolymer-degrading enzymes.

Authors:  Deepak K Gupta; Martin Rühl; Bagdevi Mishra; Vanessa Kleofas; Martin Hofrichter; Robert Herzog; Marek J Pecyna; Rahul Sharma; Harald Kellner; Florian Hennicke; Marco Thines
Journal:  BMC Genomics       Date:  2018-01-15       Impact factor: 3.969

  9 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.