| Literature DB >> 31189469 |
Christian P Kubicek1,2, Andrei S Steindorff3,4, Komal Chenthamara1, Gelsomina Manganiello4,5, Bernard Henrissat6,7,8, Jian Zhang9, Feng Cai9, Alexey G Kopchinskiy1, Eva M Kubicek2, Alan Kuo4, Riccardo Baroncelli10, Sabrina Sarrocco11, Eliane Ferreira Noronha3, Giovanni Vannacci10, Qirong Shen12, Igor V Grigoriev13,14, Irina S Druzhinina15,16.
Abstract
BACKGROUND: The growing importance of the ubiquitous fungal genus Trichoderma (Hypocreales, Ascomycota) requires understanding of its biology and evolution. Many Trichoderma species are used as biofertilizers and biofungicides and T. reesei is the model organism for industrial production of cellulolytic enzymes. In addition, some highly opportunistic species devastate mushroom farms and can become pathogens of humans. A comparative analysis of the first three whole genomes revealed mycoparasitism as the innate feature of Trichoderma. However, the evolution of these traits is not yet understood.Entities:
Keywords: Ankyrin domains; CAZymes; Core genome; Environmental opportunism; Gene gain; Gene loss; Orphans; SSCPs
Mesh:
Substances:
Year: 2019 PMID: 31189469 PMCID: PMC6560777 DOI: 10.1186/s12864-019-5680-7
Source DB: PubMed Journal: BMC Genomics ISSN: 1471-2164 Impact factor: 3.969
Fig. 1Trichoderma spp. in nature. a The fallen log of the dead wood colonized by the other fungi represents the major ecological niche for Trichoderma spp. b Trichoderma atroviride on dead wood. c T. harzianum on soil. d T. simonsii on the sporocarps of Stereum sp. Some species may also colonize soil and become endophytes. Scale bar on B and C corresponds to 1 cm
Fig. 2Phylogeny of the genus Trichoderma and occurrence of the most common species. Phylogeny of Trichoderma based on Bayesian analysis of the rpb2 gene (see Methods for details). Only species with major abundance (> 100 nucleotide sequences deposited in NCBI GenBank, April 2018,) are shown. The number of core nucleotide sequences deposited in GeneBank is indicated by the size of the filled circles with T. pleuroticola, N = 103 being the smalles shown. Sections Longibrachiatum and Trichoderma and the Harzianum/Virens clade are indicated by colored vertical bars. Rare Trichoderma spp. (< 100 nucleotide sequences known in public databases) are not shown. Circles for T. reesei and T. viride likely represent false positive values as T. reesei is most studied species, while T. viride is the oldest Trichoderma species name that was assigned to all strains before DNA barcoding became available
Properties of the Trichoderma genomes and gene distribution
| Clade | Species | Strain | Genome size (Mbp) | Total genes | Complete-ness (%) | Fragmen-ted (%) | Missing (%) | Orthologs and paralogs |
|---|---|---|---|---|---|---|---|---|
|
|
| QM6a | 32.7 | 9877 | 96.9 | 2.4 | 0.7 | 8090 |
| RUT C30 | 34.2 | 10877 | ||||||
|
| ATCC18648 | 31.74 | 10938 | 86.3 | 7.9 | 5.8 | 8229 | |
|
| TUCIM 6016 | 33.2 | 9737 | 94.1 | 3.1 | 2.8 | 7834 | |
|
| CBS125925 | 32.07 | 9292 | 95.3 | 3.7 | 1 | 8328 | |
|
| CBS 226.95 | 40.9 | 14095 | 98.1 | 1.4 | 0.5 | 9921 | |
| TR257 | 39.4 | 13932 | 97.2 | 2 | 0.8 | 9870 | ||
|
| T6776 | 39.7 | 11297 | 95.1 | 1.8 | 3.1 | 9541 | |
|
| NJAU4742 | 38.8 | 11297 | 98.3 | 1.2 | 0.5 | 9246 | |
|
| Gv29–8 | 40.52 | 12427 | 97.8 | 1.9 | 0.3 | 9795 | |
|
|
| IMI 206040 | 36.4 | 11863 | 97.5 | 2.1 | 0.4 | 9301 |
|
| T6085 | 37.9 | 10709 | 94.1 | 2.1 | 3.8 | 8825 | |
|
| CBS433.97 | 37.66 | 12586 | 97.9 | 1.4 | 0.7 | 9143 | |
|
| GD12 | 38.43 | 10520 | 98.6 | 0.7 | 0.7 | 9030 |
a numbers show data (from left to right) obtained in this paper and by Li et al. [27] (if available)
Pairwise genetic distance between orthologous proteins from 13 Trichoderma strainsa
acolors show relative high (red), intemediate (yelow) and low (green) values
Fig. 3Bayesian chronogram obtained based on the concatenated alignment of 638 core orthologous proteins of Hypocreales and the two other Sordariomycetes. All nodes were supported with PP = 1. Chronological estimations are given in a geological time scale in Mya, and the numbers represent the corresponding node age. Numbers with asterisks at nodes indicate calibration points against the origin of Hypocreales (see Methods for details). Bars correspond to 95% confidence interval in time estimation based on the lognormal relaxed clock
Distribution of Trichoderma genes in sections, clades and species
| Clade | Species | Present in | Absent from | Total: | ||||
|---|---|---|---|---|---|---|---|---|
| All clusters with at least one gene from | At least two species from each clade | the clade only | SL | HV | ST | |||
| 13,089 | 7923 | 80/745/286 | 1083 | 68 | 335 | |||
| SL |
| 8775 | 8176 | 80 | 38 | 232 | 8526 | |
|
| 8636 | 7951 | 80 | 46 | 249 | 8326 | ||
|
| 9205 | 8436 | 81 | 48 | 275 | 8840 | ||
|
| 8757 | 8105 | 81 | 37 | 234 | 8457 | ||
| HV |
| 12737 | 9419 | 778 | 1038 | 300 | 11535 | |
|
| 12698 | 9392 | 763 | 1016 | 296 | 11467 | ||
|
| 10996 | 9036 | 533 | 826 | 230 | 10625 | ||
|
| 10811 | 8802 | 500 | 867 | 232 | 10401 | ||
|
| 11474 | 9341 | 410 | 875 | 276 | 10902 | ||
| ST |
| 10738 | 9009 | 267 | 841 | 54 | 10171 | |
|
| 10039 | 8501 | 267 | 780 | 31 | 9579 | ||
|
| 10595 | 8886 | 266 | 839 | 47 | 10038 | ||
|
| 10164 | 8694 | 241 | 833 | 29 | 9797 | ||
PFAM group members with more than 500 genes in the 13 Trichoderma isolates
| clusters | genes per cluster | total genes | genes/ species | HV/SLa | HV/STa | ST/SLa | singletonsb | multiplesc | C/Sd (%) | ||
|---|---|---|---|---|---|---|---|---|---|---|---|
| Zn2Cys6 transcriptional regulators | PF04082 | 238 | 12.3 | 2929 | 225.3 | 1.69 | 1.21 | 1.40 | 84 | 154 | 64.7 |
| MFS permeases | PF07690 | 172 | 17.3 | 2972 | 228.6 | 1.55 | 1.06 | 1.46 | 67 | 105 | 61.0 |
| short-chain dehydrogenases/reductases | PF00106 | 121 | 17.6 | 2129 | 163.8 | 1.68 | 1.18 | 1.42 | 46 | 75 | 62.0 |
| ankyrin-containing proteins | PF00023 | 106 | 14.4 | 1524 | 117.2 | 1.84 | 1.05 | 1.75 | 20 | 86 | 81.1 |
| alpha-beta-hydrolases | PF00561, 07859, 02230 | 96 | 14.4 | 1382 | 106.3 | 2.02 | 1.27 | 1.59 | 29 | 67 | 69.8 |
| protein kinases | PF00069 | 98 | 12.4 | 1218 | 93.7 | 1.15 | 1.11 | 1.03 | 60 | 38 | 38.8 |
| zinc-dependent alcohol dehydrogenases | PF00107 | 66 | 17.6 | 1159 | 89.2 | 1.89 | 1.17 | 1.61 | 15 | 51 | 77.3 |
| FAD-binding oxidases | PF01494, 01565 | 84 | 13.8 | 1158 | 89.1 | 1.56 | 1.21 | 1.29 | 25 | 59 | 70.2 |
| methyltransferases | PF00891 | 91 | 12.7 | 1157 | 89.0 | 1.24 | 1.09 | 1.14 | 46 | 45 | 49.5 |
| AAA + -ATPasesAAA+ − ATPases | PF00004 | 85 | 13.3 | 1130 | 86.9 | 1.13 | 1.06 | 1.07 | 54 | 31 | 36.5 |
| cytochrome P450 monooxygenases | PF00067 | 59 | 15.9 | 940 | 72.3 | 1.65 | 1.69 | 0.97 | 14 | 45 | 76.3 |
| sugar transporters | PF00083 | 65 | 14.2 | 923 | 71.0 | 1.53 | 1.17 | 1.31 | 33 | 32 | 49.2 |
| ABC-transporters | PF00005 | 48 | 16.3 | 780 | 60.0 | 1.22 | 1.13 | 1.07 | 21 | 27 | 56.3 |
| vegetative heteroincompatibility (HET) proteins | PF06985, 07217, 17,108 | 72 | 10.5 | 753 | 57.9 | 1.94 | 1.24 | 1.56 | 13 | 59 | 81.9 |
| aminotransferases | PF01490 | 49 | 13.8 | 675 | 51.9 | 1.47 | 1.18 | 1.25 | 21 | 28 | 57.1 |
| amino acid permeases | PF00324 | 48 | 13.1 | 630 | 48.5 | 1.17 | 1.10 | 1.06 | 30 | 18 | 37.5 |
| amidases | PF01979, 04909 | 37 | 17.0 | 629 | 48.4 | 1.93 | 1.20 | 1.62 | 12 | 25 | 67.6 |
| acetytransferase | PF00583, 00797, 13,302, 13,523 | 49 | 12.2 | 600 | 46.2 | 1.26 | 1.10 | 1.15 | 23 | 26 | 53.1 |
| DEAD-box helicases | PF00270 | 42 | 13.5 | 567 | 43.6 | 0.97 | 0.96 | 1.01 | 33 | 9 | 21.4 |
| NmrA-like proteins, NAD-binding negative regulators of GATA-binding proteins | PF05368 | 41 | 13.5 | 552 | 42.5 | 3.01 | 1.11 | 2.70 | 10 | 31 | 75.6 |
| DnaJ molecular chaperone | PF00226 | 42 | 12.8 | 537 | 41.3 | 1.02 | 0.99 | 1.04 | 19 | 23 | 54.8 |
| RRM_1 RNA binding proteins | PF00076 | 42 | 12.8 | 537 | 41.3 | 1.00 | 0.99 | 1.01 | 40 | 2 | 4.8 |
a -ratio of the number of genes in all species belonging to one of the Trichoderma sections or clades
b - genes which are present in one a single copy per cluster
c - genes that occur in more than one copy per cluster in at least one species
d - percentage of clusters containing multiple genes
OrthoMCL clusters shared between Trichoderma and other Sordariomycetes Fungi
Entomopat. - six species of Entomopathogenic Hypocreales, Plant pat. - five species of plant pathogenic Hypocreales, Eweb - Escovopsis weberi, Av ± sd – average ± standard deviation; For strain abbreviations, see Methods. T - Trichoderma, N, Nectriaceae. PFAM categories printed in bold specify those that are significantly (ANOVA coupled with Dunnett’s post-test, P < 0.05) different compared with Trichoderma species
Brown background: enriched in Trichoderma; blue background: less abundant in Trichoderma
Fig. 4The structure of Trichoderma core genomes as revealed based on 13 strains. The number of genes of the core genome for which a KOG classification was obtained. The total number of genes in the core genome is 7000. The size of the boxes represents the abundance of the genes within the main KOG classifications (Cellular processes and signaling – green shades; Information storage and processing – violet shades; Metabolism – reddish shades; Poorly characterized -Grey shade. Predicted ORFs are shown in black). The numbers specify the numbers of core genome genes that belong to the respective functional groups
Fig. 5The share of Trichoderma core genome with genomes of other fungi. Genes of the Trichoderma core genome which have orthologs in other fungi (a, functionally annotated genes; b, unknown genes). The analysis was performed with Intervene [39]. The vertical bars and numbers indicate the number of genes that are shared by the fungal groups as indicated by the circles below the graph. The horizontal bar over the fungal groups indicates the total number of genes with orthologs in Trichoderma. Hypocreales entomoparasites were estimated based on the genomes of Beauveria bassiana, Cordyceps militaris, Metarhizium acridum and M. robertsii; Hypocreales phytoparasites were estimated based on the genomes of Fusarium graminearum, Nectria haematococca, and F. verticillioides; for Sordariales the genomes of Neurospora crassa and Chaetomium globosum were used. Other Pezizomycotina were assessed based on the analysis of the genomes of Chochliobolus heterostrophus, Exophiala xenobiotica, Aspergillus fumigatus, A. oryzae and Oidiodendron maius (see methods for details)
Fig. 6Genome evolution in Trichoderma. a time scaled evolutionary tree: red branches indicate only gains; blue branches only losses; black branches both gains and losses. Numbers over the branches indicate the number of gene changes per Mya; numbers below the branches indicate the number of gains (red) and losses (blue). b Heat map representing Pfam domains identifiend for OrthoMCL clusters that we gained or lost in the course of Trichoderma evolution. Framed rectangules correspond to extant species. Pale color used for hypothetical taxonomic units (HTUs, ancestral states). c. Principal component analysis based on the number of genes per each Pfam group that have been influenced by gene gain and loss in Trichoderma. Filled cicles correspond to extant Trichoderma species as shown on A. Grey circles correspond to HTUs per each infrageneric group (see a and b). Bold lined circles show the ancestral node for the respective section or clade. Circles with red/yellow and orangy/blue color show the ancestral node for the genis Trichoderma and SL/HV groups
Mating type genes in Trichoderma
| mating protein | mating protein | mating protein | mating protein | |
|---|---|---|---|---|
| MAT1-2-1 | MAT1-1-1 | MAT 1-1-2 | MAT 1-1-3 | |
|
|
| |||
|
| 1427955 | 1467528 | 1388533 | |
|
|
| |||
|
| -b | -b | OTA08401 | |
|
|
| |||
|
| 434806 | 863060 | 863056 | |
|
| OPB38549 | |||
|
| KKO 5631 | |||
|
| 60622 | |||
|
| 33998 | |||
|
| TGAM01v2_08385 | |||
|
| 64910 | 158842 | 451243 | |
|
| 12232 | 12231a | 12231a |
a annotated as one protein
b no ortholog detected by Blastp against NCBI database
Fig. 7The glycosyl hydrolase (GH) inventory of Trichoderma. GHs are ordered according to their substrate (in broader sense), which is given on the right side. GHs which contain enzymes that act on different substrates (GH3, GH5, GH30) are indicated by extended numbers. GH18 chitinases are grouped into A (no binding domain), B (attached CBM1) and C (attached GH18 and/or GH50; see Fig. 8). Numbers mean the number of genes belonging to the respective GH family in their genomes. The cladogram on the left is based on the total number of genes per each GH family, complete linkage, Eucleadian distance
Fig. 8Type and presence of carbohydrate binding domains in Trichoderma. a The summary of domains: those columns marked with an asterisk indicate individual domains, i.e. domains which occur as separate proteins and are not attached to another enzyme. b Patterns of CBMs in GH18 chitinases
Number and types of small secreted cystein-rich proteins in Trichoderma
| SSCPs | HFBs | Cerato-platanins | |||
|---|---|---|---|---|---|
| class II | pseudo-class I | ||||
| SL |
| 39 | 7 | 3 | |
|
| 89 | 7 | 3 | ||
|
| 50 | 7 | 3 | ||
|
| 27 | 7 | 3 | ||
| HV |
| 113 | 12 | 3 | 3 |
|
| 66 | 9 | 3 | 3 | |
|
| 44 | 10 | 2 | 3 | |
|
| 65 | 12 | 3 | 3 | |
| ST |
| 75 | 13 | 3 | 3 |
|
| 42 | 12 | 3 | 3 | |
|
| 125 | 11 | 3 | 3 | |
|
| 62 | 11 | 3 | 3 | |
Fig. 9Numbers and clusters of orphan genes in the chromosomes of T. reesei. a Numbers and clusters; “chromosome end” specifies those orphans that are located with 100 kb from either chromosome end; “middle” specifies those that are located in the remaining area. b Number of orphan genes in clusters of varying size. The T. reesei chromosomes, published by Li et al. [27] were used for these investigations