| Literature DB >> 23414703 |
Aaron Weimann1,2, Yulia Trukhina1,2, Phillip B Pope3, Sebastian Ga Konietzny1,2, Alice C McHardy1,2.
Abstract
BACKGROUND: Understanding the biological mechanisms used by microorganisms for plant biomass degradation is of considerable biotechnological interest. Despite of the growing number of sequenced (meta)genomes of plant biomass-degrading microbes, there is currently no technique for the systematic determination of the genomic components of this process from these data.Entities:
Year: 2013 PMID: 23414703 PMCID: PMC3585893 DOI: 10.1186/1754-6834-6-24
Source DB: PubMed Journal: Biotechnol Biofuels ISSN: 1754-6834 Impact factor: 6.040
Figure 1Frequencies of the selected Pfam families in the individual genomes and metagenomes. The data for each entry are rescaled by the total number of Pfam domains annotated to the microbial genome or metagenome. The color scale from grey to black indicates domain families that are present in low to high amounts, respectively. White indicates absent protein domains. The signs “+” and “-” indicate whether a protein domain was chosen in the respective experiment.
Figure 2Frequencies of selected glycoside hydrolase (GH) families and carbohydrate binding modules (CBMs) in the (meta-) genome sequences. The data for each entry are rescaled by the total number of GH and CBM domains annotated to the microbial genome or metagenome. The coloring from black to grey indicates domains that are present in high to low amounts, respectively. White indicates absent domain families (“A”, “a”, “B”, “b”, “C”, “c” as described in Table 1).
Misclassified species in the SVM analyses
| | ||
| | ||
Shown are species which were misclassified with the eSVMCAZY_B and the eSVMbPFAM classifiers. Contrary to previous beliefs [22], recent literature indicates in agreement with our predictions that T. curvata is a non-degrader. Furthermore, recent evidence supports that A. mirum is a lignocellulose degrader, which has not been previously described [23].
Accuracy of classifying microbes as lignocellulose-degraders or non-degraders
| 0.91 | 0.84 | 0.90 | 0.96 | 0.94 | 0.91 | 0.93 | 0.87 | |
| 0.86 | 0.73 | 0.81 | 0.94 | 0.90 | 0.88 | 0.88 | 0.79 | |
| 0.96 | 0.96 | 0.98 | 0.98 | 0.98 | 0.95 | 0.98 | 0.95 | |
L1-regularized SVMs were trained with Pfam domain or CAZY family (meta-)genome annotations. Capital letters denote classifiers trained based on the presence or absence of CAZy families and small letters indicate classifiers trained based on the relative abundances of CAZy families in annotations. Abbreviations “A”, “a”,” B”, “b”, “C”, “c” denote the following: Classifiers “A“,“a“ were trained with annotations of all CAZy families for 16 microbial genomes; Classifiers “B“,“b“ were trained with annotations for all CAZy families, except for the GT family members (which were not annotated for the Tammar Wallaby metagenome), for 16 genomes and the TW metagenome of plant biomass degraders; Classifiers “C“,“c“ were trained with annotations for the GH families and CBMs for the 16 microbial genomes and three metagenomes of plant biomass degraders, as only these were annotated for the metagenomes. All CAZy-based classifiers were trained with available annotations for 64 genomes of non-biomass degraders. The Pfam-based classifiers were trained with 21 (meta-)genomes of biomass-degraders and 82 microbial genomes of non-degraders. For more details on the experimental set-up and the evaluation measures shown see the Methods section on performance evaluation.
Prediction of the plant biomass degradation capabilities for 15 draft genomes
| eSVMCAZY_B | ++ | ++ | ++ | + | ++ | ++ | 0 | - - | - - | - - | - - | - - | - - | - - | - - |
| eSVMbPFAM | ++ | ++ | ++ | ++ | ++ | - | ++ | + | - - | - | - - | - - | - - | - | - - |
| CMC | GH5 (TW-33) | GH5 (TW-40) | GH10 (TW-34) | GH5 (TW-39) | | | | | | | | | | | |
| GH26 (TW-10) | |||||||||||||||
| GH10 (TW-8) | |||||||||||||||
| GH5 (MH-2) | |||||||||||||||
| XYL | | GH10 (TW-25) | GH10 (TW-30) | GH10 (TW-8) | | | | | | | | | | | |
| GH10 (TW-31) | |||||||||||||||
| GH10 (TW-37) | |||||||||||||||
| SWG | | GH5 (TW-40) | | | | | | | | | | | | | |
| GH5 (MH-2) | |||||||||||||||
| MIS | GH9 (TW-64) | GH5 (TW-40) | | GH5 (TW-39) | | | | | | | | | | | |
| GH5 (MH-2) | |||||||||||||||
| GH9 (TW-50) | |||||||||||||||
| AVI | GH9 (TW-64) | GH5 (TW-40) | | GH5 (TW-39) | | | | | | | | | | | |
| GH5 (MH-2) | |||||||||||||||
| GH9 (TW-50) | |||||||||||||||
| LIC | GH5 (TW-40) | GH5 (TW-39) | |||||||||||||
| GH5 (MH-2) | |||||||||||||||
| GH9 (TW-50) |
Genome reconstructions from the metagenome of a microbial community adherent to switchgrass in the cow rumen were obtained by taxonomic binning of assembled sequences in the original study. Symbols depict the prediction outcome of a voting committee of the 5 eSVMCAZY_B and the eSVMbPFAM classifiers with the best macro-accuracy (see text for the description of the classifiers). ++: genome classified as plant biomass degrader by all classifiers; +: genome classified as plant biomass degrader by 4 out of 5 classifiers; 0: ambiguous prediction; -: genome classified as not plant biomass degrader by 4 out of 5 classifiers; --: genome classified as not plant biomass degrader by all classifiers. For every draft genome, the presence of genes encoding glycoside hydrolases with verified enzymatic activity for different substrates in this study [14] is indicated. The genome and substrate names correspond to those of Figure 3 and Table S6 of the study.
Hydrolytic activity detected on:
(CMC) 1% (w/v) carboxymethyl cellulose agar.
(XYL) 1% (w/v) Xylan.
(SWG) 1% (w/v) IL-Switchgrass.
(MIS) 1% (w/v) IL-Miscanthus.
(AVI) 1% (w/v) IL-Avicel.