Literature DB >> 29771380

dbCAN2: a meta server for automated carbohydrate-active enzyme annotation.

Han Zhang¹, Tanner Yohe², Le Huang¹, Sarah Entwistle², Peizhi Wu¹, Zhenglu Yang¹, Peter K Busk³, Ying Xu⁴, Yanbin Yin².

Abstract

Complex carbohydrates of plants are the main food sources of animals and microbes, and serve as promising renewable feedstock for biofuel and biomaterial production. Carbohydrate active enzymes (CAZymes) are the most important enzymes for complex carbohydrate metabolism. With an increasing number of plant and plant-associated microbial genomes and metagenomes being sequenced, there is an urgent need of automatic tools for genomic data mining of CAZymes. We developed the dbCAN web server in 2012 to provide a public service for automated CAZyme annotation for newly sequenced genomes. Here, dbCAN2 (http://cys.bios.niu.edu/dbCAN2) is presented as an updated meta server, which integrates three state-of-the-art tools for CAZome (all CAZymes of a genome) annotation: (i) HMMER search against the dbCAN HMM (hidden Markov model) database; (ii) DIAMOND search against the CAZy pre-annotated CAZyme sequence database and (iii) Hotpep search against the conserved CAZyme short peptide database. Combining the three outputs and removing CAZymes found by only one tool can significantly improve the CAZome annotation accuracy. In addition, dbCAN2 now also accepts nucleotide sequence submission, and offers the service to predict physically linked CAZyme gene clusters (CGCs), which will be a very useful online tool for identifying putative polysaccharide utilization loci (PULs) in microbial genomes or metagenomes.

Entities: Chemical Disease Gene Species

Mesh：

Substances：

Year: 2018 PMID： 29771380 PMCID： PMC6031026 DOI： 10.1093/nar/gky418

Source DB: PubMed Journal: Nucleic Acids Res ISSN： 0305-1048 Impact factor: 16.971

INTRODUCTION

Importance of complex carbohydrates

Carbohydrates are one of the four major classes of large biopolymers found in all cells together with nucleic acids, proteins, and lipids. Carbohydrates include monosaccharides, oligosaccharides, and polysaccharides. Hybrid biopolymers with carbohydrates covalently linked to other biopolymers, such as glycoproteins and glycolipids, are called glycoconjugates. Complex carbohydrates and glycoconjugates are synthesized, degraded, and modified by carbohydrate active enzymes (CAZymes) in all organisms (1). Particularly, plants use photosynthesis to convert carbon dioxide and water into sugars, which are further turned into carbohydrates such as starches and celluloses with the help of CAZymes. Therefore, CAZymes are vitally important for plants and plant-associated animals and microbes, and not surprisingly CAZyme genes are particularly abundant in genomes of plants and plant-degrading microbes (2,3).

Importance of CAZymes

In addition to their significance in bioenergy and agricultural industries (4), CAZymes are also extremely important for human health (5). This is because humans and other animals depend on bacteria living in the digestive tracts to degrade various indigestible carbohydrates and salvage nutrients (6). It has been shown that the genomes of animal gut bacteria encode hundreds of carbohydrate-degrading GH (glycoside hydrolase) genes, in contrast to only 17 digestive GH genes encoded in the human genome (7). Recent research has suggested that altering the dietary carbohydrate composition has a profound impact on the gut microbiota structure, which further influence the human health (8,9).

CAZy database

Since 1990s over 360 CAZyme families have been defined and classified by the CAZy database (10), forming six major classes: glycosyltransferases [GTs], glycoside hydrolases [GHs], polysaccharide lyases [PLs], carbohydrate esterases [CEs], carbohydrate-binding module [CBM] and enzymes for the auxiliary activities [AAs]. CAZy also assigns GenBank proteins to CAZyme families and these CAZy pre-annotated proteins are the foundation for sequence similarity-based CAZyme annotation.

Methods for CAZyme annotation

Owing to the importance of CAZymes, newly sequenced genomes are often analyzed for putative CAZymes (collectively named CAZome). Two approaches of CAZome annotation exist in the literature: Users contact the CAZy database for collaboration, who will perform semi-automatic CAZome annotation for the users (11); as expert manual curations are involved, CAZy annotation is regarded as the gold standard method. Users run automatic tools such as HMMER (12) or BLAST (13) by themselves for CAZome annotation on their own computers or on the web (see below). Before 2012, BLAST was often used to search against CAZy pre-annotated proteins on users’ own computers. In 2010, CAT (CAZyme Analysis Toolkit) was developed as a web server, which allows users to run both BLAST and HMMER searches remotely on the CAT web server (14). The HMMER search is run against Pfam HMMs (hidden Markov models) that are associated with CAZy pre-annotated CAZymes. In 2012, we developed dbCAN, a database of HMMs for CAZyme family-specific signature domains (4). Different from CAT, for each CAZyme family we retrieved its signature domains from CAZy pre-annotated members, by searching against the CDD (conserved domain database of NCBI) database and manual literature curation; we then built our own HMMs for most CAZyme families instead of using Pfam HMMs. We update dbCAN almost once a year, by creating HMMs for CAZyme families and subfamilies newly created in the CAZy database (Figure 1). Users can download our HMMs and run HMMER locally for automated CAZome annotation. We also provide a Perl script to help parse the HMMER output, which returns CAZyme signature domains, their boundaries, E-values, and HMM domain coverage. Such domain-based annotation is particularly useful for CAZymes, as they tend to be modular proteins with multiple CAZyme domains and sometime domain repeats (e.g. multiple CBMs of the same family).

Figure 1.

dbCAN is updated every year and now has 575 HMMs. X-axis: year; Y-axis: number of HMMs of families (blue) and subfamilies (red).

dbCAN is updated every year and now has 575 HMMs. X-axis: year; Y-axis: number of HMMs of families (blue) and subfamilies (red). To help users who do not have programming experience, we also developed a web server to allow users submit protein sequences and run HMMER on our server to identify CAZymes. With the CAT website no longer maintained since 2013 and eventually obsolete in 2017, dbCAN has become the only web server that is still actively updated and offering online CAZyme annotation service. In 2017, a new tool named Hotpep (15) annotates CAZymes by searching against PPR (peptide pattern recognition) library for conserved short peptide motifs (16) present in different CAZyme families. In the PPR library, each CAZyme family has a set of 6-mer peptides that are conserved in that family, and Hotpep is used to scan new proteins for the presence of these peptides in order to assign the query proteins into existing CAZyme families.

Importance of automated CAZyme annotation

It should be mentioned that approach B is actually also included in approach A, but can be fully automated and carried out in the users’ own hands. Using CAZy already annotated CAZomes to benchmark the automated CAZyme annotation found >90% of accuracy typically for model bacterial genomes (3). Clearly, as more and more genomes and metagenomes becoming available, such automated CAZome annotation has a clear advantage over annotation by CAZy through collaboration, in that users can quickly obtain the candidate CAZyme gene list by themselves as part of their bioinformatics pipeline for genome annotation. Indeed, the popularity of automated CAZome annotation can be manifested by citations of the two approaches. Specifically, ∼100 papers have been published since 2012 with CAZomes annotated by collaboration with CAZy (according to http://www.cazy.org/Genomes.html). As a comparison, more than 300 papers have been published since 2012 using dbCAN for CAZome annotation (according to Google Scholar: https://scholar.google.com/scholar?cites=5112424923296812233, only counted papers that used the tool for finding CAZymes), and more than 100 papers have been published since 2012 using CAT for CAZome annotation (according to Google Scholar: https://scholar.google.com/scholar?cites=12948408578800903520, also only counted papers that used the tool for finding CAZymes). Lastly, the availability of dbCAN HMMs has also enabled other bioinformatics tools to incorporate CAZyme annotation step into their data analysis workflows, e.g., MOCAT2 (17), DemaDb (18), proGenomes (19) and SACCHARIS (20).

NEW FUNCTIONS AND UPDATES

Figure 2 shows the overall design of dbCAN2, an updated meta server of dbCAN server, which has the following new functions: (i) allows submission of DNA sequences in addition to protein sequences; (ii) integrates three state-of-the-art tools/databases for automated CAZyme annotation; (iii) can identify transcription factors (TFs), transporters (TCs), and further CAZyme gene clusters (CGCs) using CGC-Finder (3); (iv) combines the results from the three tools, allows visualization as a Venn diagram and detailed results as graphs, and offers an easy solution to download results as text files.

Figure 2.

Overall design of dbCAN2 meta server. GCPU (gene cluster plot utility) and CGC-Finder (CAZyme gene cluster finder) are two tools developed for dbCAN2.

DNA sequence submission

In addition to protein submission, dbCAN2 now also accepts nucleotide sequences, e.g. the complete or draft genomes and metagenomes of prokaryotes. Protein sequences are predicted by calling Prodigal (21) if the query is genomes, or FragGeneScan (22) if the query is short DNAs from metagenomes or mRNAs or coding sequences of proteins. As eukaryotic gene prediction is more complex and often needs additional input data (e.g. transcriptome data), users should perform gene predictions for eukaryotic genomes elsewhere and only submit protein sequences to dbCAN2.

Meta server of three tools/databases

The dbCAN web server (http://csbl.bmb.uga.edu/dbCAN/) currently provides HMMER search against dbCAN HMM database, and also DIAMOND (23) search against CAZy pre-annotated CAZyme sequence database. However, the results from the two tools are presented on two separate pages and not integrated at any level. In dbCAN2, we have added the third tool: Hotpep search against the PPR short peptide library. We have also systematically compared the outputs of the three tools against the CAZy pre-annotated CAZomes (i.e. as the gold standard sets) of three bacterial genomes and three eukaryotic genomes (Supplementary Table S1), in order to: (i) find the best parsing thresholds (e.g. E-value) for each tool, (ii) evaluate the annotation performance of the three tools and (iii) find the best way to aggregate the three outputs to achieve the best annotation performance. The accuracy is calculated as an F-score = 2 × (Recall × Precision)/(Recall + Precision) for the three tools on each examined genome, following the method presented in our previous papers (2,3). We removed unclassified CAZymes (e.g. GH0) and families not in the PPR library when calculating F-scores. Supplementary Table S1 presents the best parsing thresholds that we selected to use for the web server: (i) for HMMER+dbCAN, we use E-value <1e–15 and coverage >0.35; (ii) for DIAMOND+CAZy, we use E-value <1e–102 and (iii) for Hotpep+PPR, we use the number of conserved peptide hits >6 and the sum of conserved peptide frequencies >2.6. Table 1 shows that DIAMOND+CAZy has the highest F-score (0.89) for bacteria but the lowest F-score for eukaryotes (0.84); in contrast, Hotpep + PPR has the highest F-score (0.94) for eukaryotes but the lowest F-score for bacteria (0.80). HMMER + dbCAN performs very well for both eukaryotes (0.86) and bacteria (0.88) and a slightly higher overall F-score than the other two tools (Supplementary Table S1). In terms of running time, DIAMOND runs the fastest, followed by Hotpep and HMMER.

Table 1.

Comparison of tools for automated CAZyme annotation

	Accuracy (F-score)
Tools + databases	Bacteria	Eukaryotes	Subfamily	Multi-family proteins	Domain repeats	Domain positions	Speed^c
HMMER+dbCAN	0.88	0.86	Yes^a	Yes	Yes	Yes	69
DIAMOND+CAZy	0.89	0.84	Yes^a	No	No	No	4
Hotpep+PPR	0.80	0.94	Yes^b	Yes	No	No	7
Predicted by > = 2 tools	0.93	0.92

aTwenty four CAZyme families are classified into 207 subfamilies by phylogenetic clustering and CAZy expert curation (10).

bThree hundred and forty two CAZyme families are classified into 7036 groups by PPR (15,16).

cThe time is in seconds and calculated on Escherichia coli K-12 MG1655 proteome (4140 proteins). The detailed calculations on accuracy and speed are available in Supplementary Table S1. No correspondence has been established between PPR groups and CAZy subfamilies, and in dbCAN web server we only report CAZy subfamily annotation, whenever it is available.

aTwenty four CAZyme families are classified into 207 subfamilies by phylogenetic clustering and CAZy expert curation (10). bThree hundred and forty two CAZyme families are classified into 7036 groups by PPR (15,16). cThe time is in seconds and calculated on Escherichia coli K-12 MG1655 proteome (4140 proteins). The detailed calculations on accuracy and speed are available in Supplementary Table S1. No correspondence has been established between PPR groups and CAZy subfamilies, and in dbCAN web server we only report CAZy subfamily annotation, whenever it is available. More importantly, we found that the best performance of automated CAZyme annotation is to aggregate the outputs of the three tools and keep candidates found by at least two tools. Table 1 shows that the F-score can be increased to 0.93 when keeping proteins found by at least two tools. However, the above F-score calculation only considered whether a protein is found by any of the three tools. When considering if a protein is assigned to the correct family or families, we found that the F-scores for all the three tools had slightly dropped (Supplementary Table S2), with Hotpep + PPR dropped the most (dropped to 0.86 for eukaryotes and 0.70 for bacteria) and HMMER + dbCAN dropped the least (dropped to 0.85 for eukaryotes and 0.82 for bacteria). Additionally, proteins can have multiple CAZyme domains, and it is also interesting to know where the domain boundaries are. Figure 3 shows two example CAZyme proteins found by all the three tools. Both proteins have multiple CAZyme domains according to dbCAN annotation (Figure 3A). According to HMMER + dbCAN output (Figure 3C), AT1G11720.1 is annotated as CBM53(154–237) + CBM53(329–423) + CBM53(496–584) + GT5(595–1038) and YP_002573728.1 as GH9(36–466) +CBM3(491–576) + CBM3(724–804) + CBM3(923–1003) + GH48(1134–1753), i.e. all the CAZyme domains and domain repeats and their positions are reported (Table 1). However, according to both Hotpep + PPR and DIAMOND + CAZy, AT1G11720.1 is annotated as GT5 + CBM53 and YP_002573728.1 as GH9 + GH48 + CBM3, i.e. proteins are assigned to the multiple families correctly, though without reporting domain repeats and positions (Table 1).

Figure 3.

Comparison of annotation results for multi-domain CAZymes using three different tools. (A) Two example proteins (AT1G11720.1 and YP_002573728.1) are illustrated with their CAZyme domain architecture based on dbCAN search. (B) DIAMOND search result for the two proteins showing the best CAZy protein hit; (C) HMMER search result against dbCAN HMM database, from which (A) is derived; (D) Hotpep search result against PPR library; Frequency means the sum of conserved peptide frequencies and Hits means the number of conserved peptide hits (15). It should be mentioned that DIAMOND + CAZy has a much higher risk than the other two tools to give wrong CAZyme family annotation. For example, if a query protein only has a GT5 domain and has AAD30251.1 as its best CAZy hit, transferring the family assignment of AAD30251.1 (GT5 + CBM53) to the query would be wrong (as no CBM53 in the query). However, such mistakes will not happen in HMMER and Hotpep searches, as they are conserved domain and motif-based methods.

CAZyme gene clusters (CGCs)

Another important new function of dbCAN2 is that it allows identification of CGCs, when the genomic locations of all genes of the query genome are given. In literature, CGCs are also known as polysaccharide utilization loci (PULs), which are defined as physically linked genes specializing in the degradation of various complex carbohydrates (24). Most experimentally characterized PULs are found in Bacteroidetes genomes (25), but have also been reported in Proteobacteria and Firmicutes of various carbohydrate-rich environments (26). The PULDB of CAZy initially focused on susCD (starch utilization system C and D transporters) associated PULs, and more recently expanded to present CAZyme clusters (3 and more CAZyme genes clustered in the genome) on its website (25). However, PULDB focuses on Bacteroidetes genomes and does not allow online genome submissions for PUL predictions. Recently, we defined CGCs as a more general term of PULs (3), which must contain three classes of signature genes: at least one CAZyme gene, one transporter (TC) gene, and one transcription factor (TF) gene. Between two adjacent signature genes, a certain number of non-signature genes can be inserted. We have developed a Python program (CGC-Finder) that can automatically identify CGCs (3). In the dbCAN2 job submission page, we provide the ‘Find CAZyme gene clusters’ option. When users submit a protein query file, they must also provide a gene position file in order to predict CGCs. This gene position file is not required if users submit a nucleotide query file, because the gene prediction programs can generate the gene position file internally. With protein sequences, our server will predict TFs and TCs by DIAMOND search against TF and TC databases (explained in (3)), and then CGC-Finder will be called to locate genes of CAZymes, TFs, TCs in the genome, and identify CGCs.

Web design

For the job submission page, we have options to allow users to specify if they would: (i) use one of the three tools or all three tools for CAZyme annotation; (ii) use protein or nucleotide sequences as input; (iii) use CGC-Finder to predict CGCs. As shown in Figure 2, if nucleotide sequences are submitted, gene prediction programs will be first called to predict protein-coding genes and then protein sequences will be used for CAZyme annotation. If CGC-Finder option is selected, TFs and TCs will also be predicted and the gene location file will be used to predict CGCs. For the result page (Figure 4), five tabs are shown each with a data table: (i) HMMER result table; (ii) DIAMOND result table; (iii) Hotpep result table; (iv) Overview table; (v) CGC-Finder table. Above the tabs, a Venn diagram is shown to illustrate the overlaps among the outputs of the three tools (Figure 4A). Click on any numbers in the diagram will open a pop-out window displaying the protein IDs in that region.

Figure 4.

Screenshots of dbCAN2 result pages. (A) Venn diagram to show overlaps among the results of the three tools; (B) CGC-Finder result tab; (C) Overview tab combining results from the three tools and SignalP; (D) genomic location plot of an example CGC (signature genes are in red, green and blue colors, while non-signature genes are in gray); (E) detailed information of an example CGC. The Overview tab combines the results of the three CAZyme annotation tools plus SignalP (27) prediction result (Figure 4C). The number of tools that find a CAZyme protein is also shown as a column, in addition to the CAZyme family assignment (for DIAMOND and Hotpep) and domain assignment (for HMMER). Users can sort the Table according to the number of tools column and easily filter out proteins found by only one tool to get the most accurate CAZyme list. The CGC-Finder tab presents the CGCs identified in the query genome/proteome, with columns such as the genomic locations of the CGC and the three classes of signature genes in the CGCs (Figure 4B). The default parameters in running CGC-Finder include: (i) at least one CAZyme and one TC genes and (ii) the number of non-signature genes that are allowed to be inserted between two adjacent signature genes is ≤2. The two parameters can be changed underneath the CGC table to rerun CGC-Finder and then the CGC-Finder tab will be updated to display the new CGC list. Clicking on each CGC opens a new page showing the CGC genomic context plot using GCPU (gene cluster plotting utility), a Python script we developed to plot the genes in the CGCs as arrows in different colors (Figure 4D). Below the plot is a Table (Figure 4E), which shows the detailed genomic location of each member gene in the CGC, including the distance of a signature gene from its upstream signature gene (Upstream distance) and the distance from its downstream signature gene (Downstream distance), as well as their best DIAMOND hits in the CAZy, TF and TC databases. In all the five tabs and the individual CGC page, links to tab-delimited plain text files are provided for users to conveniently download and open in their local computers using Excel spreadsheet for further analysis. The Venn diagram and the CGC plot can also be downloadable as image files (e.g. SVG and PDF) and further edited by the users using Illustrator. Lastly, we also provide a web page for each CAZyme protein to plot its dbCAN domains and PPR conserved peptides in the sequence. We also allow users to download a master script to run all tools as well as the CGC-Finder program on their local computers.

CONCLUSIONS

dbCAN2 is a web server for automated carbohydrate-active enzyme annotation. It is an updated version of the original dbCAN web server, and has the following new features: dbCAN2 allows submission of nucleotide sequences: genomic sequences of prokaryotic draft genomes and metagenomes; dbCAN2 integrates three state-of-the-art tools/databases for automated CAZyme annotation: (i) HMMER for annotated CAZyme domain boundaries determination according to the dbCAN CAZyme domain HMM database; (ii) DIAMOND for fast Blast hits in the CAZy database; (iii) Hotpep for short conserved motifs in the PPR library; dbCAN2 can also identify transcription factors (TFs), transporters (TCs), and further CAZyme gene clusters (CGCs) using CGC-Finder if users submit protein sequences plus gene location files or genomic DNA sequence file; dbCAN2 combines the results from the three tools and allows visualization of the overlaps as Venn diagram and the detailed results as graphs. dbCAN2 meta server will be updated once a year to use the most updated CAZy database, dbCAN HMM database and Hotpep peptide database. Click here for additional data file.

25 in total

1. SignalP 4.0: discriminating signal peptides from transmembrane regions.

Authors: Thomas Nordahl Petersen; Søren Brunak; Gunnar von Heijne; Henrik Nielsen
Journal: Nat Methods Date: 2011-09-29 Impact factor: 28.547

Review 2. Polysaccharide Degradation by the Intestinal Microbiota and Its Influence on Human Health and Disease.

Authors: Darrell W Cockburn; Nicole M Koropatkin
Journal: J Mol Biol Date: 2016-07-06 Impact factor: 5.469

3. Functional genomic and metabolic studies of the adaptations of a prominent adult human gut symbiont, Bacteroides thetaiotaomicron, to the suckling period.

Authors: Magnus K Bjursell; Eric C Martens; Jeffrey I Gordon
Journal: J Biol Chem Date: 2006-09-12 Impact factor: 5.157

4. CAZymes Analysis Toolkit (CAT): web service for searching and analyzing carbohydrate-active enzymes in a newly sequenced organism using CAZy database.

Authors: Byung H Park; Tatiana V Karpinets; Mustafa H Syed; Michael R Leuze; Edward C Uberbacher
Journal: Glycobiology Date: 2010-08-09 Impact factor: 4.313

5. FragGeneScan: predicting genes in short and error-prone reads.

Authors: Mina Rho; Haixu Tang; Yuzhen Ye
Journal: Nucleic Acids Res Date: 2010-08-30 Impact factor: 16.971

6. dbCAN: a web resource for automated carbohydrate-active enzyme annotation.

Authors: Yanbin Yin; Xizeng Mao; Jincai Yang; Xin Chen; Fenglou Mao; Ying Xu
Journal: Nucleic Acids Res Date: 2012-05-29 Impact factor: 16.971

7. MOCAT2: a metagenomic assembly, annotation and profiling framework.

Authors: Jens Roat Kultima; Luis Pedro Coelho; Kristoffer Forslund; Jaime Huerta-Cepas; Simone S Li; Marja Driessen; Anita Yvonne Voigt; Georg Zeller; Shinichi Sunagawa; Peer Bork
Journal: Bioinformatics Date: 2016-04-08 Impact factor: 6.937

8. PULDB: the expanded database of Polysaccharide Utilization Loci.

Authors: Nicolas Terrapon; Vincent Lombard; Élodie Drula; Pascal Lapébie; Saad Al-Masaudi; Harry J Gilbert; Bernard Henrissat
Journal: Nucleic Acids Res Date: 2018-01-04 Impact factor: 16.971

9. PlantCAZyme: a database for plant carbohydrate-active enzymes.

Authors: Alexander Ekstrom; Rahil Taujale; Nathan McGinn; Yanbin Yin
Journal: Database (Oxford) Date: 2014-08-14 Impact factor: 3.451

10. SACCHARIS: an automated pipeline to streamline discovery of carbohydrate active enzyme activities within polyspecific families and de novo sequence datasets.

Authors: Darryl R Jones; Dallas Thomas; Nicholas Alger; Ata Ghavidel; G Douglas Inglis; D Wade Abbott
Journal: Biotechnol Biofuels Date: 2018-02-05 Impact factor: 6.040

399 in total

1. Complete genome sequencing and comparative CAZyme analysis of Rhodococcus sp. PAMC28705 and PAMC28707 provide insight into their biotechnological and phytopathogenic potential.

Authors: Nisha Ghimire; So-Ra Han; Byeollee Kim; Sang-Hee Jung; Hyun Park; Jun Hyuck Lee; Tae-Jin Oh
Journal: Arch Microbiol Date: 2021-01-18 Impact factor: 2.552

2. Fiber-associated spirochetes are major agents of hemicellulose degradation in the hindgut of wood-feeding higher termites.

Authors: Gaku Tokuda; Aram Mikaelyan; Chiho Fukui; Yu Matsuura; Hirofumi Watanabe; Masahiro Fujishima; Andreas Brune
Journal: Proc Natl Acad Sci U S A Date: 2018-11-30 Impact factor: 11.205

3. Spatially remote motifs cooperatively affect substrate preference of a ruminal GH26-type endo-β-1,4-mannanase.

Authors: Fernanda Mandelli; Mariana Abrahão Bueno de Morais; Evandro Antonio de Lima; Leane Oliveira; Gabriela Felix Persinoti; Mário Tyago Murakami
Journal: J Biol Chem Date: 2020-03-05 Impact factor: 5.157

4. A Need for Improved Cellulase Identification from Metagenomic Sequence Data.

Authors: Rebecca Co; Laura A Hug
Journal: Appl Environ Microbiol Date: 2020-12-17 Impact factor: 4.792

5. Division of labor in honey bee gut microbiota for plant polysaccharide digestion.

Authors: Hao Zheng; Julie Perreau; J Elijah Powell; Benfeng Han; Zijing Zhang; Waldan K Kwong; Susannah G Tringe; Nancy A Moran
Journal: Proc Natl Acad Sci U S A Date: 2019-11-27 Impact factor: 11.205

6. Genome-Centric Metagenomic Insights into the Impact of Alkaline/Acid and Thermal Sludge Pretreatment on the Microbiome in Digestion Sludge.

Authors: Zhiwei Liang; Jiangjian Shi; Chen Wang; Junhui Li; Dawei Liang; Ee Ling Yong; Zhili He; Shanquan Wang
Journal: Appl Environ Microbiol Date: 2020-11-10 Impact factor: 4.792

7. A complete and flexible workflow for metaproteomics data analysis based on MetaProteomeAnalyzer and Prophane.

Authors: Henning Schiebenhoefer; Kay Schallert; Bernhard Y Renard; Kathrin Trappe; Emanuel Schmid; Dirk Benndorf; Katharina Riedel; Thilo Muth; Stephan Fuchs
Journal: Nat Protoc Date: 2020-08-28 Impact factor: 13.491

8. Comparative genome analyses suggest a hemibiotrophic lifestyle and virulence differences for the beech bark disease fungal pathogens Neonectria faginata and Neonectria coccinea.

Authors: Catalina Salgado-Salazar; Demetra N Skaltsas; Tunesha Phipps; Lisa A Castlebury
Journal: G3 (Bethesda) Date: 2021-04-15 Impact factor: 3.154

9. Comparative genomic analysis reveals metabolic diversity of different Paenibacillus groups.

Authors: Wen-Cong Huang; Yilun Hu; Gengxin Zhang; Meng Li
Journal: Appl Microbiol Biotechnol Date: 2020-10-31 Impact factor: 4.813

10. Genome mining Streptomyces sp. KCTC 0041BP as a producer of dihydrochalcomycin.

Authors: Chung Thanh Nguyen; Adzemye Fovennso Bridget; Van Thuy Thi Pham; Hue Thi Nguyen; Tae-Su Kim; Jae Kyung Sohng
Journal: Appl Microbiol Biotechnol Date: 2021-06-17 Impact factor: 4.813