Literature DB >> 22238270

BESC knowledgebase public portal.

Mustafa H Syed1, Tatiana V Karpinets, Morey Parang, Michael R Leuze, Byung H Park, Doug Hyatt, Steven D Brown, Steve Moulton, Michael D Galloway, Edward C Uberbacher.   

Abstract

UNLABELLED: The BioEnergy Science Center (BESC) is undertaking large experimental campaigns to understand the biosynthesis and biodegradation of biomass and to develop biofuel solutions. BESC is generating large volumes of diverse data, including genome sequences, omics data and assay results. The purpose of the BESC Knowledgebase is to serve as a centralized repository for experimentally generated data and to provide an integrated, interactive and user-friendly analysis framework. The Portal makes available tools for visualization, integration and analysis of data either produced by BESC or obtained from external resources. AVAILABILITY: http://besckb.ornl.gov.

Entities:  

Mesh:

Substances:

Year:  2012        PMID: 22238270      PMCID: PMC3289919          DOI: 10.1093/bioinformatics/bts016

Source DB:  PubMed          Journal:  Bioinformatics        ISSN: 1367-4803            Impact factor:   6.937


1 INTRODUCTION

The United States Department of Energy (DOE) initiative to increase the production of renewable biofuels such as cellulosic ethanol has led researchers to explore various biomass feedstocks and microbes for alternative fuel solutions (Farrell, 2006; Gnansounou and Dauriat, 2010; Rubin, 2008; Stephanopoulos, 2007; Zacchi ). The BioEnergy Science Center (BESC) was established by DOE in 2007 as a multi-institutional partnership driving scientific efforts in this direction. BESC is undertaking large experimental campaigns to understand and mitigate the recalcitrance of biomass for cellulolytic degradation by enzymes and organisms, and to develop multitalented microbes for converting plant biomass into biofuels in a single step (Lynd ). Researchers at BESC are focused on comprehensive and system level understanding of the process of biomass and plant cell wall formation, degradation and biofuel production. As such, the center is generating large volumes of diverse data including genome sequences, many types of omics data and various assay results related to biomass properties, structure and composition. Besides managing data generated by BESC researchers, integration of key reference data such as genome, gene annotations and metabolic annotation for bioenergy relevant organisms is a central component, which provides a context for understanding experiments. In this article, we describe some important computational tools and data available from the BESC Knowledgebase.

2 BESC KNOWLEDGEBASE DATA AND TOOLS

Reference genomic data: the microbial domain of the knowledgebase currently contains 37 microbial genomes. Currently, the KB's reference microbial data consists of over 134 000 protein coding genes. We continually add new organisms of interest, genome annotations for organisms including BLAST hits, domain annotations, protein localization, transcription unit data, enzyme and pathway annotations. The microbial index page provides a set of search interfaces to query the annotations and to download search results. Reference genomic data are also available for each gene through a unified interface, a gene card. The plant domain consists of 21 plant and algal genomes along with a rich set of their annotations including gene structures, protein products, homology-based functional prediction, domain structures, ortholog and paralog prediction, gene ontology, and metabolic and enzymatic pathways. Currently, the KB's reference plant data consists of over 500 000 coding genes from which nearly 400 000 protein coding genes with function prediction have been identified. The database keeps track of available gene model variations and alternative splicing variants. Phenotype comparison toolkit (CBP): this toolkit provides tools to compare microbes across different phenotypes or genetic traits at various levels of organization, such as whole genome, a biological process, enzyme family, domain or sequence level. Comparisons can be made between microbes with different phenotypes such as aerobes and anaerobes, thermophiles and mesophiles. For each of the phenotypes, users can select a group of organisms and compare the groups in terms of their metabolic pathways, enzyme profiles, protein family domains and orthologs. BeoCyc: we have a collection of Pathway/Genome DataBases (PGDBs) for BioEnergy relevant organisms, which we call ‘BeoCyc’. We have reconstructed metabolic pathways using the Pathway Tools software (Karp ). Although the PGDB is generated automatically by the Pathologic program from the Pathway Tools software, annotation of the genome with EC numbers was improved before the reconstruction by using enzyme prediction tools, like KAAS (Moriya ), and by searching for orthologous genes in model organisms. Pathway Tools software provides a diverse set of options to query, visualize information in the databases, overlay experimental data on metabolic maps and pathways and perform comparative analysis. Gbrowse: we have also configured Gbrowse—a popular genome browser (Stein ) for all complete microbial genomes in the BESC knowledgebase. Besides genes, we have operon predictions from BeoCyc/Pathway Tools software (Karp ) and a database for prokaryotic operons (Mao ) available from the browser. SNP/indels from resequencing of an ethanol tolerant Clostridium thermocellum strain (Brown ) is also made available from the genome browser. Users can also upload their own data, such as SNPs/indels from some new strain, and compare it with data from BESC by drawing tracks parallel to default tracks available from the browser. Integration and analysis of omics data from BESC, GEO and ArrayExpress: the BESC Knowledgebase provides tools that allow the user to search for experiments in external resources such as NCBI GEO (Boyle, 2005) using keywords, bring them into the local analysis environment and integrate datasets with genomic data in the BESC Knowledgebase. Additional tools have been developed in the framework for statistical analysis, such as generating interactive scatterplots, heatmaps and mapping experimental data onto pathways. Cazymes Analysis Toolkit: the BESC Knowledgebase also hosts the Cazymes Analysis Toolkit—CAT (Park ), developed and published earlier. This toolkit provides methods to search and annotate CAZymes. This tool has already been used outside BESC to annotate genomes (Kikuchi ). BESC KB Genome Resequencing toolkit: the target for resequencing is usually the genome of a mutant strain with a practically important phenotype evolved by an adaptation of a wild-type strain. The toolkit provides a way to identify genomic modifications underlying the specific phenotype of the mutant and to understand potential biological effects of the mutation. The tool kit consists of the following tools that can be applied either as a pipeline or independently. ‘SNP-indel caller’ finds changes in the genome of the mutant strain and their location given a file with high confidence 454 reads as the input. ‘Mutant protein fasta’ generates a FASTA file of proteins for the mutant strain. ‘Mutant protein CDD’ annotates the proteins with protein family domains using the CDD pipeline for the mutant strain. ‘Mutation Mapper’ annotates all identified changes in the genomic sequence of the mutant strain, its position in the genome, location of the change within an intergenic region or gene, the type of change (synonymous or non-synonymous, multiple changes, insertions or deletions), and amino acids and codons in the M and WT strains. ‘Regulation change predictor’ produces the list of genes with mutational changes, either SNPs or indels, in the upstream intergenic regions. Entire sequence span between coding sequences of two adjacent genes are used without predicting transcription factor binding sites. ‘Function change predictor’ predicts potential changes in the protein function, like its gain or loss. Detail documentation of the toolkit is available from BESCKB website (http://cricket.ornl.gov/html/download/resequencing/ResequencingToolkitDocumentation_16Dec2011.pdf). The tools were applied to analyze the resequencing data for the ethanol adapted strain of C.thermocellum ATCC 27405 (Brown ).

3 CONCLUSIONS

We have made available suite of tools and diverse types of bioenergy relevant data through the BESC public portal. The tools may be especially helpful in integrating variety of data such as genomic, phenotypic, metabolic and experimental data, and gaining comprehensive, system level understanding of cellular processes involved in plant biomass formation, degradation and biofuel production.
  14 in total

1.  The generic genome browser: a building block for a model organism system database.

Authors:  Lincoln D Stein; Christopher Mungall; ShengQiang Shu; Michael Caudy; Marco Mangone; Allen Day; Elizabeth Nickerson; Jason E Stajich; Todd W Harris; Adrian Arva; Suzanna Lewis
Journal:  Genome Res       Date:  2002-10       Impact factor: 9.043

2.  Gene-Expression Omnibus integration and clustering tools in SeqExpress.

Authors:  John Boyle
Journal:  Bioinformatics       Date:  2005-03-03       Impact factor: 6.937

Review 3.  Consolidated bioprocessing of cellulosic biomass: an update.

Authors:  Lee R Lynd; Willem H van Zyl; John E McBride; Mark Laser
Journal:  Curr Opin Biotechnol       Date:  2005-10       Impact factor: 9.740

4.  Ethanol can contribute to energy and environmental goals.

Authors:  Alexander E Farrell; Richard J Plevin; Brian T Turner; Andrew D Jones; Michael O'Hare; Daniel M Kammen
Journal:  Science       Date:  2006-01-27       Impact factor: 47.728

5.  Challenges in engineering microbes for biofuels production.

Authors:  Gregory Stephanopoulos
Journal:  Science       Date:  2007-02-09       Impact factor: 47.728

Review 6.  Genomics of cellulosic biofuels.

Authors:  Edward M Rubin
Journal:  Nature       Date:  2008-08-14       Impact factor: 49.962

7.  Mutant alcohol dehydrogenase leads to improved ethanol tolerance in Clostridium thermocellum.

Authors:  Steven D Brown; Adam M Guss; Tatiana V Karpinets; Jerry M Parks; Nikolai Smolin; Shihui Yang; Miriam L Land; Dawn M Klingeman; Ashwini Bhandiwad; Miguel Rodriguez; Babu Raman; Xiongjun Shao; Jonathan R Mielenz; Jeremy C Smith; Martin Keller; Lee R Lynd
Journal:  Proc Natl Acad Sci U S A       Date:  2011-08-08       Impact factor: 11.205

8.  CAZymes Analysis Toolkit (CAT): web service for searching and analyzing carbohydrate-active enzymes in a newly sequenced organism using CAZy database.

Authors:  Byung H Park; Tatiana V Karpinets; Mustafa H Syed; Michael R Leuze; Edward C Uberbacher
Journal:  Glycobiology       Date:  2010-08-09       Impact factor: 4.313

9.  Genomic insights into the origin of parasitism in the emerging plant pathogen Bursaphelenchus xylophilus.

Authors:  Taisei Kikuchi; James A Cotton; Jonathan J Dalzell; Koichi Hasegawa; Natsumi Kanzaki; Paul McVeigh; Takuma Takanashi; Isheng J Tsai; Samuel A Assefa; Peter J A Cock; Thomas Dan Otto; Martin Hunt; Adam J Reid; Alejandro Sanchez-Flores; Kazuko Tsuchihara; Toshiro Yokoi; Mattias C Larsson; Johji Miwa; Aaron G Maule; Norio Sahashi; John T Jones; Matthew Berriman
Journal:  PLoS Pathog       Date:  2011-09-01       Impact factor: 6.823

10.  KAAS: an automatic genome annotation and pathway reconstruction server.

Authors:  Yuki Moriya; Masumi Itoh; Shujiro Okuda; Akiyasu C Yoshizawa; Minoru Kanehisa
Journal:  Nucleic Acids Res       Date:  2007-05-25       Impact factor: 16.971

View more
  4 in total

1.  Industrial robustness: understanding the mechanism of tolerance for the Populus hydrolysate-tolerant mutant strain of Clostridium thermocellum.

Authors:  Jessica L Linville; Miguel Rodriguez; Miriam Land; Mustafa H Syed; Nancy L Engle; Timothy J Tschaplinski; Jonathan R Mielenz; Chris D Cox
Journal:  PLoS One       Date:  2013-10-21       Impact factor: 3.240

2.  Unique aspects of fiber degradation by the ruminal ethanologen Ruminococcus albus 7 revealed by physiological and transcriptomic analysis.

Authors:  Melissa R Christopherson; John A Dawson; David M Stevenson; Andrew C Cunningham; Shanti Bramhacharya; Paul J Weimer; Christina Kendziorski; Garret Suen
Journal:  BMC Genomics       Date:  2014-12-04       Impact factor: 3.969

Review 3.  Integrated -omics: a powerful approach to understanding the heterogeneous lignification of fibre crops.

Authors:  Gea Guerriero; Kjell Sergeant; Jean-François Hausman
Journal:  Int J Mol Sci       Date:  2013-05-24       Impact factor: 5.923

4.  Global transcriptome analysis of Clostridium thermocellum ATCC 27405 during growth on dilute acid pretreated Populus and switchgrass.

Authors:  Charlotte M Wilson; Miguel Rodriguez; Courtney M Johnson; Stanton L Martin; Tzu Ming Chu; Russ D Wolfinger; Loren J Hauser; Miriam L Land; Dawn M Klingeman; Mustafa H Syed; Arthur J Ragauskas; Timothy J Tschaplinski; Jonathan R Mielenz; Steven D Brown
Journal:  Biotechnol Biofuels       Date:  2013-12-02       Impact factor: 6.040

  4 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.