Literature DB >> 17090595

BeetleBase: the model organism database for Tribolium castaneum.

Liangjiang Wang1, Suzhi Wang, Yonghua Li, Martin S R Paradesi, Susan J Brown.   

Abstract

BeetleBase (http://www.bioinformatics.ksu.edu/BeetleBase/) is an integrated resource for the Tribolium research community. The red flour beetle (Tribolium castaneum) is an important model organism for genetics, developmental biology, toxicology and comparative genomics, the genome of which has recently been sequenced. BeetleBase is constructed to integrate the genomic sequence data with information about genes, mutants, genetic markers, expressed sequence tags and publications. BeetleBase uses the Chado data model and software components developed by the Generic Model Organism Database (GMOD) project. This strategy not only reduces the time required to develop the database query tools but also makes the data structure of BeetleBase compatible with that of other model organism databases. BeetleBase will be useful to the Tribolium research community for genome annotation as well as comparative genomics.

Entities:  

Mesh:

Year:  2006        PMID: 17090595      PMCID: PMC1669707          DOI: 10.1093/nar/gkl776

Source DB:  PubMed          Journal:  Nucleic Acids Res        ISSN: 0305-1048            Impact factor:   16.971


INTRODUCTION

The red flour beetle (Tribolium castaneum) provides an excellent genetic model system for Coleoptera, the largest and most diverse order of eukaryotic organisms. Coleoptera includes many economically important species of crop pests causing major agricultural losses. Similar to Drosophila in the order Diptera, Tribolium has characteristics desired in a genetic model organism including ease of culture, short generation time, large brood sizes and efficacy of genetic manipulation. The potential of Tribolium for genetic analysis has been demonstrated through classical mutational studies (1,2), whole-genome molecular mapping (3) and RNA interference (4–6). Molecular genetic and genomic studies in Tribolium have been greatly facilitated by the recent completion of the genome sequence at the Human Genome Sequencing Center, Baylor College of Medicine. The genome sequence is currently being annotated by the Tribolium research community. In addition, large sets of expressed sequence tags (ESTs) have been generated from stage- and tissue-specific cDNA libraries by members of the Tribolium genome consortium. The sequence data provide useful information for identifying and characterizing the organization and function of beetle genes as well as their orthologues in other insect species. In particular, Tribolium is probably the most efficient model system for performing functional analysis of genes lost in the Drosophila lineage but conserved in other insects. Beetles (Coleoptera) and flies (Diptera) diverged close to 300 million years ago (7). Although Coleoptera is considered to occupy a basal phylogenetic position, Diptera is one of the most advanced insect orders and there is evidence that gene sequences in Drosophila may have evolved rapidly (7,8). As genome sequence data become available for Tribolium and other insect species, comparative genomics may reveal the genetic innovations that accompanied the evolution of higher insects. The rapid expansion of genomic research in Tribolium calls for a centralized database resource for data curation and integration. BeetleBase is developed to fill this role by providing searchable interfaces to access a variety of Tribolium data, including sequences, genes, genetic markers, mutants, publications and links to other databases. Various datasets have been collected from different sources and integrated in BeetleBase after curation. Importantly, BeetleBase implements the Chado data model developed by the Generic Model Organism Database (GMOD) project (). This should facilitate the future expansion of BeetleBase to include additional data types (e.g. microarray gene expression data) and enhance its interoperability with other genome databases to support comparative genomics.

DATA ACQUISITION AND ANALYSIS

The current data entries in BeetleBase are summarized in Table 1. The datasets have been collected from public databases and the Tribolium research community. Assembled genomic sequence contigs were obtained from the Human Genome Sequencing Center (HGSC), Baylor College of Medicine. The genome has been sequenced to 7-fold coverage using a whole-genome shotgun approach. Currently, experts of the Tribolium research community are working with HGSC scientists to manually confirm and curate a subset of the predicted genes. We have been participating in the genome annotation, and provided 9162 protein-coding genes that were predicted using the FGENESH program (9). This set of predicted genes are currently stored in BeetleBase, but will be replaced by information transferred from the HGSC at Baylor upon completion of the sequencing project.
Table 1

Data content in BeetleBase (August 2006)

• 2341 genomic sequence contigs (226 contigs with >100 kb)
• 9162 predicted genes with CDS and protein sequences
• 439 GenBank records
• 28785 BAC-end sequences
• 11254 ESTs aligned to genomic sequence contigs
• 424 genetic markers and their mapping results
• 81 mutants with phenotype descriptions
• 423 genetic stocks
• 615 PubMed references
• 18327 Gene Ontology terms
• 956 Sequence Ontology terms
Data content in BeetleBase (August 2006) Tribolium ESTs and complete cDNA sequences were retrieved from NCBI (). These expressed sequences were aligned to the genomic sequence contigs using the BLAST program (10). A Perl program was developed to parse the BLAST search results for meaningful alignments. Both the expressed sequence data and the analytical results are stored in BeetleBase. The same approach has also been used to align bacterial artificial chromosome (BAC)-end sequences to the genomic sequence contigs. Although the BLAST-based approach has worked well for EST and BAC-end sequence alignment to genomic sequences, other available tools (11–13) will also be tested and compared with our approach in the future. Besides the sequence data, genetic markers and mapping results were extracted from a recent publication (3). Information about genetic stocks and descriptions of mutant phenotypes were obtained from the Tribolium Mutant Database (). BeetleBase also stores PubMed references related to Tribolium research. In addition, two sets of controlled vocabulary terms are used, Gene Ontology (GO) terms () for gene functional annotations and Sequence Ontology (SO) terms () for specifying database object types. Use of standard terms for information indexing is an important component of the Chado data model.

SYSTEM DESIGN AND IMPLEMENTATION

The BeetleBase system consists of web interfaces, Perl CGI programs and a relational database. BeetleBase uses the Chado data model () and the MySQL database management system (). CGI programs have been developed to access the database in response to user queries and then generate web pages to present the query results. In addition, GMOD generic software components are used for data visualization (see below). Use of the GMOD software components allowed us to focus on data processing and management. Perl programs have been developed to load the various datasets into BeetleBase. Chado is designed as a modular schema so that new modules can be added for new data types. In Beetlebase, data have been populated in five modules, including the Sequence module for all the sequence-related data, the Genetic module for mutant phenotypes, the Organism module for taxonomic data, the Publication module for PubMed references, and the Controlled Vocabulary module for GO and SO terms. For future expansion of BeetleBase, the Expression module will be used to store microarray gene expression data, and the Companalysis module as well as the Organism module may be needed to support comparative genomics. The database was implemented within six months by a bioinformatics specialist assisted by three part time graduate research assistants.

DATABASE QUERY TOOLS

BeetleBase provides search forms for sequences, genes, genetic markers, mutants, stocks and publications. The search results are summarized in a tabular format. Table entries may be clicked to retrieve more information from BeetleBase. The data entries are also linked to external databases if available. For example, published cDNA sequences are referred to GenBank while Tribolium references are linked to PubMed. The GMOD software tools, GBrowse (14) and CMap (), are used to visualize sequence and mapping data in BeetleBase. As shown in Figure 1, GBrowse is used to provide an integrated view of sequences and genetic markers. The predicted genes, ESTs and genetic markers are aligned to the genomic sequence contig (Contig3587_Contig1062) according to their relative positions, and are clickable for additional information. The GBrowse tool can be queried using sequence or marker identifiers and has been integrated with the BLAST search engine (see below). The graphical representation is useful for genome annotation. CMap is used for browsing the genetic map with information about the linkage groups and map locations of genetic markers.
Figure 1

An integrated view of BeetleBase data using GBrowse. A part of the contig, Contig3587_Contig1062, is shown together with the aligned ESTs, computed genes and genetic markers.

An integrated view of BeetleBase data using GBrowse. A part of the contig, Contig3587_Contig1062, is shown together with the aligned ESTs, computed genes and genetic markers. BeetleBase also provides a BLAST server for searching Tribolium sequences. On the results page of a BLAST search, each hit is linked to the GBrowse view of the sequence. This feature allows non-Tribolium sequences to be mapped on to the Tribolium genome for comparative analysis. Thus, the BLAST server is useful not only for the Tribolium search community but also for other scientists who are interested in identifying the Tribolium homologues for their sequences. In addition, we have set up a FTP site () for downloading various datasets and software programs from BeetleBase.

FUTURE DIRECTIONS

BeetleBase will be continually updated and expanded in the future. We plan monthly updates of the database, depending on the availability of new data. The immediate effort will be to populate the database with a comprehensive set of genes that have been predicted using different software tools and are currently being curated by the Tribolium research community. Additional ESTs that are being generated by the Tribolium genome consortium will soon be available in BeetleBase. The large amount of ESTs will be assembled into non-redundant contigs, and then the EST contigs will be aligned to the genome sequence to assist gene functional annotation. BeetleBase will also be expanded to include microarray gene expression data from the Tribolium search community and genome sequences from other model insects to support comparative genomics. We will utilize the additional modules of the GMOD schema, and where necessary, develop new interfaces and tools to make the information accessible in an effective way.
  14 in total

1.  Parental RNAi in Tribolium (Coleoptera).

Authors:  Gregor Bucher; Johannes Scholten; Martin Klingler
Journal:  Curr Biol       Date:  2002-02-05       Impact factor: 10.834

2.  The generic genome browser: a building block for a model organism system database.

Authors:  Lincoln D Stein; Christopher Mungall; ShengQiang Shu; Michael Caudy; Marco Mangone; Allen Day; Elizabeth Nickerson; Jason E Stajich; Todd W Harris; Adrian Arva; Suzanna Lewis
Journal:  Genome Res       Date:  2002-10       Impact factor: 9.043

3.  The mosquito genome: the post-genomic era opens.

Authors:  Ennio De Gregorio; Bruno Lemaitre
Journal:  Nature       Date:  2002-10-03       Impact factor: 49.962

4.  A computer program for aligning a cDNA sequence with a genomic DNA sequence.

Authors:  L Florea; G Hartzell; Z Zhang; G M Rubin; W Miller
Journal:  Genome Res       Date:  1998-09       Impact factor: 9.043

Review 5.  Gapped BLAST and PSI-BLAST: a new generation of protein database search programs.

Authors:  S F Altschul; T L Madden; A A Schäffer; J Zhang; Z Zhang; W Miller; D J Lipman
Journal:  Nucleic Acids Res       Date:  1997-09-01       Impact factor: 16.971

6.  EST_GENOME: a program to align spliced DNA sequences to unspliced genomic DNA.

Authors:  R Mott
Journal:  Comput Appl Biosci       Date:  1997-08

7.  Genetic linkage maps of the red flour beetle, Tribolium castaneum, based on bacterial artificial chromosomes and expressed sequence tags.

Authors:  Marcé D Lorenzen; Zaldy Doyungan; Joel Savard; Kathy Snow; Lindsey R Crumly; Teresa D Shippy; Jeffrey J Stuart; Susan J Brown; Richard W Beeman
Journal:  Genetics       Date:  2005-04-16       Impact factor: 4.562

8.  Implications of the Tribolium Deformed mutant phenotype for the evolution of Hox gene function.

Authors:  S Brown; M DeCamillis; K Gonzalez-Charneco; M Denell; R Beeman; W Nie; R Denell
Journal:  Proc Natl Acad Sci U S A       Date:  2000-04-25       Impact factor: 11.205

9.  piggyBac-mediated germline transformation in the beetle Tribolium castaneum.

Authors:  M D Lorenzen; A J Berghammer; S J Brown; R E Denell; M Klingler; R W Beeman
Journal:  Insect Mol Biol       Date:  2003-10       Impact factor: 3.585

10.  Genome-wide acceleration of protein evolution in flies (Diptera).

Authors:  Joël Savard; Diethard Tautz; Martin J Lercher
Journal:  BMC Evol Biol       Date:  2006-01-25       Impact factor: 3.260

View more
  34 in total

1.  Evolution of a genomic regulatory domain: the role of gene co-option and gene duplication in the Enhancer of split complex.

Authors:  Elizabeth J Duncan; Peter K Dearden
Journal:  Genome Res       Date:  2010-05-10       Impact factor: 9.043

Review 2.  Genomic resources for invertebrate vectors of human pathogens, and the role of VectorBase.

Authors:  K Megy; M Hammond; D Lawson; R V Bruggner; E Birney; F H Collins
Journal:  Infect Genet Evol       Date:  2008-01-03       Impact factor: 3.342

3.  Conserved intron positions in FGFR genes reflect the modular structure of FGFR and reveal stepwise addition of domains to an already complex ancestral FGFR.

Authors:  Nicole Rebscher; Christina Deichmann; Stefanie Sudhop; Jens Holger Fritzenwanker; Stephen Green; Monika Hassel
Journal:  Dev Genes Evol       Date:  2009-12-17       Impact factor: 0.900

4.  A universal vector concept for a direct genotyping of transgenic organisms and a systematic creation of homozygous lines.

Authors:  Frederic Strobl; Anita Anderl; Ernst Hk Stelzer
Journal:  Elife       Date:  2018-03-15       Impact factor: 8.140

5.  Functional shifts in insect microRNA evolution.

Authors:  Antonio Marco; Jerome H L Hui; Matthew Ronshaugen; Sam Griffiths-Jones
Journal:  Genome Biol Evol       Date:  2010-09-03       Impact factor: 3.416

6.  Design and evaluation of genome-wide libraries for RNA interference screens.

Authors:  Thomas Horn; Thomas Sandmann; Michael Boutros
Journal:  Genome Biol       Date:  2010-06-15       Impact factor: 13.583

7.  CutProtFam-Pred: detection and classification of putative structural cuticular proteins from sequence alone, based on profile hidden Markov models.

Authors:  Zoi S Ioannidou; Margarita C Theodoropoulou; Nikos C Papandreou; Judith H Willis; Stavros J Hamodrakas
Journal:  Insect Biochem Mol Biol       Date:  2014-06-27       Impact factor: 4.714

8.  Next generation transcriptomes for next generation genomes using est2assembly.

Authors:  Alexie Papanicolaou; Remo Stierli; Richard H Ffrench-Constant; David G Heckel
Journal:  BMC Bioinformatics       Date:  2009-12-24       Impact factor: 3.169

9.  BeetleBase in 2010: revisions to provide comprehensive genomic information for Tribolium castaneum.

Authors:  Hee Shin Kim; Terence Murphy; Jing Xia; Doina Caragea; Yoonseong Park; Richard W Beeman; Marcé D Lorenzen; Stephen Butcher; J Robert Manak; Susan J Brown
Journal:  Nucleic Acids Res       Date:  2009-10-09       Impact factor: 16.971

10.  Why so many unknown genes? Partitioning orphans from a representative transcriptome of the lone star tick Amblyomma americanum.

Authors:  Amanda K Gibson; Zach Smith; Clay Fuqua; Keith Clay; John K Colbourne
Journal:  BMC Genomics       Date:  2013-02-27       Impact factor: 3.969

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.