We report a novel release of the GermOnline knowledgebase covering genes relevant for the cell cycle, gametogenesis and fertility. GermOnline was extended into a cross-species systems browser including information on DNA sequence annotation, gene expression and the function of gene products. The database covers eight model organisms and Homo sapiens, for which complete genome annotation data are available. The database is now built around a sophisticated genome browser (Ensembl), our own microarray information management and annotation system (MIMAS) used to extensively describe experimental data obtained with high-density oligonucleotide microarrays (GeneChips) and a comprehensive system for online editing of database entries (MediaWiki). The RNA data include results from classical microarrays as well as tiling arrays that yield information on RNA expression levels, transcript start sites and lengths as well as exon composition. Members of the research community are solicited to help GermOnline curators keep database entries on genes and gene products complete and accurate. The database is accessible at http://www.germonline.org/.
We report a novel release of the GermOnline knowledgebase covering genes relevant for the cell cycle, gametogenesis and fertility. GermOnline was extended into a cross-species systems browser including information on DNA sequence annotation, gene expression and the function of gene products. The database covers eight model organisms and Homo sapiens, for which complete genome annotation data are available. The database is now built around a sophisticated genome browser (Ensembl), our own microarray information management and annotation system (MIMAS) used to extensively describe experimental data obtained with high-density oligonucleotide microarrays (GeneChips) and a comprehensive system for online editing of database entries (MediaWiki). The RNA data include results from classical microarrays as well as tiling arrays that yield information on RNA expression levels, transcript start sites and lengths as well as exon composition. Members of the research community are solicited to help GermOnline curators keep database entries on genes and gene products complete and accurate. The database is accessible at http://www.germonline.org/.
Molecular biologists and biomedical researchers are confronted with the complex task of processing and interpreting different types of high-throughput genome biological data. Our goal is to facilitate scientific hypothesis building on the basis of comprehensive and accurate knowledgebase entries associated with curated and relevant experimental data. To this end, we have developed the GermOnline cross-species database, an integrated solution for displaying annotated genome sequence information and microarray expression profiling data alongside manually curated knowledge about gene function (1–3). The current release is not a simple update of the previous versions but is based on a novel concept, a different software architecture and an improved graphical user interface. To achieve our aims we harness the extensive capabilities of existing open-source software including the Ensembl genome browser environment (4) with extensions for displaying expression data, the BioMart query system (5) and our own Microarray Information Management and Annotation System (MIMAS) (6). Finally, we implement the MediaWiki online page editing package () originally developed for the community encyclopedia Wikipedia. Wiki has recently been discussed as a possible approach to biological data management by professional curators together with life scientists (7–9). GermOnline and GermOnline Wiki are available without restrictions at .
THE SCOPE OF GERMONLINE
GermOnline focuses on information relevant for the cell cycle and gametogenesis in eight key model organisms and Homo sapiens. Since we now include the genome sequences in the database (as opposed to just storing gene identifiers), we selected nine species for which sequenced and fully annotated genomes are available including six (Saccharomyces cerevisiae, Arabidopsis thaliana, Drosophila melanogaster, Rattus norvegicus, Mus musculus and Homo sapiens) that were investigated in high-density oligonucleotide microarray expression profiling studies. The current version of GermOnline therefore lacks three species covered in earlier releases for which complete and fully annotated genome sequence is currently unavailable (Neurospora crassa, Zea mays, Xenopus laevis).The Ensembl system lends itself well to extension and customization (4). Thus expression data and phylogenetic information could be integrated and displayed while retaining a uniform display. Importantly, moving into a genome-browser environment enabled us to show tiling array data in the context of exon-intron composition at the DNA and RNA level. This is important because tiling arrays contain probes covering the entire genome sequence of a target organism and they are therefore independent of any previous genome annotation. These arrays also detect non-coding RNAs that are difficult or even impossible to trace with classical arrays whose oligonucleotide probes are most often directed against 3′-untranslated sequences or the coding regions of defined genes (10,11).Lastly, researchers are provided with manually curated database entries describing mechanistic knowledge about genes and gene products important for sexual reproduction. Scientists who work on these genes are solicited to update and extend the entries to help keep them accurate and complete. This process is meant to be carried out in cooperation with our professional scientific curators who establish entries, update them and provide online support for users. The system is based on MediaWiki. Following registration, a pre-requisite for online editing, extensive and convenient text editing options are available for community annotation.
HOW TO RETRIEVE INFORMATION
Search
GermOnline is extensively searchable using gene symbols, standard and systematic names as well as various identifiers from UniProt, Ensembl and GenBank. A complex search form, the GermOnline BioMart, allows users to retrieve gene lists on the basis of functional genomic and expression studies. Recently integrated studies include a comparative mammalian meiotic transcriptome analysis based on rodent and human samples (F. Chalmel, A. Rolland, C. Niederhauser-Wiederkehr, S.S. Chung, P. Demougin, A. Gattiker, J. Moore, J.J. Patard, D.J. Wolgemuth, B. Jégou and M. Primig, submitted). Queries are also possible using experiments with budding yeast based on classical arrays (12,13) as well as high-throughput gene deletion studies (see ) (14–17). This useful feature permits searches across different experiments yielding lists of genes with particular expression patterns and phenotypes.
The locus report
Users can call up information on a given locus by using various queries including gene symbols and identification numbers. The report page covers the exon-intron structure, GeneChip probe target sequences as well as extensive graphical display of relevant microarray expression data. Examples of genomic DNA annotation data and GeneChip target sequences (for yeast S98) as well as GeneChip expression data are shown for REC8 from S.cerevisiae (Figure 1) and Rec8L1 from M.musculus (Figure 2). The expression data for the orthologues can be directly compared because they were obtained in budding yeast cells (12) and mammalian germ cells undergoing meiotic differentiation (F. Chalmel, A. Rolland, C. Niederhauser-Wiederkehr, S.S. Chung, P. Demougin, A. Gattiker, J. Moore, J.J. Patard, D.J. Wolgemuth, B. Jégou and M. Primig, submitted; ). In addition, a substantial number of new profiling studies using mouse and rat (18–21) testicular samples as well as sporulating yeast cultures (22) are available. The array experiments are extensively described in a MIAME-compliant manner and users can call up more detailed information by clicking on the sample name (Figure 2,c). The database report page contains links to PubMed, GEO and ArrayExpress, as well as to supporting web portals whenever available, to provide information about the source of array data on display.
Figure 1
Genome annotation and S98 GeneChip expression profiling data for REC8 in budding yeast. (a) The chromosome number is indicated. On a genomic map, REC8 and its neighbouring genes as annotated by SGD, and the gene-specific S98 array probe target sequences (OLIGO YG-S98) are shown in red and green, respectively. (b) Normalized linear (or log2 transformed) expression signals obtained with S98 arrays and samples from budding yeast growing in rich medium with acetate as the sole carbon source (YPA) and sporulation medium at different time points as indicated, are shown in blue and green, respectively, in a bar diagram. Expression signals of individual genes are correlated with all other signals on the array via the percentile scale. The dotted red line indicates an empirical signal threshold level for background noise.
Figure 2
Genome annotation and MG 430 2.0 GeneChip expression profiling data for Rec8L1 in mouse. The chromosome number is indicated. (a) On a genomic map, Rec8L1 and its neighbouring genes as annotated by Ensembl, and the gene-specific MG 430 2.0 array probe target sequences (OLIGO Mouse430_2) are shown in red and green, respectively. (b) Normalized linear (or log2 transformed) expression signals obtained with MG430 2.0 arrays and testicular mouse samples as indicated are shown in a bar diagram. Expression signals of individual genes are correlated with all other signals on the array via the percentile scale. The dotted red line indicates an empirical signal threshold level for background noise.
Genome annotation and S98 GeneChip expression profiling data for REC8 in budding yeast. (a) The chromosome number is indicated. On a genomic map, REC8 and its neighbouring genes as annotated by SGD, and the gene-specific S98 array probe target sequences (OLIGO YG-S98) are shown in red and green, respectively. (b) Normalized linear (or log2 transformed) expression signals obtained with S98 arrays and samples from budding yeast growing in rich medium with acetate as the sole carbon source (YPA) and sporulation medium at different time points as indicated, are shown in blue and green, respectively, in a bar diagram. Expression signals of individual genes are correlated with all other signals on the array via the percentile scale. The dotted red line indicates an empirical signal threshold level for background noise.Genome annotation and MG 430 2.0 GeneChip expression profiling data for Rec8L1 in mouse. The chromosome number is indicated. (a) On a genomic map, Rec8L1 and its neighbouring genes as annotated by Ensembl, and the gene-specific MG 430 2.0 array probe target sequences (OLIGO Mouse430_2) are shown in red and green, respectively. (b) Normalized linear (or log2 transformed) expression signals obtained with MG430 2.0 arrays and testicular mouse samples as indicated are shown in a bar diagram. Expression signals of individual genes are correlated with all other signals on the array via the percentile scale. The dotted red line indicates an empirical signal threshold level for background noise.The report page now includes data from novel tiling arrays whose probes cover the entire genome sequence of a target organism such as budding yeast. A typical signal from mitotically growing yeast is shown for ACT1 whose exons are detected by the array with remarkable accuracy (Figure 3) (10).
Figure 3
Genome annotation and tiling array expression profiling data for ACT1 in budding yeast. The chromosome number is indicated. (a) On a genomic map, the two exons in ACT1 and the tiling array target sequences (TILING Scerevisiae_tlg) are shown in red and light green, respectively. Normalized and transformed expression signals obtained with tiling arrays and samples from budding yeast growing in rich medium (YPD) are shown for two different RNA isolation protocols as indicated (25).
Genome annotation and tiling array expression profiling data for ACT1 in budding yeast. The chromosome number is indicated. (a) On a genomic map, the two exons in ACT1 and the tiling array target sequences (TILING Scerevisiae_tlg) are shown in red and light green, respectively. Normalized and transformed expression signals obtained with tiling arrays and samples from budding yeast growing in rich medium (YPD) are shown for two different RNA isolation protocols as indicated (25).GermOnline report pages now also include phylogenetic data from the Ensembl Compara database, and from the Orthologous Matrix Project (OMA, ) that aims at building phylogenetic trees of species using a conservative approach to identify ‘true’ orthologs (23,24). This feature provides efficient cross-connection between locus report pages of conserved genes from different species represented in the database. Users can therefore quickly call up genome annotation information, expression data and (where available), manually curated data on related genes that may well fulfill similar functions across species.
ONLINE ESTABLISHING AND EDITING OF DATABASE ENTRIES
Establishing a new entry
The GermOnline Wiki provides the research communities working on the cell cycle, gametogenesis and gamete function with a comprehensive online editing system to write database entries that cover relevant genes across species (). Our goal is to build entries that are complete and accurate and that can be easily updated. Scientists are solicited to edit, amend and correct pages as they deem appropriate, using free text and references to appropriate peer-reviewed publications. We provide researchers with the MediaWiki system that is also used by the highly successful online information source Wikipedia (). This community resource was recently found by experts to be as reliable as Encyclopaedia Britannica as far as scientific topics are concerned (8). Moreover, Wiki-like approaches to knowledge management in biomedical sciences are currently being discussed as a way of motivating the research community to participate in genome annotation (7,9). Registered users of the GermOnline Wiki can establish new pages for genes represented in the database for which no community annotation is available as yet. This is possible via the locus report page by following the link ‘contribute your knowledge to GermOnline’ located in the ‘Community annotation’ section. GermOnline Wiki prompts users to register via an extremely brief process and to log in, leading them to an empty text field. The information content of a page consists of free text and images or figures that can be copied from published papers. Most journals do not own the copyright and allow re-publication of images as long as they are appropriately referenced. In other cases, GermOnline has obtained permission to re-publish images.
Editing an existing entry
The GermOnline report page includes a prominent community annotation window enabling users to browse through the corresponding GermOnline Wiki entry. Scientists can edit the entry by following the ‘enlarge window’ link. Examples for REC8 from S.cerevisiae (Figure 4, a and b) and Rec8L1 from M.musculus are given (Figure 4c). Several hundred entries on genes that were written using the previous submission/curation system, were migrated into the new environment and can be accessed and updated. A history page enables users to trace changes and authors who made them and extensive online help is available via the Wiki pages. Any edited page is visited by GermOnline curators within one business day for verification purposes and to prevent vandalism.
Figure 4
Gene annotation using GermOnline Wiki. (a) Examples for an insert into the locus report page and (b) full-size Wiki pages covering REC8 from yeast and (c) Rec8L1 from mouse are shown.
Gene annotation using GermOnline Wiki. (a) Examples for an insert into the locus report page and (b) full-size Wiki pages covering REC8 from yeast and (c) Rec8L1 from mouse are shown.
FUTURE DEVELOPMENT
We intend to integrate relevant classical and tiling microarray data as they get published for all species represented in the database. Furthermore, in coordination with the ongoing development of MIMAS, we will include genome-wide protein-DNA interaction data (ChIP-Chip assays) and, where available, high-throughput data on protein structures and protein interaction networks as well as pathways.
CONCLUSION
We report the development of an innovative approach to knowledge management that combines the advanced capabilities of specific solutions for genome browsing (Ensembl), phylogenetic analysis (OMA), microarray data annotation (MIMAS) and online editing of database entries (MediaWiki). Based on our previous experience with GermOnline we conclude that any significant input from life scientists requires an efficient and flexible system that permits researchers to do what they are trained to do: provide a concise and precise summary of mechanistic knowledge established for a given gene product using free text, figures and original references. Since time is a critical factor it is crucial that any online editing system allows easy access and does not require extensive usage of complex forms. MediaWiki as implemented in GermOnline is an ideal solution for this purpose. The time for large-scale community annotation involving regular contributions by most, if not all scientists may not yet have come. However, we believe that it is important to provide appropriate prototype tools early on to help change the way biological and biomedical researchers produce and disseminate knowledge in the not too distant future.
Authors: Roy M Williams; Michael Primig; Brian K Washburn; Elizabeth A Winzeler; Michel Bellis; Cyril Sarrauste de Menthiere; Ronald W Davis; Rochelle E Esposito Journal: Proc Natl Acad Sci U S A Date: 2002-10-07 Impact factor: 11.205
Authors: M Primig; C Wiederkehr; R Basavaraj; C Sarrauste de Menthière; L Hermida; R Koch; U Schlecht; H G Dickinson; M Fellous; J A Grootegoed; R S Hawley; B Jégou; B Maro; A Nicolas; T Orr-Weaver; T Schedl; A Villeneuve; D J Wolgemuth; M Yamamoto; D Zickler; N Lamb; R E Esposito Journal: Nat Genet Date: 2003-12 Impact factor: 38.330
Authors: C Wiederkehr; R Basavaraj; C Sarrauste de Menthière; L Hermida; R Koch; U Schlecht; A Amon; S Brachat; M Breitenbach; P Briza; S Caburet; M Cherry; R Davis; A Deutschbauer; H G Dickinson; T Dumitrescu; M Fellous; A Goldman; J A Grootegoed; R Hawley; R Ishii; B Jégou; R J Kaufman; F Klein; N Lamb; B Maro; K Nasmyth; A Nicolas; T Orr-Weaver; P Philippsen; C Pineau; K P Rabitsch; V Reinke; H Roest; W Saunders; M Schröder; T Schedl; M Siep; A Villeneuve; D J Wolgemuth; M Yamamoto; D Zickler; R E Esposito; M Primig Journal: Nucleic Acids Res Date: 2004-01-01 Impact factor: 16.971
Authors: Dipanwita Roy Choudhury; Chris Small; Yufeng Wang; Paul R Mueller; Vivienne I Rebel; Michael D Griswold; John R McCarrey Journal: Biol Reprod Date: 2010-07-14 Impact factor: 4.285
Authors: Aurélie Lardenois; Yuchen Liu; Thomas Walther; Frédéric Chalmel; Bertrand Evrard; Marina Granovskaia; Angela Chu; Ronald W Davis; Lars M Steinmetz; Michael Primig Journal: Proc Natl Acad Sci U S A Date: 2010-12-13 Impact factor: 11.205
Authors: Elisa Varela; Ulrich Schlecht; Anca Moina; James D Fackenthal; Brian K Washburn; Christa Niederhauser-Wiederkehr; Monika Tsai-Pflugfelder; Michael Primig; Susan M Gasser; Rochelle E Esposito Journal: Genetics Date: 2010-04-20 Impact factor: 4.562
Authors: Frédéric Chalmel; Antoine D Rolland; Christa Niederhauser-Wiederkehr; Sanny S W Chung; Philippe Demougin; Alexandre Gattiker; James Moore; Jean-Jacques Patard; Debra J Wolgemuth; Bernard Jégou; Michael Primig Journal: Proc Natl Acad Sci U S A Date: 2007-05-02 Impact factor: 11.205