| Literature DB >> 18940860 |
John Martin1, Sahar Abubucker, Todd Wylie, Yong Yin, Zhengyuan Wang, Makedonka Mitreva.
Abstract
Nematode.net (http://nematode.net) is a publicly available resource dedicated to the study of parasitic nematodes. In 2000, the Genome Center at Washington University (GC) joined a consortium including the Nematode Genomics group in Edinburgh, and the Pathogen Sequencing Unit of the Sanger Institute to generate expressed sequence tags (ESTs) as an inexpensive and efficient solution for gene discovery in parasitic nematodes. As of 2008 the GC, sampling key parasites of humans, animals and plants, has generated over 500,000 ESTs and 1.2 million genome survey sequences from more than 30 non-Caenorhabditis elegans nematodes. Nematode.net was implemented to offer user-friendly access to data produced by this project. In addition to sequence data, the site hosts: assembled NemaGene clusters in GBrowse views characterizing composition and protein homology, functional Gene Ontology annotations presented via the AmiGO browser, KEGG-based graphical display of NemaGene clusters mapped to metabolic pathways, codon usage tables, NemFam protein families which represent conserved nematode-restricted coding sequences not found in public protein databases, a web-based WU-BLAST search tool that allows complex querying and other assorted resources. The primary aim of Nematode.net is the dissemination of this diverse collection of information to the broader scientific community in a way that is useful, consistent, centralized and enduring.Entities:
Mesh:
Substances:
Year: 2008 PMID: 18940860 PMCID: PMC2686480 DOI: 10.1093/nar/gkn744
Source DB: PubMed Journal: Nucleic Acids Res ISSN: 0305-1048 Impact factor: 16.971
Growth of data hosted by Nematode.net over the last 4 years
| 2004 | 2008 | |
|---|---|---|
| EST sequencing totals | 215 127 | 509 161 |
| GSS sequencing totals | 0 | 1 208 119 |
| Gene Ontology classifications | 7 species | 29 species |
| Codon usage tables | 2 species | 30 species |
| NemaGene clusters | 12 158 | 118 770 |
| GSS derived genes | 0 | 32 952 |
Figure 1.NemaPath pathway view. (A) In this case we are looking at a section of the Cysteine metabolism pathway. Green shaded boxes show gene products that have been putatively identified in the current organism by primary sequence homology to a member of KEGG's genes database with that same EC assignment. (B) Comparative view highlighting populated gene products for two user-selected species, (yellow and blue) on the same section of this pathway map. (C) Phylogeny-based comparison highlighting clades that have gene products putatively encoding the specific ECs. The phylogeny is based on reference 9.
Figure 2.NemFam GBrowse screenshot. In this example, details for the family NF_203_1015 are displayed. The chosen track is displaying all available features compressed together (feature specific tracks are also available). Also, some basic information about the family is printed near the top of the page.
Figure 3.This screenshot displays a typical annotation of a parasitic nematode genome via GBrowse. In this case we are looking at the annotation of a Trichuris suis contig made by the MAKER software application.
Figure 4.This flowchart depicts the movement of data from origination, through analyses, to its final display in the planned version of Nematode.net. Data generation begins with either genomic (GSS, WGS) or transcriptomic (EST, cDNA) sequence data that are assembled, and then run through a robust process of gene identification. Once the proteome is determined, various structural annotations and functional classifications are computed, and an exhaustive enzymatic pathway reconstruction will be performed with the initial focus on metabolic pathways, but eventually to include other pathways. This information is then spooled into a relational database whose schema supports multiple analysis and species centric views. As much as possible, views that are expected to be in high demand will be precomputed to keep user requests for information responsive and fast. On top of this layer, a robust query engine will support data mining on a large number of attributes, including gene name, EC number, metabolic pathway, ontology term and/or by cluster/contig/read name. Advanced queries also allow data mining based on valid combinations of the above-mentioned attributes.