Literature DB >> 22058131

959 Nematode Genomes: a semantic wiki for coordinating sequencing projects.

Sujai Kumar1, Philipp H Schiffer, Mark Blaxter.   

Abstract

Genome sequencing has been democratized by second-generation technologies, and even small labs can sequence metazoan genomes now. In this article, we describe '959 Nematode Genomes'--a community-curated semantic wiki to coordinate the sequencing efforts of individual labs to collectively sequence 959 genomes spanning the phylum Nematoda. The main goal of the wiki is to track sequencing projects that have been proposed, are in progress, or have been completed. Wiki pages for species and strains are linked to pages for people and organizations, using machine- and human-readable metadata that users can query to see the status of their favourite worm. The site is based on the same platform that runs Wikipedia, with semantic extensions that allow the underlying taxonomy and data storage models to be maintained and updated with ease compared with a conventional database-driven web site. The wiki also provides a way to track and share preliminary data if those data are not polished enough to be submitted to the official sequence repositories. In just over a year, this wiki has already fostered new international collaborations and attracted newcomers to the enthusiastic community of nematode genomicists. www.nematodegenomes.org.

Entities:  

Mesh:

Year:  2011        PMID: 22058131      PMCID: PMC3245058          DOI: 10.1093/nar/gkr826

Source DB:  PubMed          Journal:  Nucleic Acids Res        ISSN: 0305-1048            Impact factor:   16.971


INTRODUCTION

The nematode Caenorhabditis elegans was the first animal to have its genome completely sequenced in 1998 (1). Since then, second-generation sequencing technologies have revolutionized and democratized the field of genome sequencing. Even small labs can now sequence their favourite nematodes in a few weeks for a few thousand dollars. By 2012, we anticipate that more than 100 nematode genomes will be sequenced, a happy state of affairs for those of us who study this most abundant and diverse Metazoan phylum. The only problem with rapid and inexpensive sequencing is that it is becoming harder to keep track of which genomes are being sequenced, who is sequencing them, what stage the genome projects are at, and where one can get early access to the data. The nucleotide sequence archives (GenBank/EMBL/DDBJ) (2) are the de facto storehouses for complete and published genomes. However, as the bottleneck of a genome project has shifted from sequencing to analysis, which can take months, it has become imperative to have a place to share information about the project before it is published. Inspired by ArthropodBase (www.arthropodgenomes.org), the 959 Nematode Genomes (959NG) wiki was created in early 2010 to meet this need and can be accessed at www.nematodegenomes.org. 959NG is unlike existing genome and transcriptome database web sites such as WormBase (3) and NemBase (4) because, instead of storing the relationships between genes, proteins and DNA sequences, it stores the relationships between people, institutions and sequencing projects at various stages of completion. The goal is to connect users, and make it easy for them to form collaborations and share data. The platform choice reflects this goal as we describe in the ‘Software’ section.

Why (Only) 959NG?

Unlike the 1000 Human Genomes (www.1000genomes.org) or Genome 10 K (genome10k.soe.ucsc.edu) sequencing projects, the effort to sequence as many nematodes as possible is a distributed, bottom-up enterprise. We picked 959 as an initial target because all adult female hermaphrodite C. elegans have exactly 959 somatic cells. The definition of the embryonic lineage of C. elegans from fertilized zygote to fertile adult was a milestone in C. elegans developmental biology. Just as the tree of the C. elegans embryonic lineage was a key underpinning of later work on this model nematode, we hope that a nematode phylogeny with 959 genome-sequenced taxa will underpin the investigation of nematode biology in general. Obviously, we do not limit the vision to these few genomes: with 23 000 species described, and an estimated 1–2 million species undescribed, the scope for genomic exploration of Nematoda is vast.

FEATURES

959NG is a wiki and thus very easy for end-users to edit and interact with. As it is based on the Semantic MediaWiki (SMW) platform, it also allows pages to store properties and relationships to other pages. These properties and relationships can be queried by anyone.

Editable Taxonomy

We offer a view of the taxonomy of the phylum Nematoda, pre-loaded with all species that have data present in EMBL/GenBank/DDBJ. Clicking on any node in the taxonomic tree of nematodes shows the sequencing status of all species below that taxon. Each node also provides links to the NCBI page for that taxon and the Expressed Sequence Tags (ESTs) available for any species within that taxon (Figure 1). The initial tree was populated using the NCBI taxonomy (www.ncbi.nlm.nih.gov/taxonomy) but the more widely used Blaxter clades (5) and Helder clades (6) were easy to incorporate into the tree because of the SMW architecture. Users can add new species. See the ‘Software’ section for more details.
Figure 1.

Systematic tree of Nematoda, with a few taxonomic nodes expanded to show how the Blaxter and Helder classifications were incorporated into the tree.

Systematic tree of Nematoda, with a few taxonomic nodes expanded to show how the Blaxter and Helder classifications were incorporated into the tree.

Species and Strain Information

For each species, several pieces of information are stored and displayed, such as a short description, its NCBI taxonomic identifier, a picture, as well as some facts about genome size and nucleotide frequency, if known. Species pages also store names of people interested in that species. Each species can have one or more strains with a genome and transcriptome sequencing status that includes links to the funding bodies and the sequencing centres contributing to the sequencing projects (Figure 2).
Figure 2.

Species page for C. elegans displaying information for the species as well as the status of strains that have been sequenced.

Species page for C. elegans displaying information for the species as well as the status of strains that have been sequenced. All page properties are stored internally as Resource Description Framework (RDF) triples which are expressions with three parts: subject, predicate and object. An example of an RDF triple is ‘Brugia malayi TRS: Strain genome status: Published’. Although some properties are integer or text values, other properties define relationships to pages, such as ‘Trichinella spiralis: Has interested party: Makedonka Mitreva’ which links to a person page.

Persons and Organizations

Because the main goal of 959NG is to connect users, people and organization pages are as important as species pages. These pages store personal and institutional URLs, contact information as well as relationships to the species such as ‘is genome contact for’ and ‘is interested in species’.

Queries

SMW sites allow users to add new properties that the original web site creators may not have thought of. These properties and relationships can be queried to generate useful dynamic tables. Using the species, strain, people and organization properties, any user can create queries to collate and display information. The following queries are already implemented and linked to from the home page as potentially useful starting points: species with published genomes; species with genomes being sequenced; and species for which sequencing has been proposed. In addition, clicking on a node in the taxonomic tree displays the result of the query ‘Species under this taxon that have their sequencing status set to anything other than “None”’ (Figure 3).
Figure 3.

Page for the taxonomic node Enoplia, showing NCBI Taxonomy and NCBI EST link-outs, as well as the results of the query ‘Species and strains under this taxon with their sequencing status set to anything other than “None”’.

Page for the taxonomic node Enoplia, showing NCBI Taxonomy and NCBI EST link-outs, as well as the results of the query ‘Species and strains under this taxon with their sequencing status set to anything other than “None”’. New queries and information mash-ups can be added by users on any page if they know the SMW query syntax. For example, the following queries are trivial to run from the ‘Semantic Search’ page: List of strains sequenced by the funding body NIH: [[Strain_genome_funder::NIH]] Species in Blaxter clade III with Adenine-Thymine content greater than 70%: [[Category:Species]] [[Species_genome_at:: >70]] [[Species_bclade::Bclade_III]] All the pages and the relationships in 959NG can also be exported in XML and RDF format, respectively, using the Special:Export and Special:ExportRDF sections of the web site.

Blast Server For Genomes in Progress

One of the most used features of 959NG is the BLAST (7) server for intermediate genome assemblies. Although generating sequence data is no longer the bottleneck in a sequencing project, quality checks, assembly, annotation and analysis of the data can take several months. The 959NG BLAST server provides a place to park intermediate data so that interested researchers can start looking for their genes or features of interest and speed up the process of research, especially in time-critical areas such as drug–target and vaccine–candidate discovery. Completed genomes will be submitted to centralized repositories (GenBank/EMBL/DDBJ) and to specialized databases such as WormBase, at which point the intermediate assemblies can be removed from the 959NG BLAST server.

SOFTWARE

SMW (semantic-mediawiki.org) is an extension to the popular MediaWiki (mediawiki.org) platform that powers Wikipedia. We chose it for the 959NG web site because (i) users are familiar with wikis and comfortable with creating and editing pages and (ii) we were not sure at the outset about the information we wanted to capture for each species and its genome sequencing status. As we show in this section, SMWs are better than traditional databases when the data model may change.

SMW Concepts

The initial setup requires an understanding of the following SMW concepts: Categories: all pages on the site are in one of the following Categories: (i) Genome Sequencing Centre, (ii) Person, (iii), Species, (iv) Strain and (v) Taxon. A category would typically correspond to a table in a relational database. Forms: each category normally has a specialized form to enter information for that type of page. For example, a Taxon form will have fields for ‘NCBI taxon id’ and ‘Taxon parent’, which are specific to Taxon pages. Pages and Properties: a page is analogous to an object in a database or a row in a database table. Page properties in SMW are conceptually equivalent to object values or to columns in a database table. Templates: templates display information about a page or a property. Each category will usually have a template that determines how the information for those types of pages should be displayed. Templates also transform values into displays. For example, the ‘PubmedID Linkout’ template takes a PubMed ID such as 20980554 and displays a URL to that article on PubMed.

Advantages of SMW

Traditional database-driven web sites have fixed data models that are defined by the developers, and end-users typically only add data within the existing framework to such web sites. One of the main advantages of our SMW site is that, as sequencing technologies and needs change, even end-users can change the types of data stored for each entity (species, person, organization, etc.). For example, when we started the web site in early 2010, we did not have strain-specific pages because only one strain was sequenced per species. However, with sequencing becoming more accessible, different strains are now being sequenced for the same species, so we used the web interface to add a new ‘Strain’ category, created a new template and a new form for strains, and thus changed the fundamental data model without once touching a database table. The taxonomy tree is another example of how an end-user can change the data hierarchy without knowing anything about how the back-end is implemented. On our site, each taxon is a wiki page with a ‘Taxon parent’ property pointing to another taxon page, and the tree is generated dynamically based on this single property. Therefore, all we had to do to include additional sub-classifications such as Blaxter clades and Helder clades was to edit a few high-level taxon pages so that their ‘Taxon parent’ properties pointed to a new Blaxter or Helder clade page. Another advantage of SMW is that it sits on the MediaWiki platform, which is a mature and scalable engine for serving high-capacity web sites and has a large developer base. Setting up the initial web site took only three person-days, thanks to the examples of templates and forms on another similar site (arthropodgenomes.org). The BioDBCore description of the wiki is provided in the Supplementary Data section.

FUTURE DIRECTIONS

As more genomes are sequenced and the 959NG site grows, we hope that the evolving data model for nematode genome sequencing projects will also inform other genome sequencing efforts. Most genomes these days are not finished, but are published as high-quality draft sequences, so we will need to not only store the Minimum Information about a Genome Sequence (gensc.org) and Minimum Information about a high-throughput SeQuencing Experiment (www.mged.org/minseqe), but also additional values such as CEGMA scores (8) to measure how complete the genome is. We will also develop descriptors for genome-scale genetic mapping data, derived from technologies such as restriction-site-associated DNA sequencing (RADSeq) (9), genotyping by sequencing (GBS) (10) and other methods (11), across many strains or isolates of a species. Currently, site visitors can interrogate intermediate draft assemblies of genomes in progress only through the BLAST server. In addition, we would like to provide a basic, automatic annotation service for these incomplete genomes using RNASeq alignments and gene predictors.

CONCLUSIONS

The 959 Nematode Genomes wiki has already inspired international collaborations to sequence, annotate and interpret the genomes of key species. We know of two cases where groups who did not know of each other's efforts are now merging expertise and effort in a unified project. As additional genomes are proposed, new collaborations can be forged and cross-species analyses coordinated. We also hope that the existence of the wiki, and the enthusiastic community behind it, will serve to attract new researchers into this field. As nematode genomics moves into population genomics, this register of strains and sources will become ever more useful. SMW technology builds a system that is easy to navigate, easy to edit and, importantly, easy to develop as needs, knowledge and possibilities change. Genomics research on nematodes (particularly C. elegans) has already delivered important information on core biological processes. Adding additional nematode genomes will allow the specific instance of C. elegans to be contextualized, and will, we hope, feed research on comparative genomics of nematodes, the evolutionary biology of genome change, and the biology of (many) parasitic nematodes, among other fields. We hope 959NG will become a one-stop site in which to forge collaborations, learn about best practice in assembly and annotation, record insights and advances and explore the genomic diversity of Nematoda.

FUNDING

This work was supported by the School of Biological Sciences at the University of Edinburgh. Funding for open access charge: Natural Environment Research Council (NERC). Conflict of interest statement. None declared.
  11 in total

1.  Phylum-wide analysis of SSU rDNA reveals deep phylogenetic relationships among nematodes and accelerated evolution toward crown Clades.

Authors:  Martijn Holterman; Andre van der Wurff; Sven van den Elsen; Hanny van Megen; Tom Bongers; Oleksandr Holovachov; Jaap Bakker; Johannes Helder
Journal:  Mol Biol Evol       Date:  2006-06-21       Impact factor: 16.240

2.  CEGMA: a pipeline to accurately annotate core genes in eukaryotic genomes.

Authors:  Genis Parra; Keith Bradnam; Ian Korf
Journal:  Bioinformatics       Date:  2007-03-01       Impact factor: 6.937

3.  NEMBASE4: the nematode transcriptome resource.

Authors:  Benjamin Elsworth; James Wasmuth; Mark Blaxter
Journal:  Int J Parasitol       Date:  2011-04-21       Impact factor: 3.981

Review 4.  Genome-wide genetic marker discovery and genotyping using next-generation sequencing.

Authors:  John W Davey; Paul A Hohenlohe; Paul D Etter; Jason Q Boone; Julian M Catchen; Mark L Blaxter
Journal:  Nat Rev Genet       Date:  2011-06-17       Impact factor: 53.242

5.  Multiplexed shotgun genotyping for rapid and efficient genetic mapping.

Authors:  Peter Andolfatto; Dan Davison; Deniz Erezyilmaz; Tina T Hu; Joshua Mast; Tomoko Sunayama-Morita; David L Stern
Journal:  Genome Res       Date:  2011-01-13       Impact factor: 9.043

6.  A molecular evolutionary framework for the phylum Nematoda.

Authors:  M L Blaxter; P De Ley; J R Garey; L X Liu; P Scheldeman; A Vierstraete; J R Vanfleteren; L Y Mackey; M Dorris; L M Frisse; J T Vida; W K Thomas
Journal:  Nature       Date:  1998-03-05       Impact factor: 49.962

Review 7.  Genome sequence of the nematode C. elegans: a platform for investigating biology.

Authors: 
Journal:  Science       Date:  1998-12-11       Impact factor: 47.728

8.  GenBank.

Authors:  Dennis A Benson; Ilene Karsch-Mizrachi; David J Lipman; James Ostell; Eric W Sayers
Journal:  Nucleic Acids Res       Date:  2010-11-10       Impact factor: 16.971

9.  WormBase: a comprehensive resource for nematode research.

Authors:  Todd W Harris; Igor Antoshechkin; Tamberlyn Bieri; Darin Blasiar; Juancarlos Chan; Wen J Chen; Norie De La Cruz; Paul Davis; Margaret Duesbury; Ruihua Fang; Jolene Fernandes; Michael Han; Ranjana Kishore; Raymond Lee; Hans-Michael Müller; Cecilia Nakamura; Philip Ozersky; Andrei Petcherski; Arun Rangarajan; Anthony Rogers; Gary Schindelman; Erich M Schwarz; Mary Ann Tuli; Kimberly Van Auken; Daniel Wang; Xiaodong Wang; Gary Williams; Karen Yook; Richard Durbin; Lincoln D Stein; John Spieth; Paul W Sternberg
Journal:  Nucleic Acids Res       Date:  2009-11-12       Impact factor: 16.971

10.  Rapid SNP discovery and genetic mapping using sequenced RAD markers.

Authors:  Nathan A Baird; Paul D Etter; Tressa S Atwood; Mark C Currey; Anthony L Shiver; Zachary A Lewis; Eric U Selker; William A Cresko; Eric A Johnson
Journal:  PLoS One       Date:  2008-10-13       Impact factor: 3.240

View more
  18 in total

Review 1.  Nematode phospholipid metabolism: an example of closing the genome-structure-function circle.

Authors:  Soon Goo Lee; Joseph M Jez
Journal:  Trends Parasitol       Date:  2014-03-28

2.  The Global Invertebrate Genomics Alliance (GIGA): developing community resources to study diverse invertebrate genomes.

Authors:  Heather Bracken-Grissom; Allen G Collins; Timothy Collins; Keith Crandall; Daniel Distel; Casey Dunn; Gonzalo Giribet; Steven Haddock; Nancy Knowlton; Mark Martindale; Mónica Medina; Charles Messing; Stephen J O'Brien; Gustav Paulay; Nicolas Putnam; Timothy Ravasi; Greg W Rouse; Joseph F Ryan; Anja Schulze; Gert Wörheide; Maja Adamska; Xavier Bailly; Jesse Breinholt; William E Browne; M Christina Diaz; Nathaniel Evans; Jean-François Flot; Nicole Fogarty; Matthew Johnston; Bishoy Kamel; Akito Y Kawahara; Tammy Laberge; Dennis Lavrov; François Michonneau; Leonid L Moroz; Todd Oakley; Karen Osborne; Shirley A Pomponi; Adelaide Rhodes; Scott R Santos; Nori Satoh; Robert W Thacker; Yves Van de Peer; Christian R Voolstra; David Mark Welch; Judith Winston; Xin Zhou
Journal:  J Hered       Date:  2014 Jan-Feb       Impact factor: 2.645

3.  Incorporating genomics into the toolkit of nematology.

Authors:  Adler R Dillman; Ali Mortazavi; Paul W Sternberg
Journal:  J Nematol       Date:  2012-06       Impact factor: 1.402

4.  Assemblathon 2: evaluating de novo methods of genome assembly in three vertebrate species.

Authors:  Keith R Bradnam; Joseph N Fass; Anton Alexandrov; Paul Baranay; Michael Bechner; Inanç Birol; Sébastien Boisvert; Jarrod A Chapman; Guillaume Chapuis; Rayan Chikhi; Hamidreza Chitsaz; Wen-Chi Chou; Jacques Corbeil; Cristian Del Fabbro; T Roderick Docking; Richard Durbin; Dent Earl; Scott Emrich; Pavel Fedotov; Nuno A Fonseca; Ganeshkumar Ganapathy; Richard A Gibbs; Sante Gnerre; Elénie Godzaridis; Steve Goldstein; Matthias Haimel; Giles Hall; David Haussler; Joseph B Hiatt; Isaac Y Ho; Jason Howard; Martin Hunt; Shaun D Jackman; David B Jaffe; Erich D Jarvis; Huaiyang Jiang; Sergey Kazakov; Paul J Kersey; Jacob O Kitzman; James R Knight; Sergey Koren; Tak-Wah Lam; Dominique Lavenier; François Laviolette; Yingrui Li; Zhenyu Li; Binghang Liu; Yue Liu; Ruibang Luo; Iain Maccallum; Matthew D Macmanes; Nicolas Maillet; Sergey Melnikov; Delphine Naquin; Zemin Ning; Thomas D Otto; Benedict Paten; Octávio S Paulo; Adam M Phillippy; Francisco Pina-Martins; Michael Place; Dariusz Przybylski; Xiang Qin; Carson Qu; Filipe J Ribeiro; Stephen Richards; Daniel S Rokhsar; J Graham Ruby; Simone Scalabrin; Michael C Schatz; David C Schwartz; Alexey Sergushichev; Ted Sharpe; Timothy I Shaw; Jay Shendure; Yujian Shi; Jared T Simpson; Henry Song; Fedor Tsarev; Francesco Vezzi; Riccardo Vicedomini; Bruno M Vieira; Jun Wang; Kim C Worley; Shuangye Yin; Siu-Ming Yiu; Jianying Yuan; Guojie Zhang; Hao Zhang; Shiguo Zhou; Ian F Korf
Journal:  Gigascience       Date:  2013-07-22       Impact factor: 6.524

5.  New techniques and tools in 2011.

Authors:  Elaine Ellerton; Harald Hutter
Journal:  Worm       Date:  2012-01-01

Review 6.  Toward 959 nematode genomes.

Authors:  Sujai Kumar; Georgios Koutsovoulos; Gaganjot Kaur; Mark Blaxter
Journal:  Worm       Date:  2012-01-01

7.  Simultaneous genome sequencing of symbionts and their hosts.

Authors:  Sujai Kumar; Mark L Blaxter
Journal:  Symbiosis       Date:  2012-02-15       Impact factor: 2.268

8.  Plectus - a stepping stone in embryonic cell lineage evolution of nematodes.

Authors:  Jens Schulze; Wouter Houthoofd; Jana Uenk; Sandra Vangestel; Einhard Schierenberg
Journal:  Evodevo       Date:  2012-07-02       Impact factor: 2.250

9.  Community intelligence in knowledge curation: an application to managing scientific nomenclature.

Authors:  Lin Dai; Chao Xu; Ming Tian; Jian Sang; Dong Zou; Ang Li; Guocheng Liu; Fei Chen; Jiayan Wu; Jingfa Xiao; Xumin Wang; Jun Yu; Zhang Zhang
Journal:  PLoS One       Date:  2013-02-25       Impact factor: 3.240

Review 10.  The evolution of parasitism in Nematoda.

Authors:  Mark Blaxter; Georgios Koutsovoulos
Journal:  Parasitology       Date:  2014-06-25       Impact factor: 3.234

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.