Literature DB >> 24058818

WormBase: Annotating many nematode genomes.

Kevin Howe¹, Paul Davis, Michael Paulini, Mary Ann Tuli, Gary Williams, Karen Yook, Richard Durbin, Paul Kersey, Paul W Sternberg.

Abstract

WormBase (www.wormbase.org) has been serving the scientific community for over 11 years as the central repository for genomic and genetic information for the soil nematode Caenorhabditis elegans. The resource has evolved from its beginnings as a database housing the genomic sequence and genetic and physical maps of a single species, and now represents the breadth and diversity of nematode research, currently serving genome sequence and annotation for around 20 nematodes. In this article, we focus on WormBase's role of genome sequence annotation, describing how we annotate and integrate data from a growing collection of nematode species and strains. We also review our approaches to sequence curation, and discuss the impact on annotation quality of large functional genomics projects such as modENCODE.

Entities: Chemical

Keywords: Caenorhabditis elegans; annotation; community resource; genome; model organism database; nematode; parasitic nematode; sequence curation

Year: 2012 PMID： 24058818 PMCID： PMC3670165 DOI： 10.4161/worm.19574

Source DB: PubMed Journal: Worm ISSN： 2162-4046

Introduction

WormBase seeks to present an integrative view of nematode biology by in-depth curation of the research on C. elegans and other members of this animal family. To this end we integrate genomic sequences and annotations with curated data from genetic, developmental, physiological, behavioral and evolutionary studies. We provide multiple streams of access to the data, including the main website portal (www.wormbase.org), genome browsers, sequence search services, and application programming interfaces. WormBase aims to be the central repository and portal for nematode genomic data. The activities of the WormBase consortium can be broadly classified into three groups: (1) curation of C. elegans literature and associated research and development; (2) user interface design, development and maintenance and (3) genome sequence annotation, analysis and comparative genomics. The volume of nematode data has exploded in recent years, and WormBase has had to respond accordingly in all three of these areas., For example, as the volume and variety of information has increased, its presentation to the community in a clear and accessible way requires new forms of display. We have responded to this challenge by completely redesigning the WormBase web-interface (Harris et al., manuscript in preparation). In this article, we focus on our remit to provide integrated, coherent genome annotation for a large (and growing) collection of nematode genome sequences and strains. We also summarize our release production cycle and analysis pipelines, and describe how they affect the timeline between data submission and its subsequent public release.

Integrating and Annotating Multiple Nematode Genomes

WormBase now hosts genomic data for nearly 20 nematodes (see Table 1, and refs. 3–14), representing species of evolutionary, biomedical and agricultural interest. Recent additions include the parasitic nematodes Trichinella spiralis, Ascaris suum and Bursaphelenchus xylophilus. The maturity of genome sequence and annotation in WormBase varies widely between species. At one end of the spectrum is the C. elegans genome, which was completed over a number of years using traditional physical mapping and clone-by-clone sequencing and finishing, and which has highly curated annotation. More recently we have seen a number of genome sequences generated by new high-throughput low-cost technologies and many of these genomes are inevitably fragmented and incomplete; additionally, there is relatively little published functional information about many of these species.

Table 1.

Nematode genomes in WormBase

Species	Bclade^a	Mode of reproduction^b	Reference strain sequenced	Integrated into WormBase	WS230 status
					Assembly version^e	Sequenced genome size (Mb)	Top-level fragments	Scaffold N50^f	CDS models (distinct loci)	Gene set status
C. elegans	V	androdioecious	Bristol N2	WS1	WS230	100.3	6	17493793	25634 (20517)	Curated
C. briggsae	V	androdioecious	AF16	WS132	CAAC03000000	108.4	367	17485439	21961 (21936)	Curated
C. remanei	V	gonochoristic	PB4641	WS185	AAGD02000000	145.5	3670	461060	31476 (31471)	Curated
Brugia malayi	III	gonochoristic	TRS	WS185	AAQA01000000	95.8	27210	37841	21332 (18348)	External^g
Pristionchus pacificus	V	hermaphoditic	PS312	WS194	v2 (Sep. 2010)	172.5	18083	1244534	24217 (24216)	External
C. japonica	V	gonochoristic	DF5080	WS195	ABLE03000000	166.3	18817	94149	36105 (29962)	Curated
C. brenneri	V	gonochoristic	PB2801	WS196	ABEG02000000	190.4	3305	368319	30670 (30667)	Curated
Meloidogyne hapla	IV	gonochoristic	VW9	WS204	ABLG01000000	53.0	3452	84000	13072 (13072)	External
Meloidogyne incognita	IV	gonochoristic	Morelos	WS205	CABB01000000	82.1	9538	83000	-	External^h
Hemonchus contortus	V	gonochoristic	MHco3 (ISE)	WS208	v1 (Aug. 2008)	298.0	59707	13338	6201 (6201)	WormBase Predicted
C. angaria	V	gonochoristic	PS1010	WS218	AEHI01000000ⁱ	79.8	33559	9453	26265 (22622)	External
Trichinella spiralis	I	gonochoristic	ISS 195	WS225	ABIR02000000	63.5	6863	6373445	16380 (16380)	External
C. sp 9	V	gonochoristic	JU1422	WS226	v1 (June 2011)	204.3	7636	196652	45167 (45167)	WormBase predicted
C. sp 11	V	androdioecious	JU1373	WS226	AEKS01000000	79.3	665	20921866	27721 (22326)	External
Strongyloides ratti	IV	gonochoristic and parthenogenetic^c	ED321 Heterogonic	WS226	CACX01000000	52.6	2184	359029	8188 (8077)	WormBase Predicted
Ascaris suum	III	gonochoristic	Natural isolate	WS229	v1 (Aug. 2011)	272.8	29831	407899	18449 (18449)	External
Bursaphelenchus xylophilus	IV	gonochoristic	Ka4C1	WS229	CADV01000000	74.6	5527	1158000	18074 (18074)	External
Heterorhabditis bacteriophora	V	gonochoristic and hermaphroditic^c,d	M31e	WS229	ACKM01000000	77.0	1240	312328	-	External^h
C. sp 5	V	gonochoristic	DRD-2008 JU800	WS230	v1 (Jan. 2012)	131.8	15261	25228	46280 (34696)	External

Notes: (a) ref. 15; (b) refs. 16–25; (c) heterogonic; (d) sex also determined by the environment; (e) INSDC assembly accession where available; (f) http://www.broadinstitute.org/crd/wiki/index.php/N50; (g) author gene-set extended by additional isoform predictions from WormBase; (h) awaiting submission of gene set; (i) an improved C. angaria assembly will be available in WS231. WormBase undertakes different responsibilities for each of these species, which can include (1) administration of the genome sequence; (2) curation of gene models and other sequence features; (3) curation of non-sequence-based data from the literature and (4) tracking of identifiers forward through different versions of the genome sequence and annotation. The specific way in which we manage the data for a species depends (primarily) on whether we curate gene models and other features for it. It is therefore useful for the sake of discussion to classify the species into two groups: core (WormBase curated gene models) and non-core. As of release WS230, the core species are C. elegans, C. briggsae, C. remanei, C.brenneri and C. japonica. Analyzing and presenting data for an ever-increasing number of nematode genomes requires methods that scale well. We deploy a standard automatic analysis pipeline to annotate all the species we house (core and non-core), including repeat prediction, cDNA alignments, the determination of homology relationships, and protein domain identification. If a genome sequence for a non-core species is submitted without a gene-set, we also run an in-house gene prediction pipeline that uses CEGMA to accurately identify a small, universally conserved set of gene models. These are then used to train parameters for AUGUSTUS, which we then apply using protein homologies and any available RNASeq and other transcript data as supporting evidence. In some cases, these internally-produced gene predictions are later replaced by a canonical set of models provided by the submitters. Updating an existing species in WormBase with a new assembly and/or gene-set presents additional challenges, because users rely on stable identifiers to track their entities of interest, which must be propagated forward to corresponding features in subsequent releases. For core species, identifiers are actively managed and tracked using our own curation software infrastructure. For non-core species, we use the Ensembl stable-identifier mapping software for this task. The principal way in which we draw information from multiple species together is by connecting genes via orthology and paralogy relationships to genes in other species (both nematode and other model organisms such as human, mouse and fly). As of WS230, we include relationships published by the following projects and resources: InParanoid (version 7); TreeFam (version 7); the Othologous Matrix Project (OMA, August 2009/08 version); OrthoMCL; PantherDB, (version 7); and Ensembl, (version 65). In addition, we curate orthology calls from the literature (e.g., Hillier et al., ref. 8) and direct submissions. We also use data in eggNOG (version 3.0) to cluster genes into functionally characterized homologous groups. These resources are inevitably based on snapshots of the gene models, taken at various times. For our core species however, particularly C. elegans, the gene models are in a state of flux, being revised and improved on the basis of the latest evidence. In order to infer up-to-date nematode homology relationships for the latest gene models, we run the Ensembl Compara GeneTree pipeline as part of the preparation for every WormBase release. The resulting gene trees are used to infer additional current orthology relationships to those obtained by import from the third-party resources and direct submission. One way in which we use the orthology relationships internally is to project WormBase-approved gene names onto orthologous gene(s) of other nematode species. For this a conservative approach is adopted: each proposed gene name is required to be supported by an unambiguous one to one orthology connection according to the majority of available source analyses. We also use Ensembl Compara DNA pipeline to produce whole-genome multiple alignments of all genomes in WormBase and derived genome conservation tracks (using GERP). However, as the genetic diversity of the species collection in WormBase continues to increase, a single multiple alignment for all nematodes becomes less appropriate. We therefore propose to replace it with a series of pairwise alignments, providing multiple alignments only for selected subsets of species.

Sequence Curation

WormBase adopts an anomaly-driven approach to curation, whereby discrepancies between current gene models and alignment data are identified and flagged as curation targets. We have implemented a software application (CurationTool) that identifies these discrepancies and scores them according to their degree of discordance, presenting the results to the curator using a graphical user interface. An in-depth discussion of CurationTool and our anomaly-driven curation is presented elsewhere. For protein-coding genes, WormBase curates only the protein-coding portion (CDS) of the full transcript. For our core species, we use the high-confidence subset of cDNA alignments overlaying the curated CDS models to infer a set of full-length transcripts (including 5′ and 3′ untranslated regions), using a custom algorithm (unpublished). In the past, the accuracy of this process has been sensitive to artifacts such as alignment errors or chimeric cDNAs, but we have recently improved the algorithm to take these factors into account. The primary line of evidence for gene model curation is transcript data. In addition to cDNAs deposited in the nucleotide archives, we draw data from numerous resources, publications and direct submissions. We also align all RNASeq data deposited in the Short Read Archive (SRA) to our core species using TopHat, and infer gene expression estimates for a variety of life stages and environmental conditions using Cufflinks. WormBase is committed to act as the ultimate repository for data coming from the nematode half of the modENCODE, project. Most data sets have been accessible via the genome browser since the summer of 2010. To extract the maximum utility from the data, it is integrated fully into our database, by extending the data models where necessary and adding full cross-referencing and connectivity with existing WormBase objects. To date, the focus for full integration has been on data sets with high impact on gene model and other sequence feature curation, namely: trans-splice sites; poly-A cleavage sites and untranslated regions;, large-scale EST sets (P. Green; data retrieved from nucleotide archives); mass-spectrometry peptide sequences; and RNASeq transcripts, and derived gene-predictions. The data of highest impact for curation has been the RNASeq transcriptome, and this has been used in a number of different ways. First, the modENCODE “genelets” (fragmentary gene models constructed using RNASeq data from 14 life stages) have been used to produce a new anomaly type for CurationTool that highlights potential cases where adjacent genes could be merged. To date, over three hundred cases displaying this anomaly have been scrutinized, of which approximately 35% resulted in a merge, and a further 10% some other change (for example the movement of an exon from one gene to another). Second, we have re-visited the source RNASeq data and analyzed it using the Tophat/Cufflnks pipeline, to identify candidate “RNASeq-splice” features. These can be used both to confirm introns already part of curated gene models, and also to suggest changes to existing gene models or new isoforms. Third, the strand bias characteristic of the modENCODE RNASeq alignments has been extremely useful for curators to resolve ambiguities in the definition of the 5′ and 3′ ends of genes. Finally, the modENCODE RNASeq data has allowed us to make corrections to the C. elegans reference genome itself. By taking proposed errors and verifying them using data from a private submission of high-throughput-sequencing (J. Ahringer and M. Berriman, pers. comm.), we have been able to make 156 genome sequence corrections (110 insertions, 44 deletions and 2 substitutions), resulting in the correction of 100 gene models. Additionally, since the data from modENCODE began to become available from the project Data Co-ordination Centre, the following data sets have been subjected to rigorous internal quality control and fully integrated into the database: ~300 Highly Occupied Target (HOT) regions; ~7,000 non-coding RNA genes; the probable parent for ~1,000 pseudogenes; and ~21,000 three-prime UTRs from the UTRome project. We will prioritise the incorporation of the transcription-factor binding site and chromatin accessibility data as soon as the final versions of these data sets are made available. We have also worked with groups performing their own analysis of the modENCODE data. For example, a study of the modENCODE RNASeq reads (T. Blumenthal, pers. comm.) has resulted in significant improvements to the operon data set. This has involved identifying cases where fewer than 5% of the trans-splice leader reads for “internal” genes (i.e., genes other than the first) were SL2 type, and modifying the gene content of the operons accordingly. In addition to modENCODE, we continue to draw in data from the scientific literature and direct submissions, often combining different data sources to assist in making correct predictions. The modENCODE poly-A site data has been supplemented with a corresponding data set from an independent study. These two data sets have only 25% redundancy, and over 80% of coding genes now have an annotated polyA site in WormBase. Gene predictions by genBlastG based on BLAST homologies to C. elegans proteins have also proved valuable for the curation of C. briggsae, C. brenneri, and C. remenei. We can assess gene-model accuracy in the presence of fragmentary transcript evidence by measuring the proportion of curated introns that are confirmed by spliced cDNA evidence. For WS230, the proportion of C. elegans curated CDS introns confirmed by traditional cDNA, modENCODE RNASeq and mass-spectrometry evidences is 83%, 88% and 14% respectively. Overall, 93% of curated introns are confirmed and 82% of CDS models have all of their introns confirmed by at least one of these three lines of evidence; the corresponding measurements for the final release prior to modENCODE (WS200, February 2009) were 74% and 56%, demonstrating the value of the project in increasing the accuracy and confidence of C. elegans gene models.

Intraspecies Variation

Similar to many other resources, WormBase captures within-species variation as differences (insertions, deletions and substitutions) with respect to the genome sequence of the reference strain. We expect variation data for many nematode species in the future, but at present almost all the data we house is for C. elegans. Historically, the majority of variation data we have processed has been from laboratory-manipulated strains. We maintain close working relationships and established data exchange protocols with the Caenorhabditis Genetics Center (CGC; www.cbs.umn.edu/CGC), the C. elegans Gene Knockout Consortium (GKC; www.celeganskoconsortium.omrf.org), and the National BioResource Project of Japan (NBRP; www.shigen.nig.ac.jp/c.elegans/index.jsp). We also curate variation data from individual user submissions; which although time-consuming, are often biologically important. There has recently been a rapid growth of C. elegans variation data generated by whole genome sequencing projects (refs. 50–54; Andersen et al., manuscript in preparation; Moerman and Waterston, manuscript in preparation). These data sets include an increasing number of variations from naturally-occurring wild-isolate strains. Motivated by community feedback, we have increased the clarity of our representation and display of this information. Every variation object processed by WormBase is assigned a unique, stable identifier with prefix “WBVar.” For laboratory-induced variations, we also assign a more directly informative public name comprised of a project/laboratory prefix (supplied by J. Hodgkin, pers. comm.) and a numerical suffix. For naturally occurring variations, the public name defaults to the WBVar identifier, making the distinction between these objects and the laboratory induced variations obvious and immediate. We now also collect non-sequence-based information for wild isolate strains (http://tazendra.caltech.edu/~azurebrd/cgi-bin/forms/wild_isolate.cgi). Compared with laboratory-manipulated strains, there is additional information to capture about the wild isolates, such as isolation location, the condition in which it was found, and details of how it was isolated. Many wild isolates are not stocked at the CGC, and WormBase acts as the central data repository for these strains. WormBase does not have a mandate to act as a permanent repository for variation data, and as the volume of these data sets continues to rapidly increase, we become less adequately resourced to perform this function. Projects are therefore encouraged to submit their data to the NCBI’s Database of Short Genetic Variations (dbSNP), an established archive for variation data. We act as a submission broker in cases where a laboratory lacks the technical resources to conform to the dbSNP submission protocols. To date, data from six projects have been integrated into WormBase and submitted to dbSNP. WormBase adds value to these data sets by performing additional analysis and placing them into context with other data types (e.g., Gene). Variations are most often submitted to WormBase as a molecular change at given location in a specific version of the reference genome sequence. As part of the curation, we capture and record a short flanking sequence either side of the variation feature, disassociating it from a specific version of the reference genome. Each release, we re-map all variations and re-calculate potential consequences of the molecular changes (e.g., non-sense, mis-sense or silent protein-coding mutation) on the latest gene models.

Release Cycle and Database Build

WormBase is released every two months, with the preparation for a release beginning three months in advance. This release cycle can give rise to variability in the time between a curator transaction (e.g., the update of a gene name, correction of an error, or the import of a new data set) and its availability on the WormBase website. The delay can be as short as three months (if the change is made immediately before we start building the release) and as long as five months (if made immediately after, in which case it will not be public until the following release). Building a WormBase database release is a complicated process, the broad stages of which can be described as: (1) data freeze, where each contributing consortium partner takes a snap-shot of the database(s) in which their curation data are stored; (2) data collation, where the curation database snap-shots are brought together into a single database; (3) submission of updated annotation on core species to the International Nucleotide Sequence Database Collaboration, to ensure that the representation of core nematode data in the nucleotide and protein archives is up-to-date; (4) mapping of sequence data (e.g., cDNAs, microarray probes, sequence features, variations) to the genome; (5) establishing connections between objects of different types (e.g., RNAi to Gene), usually via genomic location; (6) the large-scale computational analyses discussed earlier, such as homology detection and whole-genome alignment; and (7) quality control and assurance. For the more complicated parts of the build process, we deploy two components of the Ensembl system for the management and tracking of computational pipelines: ensembl-pipeline for homology analysis and eHive for comparative analysis. The key features of these systems are (1) automatic re-run of tasks that have failed; and (2) user-definition of a sub-task dependency graph for a process, allowing complex pipelines to be run with minimal user intervention. These systems are critical in enabling us to produce the database in a regular and timely manner. Each stage of the database production is subject to a suite of integrity checks to ensure that it has completed cleanly and without error. For example, we compare the number of objects in each data class with the count at the corresponding stage in the previous release. Major discrepancies are flagged for investigation. This mechanism has proved to be extremely effective in catching errors and process failures as soon as they occur.

Summary

WormBase is facing a deluge of data from many nematode genome sequencing projects, and we have prepared for this by putting into place annotation and integration pipelines and workflows that will allow the data to be analyzed and presented in a timely and consistent manner. As ever, we welcome feedback and ideas from our user-base as part of the continued development of the resource. We are currently particularly interested in suggestions on how we can maximise the utility of housing a broad representation of the nematode phylum, and what comparative genomics services and views users would find most useful. Users can contact the developers at help@wormbase.org with their suggestions.

57 in total

1. The Ensembl analysis pipeline.

Authors: Simon C Potter; Laura Clarke; Val Curwen; Stephen Keenan; Emmanuel Mongin; Stephen M J Searle; Arne Stabenau; Roy Storey; Michele Clamp
Journal: Genome Res Date: 2004-05 Impact factor: 9.043

2. The Pristionchus pacificus genome provides a unique perspective on nematode lifestyle and parasitism.

Authors: Christoph Dieterich; Sandra W Clifton; Lisa N Schuster; Asif Chinwalla; Kimberly Delehaunty; Iris Dinkelacker; Lucinda Fulton; Robert Fulton; Jennifer Godfrey; Pat Minx; Makedonka Mitreva; Waltraud Roeseler; Huiyu Tian; Hanh Witte; Shiaw-Pyng Yang; Richard K Wilson; Ralf J Sommer
Journal: Nat Genet Date: 2008-09-21 Impact factor: 38.330

Review 3. The genomes of root-knot nematodes.

Authors: David McK Bird; Valerie M Williamson; Pierre Abad; James McCarter; Etienne G J Danchin; Philippe Castagnone-Sereno; Charles H Opperman
Journal: Annu Rev Phytopathol Date: 2009 Impact factor: 13.078

4. Ensembl 2011.

Authors: Paul Flicek; M Ridwan Amode; Daniel Barrell; Kathryn Beal; Simon Brent; Yuan Chen; Peter Clapham; Guy Coates; Susan Fairley; Stephen Fitzgerald; Leo Gordon; Maurice Hendrix; Thibaut Hourlier; Nathan Johnson; Andreas Kähäri; Damian Keefe; Stephen Keenan; Rhoda Kinsella; Felix Kokocinski; Eugene Kulesha; Pontus Larsson; Ian Longden; William McLaren; Bert Overduin; Bethan Pritchard; Harpreet Singh Riat; Daniel Rios; Graham R S Ritchie; Magali Ruffier; Michael Schuster; Daniel Sobral; Giulietta Spudich; Y Amy Tang; Stephen Trevanion; Jana Vandrovcova; Albert J Vilella; Simon White; Steven P Wilder; Amonida Zadissa; Jorge Zamora; Bronwen L Aken; Ewan Birney; Fiona Cunningham; Ian Dunham; Richard Durbin; Xosé M Fernández-Suarez; Javier Herrero; Tim J P Hubbard; Anne Parker; Glenn Proctor; Jan Vogel; Stephen M J Searle
Journal: Nucleic Acids Res Date: 2010-11-02 Impact factor: 16.971

5. OrthoMCL-DB: querying a comprehensive multi-species collection of ortholog groups.

Authors: Feng Chen; Aaron J Mackey; Christian J Stoeckert; David S Roos
Journal: Nucleic Acids Res Date: 2006-01-01 Impact factor: 16.971

6. TreeFam: a curated database of phylogenetic trees of animal gene families.

Authors: Heng Li; Avril Coghlan; Jue Ruan; Lachlan James Coin; Jean-Karim Hériché; Lara Osmotherly; Ruiqiang Li; Tao Liu; Zhang Zhang; Lars Bolund; Gane Ka-Shu Wong; Weimou Zheng; Paramvir Dehal; Jun Wang; Richard Durbin
Journal: Nucleic Acids Res Date: 2006-01-01 Impact factor: 16.971

7. InParanoid 7: new algorithms and tools for eukaryotic orthology analysis.

Authors: Gabriel Ostlund; Thomas Schmitt; Kristoffer Forslund; Tina Köstler; David N Messina; Sanjit Roopra; Oliver Frings; Erik L L Sonnhammer
Journal: Nucleic Acids Res Date: 2009-11-05 Impact factor: 16.971

8. TopHat: discovering splice junctions with RNA-Seq.

Authors: Cole Trapnell; Lior Pachter; Steven L Salzberg
Journal: Bioinformatics Date: 2009-03-16 Impact factor: 6.937

9. Phylogeny of the nematode genus Pristionchus and implications for biodiversity, biogeography and the evolution of hermaphroditism.

Authors: Werner E Mayer; Matthias Herrmann; Ralf J Sommer
Journal: BMC Evol Biol Date: 2007-07-02 Impact factor: 3.260

10. Comparison of C. elegans and C. briggsae genome sequences reveals extensive conservation of chromosome organization and synteny.

Authors: LaDeana W Hillier; Raymond D Miller; Scott E Baird; Asif Chinwalla; Lucinda A Fulton; Daniel C Koboldt; Robert H Waterston
Journal: PLoS Biol Date: 2007-07-03 Impact factor: 8.029

7 in total

1. Mixing genome annotation methods in a comparative analysis inflates the apparent number of lineage-specific genes.

Authors: Caroline M Weisman; Andrew W Murray; Sean R Eddy
Journal: Curr Biol Date: 2022-05-18 Impact factor: 10.900

2. The EMBL-EBI bioinformatics web and programmatic tools framework.

Authors: Weizhong Li; Andrew Cowley; Mahmut Uludag; Tamer Gur; Hamish McWilliam; Silvano Squizzato; Young Mi Park; Nicola Buso; Rodrigo Lopez
Journal: Nucleic Acids Res Date: 2015-04-06 Impact factor: 16.971

3. MiRNATIP: a SOM-based miRNA-target interactions predictor.

Authors: Antonino Fiannaca; Massimo La Rosa; Laura La Paglia; Riccardo Rizzo; Alfonso Urso
Journal: BMC Bioinformatics Date: 2016-09-22 Impact factor: 3.169

4. WormBase ParaSite - a comprehensive resource for helminth genomics.

Authors: Kevin L Howe; Bruce J Bolt; Myriam Shafie; Paul Kersey; Matthew Berriman
Journal: Mol Biochem Parasitol Date: 2016-11-27 Impact factor: 1.759

5. WormBase 2017: molting into a new stage.

Authors: Raymond Y N Lee; Kevin L Howe; Todd W Harris; Valerio Arnaboldi; Scott Cain; Juancarlos Chan; Wen J Chen; Paul Davis; Sibyl Gao; Christian Grove; Ranjana Kishore; Hans-Michael Muller; Cecilia Nakamura; Paulo Nuin; Michael Paulini; Daniela Raciti; Faye Rodgers; Matt Russell; Gary Schindelman; Mary Ann Tuli; Kimberly Van Auken; Qinghua Wang; Gary Williams; Adam Wright; Karen Yook; Matthew Berriman; Paul Kersey; Tim Schedl; Lincoln Stein; Paul W Sternberg
Journal: Nucleic Acids Res Date: 2018-01-04 Impact factor: 16.971

6. Hybrid de novo whole-genome assembly and annotation of the model tapeworm Hymenolepis diminuta.

Authors: Robert M Nowak; Jan P Jastrzębski; Wiktor Kuśmirek; Rusłan Sałamatin; Małgorzata Rydzanicz; Agnieszka Sobczyk-Kopcioł; Anna Sulima-Celińska; Łukasz Paukszto; Karol G Makowczenko; Rafał Płoski; Vasyl V Tkach; Katarzyna Basałaj; Daniel Młocicki
Journal: Sci Data Date: 2019-12-03 Impact factor: 6.444

7. WormBase 2014: new views of curated biology.

Authors: Todd W Harris; Joachim Baran; Tamberlyn Bieri; Abigail Cabunoc; Juancarlos Chan; Wen J Chen; Paul Davis; James Done; Christian Grove; Kevin Howe; Ranjana Kishore; Raymond Lee; Yuling Li; Hans-Michael Muller; Cecilia Nakamura; Philip Ozersky; Michael Paulini; Daniela Raciti; Gary Schindelman; Mary Ann Tuli; Kimberly Van Auken; Daniel Wang; Xiaodong Wang; Gary Williams; J D Wong; Karen Yook; Tim Schedl; Jonathan Hodgkin; Matthew Berriman; Paul Kersey; John Spieth; Lincoln Stein; Paul W Sternberg
Journal: Nucleic Acids Res Date: 2013-11-04 Impact factor: 16.971

7 in total