Literature DB >> 24058818

WormBase: Annotating many nematode genomes.

Kevin Howe1, Paul Davis, Michael Paulini, Mary Ann Tuli, Gary Williams, Karen Yook, Richard Durbin, Paul Kersey, Paul W Sternberg.   

Abstract

WormBase (www.wormbase.org) has been serving the scientific community for over 11 years as the central repository for genomic and genetic information for the soil nematode Caenorhabditis elegans. The resource has evolved from its beginnings as a database housing the genomic sequence and genetic and physical maps of a single species, and now represents the breadth and diversity of nematode research, currently serving genome sequence and annotation for around 20 nematodes. In this article, we focus on WormBase's role of genome sequence annotation, describing how we annotate and integrate data from a growing collection of nematode species and strains. We also review our approaches to sequence curation, and discuss the impact on annotation quality of large functional genomics projects such as modENCODE.

Entities:  

Keywords:  Caenorhabditis elegans; annotation; community resource; genome; model organism database; nematode; parasitic nematode; sequence curation

Year:  2012        PMID: 24058818      PMCID: PMC3670165          DOI: 10.4161/worm.19574

Source DB:  PubMed          Journal:  Worm        ISSN: 2162-4046


Introduction

WormBase seeks to present an integrative view of nematode biology by in-depth curation of the research on C. elegans and other members of this animal family. To this end we integrate genomic sequences and annotations with curated data from genetic, developmental, physiological, behavioral and evolutionary studies. We provide multiple streams of access to the data, including the main website portal (www.wormbase.org), genome browsers, sequence search services, and application programming interfaces. WormBase aims to be the central repository and portal for nematode genomic data. The activities of the WormBase consortium can be broadly classified into three groups: (1) curation of C. elegans literature and associated research and development; (2) user interface design, development and maintenance and (3) genome sequence annotation, analysis and comparative genomics. The volume of nematode data has exploded in recent years, and WormBase has had to respond accordingly in all three of these areas., For example, as the volume and variety of information has increased, its presentation to the community in a clear and accessible way requires new forms of display. We have responded to this challenge by completely redesigning the WormBase web-interface (Harris et al., manuscript in preparation). In this article, we focus on our remit to provide integrated, coherent genome annotation for a large (and growing) collection of nematode genome sequences and strains. We also summarize our release production cycle and analysis pipelines, and describe how they affect the timeline between data submission and its subsequent public release.

Integrating and Annotating Multiple Nematode Genomes

WormBase now hosts genomic data for nearly 20 nematodes (see Table 1, and refs. 3–14), representing species of evolutionary, biomedical and agricultural interest. Recent additions include the parasitic nematodes Trichinella spiralis, Ascaris suum and Bursaphelenchus xylophilus. The maturity of genome sequence and annotation in WormBase varies widely between species. At one end of the spectrum is the C. elegans genome, which was completed over a number of years using traditional physical mapping and clone-by-clone sequencing and finishing, and which has highly curated annotation. More recently we have seen a number of genome sequences generated by new high-throughput low-cost technologies and many of these genomes are inevitably fragmented and incomplete; additionally, there is relatively little published functional information about many of these species.
Table 1.

Nematode genomes in WormBase

Species
Bcladea
Mode of reproductionb
Reference strain sequenced
Integrated into WormBase
WS230 status
     Assembly versioneSequenced genome size (Mb)Top-level fragmentsScaffold N50fCDS models (distinct loci)Gene set status
C. elegans
V
androdioecious
Bristol N2
WS1
WS230
100.3
6
17493793
25634 (20517)
Curated
C. briggsae
V
androdioecious
AF16
WS132
CAAC03000000
108.4
367
17485439
21961 (21936)
Curated
C. remanei
V
gonochoristic
PB4641
WS185
AAGD02000000
145.5
3670
461060
31476 (31471)
Curated
Brugia malayi
III
gonochoristic
TRS
WS185
AAQA01000000
95.8
27210
37841
21332 (18348)
Externalg
Pristionchus pacificus
V
hermaphoditic
PS312
WS194
v2 (Sep. 2010)
172.5
18083
1244534
24217 (24216)
External
C. japonica
V
gonochoristic
DF5080
WS195
ABLE03000000
166.3
18817
94149
36105 (29962)
Curated
C. brenneri
V
gonochoristic
PB2801
WS196
ABEG02000000
190.4
3305
368319
30670 (30667)
Curated
Meloidogyne hapla
IV
gonochoristic
VW9
WS204
ABLG01000000
53.0
3452
84000
13072 (13072)
External
Meloidogyne incognita
IV
gonochoristic
Morelos
WS205
CABB01000000
82.1
9538
83000
-
Externalh
Hemonchus contortus
V
gonochoristic
MHco3 (ISE)
WS208
v1 (Aug. 2008)
298.0
59707
13338
6201 (6201)
WormBase Predicted
C. angaria
V
gonochoristic
PS1010
WS218
AEHI01000000i
79.8
33559
9453
26265 (22622)
External
Trichinella spiralis
I
gonochoristic
ISS 195
WS225
ABIR02000000
63.5
6863
6373445
16380 (16380)
External
C. sp 9
V
gonochoristic
JU1422
WS226
v1 (June 2011)
204.3
7636
196652
45167 (45167)
WormBase predicted
C. sp 11
V
androdioecious
JU1373
WS226
AEKS01000000
79.3
665
20921866
27721 (22326)
External
Strongyloides ratti
IV
gonochoristic and parthenogeneticc
ED321 Heterogonic
WS226
CACX01000000
52.6
2184
359029
8188 (8077)
WormBase Predicted
Ascaris suum
III
gonochoristic
Natural isolate
WS229
v1 (Aug. 2011)
272.8
29831
407899
18449 (18449)
External
Bursaphelenchus xylophilus
IV
gonochoristic
Ka4C1
WS229
CADV01000000
74.6
5527
1158000
18074 (18074)
External
Heterorhabditis bacteriophora
V
gonochoristic and hermaphroditicc,d
M31e
WS229
ACKM01000000
77.0
1240
312328
-
Externalh
C. sp 5VgonochoristicDRD-2008 JU800WS230v1 (Jan. 2012)131.8152612522846280 (34696)External

Notes: (a) ref. 15; (b) refs. 16–25; (c) heterogonic; (d) sex also determined by the environment; (e) INSDC assembly accession where available; (f) http://www.broadinstitute.org/crd/wiki/index.php/N50; (g) author gene-set extended by additional isoform predictions from WormBase; (h) awaiting submission of gene set; (i) an improved C. angaria assembly will be available in WS231.

Notes: (a) ref. 15; (b) refs. 16–25; (c) heterogonic; (d) sex also determined by the environment; (e) INSDC assembly accession where available; (f) http://www.broadinstitute.org/crd/wiki/index.php/N50; (g) author gene-set extended by additional isoform predictions from WormBase; (h) awaiting submission of gene set; (i) an improved C. angaria assembly will be available in WS231. WormBase undertakes different responsibilities for each of these species, which can include (1) administration of the genome sequence; (2) curation of gene models and other sequence features; (3) curation of non-sequence-based data from the literature and (4) tracking of identifiers forward through different versions of the genome sequence and annotation. The specific way in which we manage the data for a species depends (primarily) on whether we curate gene models and other features for it. It is therefore useful for the sake of discussion to classify the species into two groups: core (WormBase curated gene models) and non-core. As of release WS230, the core species are C. elegans, C. briggsae, C. remanei, C.brenneri and C. japonica. Analyzing and presenting data for an ever-increasing number of nematode genomes requires methods that scale well. We deploy a standard automatic analysis pipeline to annotate all the species we house (core and non-core), including repeat prediction, cDNA alignments, the determination of homology relationships, and protein domain identification. If a genome sequence for a non-core species is submitted without a gene-set, we also run an in-house gene prediction pipeline that uses CEGMA to accurately identify a small, universally conserved set of gene models. These are then used to train parameters for AUGUSTUS, which we then apply using protein homologies and any available RNASeq and other transcript data as supporting evidence. In some cases, these internally-produced gene predictions are later replaced by a canonical set of models provided by the submitters. Updating an existing species in WormBase with a new assembly and/or gene-set presents additional challenges, because users rely on stable identifiers to track their entities of interest, which must be propagated forward to corresponding features in subsequent releases. For core species, identifiers are actively managed and tracked using our own curation software infrastructure. For non-core species, we use the Ensembl stable-identifier mapping software for this task. The principal way in which we draw information from multiple species together is by connecting genes via orthology and paralogy relationships to genes in other species (both nematode and other model organisms such as human, mouse and fly). As of WS230, we include relationships published by the following projects and resources: InParanoid (version 7); TreeFam (version 7); the Othologous Matrix Project (OMA, August 2009/08 version); OrthoMCL; PantherDB, (version 7); and Ensembl, (version 65). In addition, we curate orthology calls from the literature (e.g., Hillier et al., ref. 8) and direct submissions. We also use data in eggNOG (version 3.0) to cluster genes into functionally characterized homologous groups. These resources are inevitably based on snapshots of the gene models, taken at various times. For our core species however, particularly C. elegans, the gene models are in a state of flux, being revised and improved on the basis of the latest evidence. In order to infer up-to-date nematode homology relationships for the latest gene models, we run the Ensembl Compara GeneTree pipeline as part of the preparation for every WormBase release. The resulting gene trees are used to infer additional current orthology relationships to those obtained by import from the third-party resources and direct submission. One way in which we use the orthology relationships internally is to project WormBase-approved gene names onto orthologous gene(s) of other nematode species. For this a conservative approach is adopted: each proposed gene name is required to be supported by an unambiguous one to one orthology connection according to the majority of available source analyses. We also use Ensembl Compara DNA pipeline to produce whole-genome multiple alignments of all genomes in WormBase and derived genome conservation tracks (using GERP). However, as the genetic diversity of the species collection in WormBase continues to increase, a single multiple alignment for all nematodes becomes less appropriate. We therefore propose to replace it with a series of pairwise alignments, providing multiple alignments only for selected subsets of species.

Sequence Curation

WormBase adopts an anomaly-driven approach to curation, whereby discrepancies between current gene models and alignment data are identified and flagged as curation targets. We have implemented a software application (CurationTool) that identifies these discrepancies and scores them according to their degree of discordance, presenting the results to the curator using a graphical user interface. An in-depth discussion of CurationTool and our anomaly-driven curation is presented elsewhere. For protein-coding genes, WormBase curates only the protein-coding portion (CDS) of the full transcript. For our core species, we use the high-confidence subset of cDNA alignments overlaying the curated CDS models to infer a set of full-length transcripts (including 5′ and 3′ untranslated regions), using a custom algorithm (unpublished). In the past, the accuracy of this process has been sensitive to artifacts such as alignment errors or chimeric cDNAs, but we have recently improved the algorithm to take these factors into account. The primary line of evidence for gene model curation is transcript data. In addition to cDNAs deposited in the nucleotide archives, we draw data from numerous resources, publications and direct submissions. We also align all RNASeq data deposited in the Short Read Archive (SRA) to our core species using TopHat, and infer gene expression estimates for a variety of life stages and environmental conditions using Cufflinks. WormBase is committed to act as the ultimate repository for data coming from the nematode half of the modENCODE, project. Most data sets have been accessible via the genome browser since the summer of 2010. To extract the maximum utility from the data, it is integrated fully into our database, by extending the data models where necessary and adding full cross-referencing and connectivity with existing WormBase objects. To date, the focus for full integration has been on data sets with high impact on gene model and other sequence feature curation, namely: trans-splice sites; poly-A cleavage sites and untranslated regions;, large-scale EST sets (P. Green; data retrieved from nucleotide archives); mass-spectrometry peptide sequences; and RNASeq transcripts, and derived gene-predictions. The data of highest impact for curation has been the RNASeq transcriptome, and this has been used in a number of different ways. First, the modENCODE “genelets” (fragmentary gene models constructed using RNASeq data from 14 life stages) have been used to produce a new anomaly type for CurationTool that highlights potential cases where adjacent genes could be merged. To date, over three hundred cases displaying this anomaly have been scrutinized, of which approximately 35% resulted in a merge, and a further 10% some other change (for example the movement of an exon from one gene to another). Second, we have re-visited the source RNASeq data and analyzed it using the Tophat/Cufflnks pipeline, to identify candidate “RNASeq-splice” features. These can be used both to confirm introns already part of curated gene models, and also to suggest changes to existing gene models or new isoforms. Third, the strand bias characteristic of the modENCODE RNASeq alignments has been extremely useful for curators to resolve ambiguities in the definition of the 5′ and 3′ ends of genes. Finally, the modENCODE RNASeq data has allowed us to make corrections to the C. elegans reference genome itself. By taking proposed errors and verifying them using data from a private submission of high-throughput-sequencing (J. Ahringer and M. Berriman, pers. comm.), we have been able to make 156 genome sequence corrections (110 insertions, 44 deletions and 2 substitutions), resulting in the correction of 100 gene models. Additionally, since the data from modENCODE began to become available from the project Data Co-ordination Centre, the following data sets have been subjected to rigorous internal quality control and fully integrated into the database: ~300 Highly Occupied Target (HOT) regions; ~7,000 non-coding RNA genes; the probable parent for ~1,000 pseudogenes; and ~21,000 three-prime UTRs from the UTRome project. We will prioritise the incorporation of the transcription-factor binding site and chromatin accessibility data as soon as the final versions of these data sets are made available. We have also worked with groups performing their own analysis of the modENCODE data. For example, a study of the modENCODE RNASeq reads (T. Blumenthal, pers. comm.) has resulted in significant improvements to the operon data set. This has involved identifying cases where fewer than 5% of the trans-splice leader reads for “internal” genes (i.e., genes other than the first) were SL2 type, and modifying the gene content of the operons accordingly. In addition to modENCODE, we continue to draw in data from the scientific literature and direct submissions, often combining different data sources to assist in making correct predictions. The modENCODE poly-A site data has been supplemented with a corresponding data set from an independent study. These two data sets have only 25% redundancy, and over 80% of coding genes now have an annotated polyA site in WormBase. Gene predictions by genBlastG based on BLAST homologies to C. elegans proteins have also proved valuable for the curation of C. briggsae, C. brenneri, and C. remenei. We can assess gene-model accuracy in the presence of fragmentary transcript evidence by measuring the proportion of curated introns that are confirmed by spliced cDNA evidence. For WS230, the proportion of C. elegans curated CDS introns confirmed by traditional cDNA, modENCODE RNASeq and mass-spectrometry evidences is 83%, 88% and 14% respectively. Overall, 93% of curated introns are confirmed and 82% of CDS models have all of their introns confirmed by at least one of these three lines of evidence; the corresponding measurements for the final release prior to modENCODE (WS200, February 2009) were 74% and 56%, demonstrating the value of the project in increasing the accuracy and confidence of C. elegans gene models.

Intraspecies Variation

Similar to many other resources, WormBase captures within-species variation as differences (insertions, deletions and substitutions) with respect to the genome sequence of the reference strain. We expect variation data for many nematode species in the future, but at present almost all the data we house is for C. elegans. Historically, the majority of variation data we have processed has been from laboratory-manipulated strains. We maintain close working relationships and established data exchange protocols with the Caenorhabditis Genetics Center (CGC; www.cbs.umn.edu/CGC), the C. elegans Gene Knockout Consortium (GKC; www.celeganskoconsortium.omrf.org), and the National BioResource Project of Japan (NBRP; www.shigen.nig.ac.jp/c.elegans/index.jsp). We also curate variation data from individual user submissions; which although time-consuming, are often biologically important. There has recently been a rapid growth of C. elegans variation data generated by whole genome sequencing projects (refs. 50–54; Andersen et al., manuscript in preparation; Moerman and Waterston, manuscript in preparation). These data sets include an increasing number of variations from naturally-occurring wild-isolate strains. Motivated by community feedback, we have increased the clarity of our representation and display of this information. Every variation object processed by WormBase is assigned a unique, stable identifier with prefix “WBVar.” For laboratory-induced variations, we also assign a more directly informative public name comprised of a project/laboratory prefix (supplied by J. Hodgkin, pers. comm.) and a numerical suffix. For naturally occurring variations, the public name defaults to the WBVar identifier, making the distinction between these objects and the laboratory induced variations obvious and immediate. We now also collect non-sequence-based information for wild isolate strains (http://tazendra.caltech.edu/~azurebrd/cgi-bin/forms/wild_isolate.cgi). Compared with laboratory-manipulated strains, there is additional information to capture about the wild isolates, such as isolation location, the condition in which it was found, and details of how it was isolated. Many wild isolates are not stocked at the CGC, and WormBase acts as the central data repository for these strains. WormBase does not have a mandate to act as a permanent repository for variation data, and as the volume of these data sets continues to rapidly increase, we become less adequately resourced to perform this function. Projects are therefore encouraged to submit their data to the NCBI’s Database of Short Genetic Variations (dbSNP), an established archive for variation data. We act as a submission broker in cases where a laboratory lacks the technical resources to conform to the dbSNP submission protocols. To date, data from six projects have been integrated into WormBase and submitted to dbSNP. WormBase adds value to these data sets by performing additional analysis and placing them into context with other data types (e.g., Gene). Variations are most often submitted to WormBase as a molecular change at given location in a specific version of the reference genome sequence. As part of the curation, we capture and record a short flanking sequence either side of the variation feature, disassociating it from a specific version of the reference genome. Each release, we re-map all variations and re-calculate potential consequences of the molecular changes (e.g., non-sense, mis-sense or silent protein-coding mutation) on the latest gene models.

Release Cycle and Database Build

WormBase is released every two months, with the preparation for a release beginning three months in advance. This release cycle can give rise to variability in the time between a curator transaction (e.g., the update of a gene name, correction of an error, or the import of a new data set) and its availability on the WormBase website. The delay can be as short as three months (if the change is made immediately before we start building the release) and as long as five months (if made immediately after, in which case it will not be public until the following release). Building a WormBase database release is a complicated process, the broad stages of which can be described as: (1) data freeze, where each contributing consortium partner takes a snap-shot of the database(s) in which their curation data are stored; (2) data collation, where the curation database snap-shots are brought together into a single database; (3) submission of updated annotation on core species to the International Nucleotide Sequence Database Collaboration, to ensure that the representation of core nematode data in the nucleotide and protein archives is up-to-date; (4) mapping of sequence data (e.g., cDNAs, microarray probes, sequence features, variations) to the genome; (5) establishing connections between objects of different types (e.g., RNAi to Gene), usually via genomic location; (6) the large-scale computational analyses discussed earlier, such as homology detection and whole-genome alignment; and (7) quality control and assurance. For the more complicated parts of the build process, we deploy two components of the Ensembl system for the management and tracking of computational pipelines: ensembl-pipeline for homology analysis and eHive for comparative analysis. The key features of these systems are (1) automatic re-run of tasks that have failed; and (2) user-definition of a sub-task dependency graph for a process, allowing complex pipelines to be run with minimal user intervention. These systems are critical in enabling us to produce the database in a regular and timely manner. Each stage of the database production is subject to a suite of integrity checks to ensure that it has completed cleanly and without error. For example, we compare the number of objects in each data class with the count at the corresponding stage in the previous release. Major discrepancies are flagged for investigation. This mechanism has proved to be extremely effective in catching errors and process failures as soon as they occur.

Summary

WormBase is facing a deluge of data from many nematode genome sequencing projects, and we have prepared for this by putting into place annotation and integration pipelines and workflows that will allow the data to be analyzed and presented in a timely and consistent manner. As ever, we welcome feedback and ideas from our user-base as part of the continued development of the resource. We are currently particularly interested in suggestions on how we can maximise the utility of housing a broad representation of the nematode phylum, and what comparative genomics services and views users would find most useful. Users can contact the developers at help@wormbase.org with their suggestions.
  57 in total

1.  The Ensembl analysis pipeline.

Authors:  Simon C Potter; Laura Clarke; Val Curwen; Stephen Keenan; Emmanuel Mongin; Stephen M J Searle; Arne Stabenau; Roy Storey; Michele Clamp
Journal:  Genome Res       Date:  2004-05       Impact factor: 9.043

2.  The Pristionchus pacificus genome provides a unique perspective on nematode lifestyle and parasitism.

Authors:  Christoph Dieterich; Sandra W Clifton; Lisa N Schuster; Asif Chinwalla; Kimberly Delehaunty; Iris Dinkelacker; Lucinda Fulton; Robert Fulton; Jennifer Godfrey; Pat Minx; Makedonka Mitreva; Waltraud Roeseler; Huiyu Tian; Hanh Witte; Shiaw-Pyng Yang; Richard K Wilson; Ralf J Sommer
Journal:  Nat Genet       Date:  2008-09-21       Impact factor: 38.330

Review 3.  The genomes of root-knot nematodes.

Authors:  David McK Bird; Valerie M Williamson; Pierre Abad; James McCarter; Etienne G J Danchin; Philippe Castagnone-Sereno; Charles H Opperman
Journal:  Annu Rev Phytopathol       Date:  2009       Impact factor: 13.078

4.  Ensembl 2011.

Authors:  Paul Flicek; M Ridwan Amode; Daniel Barrell; Kathryn Beal; Simon Brent; Yuan Chen; Peter Clapham; Guy Coates; Susan Fairley; Stephen Fitzgerald; Leo Gordon; Maurice Hendrix; Thibaut Hourlier; Nathan Johnson; Andreas Kähäri; Damian Keefe; Stephen Keenan; Rhoda Kinsella; Felix Kokocinski; Eugene Kulesha; Pontus Larsson; Ian Longden; William McLaren; Bert Overduin; Bethan Pritchard; Harpreet Singh Riat; Daniel Rios; Graham R S Ritchie; Magali Ruffier; Michael Schuster; Daniel Sobral; Giulietta Spudich; Y Amy Tang; Stephen Trevanion; Jana Vandrovcova; Albert J Vilella; Simon White; Steven P Wilder; Amonida Zadissa; Jorge Zamora; Bronwen L Aken; Ewan Birney; Fiona Cunningham; Ian Dunham; Richard Durbin; Xosé M Fernández-Suarez; Javier Herrero; Tim J P Hubbard; Anne Parker; Glenn Proctor; Jan Vogel; Stephen M J Searle
Journal:  Nucleic Acids Res       Date:  2010-11-02       Impact factor: 16.971

5.  OrthoMCL-DB: querying a comprehensive multi-species collection of ortholog groups.

Authors:  Feng Chen; Aaron J Mackey; Christian J Stoeckert; David S Roos
Journal:  Nucleic Acids Res       Date:  2006-01-01       Impact factor: 16.971

6.  TreeFam: a curated database of phylogenetic trees of animal gene families.

Authors:  Heng Li; Avril Coghlan; Jue Ruan; Lachlan James Coin; Jean-Karim Hériché; Lara Osmotherly; Ruiqiang Li; Tao Liu; Zhang Zhang; Lars Bolund; Gane Ka-Shu Wong; Weimou Zheng; Paramvir Dehal; Jun Wang; Richard Durbin
Journal:  Nucleic Acids Res       Date:  2006-01-01       Impact factor: 16.971

7.  InParanoid 7: new algorithms and tools for eukaryotic orthology analysis.

Authors:  Gabriel Ostlund; Thomas Schmitt; Kristoffer Forslund; Tina Köstler; David N Messina; Sanjit Roopra; Oliver Frings; Erik L L Sonnhammer
Journal:  Nucleic Acids Res       Date:  2009-11-05       Impact factor: 16.971

8.  TopHat: discovering splice junctions with RNA-Seq.

Authors:  Cole Trapnell; Lior Pachter; Steven L Salzberg
Journal:  Bioinformatics       Date:  2009-03-16       Impact factor: 6.937

9.  Phylogeny of the nematode genus Pristionchus and implications for biodiversity, biogeography and the evolution of hermaphroditism.

Authors:  Werner E Mayer; Matthias Herrmann; Ralf J Sommer
Journal:  BMC Evol Biol       Date:  2007-07-02       Impact factor: 3.260

10.  Comparison of C. elegans and C. briggsae genome sequences reveals extensive conservation of chromosome organization and synteny.

Authors:  LaDeana W Hillier; Raymond D Miller; Scott E Baird; Asif Chinwalla; Lucinda A Fulton; Daniel C Koboldt; Robert H Waterston
Journal:  PLoS Biol       Date:  2007-07-03       Impact factor: 8.029

View more
  7 in total

1.  Mixing genome annotation methods in a comparative analysis inflates the apparent number of lineage-specific genes.

Authors:  Caroline M Weisman; Andrew W Murray; Sean R Eddy
Journal:  Curr Biol       Date:  2022-05-18       Impact factor: 10.900

2.  The EMBL-EBI bioinformatics web and programmatic tools framework.

Authors:  Weizhong Li; Andrew Cowley; Mahmut Uludag; Tamer Gur; Hamish McWilliam; Silvano Squizzato; Young Mi Park; Nicola Buso; Rodrigo Lopez
Journal:  Nucleic Acids Res       Date:  2015-04-06       Impact factor: 16.971

3.  MiRNATIP: a SOM-based miRNA-target interactions predictor.

Authors:  Antonino Fiannaca; Massimo La Rosa; Laura La Paglia; Riccardo Rizzo; Alfonso Urso
Journal:  BMC Bioinformatics       Date:  2016-09-22       Impact factor: 3.169

4.  WormBase ParaSite - a comprehensive resource for helminth genomics.

Authors:  Kevin L Howe; Bruce J Bolt; Myriam Shafie; Paul Kersey; Matthew Berriman
Journal:  Mol Biochem Parasitol       Date:  2016-11-27       Impact factor: 1.759

5.  WormBase 2017: molting into a new stage.

Authors:  Raymond Y N Lee; Kevin L Howe; Todd W Harris; Valerio Arnaboldi; Scott Cain; Juancarlos Chan; Wen J Chen; Paul Davis; Sibyl Gao; Christian Grove; Ranjana Kishore; Hans-Michael Muller; Cecilia Nakamura; Paulo Nuin; Michael Paulini; Daniela Raciti; Faye Rodgers; Matt Russell; Gary Schindelman; Mary Ann Tuli; Kimberly Van Auken; Qinghua Wang; Gary Williams; Adam Wright; Karen Yook; Matthew Berriman; Paul Kersey; Tim Schedl; Lincoln Stein; Paul W Sternberg
Journal:  Nucleic Acids Res       Date:  2018-01-04       Impact factor: 16.971

6.  Hybrid de novo whole-genome assembly and annotation of the model tapeworm Hymenolepis diminuta.

Authors:  Robert M Nowak; Jan P Jastrzębski; Wiktor Kuśmirek; Rusłan Sałamatin; Małgorzata Rydzanicz; Agnieszka Sobczyk-Kopcioł; Anna Sulima-Celińska; Łukasz Paukszto; Karol G Makowczenko; Rafał Płoski; Vasyl V Tkach; Katarzyna Basałaj; Daniel Młocicki
Journal:  Sci Data       Date:  2019-12-03       Impact factor: 6.444

7.  WormBase 2014: new views of curated biology.

Authors:  Todd W Harris; Joachim Baran; Tamberlyn Bieri; Abigail Cabunoc; Juancarlos Chan; Wen J Chen; Paul Davis; James Done; Christian Grove; Kevin Howe; Ranjana Kishore; Raymond Lee; Yuling Li; Hans-Michael Muller; Cecilia Nakamura; Philip Ozersky; Michael Paulini; Daniela Raciti; Gary Schindelman; Mary Ann Tuli; Kimberly Van Auken; Daniel Wang; Xiaodong Wang; Gary Williams; J D Wong; Karen Yook; Tim Schedl; Jonathan Hodgkin; Matthew Berriman; Paul Kersey; John Spieth; Lincoln Stein; Paul W Sternberg
Journal:  Nucleic Acids Res       Date:  2013-11-04       Impact factor: 16.971

  7 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.