| Literature DB >> 23203883 |
Guy Cochrane1, Blaise Alako, Clara Amid, Lawrence Bower, Ana Cerdeño-Tárraga, Iain Cleland, Richard Gibson, Neil Goodgame, Mikyung Jang, Simon Kay, Rasko Leinonen, Xiu Lin, Rodrigo Lopez, Hamish McWilliam, Arnaud Oisel, Nima Pakseresht, Swapna Pallreddy, Youngmi Park, Sheila Plaister, Rajesh Radhakrishnan, Stephane Rivière, Marc Rossello, Alexander Senf, Nicole Silvester, Dmitriy Smirnov, Petra Ten Hoopen, Ana Toribio, Daniel Vaughan, Vadim Zalunin.
Abstract
The European Nucleotide Archive (ENA; http://www.ebi.ac.uk/ena/) collects, maintains and presents comprehensive nucleic acid sequence and related information as part of the permanent public scientific record. Here, we provide brief updates on ENA content developments and major service enhancements in 2012 and describe in more detail two important areas of development and policy that are driven by ongoing growth in sequencing technologies. First, we describe the ENA data warehouse, a resource for which we provide a programmatic entry point to integrated content across the breadth of ENA. Second, we detail our plans for the deployment of CRAM data compression technology in ENA.Entities:
Mesh:
Year: 2012 PMID: 23203883 PMCID: PMC3531187 DOI: 10.1093/nar/gks1175
Source DB: PubMed Journal: Nucleic Acids Res ISSN: 0305-1048 Impact factor: 16.971
Figure 1.Accumulation of raw data into ENA. The figure shows the accumulation of raw next generation sequence data into public data repositories. In addition to unrestricted data available in ENA, the figure also shows the accumulation of sequence data into the European Bioinformatics Institute's, European Genome-phenome Archive, a repository for human molecular medicine research-related data requiring authorized access for ethical reasons (http://www.ebi.ac.uk/ega/). Data points for expected output of global cancer genome sequencing projects are shown.
Major service developments in 2012
| Development | Utility |
|---|---|
| 11 face-to-face training events and creation of new training materials | |
| Application-specific checklists | Framework for community developers to develop specifications for external applications and pipelines for ENA validation and presentation of data (e.g. |
| Checklists for annotated sequence submissions | Checklists that define Webin submission templates for annotated sequences ( |
| Creation of genome assembly storage layer | Improved and more granular support for assembly information including compliance with AGP2 ( |
| Creation of search warehouse and launch of ENA query builder and search | See below and |
| Extension of ENA sequence search to include reference sequence data directly from Ensembl and Ensembl Genomes | Provision of rapid interactive and programmatic sequence search service to all sequences in ENA and the Ensembl and Ensembl Genomes resources ( |
| Inclusion of capillary sequencing instruments into the raw sequence storage layer (SRA) | Raw capillary sequence submissions are now supported in the Sequence Read Archive (SRA) storage layer in ENA ( |
| Inclusion of sample checklist support for raw data submissions in Webin | Provision of tabular sample checklist functionality in the Webin system that greatly simplifies the submission of sample annotation for different sample types ( |
| Launch of raw sequence data file uploader/downloader utility | Support for simple and managed upload and download of raw sequence data files—available as interactive and command line clients (e.g., |
| New services for genome assembly submissions | Tools and workflow for the submission of assembly information ( |
| Release of CRAM sequence data compression technology | See below and |
| Simplification of transcript shotgun assembly submissions | Simplification of submission process for the growing body of transcriptomes assembled from shotgun data ( |
Search ‘domains’ and ‘results’
| Domain | Domain description | Result | Result data structure | Result description |
|---|---|---|---|---|
| Assembly | Genome and transcriptome assembly information | Assembly | EMBL-Bank and assembly database | Genome assemblies from contig upwards |
| Sequence | Assembled and (optionally) annotated sequence | Sequence_release | EMBL-Bank | Sequence from the latest EMBL-Bank release |
| Sequence_update | EMBL-Bank | Sequences from the EMBL-Bank update product, covering and modified and new entries since last release | ||
| Study | Information relating to a sequencing-based scientific investigation; a unit of scientific output | Study | SRA study | Large-scale sequencing project that comprises assembled/annotated sequences |
| Coding | Sequences believed to encode proteins | Sequence_coding | ENA-CDS | Submitter-annotated protein-coding sequences |
| Analysis | Primary interpretations of raw read data, such as alignments, OTU tables and genome tracks | Analysis | SRA analysis | Read analysis objects |
| Analysis_sample | SRA analysis | Read analysis objects grouped by sequenced sample | ||
| Analysis_study | SRA analysis | Read analysis objects grouped by study | ||
| Read | Raw sequencing data from next generation platforms | Read_run | SRA run | Read data |
| Read_experiment | SRA run | Read data grouped by experiment objects | ||
| Read_sample | SRA run | Read data grouped by sequenced samples | ||
| Read_study | SRA run | Read data grouped by study objects | ||
| Taxon | Information relating to taxa | Taxon | ENA-taxonomy | Data in the ENA Taxonomy |
| Trace | Raw sequencing data from capillary platforms | Read_trace | Trace archive | Raw capillary read data |
Figure 2.Query builder interface screenshot. The figure shows the query builder graphical user interface for the ENA warehouse search service. This interface is available from the ‘Advanced search’ tab visible across the ENA browser.