| Literature DB >> 26615190 |
Richard Gibson1, Blaise Alako2, Clara Amid2, Ana Cerdeño-Tárraga2, Iain Cleland2, Neil Goodgame2, Petra Ten Hoopen2, Suran Jayathilaka2, Simon Kay2, Rasko Leinonen2, Xin Liu2, Swapna Pallreddy2, Nima Pakseresht2, Jeena Rajan2, Marc Rosselló2, Nicole Silvester2, Dmitriy Smirnov2, Ana Luisa Toribio2, Daniel Vaughan2, Vadim Zalunin2, Guy Cochrane2.
Abstract
The European Nucleotide Archive (ENA; http://www.ebi.ac.uk/ena) is a repository for the submission, maintenance and presentation of nucleotide sequence data and related sample and experimental information. In this article we report on ENA in 2015 regarding general activity, notable published data sets and major achievements. This is followed by a focus on sustainable biocuration of functional annotation, an area which has particularly felt the pressure of sequencing growth. The importance of functional annotation, how it can be submitted and the shifting role of the biocurator in the context of increasing volumes of data are all discussed.Entities:
Mesh:
Year: 2015 PMID: 26615190 PMCID: PMC4702917 DOI: 10.1093/nar/gkv1311
Source DB: PubMed Journal: Nucleic Acids Res ISSN: 0305-1048 Impact factor: 16.971
Figure 1.Example of Feature Table annotation for an HLA-A gene (taken from ENA accession LN873232). See the text for details.
Annotation Checklists currently available for submitting simple annotations and marker sequences
| Type | Name | Description |
|---|---|---|
| Frequently used | rRNA gene | For ribosomal RNA genes from prokaryotic, nuclear or organellar DNA. All rRNAs are considered partial. |
| Single CDS genomic DNA | For complete or partial coding sequence (CDS) derived from genomic DNA. This checklist will not accept segmented genes (i.e. with intron regions) so should be used for prokaryotic, organellar genes or for submitting a single exon. | |
| Single CDS mRNA | For complete or partial single coding sequence (CDS) derived from mRNA (via cDNA). Do NOT use for submission of VIRTUAL transcripts (TSA or Unigene clusters)—use TSA CDS Annotated checklist. | |
| Multi-Exon Gene | For the submission of single complete or partial multi-exon genes from | |
| MHC gene 1 exon | For partial MHC class I or II antigens containing one exon ONLY. | |
| MHC gene 2 exons | For partial MHC class I or II antigens containing two exons ONLY. An intron feature should only be used when the intron region has actually been sequenced. If the intron has not been sequenced, or only partially sequenced, please fill the non-sequenced gap with 100 Ns. | |
| ncRNA | For non-coding RNA (ncRNA) transcripts or single-exon genes of prokaryotic or eukaryotic origin with the exception of the ribosomal RNA (rRNA) and transfer RNA (tRNA). | |
| Satellite DNA | For submission of Satellites, Microsatellites and Minisatellites. Complete or partial single polymorphic locus present in nuclear and organellar DNA that consists of short sequences repeated in tandem arrays. | |
| Mobile Element | For the submission of a single complete or partial mobile element. This checklist captures the mobile element feature but does not allow for granular annotation of component parts, such as coding regions, repeat regions and miscellaneous features within the mobile element itself. If precise annotation or translation is required, please use an alternative submission route. | |
| Gene Promoter | For submission of uni- or bi-directional gene promoter regions. Please note that CDS is not annotated; if you wish to include the start of the coding region(s), please leave a comment with the coordinates of the start site(s). | |
| Marker sequence | COI gene | For mitochondrial cytochrome oxidase subunit 1 genes. |
| ITS rDNA | For ITS rDNA region. This checklist allows generic annotation of the ITS components (18S rRNA, ITS1, 5.8S rRNA, ITS2 and 28S rRNA). For annotation of the rRNA component only, please use the rRNA gene checklist. | |
| trnK-matK locus | For complete or partial matK gene within the chloroplast trnK gene intron. | |
| Phylogenetic Marker | For the submission of the following markers: actin (act), tubulin (tuba or tubb), calmodulin (CaM), RNA polymerase II large subunits (RPB1 and RPB2), translation elongation factor 1-alpha (tef1a), glyceraldehyde 3-phosphate dehydrogenase (GAPDH) and histone 3 (H3) where the intron/exon boundaries are not known. | |
| Multi-Locus Marker | For the submission of multi-locus markers (e.g. tRNA + CDS + rRNA) from | |
| D-Loop | For mitochondrial D-loop (control region) sequences. All D-loops are considered partial. | |
| Intergenic Spacer, IGS | For intergenic spacer (IGS) sequences between neighbouring genes (e.g. psbA-trnH IGS, 16S-23S rRNA IGS). Inclusion of the flanking genes is allowed. | |
| Gene intron | For complete or partial single gene intron. | |
| External Transcribed Spacer (ETS) | For submission of External Transcribed Spacer (ETS) regions of the eukaryotic rDNA transcript; a region often used to study intrageneric relationships. | |
| 16S-23S Intergenic Spacer Region | For submission of the 16S-23S rRNA intergenic spacer region: the transcribed spacer between the 16S rR NA and 23S rRNA genes of rRNA operons, found in prokaryotes and organelles. | |
| Virus-specific | Single Viral CDS | For complete or partial single coding sequence (CDS) from a viral gene. Please do not use for peptides processed from polyproteins or proviral sequences, as these are all annotated differently. |
| Viral Polyprotein | For complete or partial viral polyprotein genes where the mature peptide boundaries remain undefined. This template is not suitable for proviral sequences. If the sequences contain ribosomal frameshifts, please contact us. | |
| ssRNA(-) Viral copy RNA | For complete or partial viral copy RNA (cRNA) sequences, complementary to ssRNA(-) virus genomes. Only one CDS can be added; further CDS information should be provided in the curator comments section. | |
| Viral Untranslated Region (UTR) | For complete or partial untranslated region (UTR) or nontranslated region (NTR) found at the termini of viral genomes. Please do not use this checklist for submitting virus genomes or viral coding genes. | |
| Alphasatellite sub-viral particle | For submission of circular single stranded DNA alphasatellite sequences associated with Begomovirus, Babuvirus and Nanovirus. | |
| Betasatellite sub-viral particle | For submission of circular single stranded DNA betasatellite sequences of the Begomovirus genus. | |
| Plant Viroid | For complete circular ssRNA plant viroid sequences. Please do not use for other circular viruses. | |
| Standards-Compliant | BARCODE COI | For Metazoan mitochondrial cytochrome oxidase subunit 1 (COI) genes that provide unique species-level identification and conform to Consortium for the Barcode of Life (CBoL) standards. |
| GSC MIMARKS-Survey 16S rRNA sequences | For the submission of 16S rRNA (gDNA) sequences compliant with the GSC MIMARKS 4.0 standard. Users of this checklist must first submit their samples here: | |
| Large-scale data | Expressed Sequence Tag (EST) | For submission of Sanger-sequenced Expressed Sequence Tags (ESTs). ESTs are short transcripts ≈500–800 bp long usually of low quality as they are the result of only single pass reads. No feature annotation is recorded on ESTs. |
| Sequence Tagged Site (STS) | For submission of Sequence Tagged Sites (STS). The Sequence Tagged Site (STS) is a relatively short, easily PCR-amplified sequence (200–500 bp) which can be specifically amplified by PCR and detected in the presence of all other genomic sequences and whose location in the genome is mapped. | |
| Genome Survey Sequence (GSS) | For submission of Genome Survey Sequences (GSS). These are short DNA sequences which inlude: random single pass genome survey sequences, single pass reads from cosmid/BAC/YAC ends (may be chromosome specific), exon trapped genomic sequences, Alu PCR sequences and transposon-tagged sequences. | |
| Transcriptome Shotgun Assembly (TSA)—Unannotated | For submission of virtual transcript assemblies (TSA, EST clusters) without feature annotation. IMPORTANT INFORMATION: virtual transcripts can ONLY be hosted with supporting evidence from raw experimental data. The raw reads should therefore be submitted to Read domain prior to the assembly being submitted as well as an alignment BAM file demonstrating how the raw reads are mapped to the transcripts. Please email | |
| Transcriptome Shotgun Assembly (TSA)—CDS Annotated | For submission of virtual transcript assemblies (TSA, EST clusters) with CDS annotation. IMPORTANT INFORMATION: virtual transcripts can ONLY be hosted with supporting evidence from raw experimental data. The raw reads should therefore be submitted to the Read domain prior to the assembly being submitted as well as an alignment BAM file demonstrating how the raw reads are mapped to the transcripts. Please email |
More complex annotation should be submitted in an ENA-format flat file using ‘Entry Upload’ option.
Figure 2.Submission workflow for functional annotation submissions under sustainable biocuration. The top level shows the flow through the submitter interface. The bottom level shows those biocurator roles which directly influence functional annotation submissions in ENA. Linking the work of the biocurator with the submitter interface is the autonomous system represented by the middle level.