| Literature DB >> 29092050 |
Paul Julian Kersey1, James E Allen1, Alexis Allot1, Matthieu Barba1, Sanjay Boddu1, Bruce J Bolt1, Denise Carvalho-Silva1, Mikkel Christensen1, Paul Davis1, Christoph Grabmueller1, Navin Kumar1, Zicheng Liu1, Thomas Maurel1, Ben Moore1, Mark D McDowall1, Uma Maheswari1, Guy Naamati1, Victoria Newman1, Chuang Kee Ong1, Michael Paulini1, Helder Pedro1, Emily Perry1, Matthew Russell1, Helen Sparrow1, Electra Tapanari1, Kieron Taylor1, Alessandro Vullo1, Gareth Williams1, Amonida Zadissia1, Andrew Olson2, Joshua Stein2, Sharon Wei2, Marcela Tello-Ruiz2, Doreen Ware2,3, Aurelien Luciani1, Simon Potter1, Robert D Finn1, Martin Urban4, Kim E Hammond-Kosack4, Dan M Bolser1, Nishadi De Silva1, Kevin L Howe1, Nicholas Langridge1, Gareth Maslen1, Daniel Michael Staines1, Andrew Yates1.
Abstract
Ensembl Genomes (http://www.ensemblgenomes.org) is an integrating resource for genome-scale data from non-vertebrate species, complementing the resources for vertebrate genomics developed in the Ensembl project (http://www.ensembl.org). Together, the two resources provide a consistent set of programmatic and interactive interfaces to a rich range of data including genome sequence, gene models, transcript sequence, genetic variation, and comparative analysis. This paper provides an update to the previous publications about the resource, with a focus on recent developments and expansions. These include the incorporation of almost 20 000 additional genome sequences and over 35 000 tracks of RNA-Seq data, which have been aligned to genomic sequence and made available for visualization. Other advances since 2015 include the release of the database in Resource Description Framework (RDF) format, a large increase in community-derived curation, a new high-performance protein sequence search, additional cross-references, improved annotation of non-protein-coding genes, and the launch of pre-release and archival sites. Collectively, these changes are part of a continuing response to the increasing quantity of publicly-available genome-scale data, and the consequent need to archive, integrate, annotate and disseminate these using automated, scalable methods.Entities:
Mesh:
Substances:
Year: 2018 PMID: 29092050 PMCID: PMC5753204 DOI: 10.1093/nar/gkx1011
Source DB: PubMed Journal: Nucleic Acids Res ISSN: 0305-1048 Impact factor: 16.971
Growth in Ensembl Genomes 2015–2017
| Release version | Date | Number of genomes | ||||
|---|---|---|---|---|---|---|
| Number in brackets indicates genomes directly imported from INSDC. | ||||||
| Ensembl Bacteria | Ensembl Protists | Ensembl Fungi | Ensembl Plants | Ensembl Metazoa | ||
| 28 | August 2015 | 23 001 (23,001) | 133 (101) | 407 (359) | 41 (8) | 55 (1) |
| 37 | September 2017 | 44 048 (44,048) | 189 (157) | 811 (760) | 45 (12) | 68 (2) |
| Increase | 21 047 (21,047) | 56 (56) | 404 (401) | 4 (4) | 13 (1) | |
RNA-seq alignment tracks by division
| Division | Tracks | Experiments | Species |
|---|---|---|---|
| Protists | 71 | 36 | 3 |
| Fungi | 6384 | 4822 | 24 |
| Plants | 29 836 | 1418 | 43 |
| Metazoa | 198 | 105 | 34 |
Figure 1.Data discovery and visualisation of alignment data in Ensembl Plants. While browsing the genome, a user can follow the ‘Custom tracks’ option in the left-hand menu (panel A). This gives access to a menu that searches the track hub registry for hubs that are anchored on this genome and match a specified metadata search term (panel B). Having selected a hub, the user is led through to a second menu where the tracks contained in the hub can be figured for display (panel C). When the configuration is complete, the selected tracks appear in the browser as selected (panel D).
Figure 2.Hidden Markov Model search integrated into Ensembl Genomes. Results from a protein search using Hidden Markov Models, implemented using HMMer3, in Ensembl Genomes. Various options are available in a tabbed display including (i) a description of the domain architecture of the query sequence, and a graphical summary of the distribution of the significance scores of matches against the library (panel A) (ii) a breakdown of matching library sequences ordered by domain architecture (panel B).