| Literature DB >> 34791404 |
Fiona Cunningham1, James E Allen1, Jamie Allen1, Jorge Alvarez-Jarreta1, M Ridwan Amode1, Irina M Armean1, Olanrewaju Austine-Orimoloye1, Andrey G Azov1, If Barnes1, Ruth Bennett1, Andrew Berry1, Jyothish Bhai1, Alexandra Bignell1, Konstantinos Billis1, Sanjay Boddu1, Lucy Brooks1, Mehrnaz Charkhchi1, Carla Cummins1, Luca Da Rin Fioretto1, Claire Davidson1, Kamalkumar Dodiya1, Sarah Donaldson1, Bilal El Houdaigui1, Tamara El Naboulsi1, Reham Fatima1, Carlos Garcia Giron1, Thiago Genez1, Jose Gonzalez Martinez1, Cristina Guijarro-Clarke1, Arthur Gymer1, Matthew Hardy1, Zoe Hollis1, Thibaut Hourlier1, Toby Hunt1, Thomas Juettemann1, Vinay Kaikala1, Mike Kay1, Ilias Lavidas1, Tuan Le1, Diana Lemos1, José Carlos Marugán1, Shamika Mohanan1, Aleena Mushtaq1, Marc Naven1, Denye N Ogeh1, Anne Parker1, Andrew Parton1, Malcolm Perry1, Ivana Piližota1, Irina Prosovetskaia1, Manoj Pandian Sakthivel1, Ahamed Imran Abdul Salam1, Bianca M Schmitt1, Helen Schuilenburg1, Dan Sheppard1, José G Pérez-Silva1, William Stark1, Emily Steed1, Kyösti Sutinen1, Ranjit Sukumaran1, Dulika Sumathipala1, Marie-Marthe Suner1, Michal Szpak1, Anja Thormann1, Francesca Floriana Tricomi1, David Urbina-Gómez1, Andres Veidenberg1, Thomas A Walsh1, Brandon Walts1, Natalie Willhoft1, Andrea Winterbottom1, Elizabeth Wass1, Marc Chakiachvili1, Bethany Flint1, Adam Frankish1, Stefano Giorgetti1, Leanne Haggerty1, Sarah E Hunt1, Garth R IIsley1, Jane E Loveland1, Fergal J Martin1, Benjamin Moore1, Jonathan M Mudge1, Matthieu Muffato1, Emily Perry1, Magali Ruffier1, John Tate1, David Thybert1, Stephen J Trevanion1, Sarah Dyer1, Peter W Harrison1, Kevin L Howe1, Andrew D Yates1, Daniel R Zerbino1, Paul Flicek1.
Abstract
Ensembl (https://www.ensembl.org) is unique in its flexible infrastructure for access to genomic data and annotation. It has been designed to efficiently deliver annotation at scale for all eukaryotic life, and it also provides deep comprehensive annotation for key species. Genomes representing a greater diversity of species are increasingly being sequenced. In response, we have focussed our recent efforts on expediting the annotation of new assemblies. Here, we report the release of the greatest annual number of newly annotated genomes in the history of Ensembl via our dedicated Ensembl Rapid Release platform (http://rapid.ensembl.org). We have also developed a new method to generate comparative analyses at scale for these assemblies and, for the first time, we have annotated non-vertebrate eukaryotes. Meanwhile, we continually improve, extend and update the annotation for our high-value reference vertebrate genomes and report the details here. We have a range of specific software tools for specific tasks, such as the Ensembl Variant Effect Predictor (VEP) and the newly developed interface for the Variant Recoder. All Ensembl data, software and tools are freely available for download and are accessible programmatically.Entities:
Mesh:
Year: 2022 PMID: 34791404 PMCID: PMC8728283 DOI: 10.1093/nar/gkab1049
Source DB: PubMed Journal: Nucleic Acids Res ISSN: 0305-1048 Impact factor: 16.971
Figure 1.Homology annotation pipeline. (A) For each query genome, the pipeline starts with the identification of the correct genome representative set based on the query genome taxonomy. Then the query genome and the representative genomes are compared using Diamond-Blast. The Diamond-Blast output is analysed to identify reciprocal best hit (RBH) or best hit (BH) if no RBH is found against the query genome or representative genomes. The RBH and BH homology relationships are then stored in a per species Compara database. These data are then displayed in the homology view of the Rapid Release platform. (B) List of representative genomes shared by all the representative sets. (C) The set of representative genomes used for the bony fish (actinopterygii) are here in larger font. Their selection has been based on community usage and annotation quality. The red and blue branches define clusters of actinopterygii subclades identified by their branch length. At least one representative per subclade has been selected. (D) The sets of representative genomes that are currently available in Ensembl.
Figure 2.Entity viewer with transcript filtering and sorting. Screenshot of entity viewer showing protein coding transcripts available for BRCA2 (ENSG00000139618.17, GRCh38), ordered according to the number of exons. Information about the selected gene is displayed in the right-hand menu with cross references available by clicking on ‘External references’. Multiple functions are available from the right-hand control strip including a gene search (magnifying glass icon), recently visited genes (bookmark icon) and sequence download (download icon). Additional information about each transcript can be accessed from the ‘More information’ link, and sequence data for a single transcript can be accessed from the ‘Download’ link.