| Literature DB >> 34791415 |
Andrew D Yates1, James Allen1, Ridwan M Amode1, Andrey G Azov1, Matthieu Barba1, Andrés Becerra1, Jyothish Bhai1, Lahcen I Campbell1, Manuel Carbajo Martinez1, Marc Chakiachvili1, Kapeel Chougule2, Mikkel Christensen1, Bruno Contreras-Moreira1, Alayne Cuzick3, Luca Da Rin Fioretto1, Paul Davis1, Nishadi H De Silva1, Stavros Diamantakis1, Sarah Dyer1, Justin Elser4, Carla V Filippi1,5,6, Astrid Gall1, Dionysios Grigoriadis1, Cristina Guijarro-Clarke1, Parul Gupta4, Kim E Hammond-Kosack3, Kevin L Howe1, Pankaj Jaiswal4, Vinay Kaikala1, Vivek Kumar2, Sunita Kumari2, Nick Langridge1, Tuan Le1, Manuel Luypaert1, Gareth L Maslen1, Thomas Maurel1, Benjamin Moore1, Matthieu Muffato1, Aleena Mushtaq1, Guy Naamati1, Sushma Naithani4, Andrew Olson2, Anne Parker1, Michael Paulini1, Helder Pedro1, Emily Perry1, Justin Preece4, Mark Quinton-Tulloch1, Faye Rodgers7, Marc Rosello1, Magali Ruffier1, James Seager3, Vasily Sitnik1, Michal Szpak1, John Tate1, Marcela K Tello-Ruiz2, Stephen J Trevanion1, Martin Urban3, Doreen Ware2,8, Sharon Wei2, Gary Williams1, Andrea Winterbottom1, Magdalena Zarowiecki1, Robert D Finn1, Paul Flicek1.
Abstract
Ensembl Genomes (https://www.ensemblgenomes.org) provides access to non-vertebrate genomes and analysis complementing vertebrate resources developed by the Ensembl project (https://www.ensembl.org). The two resources collectively present genome annotation through a consistent set of interfaces spanning the tree of life presenting genome sequence, annotation, variation, transcriptomic data and comparative analysis. Here, we present our largest increase in plant, metazoan and fungal genomes since the project's inception creating one of the world's most comprehensive genomic resources and describe our efforts to reduce genome redundancy in our Bacteria portal. We detail our new efforts in gene annotation, our emerging support for pangenome analysis, our efforts to accelerate data dissemination through the Ensembl Rapid Release resource and our new AlphaFold visualization. Finally, we present details of our future plans including updates on our integration with Ensembl, and how we plan to improve our support for the microbial research community. Software and data are made available without restriction via our website, online tools platform and programmatic interfaces (available under an Apache 2.0 license). Data updates are synchronised with Ensembl's release cycle.Entities:
Mesh:
Year: 2022 PMID: 34791415 PMCID: PMC8728113 DOI: 10.1093/nar/gkab1007
Source DB: PubMed Journal: Nucleic Acids Res ISSN: 0305-1048 Impact factor: 16.971
Ensembl non-vertebrate growth/update 2019–2021
| Number of genomes | ||||||
|---|---|---|---|---|---|---|
| Release | Date | Bacteria | Protists | Fungi | Plants | Metazoa |
| 45 | September 2019 | 44 048 | 237 | 1014 | 67 | 78 |
| 52 | October 2021 | 31 332 | 237 | 1505 | 119 | 123 |
| Change | –12 716 | 0 | +491 | +52 | +45 | |
Figure 1.Shows the change in Ensembl Bacteria's collection, aggregated by our ten largest represented phylums, between releases 48 and 49. Component A shows the overall change in genome numbers in each phylum with over 15,000 genomes coming from three phylums. Component B demonstrates that overall family coverage within phylums has improved irrespective of the removal of genomes. Component C shows an increase in genomes without a known family with the majority occurring in Proteobacteria.
Figure 2.EPO multiple genome alignment visualization of chromosome 1 in three rice genomes: Oryza sativa indica Group (top), Oryza sativa japonica Group (middle) and Oryza glaberrima (bottom). Orange discontinuous blocks represent the areas of alignment across all three genomes. Each genome displays its genes and can be used to identify regions of uniqueness in each genome and identify potential areas of mis-assembly or mis-annotation. This alignment can be browsed at http://plants.ensembl.org/Oryza_nivara/Location/Compara_Alignments/Image?align = 9910;db = core;r = 1:586653–632276.
Figure 3.An AlphaFold 3D prediction for the Arabidopsis thaliana protein Q00958 (LFY: AT5G61850.1) displayed as a Richardson model using Mol*. The central panel annotates the model with regions of high confidence (blue) to low confidence (orange) with its protein sequence displayed above. The right hand panel enables highlighting of one or more exons, variants and protein features which are controlled by clicking on the eye icon. Variants can be turned on/off according to how deleterious or tolerated they are or individually. Only variants resulting in protein changes with SIFT scores are made available for display.