Literature DB >> 25723810

Avianbase: a community resource for bird genomics.

Lél Eöry, M Thomas P Gilbert, Cai Li, Bo Li, Alan Archibald, Bronwen L Aken, Guojie Zhang, Erich Jarvis, Paul Flicek, David W Burt.   

Abstract

Giving access to sequence and annotation data for genome assemblies is important because, while facilitating research, it places both assembly and annotation quality under scrutiny, resulting in improvements to both. Therefore we announce Avianbase, a resource for bird genomics, which provides access to data released by the Avian Phylogenomics Consortium.

Entities:  

Mesh:

Year:  2015        PMID: 25723810      PMCID: PMC4310197          DOI: 10.1186/s13059-015-0588-2

Source DB:  PubMed          Journal:  Genome Biol        ISSN: 1474-7596            Impact factor:   13.583


Access to complete genome sequences provides the first step towards the understanding of the biology of organisms. It is the template that underpins the phenotypic characteristics of individuals and ultimately separates species due to the accumulation and fixation of mutations over evolutionary timescales. In terms of the available genomic datasets for species, birds, as our more distant relatives, have been historically underrepresented. The high cost of sequencing and annotation in the past led to a bias towards accumulating data for species that are either established model organisms or economically significant (that is, chicken, turkey and duck, representing two sister orders within the Galloanseriformes clade from the large and diverse phylogeny of birds). The recent release of genome assemblies and initial predictions of protein-coding genes [1-4] for 44 bird species, including representatives from all major branches of the bird phylogeny, is, therefore, highly significant. One of the major challenges with the release of this number of newly sequenced genomes and the many more to come [5] is how to make these available to the various research communities in a way that supports basic research. Providing access to the sequences and initial annotations in the format of text files will limit the potential usage of the data as they require significant resources, including bioinformatics personnel and computer infrastructure in place to access and mine - for example, searching for genes belonging to certain protein families or searching for orthologous genes. These overheads pose a serious bottleneck that can hinder research and requires concerted action by the relevant research communities. Once genomes are submitted to public databases, genome-wide annotations are frequently generated and released either via the Ensembl project [6] or by the National Center for Biotechnology Information [7] and sequence and annotation are then made visually available online in integrated views via the Ensembl or the University of California Santa Cruz (UCSC) genome browsers [8]. These systems provide search facilities, sequence alignment tools like BLAT/BLAST and various analysis tools to facilitate subsetting and computational retrieval of the data, including UCSC’s Table Browser or Ensembl’s Perl and REST APIs and BioMart system. While these systems have become almost indispensable for research, not all sequenced genomes are annotated and displayed in genome browsers. Full genome annotation remains time consuming and resource intensive: a full evidence-based Ensembl genebuild takes approximately 4 months. Thus, the list of species represented is currently limited and depends on various factors, including the completeness of the assembled genome sequence and the overall demand in the scientific community for the resources, including whether the species is a model organism (for example, human or mouse), economically important (for example, farmed animals) or of specific phylogenetic interest. Many of the recently sequenced bird genomes do not obviously fall within these categories.

Bird genomics resource using Ensembl infrastructure

In order to support bird genomics by making the sequence and gene predictions generated by the Avian Phylogenomics Consortium (APC) more broadly available, as well as to support the research and conclusions in the published companion papers, we decided to make the initial data available within the Ensembl framework. We chose to use Ensembl for many reasons. First, Ensembl’s open-access data model and open-source software infrastructure make it possible to reuse their data and employ their source code for our purposes with minimal customizations. The software infrastructure includes various analysis pipelines and implements the genome browser interface with its unique tool-set. Second, the eHive analysis workflow management system [9] developed by the Ensembl team provides support for various computer infrastructures and greatly simplifies the tasks related to job management. Third, Ensembl runs a two tier user support system that quickly and efficiently resolves, beside many things, system-related problems via email to its helpdesk or through access to its developers through a dedicated mailing list. Finally, the modular design of the existing software infrastructure makes it possible to extend the analysis pipelines with new software or to create pipelines for new data types, to provide services matching the available data and/or computer infrastructure, and most importantly to scale-up data loading and analyses to a multispecies level. Here we provide Avianbase, an Ensembl-based resource that is primarily built by and for the bird research communities to share and improve the existing data and annotation made available by the consortium. In its current form this Ensembl instance provides unique access to 44 newly sequenced bird genomes (Figure 1). The data include the genome assemblies generated by BGI, full repeat annotations using dustmasker [10], tandem repeat finder [11], homology-based repeat identification with RepeatMasker [12] and de novo repeat identification with RepeatModeler [13] as well as GeneWise [14] gene predictions created by the BGI and based on a set of selected transcripts from the chicken, zebra finch and human Ensembl genebuilds [1-4] (Figure 2). We also include within Avianbase a mirror of four relevant Ensembl core databases: chicken, turkey, duck and zebra finch, as some of these birds served as templates for the gene predictions and also because this set of 48 birds is the subject of the research described in many of the companion papers to the main APC papers [1,2]. In addition to providing visual displays of the sequences, gene models, transcripts and translations, we also provide indexed search facilities for these birds and BLAST access to the genomic data as well as links to the original data files [15]. Users can also upload and display their own data along with the default annotations. Future support for data mining and analysis is also planned by allowing access to the data via BioMart or via the Perl API and we are actively considering how to provide these options.
Figure 1

Avianbase: genome portal for bird genomics using the Ensembl infrastructure.

Figure 2

Location view with example gene model and repeat annotation for Emperor penguin using the Ensembl Genome Viewer.

Avianbase: genome portal for bird genomics using the Ensembl infrastructure. Location view with example gene model and repeat annotation for Emperor penguin using the Ensembl Genome Viewer.

Conclusions

Although at present the sequence data and annotations available on our site are limited to what was released by the APC, our bird portal can serve as a medium to support avian research in many ways. One of our aims is to use this broad sample of available bird genomes to generate an improved functional map of selectively constrained sites for bird genomes in a genome-wide manner and in a functional category-independent way. This map will greatly improve our ability to link causative variants with genomic locations and so link certain genotypes with observed phenotypes. In the past, detailed maps of this kind were only available for mammals [16] and now we have the opportunity to greatly enhance avian research, especially for species for which variation data are already available (see, for example, [17]). Our bird portal can be tailored to the needs of the individual bird research communities. It can list available resources and support collaboration within and between research teams by providing and sharing data that can be used to improve the assembly (resequencing projects) or the annotation (variation and transcriptome data) for the genome of interest. We encourage these communities to contact us (avianbase@ed.ac.uk) and suggest ways for improvements that can benefit their research. Avianbase, our Ensembl-based bird resource, is available at http://avianbase.narf.ac.uk and is hosted within the National Avian Research Facility (NARF), UK [18], which aims to support the study of avian biology, genetics, infection and disease.
  11 in total

1.  GeneWise and Genomewise.

Authors:  Ewan Birney; Michele Clamp; Richard Durbin
Journal:  Genome Res       Date:  2004-05       Impact factor: 9.043

2.  Genome 10K: a proposal to obtain whole-genome sequence for 10,000 vertebrate species.

Authors: 
Journal:  J Hered       Date:  2009-11-05       Impact factor: 2.645

3.  Tandem repeats finder: a program to analyze DNA sequences.

Authors:  G Benson
Journal:  Nucleic Acids Res       Date:  1999-01-15       Impact factor: 16.971

4.  Whole-genome resequencing reveals loci under selection during chicken domestication.

Authors:  Carl-Johan Rubin; Michael C Zody; Jonas Eriksson; Jennifer R S Meadows; Ellen Sherwood; Matthew T Webster; Lin Jiang; Max Ingman; Ted Sharpe; Sojeong Ka; Finn Hallböök; Francois Besnier; Orjan Carlborg; Bertrand Bed'hom; Michèle Tixier-Boichard; Per Jensen; Paul Siegel; Kerstin Lindblad-Toh; Leif Andersson
Journal:  Nature       Date:  2010-03-10       Impact factor: 49.962

5.  Comparative genomics reveals insights into avian genome evolution and adaptation.

Authors:  Guojie Zhang; Cai Li; Qiye Li; Bo Li; Denis M Larkin; Chul Lee; Jay F Storz; Agostinho Antunes; Matthew J Greenwold; Robert W Meredith; Anders Ödeen; Jie Cui; Qi Zhou; Luohao Xu; Hailin Pan; Zongji Wang; Lijun Jin; Pei Zhang; Haofu Hu; Wei Yang; Jiang Hu; Jin Xiao; Zhikai Yang; Yang Liu; Qiaolin Xie; Hao Yu; Jinmin Lian; Ping Wen; Fang Zhang; Hui Li; Yongli Zeng; Zijun Xiong; Shiping Liu; Long Zhou; Zhiyong Huang; Na An; Jie Wang; Qiumei Zheng; Yingqi Xiong; Guangbiao Wang; Bo Wang; Jingjing Wang; Yu Fan; Rute R da Fonseca; Alonzo Alfaro-Núñez; Mikkel Schubert; Ludovic Orlando; Tobias Mourier; Jason T Howard; Ganeshkumar Ganapathy; Andreas Pfenning; Osceola Whitney; Miriam V Rivas; Erina Hara; Julia Smith; Marta Farré; Jitendra Narayan; Gancho Slavov; Michael N Romanov; Rui Borges; João Paulo Machado; Imran Khan; Mark S Springer; John Gatesy; Federico G Hoffmann; Juan C Opazo; Olle Håstad; Roger H Sawyer; Heebal Kim; Kyu-Won Kim; Hyeon Jeong Kim; Seoae Cho; Ning Li; Yinhua Huang; Michael W Bruford; Xiangjiang Zhan; Andrew Dixon; Mads F Bertelsen; Elizabeth Derryberry; Wesley Warren; Richard K Wilson; Shengbin Li; David A Ray; Richard E Green; Stephen J O'Brien; Darren Griffin; Warren E Johnson; David Haussler; Oliver A Ryder; Eske Willerslev; Gary R Graves; Per Alström; Jon Fjeldså; David P Mindell; Scott V Edwards; Edward L Braun; Carsten Rahbek; David W Burt; Peter Houde; Yong Zhang; Huanming Yang; Jian Wang; Erich D Jarvis; M Thomas P Gilbert; Jun Wang
Journal:  Science       Date:  2014-12-11       Impact factor: 47.728

6.  A high-resolution map of human evolutionary constraint using 29 mammals.

Authors:  Kerstin Lindblad-Toh; Manuel Garber; Or Zuk; Michael F Lin; Brian J Parker; Stefan Washietl; Pouya Kheradpour; Jason Ernst; Gregory Jordan; Evan Mauceli; Lucas D Ward; Craig B Lowe; Alisha K Holloway; Michele Clamp; Sante Gnerre; Jessica Alföldi; Kathryn Beal; Jean Chang; Hiram Clawson; James Cuff; Federica Di Palma; Stephen Fitzgerald; Paul Flicek; Mitchell Guttman; Melissa J Hubisz; David B Jaffe; Irwin Jungreis; W James Kent; Dennis Kostka; Marcia Lara; Andre L Martins; Tim Massingham; Ida Moltke; Brian J Raney; Matthew D Rasmussen; Jim Robinson; Alexander Stark; Albert J Vilella; Jiayu Wen; Xiaohui Xie; Michael C Zody; Jen Baldwin; Toby Bloom; Chee Whye Chin; Dave Heiman; Robert Nicol; Chad Nusbaum; Sarah Young; Jane Wilkinson; Kim C Worley; Christie L Kovar; Donna M Muzny; Richard A Gibbs; Andrew Cree; Huyen H Dihn; Gerald Fowler; Shalili Jhangiani; Vandita Joshi; Sandra Lee; Lora R Lewis; Lynne V Nazareth; Geoffrey Okwuonu; Jireh Santibanez; Wesley C Warren; Elaine R Mardis; George M Weinstock; Richard K Wilson; Kim Delehaunty; David Dooling; Catrina Fronik; Lucinda Fulton; Bob Fulton; Tina Graves; Patrick Minx; Erica Sodergren; Ewan Birney; Elliott H Margulies; Javier Herrero; Eric D Green; David Haussler; Adam Siepel; Nick Goldman; Katherine S Pollard; Jakob S Pedersen; Eric S Lander; Manolis Kellis
Journal:  Nature       Date:  2011-10-12       Impact factor: 49.962

7.  Phylogenomic analyses data of the avian phylogenomics project.

Authors:  Erich D Jarvis; Siavash Mirarab; Andre J Aberer; Bo Li; Peter Houde; Cai Li; Simon Y W Ho; Brant C Faircloth; Benoit Nabholz; Jason T Howard; Alexander Suh; Claudia C Weber; Rute R da Fonseca; Alonzo Alfaro-Núñez; Nitish Narula; Liang Liu; Dave Burt; Hans Ellegren; Scott V Edwards; Alexandros Stamatakis; David P Mindell; Joel Cracraft; Edward L Braun; Tandy Warnow; Wang Jun; M Thomas Pius Gilbert; Guojie Zhang
Journal:  Gigascience       Date:  2015-02-12       Impact factor: 6.524

8.  Ensembl 2014.

Authors:  Paul Flicek; M Ridwan Amode; Daniel Barrell; Kathryn Beal; Konstantinos Billis; Simon Brent; Denise Carvalho-Silva; Peter Clapham; Guy Coates; Stephen Fitzgerald; Laurent Gil; Carlos García Girón; Leo Gordon; Thibaut Hourlier; Sarah Hunt; Nathan Johnson; Thomas Juettemann; Andreas K Kähäri; Stephen Keenan; Eugene Kulesha; Fergal J Martin; Thomas Maurel; William M McLaren; Daniel N Murphy; Rishi Nag; Bert Overduin; Miguel Pignatelli; Bethan Pritchard; Emily Pritchard; Harpreet S Riat; Magali Ruffier; Daniel Sheppard; Kieron Taylor; Anja Thormann; Stephen J Trevanion; Alessandro Vullo; Steven P Wilder; Mark Wilson; Amonida Zadissa; Bronwen L Aken; Ewan Birney; Fiona Cunningham; Jennifer Harrow; Javier Herrero; Tim J P Hubbard; Rhoda Kinsella; Matthieu Muffato; Anne Parker; Giulietta Spudich; Andy Yates; Daniel R Zerbino; Stephen M J Searle
Journal:  Nucleic Acids Res       Date:  2013-12-06       Impact factor: 16.971

9.  Whole-genome analyses resolve early branches in the tree of life of modern birds.

Authors:  Erich D Jarvis; Siavash Mirarab; Andre J Aberer; Bo Li; Peter Houde; Cai Li; Simon Y W Ho; Brant C Faircloth; Benoit Nabholz; Jason T Howard; Alexander Suh; Claudia C Weber; Rute R da Fonseca; Jianwen Li; Fang Zhang; Hui Li; Long Zhou; Nitish Narula; Liang Liu; Ganesh Ganapathy; Bastien Boussau; Md Shamsuzzoha Bayzid; Volodymyr Zavidovych; Sankar Subramanian; Toni Gabaldón; Salvador Capella-Gutiérrez; Jaime Huerta-Cepas; Bhanu Rekepalli; Kasper Munch; Mikkel Schierup; Bent Lindow; Wesley C Warren; David Ray; Richard E Green; Michael W Bruford; Xiangjiang Zhan; Andrew Dixon; Shengbin Li; Ning Li; Yinhua Huang; Elizabeth P Derryberry; Mads Frost Bertelsen; Frederick H Sheldon; Robb T Brumfield; Claudio V Mello; Peter V Lovell; Morgan Wirthlin; Maria Paula Cruz Schneider; Francisco Prosdocimi; José Alfredo Samaniego; Amhed Missael Vargas Velazquez; Alonzo Alfaro-Núñez; Paula F Campos; Bent Petersen; Thomas Sicheritz-Ponten; An Pas; Tom Bailey; Paul Scofield; Michael Bunce; David M Lambert; Qi Zhou; Polina Perelman; Amy C Driskell; Beth Shapiro; Zijun Xiong; Yongli Zeng; Shiping Liu; Zhenyu Li; Binghang Liu; Kui Wu; Jin Xiao; Xiong Yinqi; Qiuemei Zheng; Yong Zhang; Huanming Yang; Jian Wang; Linnea Smeds; Frank E Rheindt; Michael Braun; Jon Fjeldsa; Ludovic Orlando; F Keith Barker; Knud Andreas Jønsson; Warren Johnson; Klaus-Peter Koepfli; Stephen O'Brien; David Haussler; Oliver A Ryder; Carsten Rahbek; Eske Willerslev; Gary R Graves; Travis C Glenn; John McCormack; Dave Burt; Hans Ellegren; Per Alström; Scott V Edwards; Alexandros Stamatakis; David P Mindell; Joel Cracraft; Edward L Braun; Tandy Warnow; Wang Jun; M Thomas P Gilbert; Guojie Zhang
Journal:  Science       Date:  2014-12-12       Impact factor: 47.728

10.  The UCSC Genome Browser database: 2014 update.

Authors:  Donna Karolchik; Galt P Barber; Jonathan Casper; Hiram Clawson; Melissa S Cline; Mark Diekhans; Timothy R Dreszer; Pauline A Fujita; Luvina Guruvadoo; Maximilian Haeussler; Rachel A Harte; Steve Heitner; Angie S Hinrichs; Katrina Learned; Brian T Lee; Chin H Li; Brian J Raney; Brooke Rhead; Kate R Rosenbloom; Cricket A Sloan; Matthew L Speir; Ann S Zweig; David Haussler; Robert M Kuhn; W James Kent
Journal:  Nucleic Acids Res       Date:  2013-11-21       Impact factor: 16.971

View more
  13 in total

1.  The Adaptive Evolution Database (TAED): A New Release of a Database of Phylogenetically Indexed Gene Families from Chordates.

Authors:  Russell A Hermansen; Benjamin P Oswald; Stormy Knight; Stephen D Shank; David Northover; Katharine L Korunes; Stephen N Michel; David A Liberles
Journal:  J Mol Evol       Date:  2017-08-09       Impact factor: 2.395

2.  Dynamics of genome size evolution in birds and mammals.

Authors:  Aurélie Kapusta; Alexander Suh; Cédric Feschotte
Journal:  Proc Natl Acad Sci U S A       Date:  2017-02-08       Impact factor: 11.205

3.  New high copy tandem repeat in the content of the chicken W chromosome.

Authors:  Aleksey S Komissarov; Svetlana A Galkina; Elena I Koshel; Maria M Kulak; Aleksander G Dyomin; Stephen J O'Brien; Elena R Gaginskaya; Alsu F Saifitdinova
Journal:  Chromosoma       Date:  2017-09-26       Impact factor: 4.316

Review 4.  The state of play in higher eukaryote gene annotation.

Authors:  Jonathan M Mudge; Jennifer Harrow
Journal:  Nat Rev Genet       Date:  2016-10-24       Impact factor: 53.242

5.  GenomeHubs: simple containerized setup of a custom Ensembl database and web server for any species.

Authors:  Richard J Challis; Sujai Kumar; Lewis Stevens; Mark Blaxter
Journal:  Database (Oxford)       Date:  2017-01-01       Impact factor: 3.451

6.  Ensembl core software resources: storage and programmatic access for DNA sequence and genome annotation.

Authors:  Magali Ruffier; Andreas Kähäri; Monika Komorowska; Stephen Keenan; Matthew Laird; Ian Longden; Glenn Proctor; Steve Searle; Daniel Staines; Kieron Taylor; Alessandro Vullo; Andrew Yates; Daniel Zerbino; Paul Flicek
Journal:  Database (Oxford)       Date:  2017-01-01       Impact factor: 3.451

7.  Evolution of the functionally conserved DCC gene in birds.

Authors:  Cedric Patthey; Yong Guang Tong; Christine Mary Tait; Sara Ivy Wilson
Journal:  Sci Rep       Date:  2017-02-27       Impact factor: 4.379

Review 8.  Avian Interferons and Their Antiviral Effectors.

Authors:  Diwakar Santhakumar; Dennis Rubbenstroth; Luis Martinez-Sobrido; Muhammad Munir
Journal:  Front Immunol       Date:  2017-01-31       Impact factor: 7.561

9.  Ensembl 2018.

Authors:  Daniel R Zerbino; Premanand Achuthan; Wasiu Akanni; M Ridwan Amode; Daniel Barrell; Jyothish Bhai; Konstantinos Billis; Carla Cummins; Astrid Gall; Carlos García Girón; Laurent Gil; Leo Gordon; Leanne Haggerty; Erin Haskell; Thibaut Hourlier; Osagie G Izuogu; Sophie H Janacek; Thomas Juettemann; Jimmy Kiang To; Matthew R Laird; Ilias Lavidas; Zhicheng Liu; Jane E Loveland; Thomas Maurel; William McLaren; Benjamin Moore; Jonathan Mudge; Daniel N Murphy; Victoria Newman; Michael Nuhn; Denye Ogeh; Chuang Kee Ong; Anne Parker; Mateus Patricio; Harpreet Singh Riat; Helen Schuilenburg; Dan Sheppard; Helen Sparrow; Kieron Taylor; Anja Thormann; Alessandro Vullo; Brandon Walts; Amonida Zadissa; Adam Frankish; Sarah E Hunt; Myrto Kostadima; Nicholas Langridge; Fergal J Martin; Matthieu Muffato; Emily Perry; Magali Ruffier; Dan M Staines; Stephen J Trevanion; Bronwen L Aken; Fiona Cunningham; Andrew Yates; Paul Flicek
Journal:  Nucleic Acids Res       Date:  2018-01-04       Impact factor: 16.971

10.  De novo PacBio long-read and phased avian genome assemblies correct and add to reference genes generated with intermediate and short reads.

Authors:  Jonas Korlach; Gregory Gedman; Sarah B Kingan; Chen-Shan Chin; Jason T Howard; Jean-Nicolas Audet; Lindsey Cantin; Erich D Jarvis
Journal:  Gigascience       Date:  2017-10-01       Impact factor: 6.524

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.