Literature DB >> 17099233

FlyBase: genomes by the dozen.

Madeline A Crosby1, Joshua L Goodman, Victor B Strelets, Peili Zhang, William M Gelbart.   

Abstract

FlyBase (http://flybase.org/) is the primary database of genetic and genomic data for the insect family Drosophilidae. Historically, Drosophila melanogaster has been the most extensively studied species in this family, but recent determination of the genomic sequences of an additional 11 Drosophila species opens up new avenues of research for other Drosophila species. This extensive sequence resource, encompassing species with well-defined phylogenetic relationships, provides a model system for comparative genomic analyses. FlyBase has developed tools to facilitate access to and navigation through this invaluable new data collection.

Entities:  

Mesh:

Year:  2006        PMID: 17099233      PMCID: PMC1669768          DOI: 10.1093/nar/gkl827

Source DB:  PubMed          Journal:  Nucleic Acids Res        ISSN: 0305-1048            Impact factor:   16.971


A NEW LOOK TO FlyBase

Over the past 2 years, FlyBase has effected a complete migration and integration of its underlying databases into a PostgreSQL chado genome database [(1), ]. This has enabled a reimplementation from the ground up of the FlyBase public interface, with a complete redesign of the Web pages, queries and reports (Figure 1). So, if you do not recognize us—take a second look! Detailed descriptions of the new FlyBase website will appear elsewhere.
Figure 1

A FlyBase gene report. The new FlyBase design allows the user to choose which sections and subsections to view. In this example, the table showing classical alleles has been opened.

A FlyBase gene report. The new FlyBase design allows the user to choose which sections and subsections to view. In this example, the table showing classical alleles has been opened.

THE CHANGING CONTENT OF FlyBase

FlyBase is an integrated resource for a vast array of genetic and molecular data concerning the Drosophilidae, including interactive genomic maps, gene product descriptions, mutant allele phenotypes, genetic interactions, expression patterns, transgenic constructs and insertions of transgenic constructs, anatomy and images, and genetic stock collections (2). Data are captured from bulk data sources, by curation from the literature, and by annotation based on assessment of contributing evidence; data capture is organized around consistent attribution to primary sources. As far as possible, descriptive data are curated using controlled vocabularies (CV), including the Gene Ontology for molecular function, biological process and cellular component (3), the Sequence Ontology for sequence features (4) and an extensive CV for anatomical terms and developmental stages (available as part of the Open Biomedical Ontologies project, ). Although FlyBase has since its inception curated genetic and genomic information on the family Drosophilidae, it is only with the recent whole-genome shotgun (WGS) sequencing and assembly of 11 additional species that substantial amounts of non-melanogaster data have appeared in FlyBase. Indeed, it will be interesting to see how the availability of these WGS sequence assemblies will affect Drosophila research through the ability to perform genome-wide comparative analyses at the sequence, phenotypic and biological process levels.

THE DROSOPHILA GENOMES (EMPHASIS ON THE PLURAL)

The genome sequences of 12 species of Drosophila are now available. The species and their phylogeny are shown in the left-hand side of Figure 2. For the genome of the primary biological research species, Drosophila melanogaster, the euchromatic arms have now been finished to high quality by the BDGP [(5), ]. In the current release of the D.melanogaster genome assembly (Release 5; see Table 1 and ), the arms include several megabases of centric heterochromatin as well as the entirety of the euchromatin. The heterochromatin, sequenced by the BDGP and the Drosophila Heterochromatin Genome Project (DHGP) (6), also includes several major scaffolds that are currently unattached to the arms. The fully annotated arms are available from FlyBase and GenBank; the DHGP-annotated heterochromatic scaffolds should be contributed to FlyBase and GenBank in late 2006.
Figure 2

The Muller element arm synteny table. The phylogenetic relationships of the 12 sequenced species are shown to the left, color-coded diagrammatic karyotypes and a table of syntenic relationships are shown to the right. There is not a simple one-to-one correspondence of arms to Muller elements for all of the species, due to fusion and inversion events relative to D.melanogaster in five of the species.

Table 1

Release 5 of the D.melanogaster genome assembly

Current annotation statusArmGenBank accession no.Annotated genes
Annotated Release 5.1XAE014298.42332
Annotated Release 5.12LAE014134.52756
Annotated Release 5.12RAE013599.43028
Annotated Release 5.13LAE014296.42808
Annotated Release 5.13RAE014297.23553
Annotated Release 5.14AE014135.391
Annotation in progressHeterochromatic sequences not integrated onto arms
The Muller element arm synteny table. The phylogenetic relationships of the 12 sequenced species are shown to the left, color-coded diagrammatic karyotypes and a table of syntenic relationships are shown to the right. There is not a simple one-to-one correspondence of arms to Muller elements for all of the species, due to fusion and inversion events relative to D.melanogaster in five of the species. Release 5 of the D.melanogaster genome assembly The other 11 species have all been sequenced in NHGRI-funded large-scale sequencing centers (Table 2), following the approval of three separate community-based white papers. The first white paper [(7), proposed the sequencing of a second species, Drosophila pseudoobscura, to support the annotation of D.melanogaster (8). The second white paper [(9), ] proposed the sequencing of several isolates of Drosophila simulans, a close relative of D.melanogaster, to understand the basis of variation within and between species, and the sequencing of a somewhat more distant member of the same species group, Drosophila yakuba, as an outgroup. The third white paper [(10), ] proposed the sequencing of eight additional species. Six of these species (Drosophila ananassae, Drosophila erecta, Drosophila grimshawi, Drosophila mojavensis, Drosophila virilis and Drosophila willistoni) were proposed principally to provide additional branch length for comparative genomic analysis in support of the annotation of D.melanogaster, as well as for the study of gene and chromosome evolution on a whole-genome scale. The other two species, D.persimilis and D.sechellia, are sibling species of D.pseudoobscura and D.simulans, respectively; these were chosen because the sibling species pairs can form fertile F1 hybrids and have been used to study genetic variation that underlies speciation.
Table 2

Sequenced genomes of Drosophila species

SpeciesGenBank WGS accession no. (CA Freeze 1)Assembly size (bp)Large-scale sequencing center
D.melanogasterSee Table 1139 712 364BDGP/Celera
D.ananassaeAAPP01000000230 993 012Agencourt Biosciences
D.erectaAAPQ01000000152 712 140Agencourt Biosciences
D.grimshawiAAPT01000000200 467 819Agencourt Biosciences
D.mojavensisAAPU01000000193 826 310Agencourt Biosciences
D.persimilisAAIZ00000000188 374 079Broad Institute
D.pseudoobscuraAADE01000000148 799 920Baylor HGSC
D.sechelliaAAKO01000000166 577 145Broad Institute
D.simulans(Pending)Washington University, GSC
D.virilisAANI01000000206 026 697Agencourt Biosciences
D.willistoniAAQB01000000235 516 348J. Craig Venter Institute
D.yakubaAAEU02000000165 691 649Washington University, GSC
Sequenced genomes of Drosophila species

REPRESENTATION OF THE DOZEN GENOMES IN FlyBase

A group called ‘Assembly, Annotation and Analysis’ (AAA) has been coordinating the community production and distribution of the relevant large datasets, the production of consensus annotation sets and the preparation of the initial reports of the results of these studies (). By the end of 2006, it is expected that the major datasets will have been produced, publications submitted and data contributed to FlyBase and GenBank. For each species these data will include several independent homology-based and ab initio gene prediction sets, consensus mRNA and protein annotation sets, orthologies, gene family groupings, and syntenic relationships among the species, the latter extending the previously known large-scale syntenic conservations among the chromosome arms of the genus Drosophila (see Figure 2). The following discussion describes the major ways to interrogate and browse these genomes and their relationships to one another.

THE FlyBase BLAST TOOL: QUERIES ACROSS INSECT SPECIES

The FlyBase BLAST tool serves as a convenient entry point to data for the insect species for which genomic sequence data are available, including the 12 Drosophila species, mosquito (Anopheles and Aedes), silkworm, honey bee and Tribolium. The tool provides an array of options in an intuitive format (Figure 3). An extremely useful feature of the BLAST output presentation are links that go directly to a GBrowse view of the genomic region that corresponds to the BLAST hit.
Figure 3

The FlyBase BLAST tool. ‘BLAST Database’ selection options are shown in the ‘BLAST database availability key’ and include the whole-genome assemblies, annotated transposons, protein sequences or intergenic sequences, as well as GenBank and UniProt datasets. The species for which data are available are shown in a hierarchical tree; clicking on any node in the tree selects all descendants of that node. Not all types of data are available for each of the featured species: an availability code (colored letters) is shown after each species listing. Advanced BLAST options (out of view at bottom of page) allow customization of BLAST parameters and output format; an ON/OFF toggle is provided for the ‘Low Complexity Filter’; the ‘Expect’ threshold parameter may be adjusted at the top of the page. The BLAST output includes a ‘BLAST HIT on Genome Map’ or ‘Feature on Genome Map’ link shown at the top of an aligned segment; this allows immediate access to a GBrowse view of the genomic region that corresponds to the BLAST hit, with the aligned region indicated by a gray panel.

The FlyBase BLAST tool. ‘BLAST Database’ selection options are shown in the ‘BLAST database availability key’ and include the whole-genome assemblies, annotated transposons, protein sequences or intergenic sequences, as well as GenBank and UniProt datasets. The species for which data are available are shown in a hierarchical tree; clicking on any node in the tree selects all descendants of that node. Not all types of data are available for each of the featured species: an availability code (colored letters) is shown after each species listing. Advanced BLAST options (out of view at bottom of page) allow customization of BLAST parameters and output format; an ON/OFF toggle is provided for the ‘Low Complexity Filter’; the ‘Expect’ threshold parameter may be adjusted at the top of the page. The BLAST output includes a ‘BLAST HIT on Genome Map’ or ‘Feature on Genome Map’ link shown at the top of an aligned segment; this allows immediate access to a GBrowse view of the genomic region that corresponds to the BLAST hit, with the aligned region indicated by a gray panel.

THE GBrowse GENOME VIEWER: CUSTOMIZED VIEWS OF PREDICTIONS AND EVIDENCE

Interactive views of the data generated by the genomic sequencing projects are presented using a newly modified version of the GBrowse genome viewer [(11), ]. Entry to a specific genomic region may be accomplished by running a BLAST search first, as described above. The tool may also be accessed from the FlyBase home page or from the ‘Tools’ menu found in the top bar on all FlyBase reports. Once the species to be viewed is chosen and the region of interest specified, the data to be viewed can also be specified and its presentation customized (Figure 4). For the newly sequenced genomes, the default view shows alignments to D.melanogaster putative orthologs and the GLEANR consensus predictions. GC content, translation stops and additional prediction sets may be selected for viewing, and the view may be modified by zooming or scrolling or flipping. As more data become available, they will be incorporated into the GBrowse presentation. The sequence and selected datasets for the genomic extent being viewed may be downloaded as a decorated FASTA file, a GFF file or a table.
Figure 4

A GBrowse view. The region of the D.ananassae genome homologous to the D.melanogaster HP1c gene has been accessed via a link from the FlyBase BLAST output. The region of BLAST alignment is indicated by a gray panel. The default selection of data includes alignments of D.melanogaster putative orthologs and the GLEANR consensus predictions; in this view, additional gene prediction tracks have been selected. The presentation of the data options may be customized further by using a ‘Configure tracks’ option (out of view at bottom of page). The species being viewed may be changed at any time by choosing from the ‘Data Source’ menu. Once viewing the species of choice, a particular region can be specified using sequence coordinates or a ‘landmark’. For the newly sequenced species, the initial set of landmarks consists of the gene predictions, identified by their alphanumeric designations, and the D.melanogaster putative orthologs. The options for downloading the data shown are listed in the menu ‘Reports and Analysis’; the FASTA output may be customized by using the ‘Configure’ option.

A GBrowse view. The region of the D.ananassae genome homologous to the D.melanogaster HP1c gene has been accessed via a link from the FlyBase BLAST output. The region of BLAST alignment is indicated by a gray panel. The default selection of data includes alignments of D.melanogaster putative orthologs and the GLEANR consensus predictions; in this view, additional gene prediction tracks have been selected. The presentation of the data options may be customized further by using a ‘Configure tracks’ option (out of view at bottom of page). The species being viewed may be changed at any time by choosing from the ‘Data Source’ menu. Once viewing the species of choice, a particular region can be specified using sequence coordinates or a ‘landmark’. For the newly sequenced species, the initial set of landmarks consists of the gene predictions, identified by their alphanumeric designations, and the D.melanogaster putative orthologs. The options for downloading the data shown are listed in the menu ‘Reports and Analysis’; the FASTA output may be customized by using the ‘Configure’ option.

BULK DATA DOWNLOADS

Data files for all classes of data in FlyBase are available for download by FTP in several formats, including GFF3 for sequence data. Links to the bulk data repositories may be accessed from the ‘Files’ menu, ‘Precomputed files’ option, at the top of all FlyBase pages; from there, the ‘Genomes: Annotation and Sequence’ section provides access to genome data for each (or all) of the sequenced species. In addition, bulk queries can be performed and downloaded via the ‘QueryBuilder’ tool, accessed from the top page or the ‘Tools’ menu.

MORE ON THE SPECIES OF FAMILY DROSOPHILIDAE

From the ‘Species’ menu on the top bar of the FlyBase home page and all report pages, additional information on the Drosophilidae may be accessed. At present there are four items to choose from: ‘Phylogeny’ links to an index of species, each linked to its position in the Drosophilidae phylogenetic tree; ‘Synteny table’ goes to the presentation of syntenic relationships of the chromosomal arms of the 12 sequenced species shown in Figure 2; ‘Drosophilidae’ links to a compilation of color images of species within this family, originally published by the University of Texas at Austin School of Biological Sciences; and ‘Abbreviations’ accesses a list of the four-letter genus–species codes for all species found in FlyBase. The ‘Species’ resources will be updated periodically, as appropriate community resources and data become available. FlyBase continues to curate and present traditional genetic data for all the Drosophilid species. Now, availability and integration of genomic data for 12 well-characterized species provide a powerful resource that will allow the research community to take full advantage of the family Drosophilidae as a model for comparative genomic and phylogenetic analyses.
  7 in total

1.  The Gene Ontology (GO) database and informatics resource.

Authors:  M A Harris; J Clark; A Ireland; J Lomax; M Ashburner; R Foulger; K Eilbeck; S Lewis; B Marshall; C Mungall; J Richter; G M Rubin; J A Blake; C Bult; M Dolan; H Drabkin; J T Eppig; D P Hill; L Ni; M Ringwald; R Balakrishnan; J M Cherry; K R Christie; M C Costanzo; S S Dwight; S Engel; D G Fisk; J E Hirschman; E L Hong; R S Nash; A Sethuraman; C L Theesfeld; D Botstein; K Dolinski; B Feierbach; T Berardini; S Mundodi; S Y Rhee; R Apweiler; D Barrell; E Camon; E Dimmer; V Lee; R Chisholm; P Gaudet; W Kibbe; R Kishore; E M Schwarz; P Sternberg; M Gwinn; L Hannick; J Wortman; M Berriman; V Wood; N de la Cruz; P Tonellato; P Jaiswal; T Seigfried; R White
Journal:  Nucleic Acids Res       Date:  2004-01-01       Impact factor: 16.971

2.  The generic genome browser: a building block for a model organism system database.

Authors:  Lincoln D Stein; Christopher Mungall; ShengQiang Shu; Michael Caudy; Marco Mangone; Allen Day; Elizabeth Nickerson; Jason E Stajich; Todd W Harris; Adrian Arva; Suzanna Lewis
Journal:  Genome Res       Date:  2002-10       Impact factor: 9.043

3.  Comparative genome sequencing of Drosophila pseudoobscura: chromosomal, gene, and cis-element evolution.

Authors:  Stephen Richards; Yue Liu; Brian R Bettencourt; Pavel Hradecky; Stan Letovsky; Rasmus Nielsen; Kevin Thornton; Melissa J Hubisz; Rui Chen; Richard P Meisel; Olivier Couronne; Sujun Hua; Mark A Smith; Peili Zhang; Jing Liu; Harmen J Bussemaker; Marinus F van Batenburg; Sally L Howells; Steven E Scherer; Erica Sodergren; Beverly B Matthews; Madeline A Crosby; Andrew J Schroeder; Daniel Ortiz-Barrientos; Catharine M Rives; Michael L Metzker; Donna M Muzny; Graham Scott; David Steffen; David A Wheeler; Kim C Worley; Paul Havlak; K James Durbin; Amy Egan; Rachel Gill; Jennifer Hume; Margaret B Morgan; George Miner; Cerissa Hamilton; Yanmei Huang; Lenée Waldron; Daniel Verduzco; Kerstin P Clerc-Blankenburg; Inna Dubchak; Mohamed A F Noor; Wyatt Anderson; Kevin P White; Andrew G Clark; Stephen W Schaeffer; William Gelbart; George M Weinstock; Richard A Gibbs
Journal:  Genome Res       Date:  2005-01       Impact factor: 9.043

4.  The Sequence Ontology: a tool for the unification of genome annotations.

Authors:  Karen Eilbeck; Suzanna E Lewis; Christopher J Mungall; Mark Yandell; Lincoln Stein; Richard Durbin; Michael Ashburner
Journal:  Genome Biol       Date:  2005-04-29       Impact factor: 13.583

5.  FlyBase: anatomical data, images and queries.

Authors:  Gary Grumbling; Victor Strelets
Journal:  Nucleic Acids Res       Date:  2006-01-01       Impact factor: 16.971

6.  Heterochromatic sequences in a Drosophila whole-genome shotgun assembly.

Authors:  Roger A Hoskins; Christopher D Smith; Joseph W Carlson; A Bernardo Carvalho; Aaron Halpern; Joshua S Kaminker; Cameron Kennedy; Chris J Mungall; Beth A Sullivan; Granger G Sutton; Jiro C Yasuhara; Barbara T Wakimoto; Eugene W Myers; Susan E Celniker; Gerald M Rubin; Gary H Karpen
Journal:  Genome Biol       Date:  2002-12-31       Impact factor: 13.583

7.  Finishing a whole-genome shotgun: release 3 of the Drosophila melanogaster euchromatic genome sequence.

Authors:  Susan E Celniker; David A Wheeler; Brent Kronmiller; Joseph W Carlson; Aaron Halpern; Sandeep Patel; Mark Adams; Mark Champe; Shannon P Dugan; Erwin Frise; Ann Hodgson; Reed A George; Roger A Hoskins; Todd Laverty; Donna M Muzny; Catherine R Nelson; Joanne M Pacleb; Soo Park; Barret D Pfeiffer; Stephen Richards; Erica J Sodergren; Robert Svirskas; Paul E Tabor; Kenneth Wan; Mark Stapleton; Granger G Sutton; Craig Venter; George Weinstock; Steven E Scherer; Eugene W Myers; Richard A Gibbs; Gerald M Rubin
Journal:  Genome Biol       Date:  2002-12-23       Impact factor: 13.583

  7 in total
  127 in total

1.  Progress and Promise in using Arabidopsis to Study Adaptation, Divergence, and Speciation.

Authors:  Ben Hunter; Kirsten Bomblies
Journal:  Arabidopsis Book       Date:  2010-09-29

2.  Evolutionary dynamics of olfactory receptor genes in Drosophila species.

Authors:  Masafumi Nozawa; Masatoshi Nei
Journal:  Proc Natl Acad Sci U S A       Date:  2007-04-16       Impact factor: 11.205

3.  Genome-scale analysis of positionally relocated genes.

Authors:  Arjun Bhutkar; Susan M Russo; Temple F Smith; William M Gelbart
Journal:  Genome Res       Date:  2007-11-07       Impact factor: 9.043

4.  Discovery of functional elements in 12 Drosophila genomes using evolutionary signatures.

Authors:  Alexander Stark; Michael F Lin; Pouya Kheradpour; Jakob S Pedersen; Leopold Parts; Joseph W Carlson; Madeline A Crosby; Matthew D Rasmussen; Sushmita Roy; Ameya N Deoras; J Graham Ruby; Julius Brennecke; Emily Hodges; Angie S Hinrichs; Anat Caspi; Benedict Paten; Seung-Won Park; Mira V Han; Morgan L Maeder; Benjamin J Polansky; Bryanne E Robson; Stein Aerts; Jacques van Helden; Bassem Hassan; Donald G Gilbert; Deborah A Eastman; Michael Rice; Michael Weir; Matthew W Hahn; Yongkyu Park; Colin N Dewey; Lior Pachter; W James Kent; David Haussler; Eric C Lai; David P Bartel; Gregory J Hannon; Thomas C Kaufman; Michael B Eisen; Andrew G Clark; Douglas Smith; Susan E Celniker; William M Gelbart; Manolis Kellis
Journal:  Nature       Date:  2007-11-08       Impact factor: 49.962

5.  Animal trait ontology: The importance and usefulness of a unified trait vocabulary for animal species.

Authors:  L M Hughes; J Bao; Z-L Hu; V Honavar; J M Reecy
Journal:  J Anim Sci       Date:  2008-02-13       Impact factor: 3.159

6.  Interplay of developmentally regulated gene expression and heterochromatic silencing in trans in Drosophila.

Authors:  Brian T Sage; Michael D Wu; Amy K Csink
Journal:  Genetics       Date:  2008-02-01       Impact factor: 4.562

7.  Microarray analysis of replicate populations selected against a wing-shape correlation in Drosophila melanogaster.

Authors:  Kenneth E Weber; Ralph J Greenspan; David R Chicoine; Katia Fiorentino; Mary H Thomas; Theresa L Knight
Journal:  Genetics       Date:  2008-02-01       Impact factor: 4.562

8.  Quantitative measures for the management and comparison of annotated genomes.

Authors:  Karen Eilbeck; Barry Moore; Carson Holt; Mark Yandell
Journal:  BMC Bioinformatics       Date:  2009-02-23       Impact factor: 3.169

9.  Evolution, biogenesis, expression, and target predictions of a substantially expanded set of Drosophila microRNAs.

Authors:  J Graham Ruby; Alexander Stark; Wendy K Johnston; Manolis Kellis; David P Bartel; Eric C Lai
Journal:  Genome Res       Date:  2007-11-07       Impact factor: 9.043

10.  Genomic regulatory blocks underlie extensive microsynteny conservation in insects.

Authors:  Pär G Engström; Shannan J Ho Sui; Oyvind Drivenes; Thomas S Becker; Boris Lenhard
Journal:  Genome Res       Date:  2007-11-07       Impact factor: 9.043

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.