Literature DB >> 15608223

FlyBase: genes and gene models.

Rachel A Drysdale1, Madeline A Crosby.   

Abstract

FlyBase (http://flybase.org) is the primary repository of genetic and molecular data of the insect family Drosophilidae. For the most extensively studied species, Drosophila melanogaster, a wide range of data are presented in integrated formats. Data types include mutant phenotypes, molecular characterization of mutant alleles and aberrations, cytological maps, wild-type expression patterns, anatomical images, transgenic constructs and insertions, sequence-level gene models and molecular classification of gene product functions. There is a growing body of data for other Drosophila species; this is expected to increase dramatically over the next year, with the completion of draft-quality genomic sequences of an additional 11 Drosophila species.

Entities:  

Mesh:

Substances:

Year:  2005        PMID: 15608223      PMCID: PMC540000          DOI: 10.1093/nar/gki046

Source DB:  PubMed          Journal:  Nucleic Acids Res        ISSN: 0305-1048            Impact factor:   16.971


SCOPE OF FLYBASE

FlyBase includes information about the structure and function of genes and gene products of the Drosophila genome (1). Although the primary species represented is that workhorse of classic genetics, Drosophila melanogaster, the database currently includes records for genes of more than 400 other Drosophila species, and will house genomic information for the 11 additional species included in the Drosophila comparative genomics sequencing effort. Phenotypic and genetic interaction information about mutants, and wild-type gene and enhancer-trap expression patterns are linked to strains in the Drosophila Stock Centers, from which extensive collections of mutant and wild-type strains are available. Mutant phenotypes (2) and gene expression patterns are described using controlled vocabularies, including anatomical terms linked to illustrations in the Anatomy section of FlyBase. Data concerning chromosome aberrations, natural transposons, genetically engineered constructs and transgene insertions are presented with hyperlinks to affected genes and resulting mutant alleles. An overview of the classes of data found in FlyBase may be seen on the homepage (http://flybase.org; for further description see Supplementary Figure 1). Features recently added to FlyBase include an External Database Links section in Gene reports, expanded Batch query options and an extensive Drosophila Resources compilation (http://flybase.bio.indiana.edu/allied-data/resources.html), which provides a comprehensive list of links to both network resources (e.g. sequence analysis tools) and material resources (e.g. clone and microarray suppliers) external to the FlyBase project. Data are compiled by curators and annotators from sources including the scientific literature, large-scale genome sequencing projects and online resources such as the GenBank (NCBI)/EMBL/DDBJ nucleotide sequence databases and the UniProt (3) protein database. FlyBase curators work with curators of other databases, such as the Gene Ontology (GO) consortium (4) to ensure consistency of annotation across databases. The D.melanogaster genome annotation, Release 4.0 at the time of writing (5–7), has been enhanced by hand curation of all gene models (8,9), including integration of error reports submitted by the user community. Table 1 shows a snapshot of FlyBase content as of September 2004. The remainder of this paper will focus on genes and gene models in FlyBase.
Table 1.

Number of data records/statements in FlyBase: September 13, 2004

Gene records (D.melanogaster)28 015
Genes records (species other than D.melanogaster)10 766
Genes with genome annotations (D.melanogaster)14 872
Genes with genome annotations, protein coding (D.melanogaster)14 367
Genes with GO annotations (D.melanogaster)9643
Genes with GO annotations (other species)143
Mutant alleles (D.melanogaster)66 279
Mutant alleles (other species)5058
Phenotypic data controlled vocabulary statements (D.melanogaster)142 756
Genetic interaction controlled vocabulary statements (D.melanogaster)76 580
Natural transposons (D.melanogaster)145
Natural transposon insertions mapped to euchromatic genome (D.melanogaster)1571
Genetically engineered transposons (D.melanogaster)14 075
DNA sequence accession records curated to genes (D.melanogaster)77 629
DNA sequence accession records curated to genes (other species)17 912
References128 602
Stock Center stocks27 200

THE GENE REPORT

FlyBase provides several formats of gene report which differ by degree of completeness of data reported within the initial web page, the default being the Synopsis format. The Synopsis report for the maleless (mle) gene is shown in Figure 1. The Synopsis report displays commonly accessed gene information fields, an Available reports side panel to allow easy access to other report formats, and a text Summary generated automatically from the underlying data. The Abridged report format displays a wider range of information in the initial display than the Synopsis format, but collapses many of the details, such as individual Allele reports, into links in tables. The Full report format is the most comprehensive initial display.
Figure 1

FlyBase gene report, highlighting different format and subsection report options, automated gene summaries and the recently added External Database section.

FlyBase also offers Subsection reports selected by data type, for example, alleles of that gene, references that discuss the gene and sequences in the DNA and protein data banks that correspond to the gene. Links to these and other subreports are listed in the Subsections panel of the Synopsis report. Recent additions include the Gene Ontology subreport, the Genetic Interactions subreport and the Constructs & Insertions subreport. Gene reports now include an External Database Links section (http://flybase.bio.indiana.edu/allied-data/extdb/ExternalLinks.htm). This section houses links to databases external to FlyBase, to ease access to information about the gene that falls outside the scope of FlyBase data curation. The databases currently listed in this section include; the BDGP In Situ Gene Expression Database (10), Drosophila melanogaster Exon Database (http://proline.bic.nus.edu.sg/dedb), PANTHER Protein Classification (11,12), Fly GRID Interaction Data (13), Hybrigenics PIMRider interactions (14), Interactive Fly (15), Yale Developmental Gene Expression (16) and NCBI's Gene Expression Omnibus (17). Not all genes have an entry in all these databases. The number of external links in place via this facility exceeds 76 500.

THE GENE ANNOTATION REPORT

Detailed information about the annotated transcripts and other sequence-level data for a particular gene are to be found in the Annotation Report. This may be accessed from the Gene Report page from the link ‘Genome Annotation’ or by a direct query using the ‘Gene Annotations’ option in the homepage search box. The Annotation Query Form (http://flybase.bio.indiana.edu/annot/fbannquery.hform) allows queries based on location, gene class, peptide length, mapped expressed sequenced tags (ESTs) or cDNAs, GO terms, or terms within annotation comments. An example of an Annotation Report is shown in Figure 2. Notable features include a graphic representation of the transcript structures aligned with supporting evidence, information about each transcript and protein product, links to sequence data and information about other data mapped experimentally to the genomic sequence, such as point mutations, aberration breakpoints, rescue fragments and experimentally defined regulatory regions. Accompanying comments describe any unusual characteristics of the gene model, such as atypical splice donor or acceptor, non-AUG translation start, or dicistronic transcript. At the top of the report is a link to the peptide analysis that includes a graphic display of homologous proteins and known InterPro (3) protein motifs.
Figure 2

FlyBase annotation report. The panels show sequential extracts from the annotation report. At the top there are links to a Cytogenetic map, the GenBank scaffold sequence accession, and a Peptides ‘view analysis’ page showing alignments to related proteins and protein domain predictions. The ‘Sequence’ option allows the user to retrieve sequence for the gene region, transcripts, UTRs or proteins in a choice of formats. The Gene Annotation and Evidence panel shows two alternative transcripts and supporting EST, cDNA and protein (blastx) data. Note that the mle-RB transcript is based on data curated from the literature (not represented graphically), and that cDNA data supports an additional alternative transcript (to be added in the next annotation update). Details about the annotated transcripts and protein products are presented, and an ‘Other Features’ section describes mutational lesions, rescue fragments and other entities mapped onto the sequence level. These features appear on the GBrowse map, which may be accessed from a link at the top of the page.

GENE REGION MAPS: GBROWSE AND APOLLO

A molecular map of the region surrounding a gene may be accessed through the Gene Region Map (GBrowse) link on either the Gene Report page or the Gene Annotation Report. GBrowse (18) is a configurable genome viewer that allows the presentation of both molecularly mapped and cytologically mapped data (http://www.gmod.org/ggb/gbrowse.shtml; see Supplementary Figure 2). Annotations or larger genomic regions may also be viewed using the interactive viewing and editing tool, Apollo (19). Apollo is available for Windows, MacOSX or Unix systems and may be downloaded from the Apollo site (http://www.fruitfly.org/annot/apollo).

BULK DATA DOWNLOADS

FlyBase offers a variety of routes for bulk data retrieval; a recent addition is the Batch Download Reports by ID facility shown in Figure 3. This tool allows the user to query the genes dataset for many records at once, by valid symbol or by FlyBase identification number. The users can select the output type they wish to retrieve (HTML/Text, Spreadsheet or Database format). For HTML/Text outputs, the user can choose Report Content (from Synopsis, Abridged, Full, Summary, Alleles, Sequences, Reviews, References). For HTML/Text or Spreadsheet outputs, it is possible to filter output by field, using the ‘Select fields’ function. A related tool, Batch Download Sequences by ID, allows querying for sequences for many genes simultaneously. Options for sequence retrieved are Gene Region, Transcript, Translation, 3′-untranslated region (3′-UTR) and 5′-UTR. Both Batch Download forms can be accessed from the Genes data directory or from the Genome Annotation and Sequences page.
Figure 3

The FlyBase ‘Batch Download Reports by ID’ tool. In this example, seven genes are the subject of the query (listed in the ‘Enter List of Ids’ box), the user has selected ‘Document hypertext’ as the output format, and is in the process of selecting which data fields to retrieve.

In addition to bulk queries performed over the web interface, FlyBase data files are available for download by ftp from several of our mirror sites, in a text, acode or XML format. Protocols are described in the FlyBase Reference Manual section D (http://flybase.org/docs/lk/refman/refman-D.html).

D.MELANOGASTER GENOME RELEASES

The genomic sequence of D.melanogaster continues to be refined and expanded (http://flybase.bio.indiana.edu/annot/release3.html); the Berkeley Drosophila Genome Project has made public Release 4.0 of the genome sequence (http://www.fruitfly.org/annot/release4.html), and is currently finishing Release 5.0. FlyBase makes regular corrections and additions to the gene model annotations based on new data submissions to the sequence databases, user error reports and literature curation. We anticipate that comparative genomic analyses will play an increasing role in annotation assessment and improvement. Annotation updates are indicated by decimal numbers appended to the release number: e.g. Release 4.0 and Release 4.1. The heterochromatic portion of the genome is being analyzed by members of the Drosophila Heterochromatin Genome Project (http://www.dhgp.org); the heterochromatin annotations are accessible through FlyBase.

ADDITIONAL DROSOPHILA GENOMES

The National Human Genome Research Institute (NHGRI) has recognized the importance of comparative genomic analysis for the annotation of D.melanogaster and for understanding how genomes evolved. Towards this end, the major NHGRI-funded sequencing centers are sequencing 11 additional species of Drosophila (pseudoobscura, yakuba, simulans, virilis, ananassae, erecta, willistoni, grimshawi, mojavensis, persimilis and sechellia; status of projects reported at http://genome.gov/page.cfm?pageID=10002154). The genome sequences, annotations, syntenic relationships and other data from these genome projects will be incorporated into FlyBase, consistent with FlyBase's long-term commitment to maintaining genomic and genetic data on the family Drosophilidae.

THE CHADO DATABASE SCHEMA

FlyBase has been operating since 1992 and is now in the process of developing and populating a new database structure, an integrated implementation of the chado generic genome database schema (http://www.gmod.org/schema/). The initial design of the chado schema was undertaken by FlyBase developers at Harvard and Berkeley to fully integrate the finished D.melanogaster genome sequence and annotation with the vast body of Drosophila genetic and phenotypic data produced over the last 100 years. The chado schema is an open software project and is being developed in cooperation with the GMOD initiative (http://www.gmod.org).

REFERENCING FLYBASE

We suggest FlyBase be referenced in publications by citing this publication and the FlyBase web address (http://flybase.org).

SUPPLEMENTARY MATERIAL

Supplementary Material is available at NAR Online.
  19 in total

1.  UniProt: the Universal Protein knowledgebase.

Authors:  Rolf Apweiler; Amos Bairoch; Cathy H Wu; Winona C Barker; Brigitte Boeckmann; Serenella Ferro; Elisabeth Gasteiger; Hongzhan Huang; Rodrigo Lopez; Michele Magrane; Maria J Martin; Darren A Natale; Claire O'Donovan; Nicole Redaschi; Lai-Su L Yeh
Journal:  Nucleic Acids Res       Date:  2004-01-01       Impact factor: 16.971

2.  Assessment of genome-wide protein function classification for Drosophila melanogaster.

Authors:  Huaiyu Mi; Jody Vandergriff; Michael Campbell; Apurva Narechania; William Majoros; Suzanna Lewis; Paul D Thomas; Michael Ashburner
Journal:  Genome Res       Date:  2003-09       Impact factor: 9.043

3.  PANTHER: a library of protein families and subfamilies indexed by function.

Authors:  Paul D Thomas; Michael J Campbell; Anish Kejariwal; Huaiyu Mi; Brian Karlak; Robin Daverman; Karen Diemer; Anushya Muruganujan; Apurva Narechania
Journal:  Genome Res       Date:  2003-09       Impact factor: 9.043

4.  The FlyBase database of the Drosophila genome projects and community literature.

Authors: 
Journal:  Nucleic Acids Res       Date:  2003-01-01       Impact factor: 16.971

5.  The GRID: the General Repository for Interaction Datasets.

Authors:  Bobby-Joe Breitkreutz; Chris Stark; Mike Tyers
Journal:  Genome Biol       Date:  2003-02-27       Impact factor: 13.583

Review 6.  Annotation of the Drosophila melanogaster euchromatic genome: a systematic review.

Authors:  Sima Misra; Madeline A Crosby; Christopher J Mungall; Beverley B Matthews; Kathryn S Campbell; Pavel Hradecky; Yanmei Huang; Joshua S Kaminker; Gillian H Millburn; Simon E Prochnik; Christopher D Smith; Jonathan L Tupy; Eleanor J Whitfied; Leyla Bayraktaroglu; Benjamin P Berman; Brian R Bettencourt; Susan E Celniker; Aubrey D N J de Grey; Rachel A Drysdale; Nomi L Harris; John Richter; Susan Russo; Andrew J Schroeder; Sheng Qiang Shu; Mark Stapleton; Chihiro Yamada; Michael Ashburner; William M Gelbart; Gerald M Rubin; Suzanna E Lewis
Journal:  Genome Biol       Date:  2002-12-31       Impact factor: 13.583

7.  Heterochromatic sequences in a Drosophila whole-genome shotgun assembly.

Authors:  Roger A Hoskins; Christopher D Smith; Joseph W Carlson; A Bernardo Carvalho; Aaron Halpern; Joshua S Kaminker; Cameron Kennedy; Chris J Mungall; Beth A Sullivan; Granger G Sutton; Jiro C Yasuhara; Barbara T Wakimoto; Eugene W Myers; Susan E Celniker; Gerald M Rubin; Gary H Karpen
Journal:  Genome Biol       Date:  2002-12-31       Impact factor: 13.583

8.  The transposable elements of the Drosophila melanogaster euchromatin: a genomics perspective.

Authors:  Joshua S Kaminker; Casey M Bergman; Brent Kronmiller; Joseph Carlson; Robert Svirskas; Sandeep Patel; Erwin Frise; David A Wheeler; Suzanna E Lewis; Gerald M Rubin; Michael Ashburner; Susan E Celniker
Journal:  Genome Biol       Date:  2002-12-23       Impact factor: 13.583

9.  Finishing a whole-genome shotgun: release 3 of the Drosophila melanogaster euchromatic genome sequence.

Authors:  Susan E Celniker; David A Wheeler; Brent Kronmiller; Joseph W Carlson; Aaron Halpern; Sandeep Patel; Mark Adams; Mark Champe; Shannon P Dugan; Erwin Frise; Ann Hodgson; Reed A George; Roger A Hoskins; Todd Laverty; Donna M Muzny; Catherine R Nelson; Joanne M Pacleb; Soo Park; Barret D Pfeiffer; Stephen Richards; Erica J Sodergren; Robert Svirskas; Paul E Tabor; Kenneth Wan; Mark Stapleton; Granger G Sutton; Craig Venter; George Weinstock; Steven E Scherer; Eugene W Myers; Richard A Gibbs; Gerald M Rubin
Journal:  Genome Biol       Date:  2002-12-23       Impact factor: 13.583

10.  Systematic determination of patterns of gene expression during Drosophila embryogenesis.

Authors:  Pavel Tomancak; Amy Beaton; Richard Weiszmann; Elaine Kwan; ShengQiang Shu; Suzanna E Lewis; Stephen Richards; Michael Ashburner; Volker Hartenstein; Susan E Celniker; Gerald M Rubin
Journal:  Genome Biol       Date:  2002-12-23       Impact factor: 13.583

View more
  159 in total

1.  Odorant receptor polymorphisms and natural variation in olfactory behavior in Drosophila melanogaster.

Authors:  Stephanie M Rollmann; Ping Wang; Priya Date; Steven A West; Trudy F C Mackay; Robert R H Anholt
Journal:  Genetics       Date:  2010-07-13       Impact factor: 4.562

2.  Rb1 mRNA expression in developing mouse teeth.

Authors:  Viktoria Andreeva; Justin Cardarelli; Pamela C Yelick
Journal:  Gene Expr Patterns       Date:  2012-01-25       Impact factor: 1.224

3.  Mutations in the K+/Cl- cotransporter gene kazachoc (kcc) increase seizure susceptibility in Drosophila.

Authors:  Daria S Hekmat-Scafe; Miriam Y Lundy; Rakhee Ranga; Mark A Tanouye
Journal:  J Neurosci       Date:  2006-08-30       Impact factor: 6.167

4.  Differential usage of alternative pathways of double-strand break repair in Drosophila.

Authors:  Christine R Preston; Carlos C Flores; William R Engels
Journal:  Genetics       Date:  2005-11-19       Impact factor: 4.562

5.  Exploring relationships and mining data with the UCSC Gene Sorter.

Authors:  W J Kent; Fan Hsu; Donna Karolchik; Robert M Kuhn; Hiram Clawson; Heather Trumbower; David Haussler
Journal:  Genome Res       Date:  2005-05       Impact factor: 9.043

6.  DNA sequence polymorphism and divergence at the erect wing and suppressor of sable loci of Drosophila melanogaster and D. simulans.

Authors:  John M Braverman; Brian P Lazzaro; Montserrat Aguadé; Charles H Langley
Journal:  Genetics       Date:  2005-06-08       Impact factor: 4.562

7.  Identification of Drosophila genes modulating Janus kinase/signal transducer and activator of transcription signal transduction.

Authors:  Tina Mukherjee; Ulrich Schäfer; Martin P Zeidler
Journal:  Genetics       Date:  2005-12-30       Impact factor: 4.562

Review 8.  Drosophila under the lens: imaging from chromosomes to whole embryos.

Authors:  Cornelia Fritsch; Ginette Ploeger; Donna J Arndt-Jovin
Journal:  Chromosome Res       Date:  2006       Impact factor: 5.239

9.  DNA underreplication in intercalary heterochromatin regions in polytene chromosomes of Drosophila melanogaster correlates with the formation of partial chromosomal aberrations and ectopic pairing.

Authors:  Elena S Belyaeva; Sergey A Demakov; Galina V Pokholkova; Artyom A Alekseyenko; Tatiana D Kolesnikova; Igor F Zhimulev
Journal:  Chromosoma       Date:  2006-04-01       Impact factor: 4.316

10.  Molecular mechanisms that funnel RNA precursors into endogenous small-interfering RNA and microRNA biogenesis pathways in Drosophila.

Authors:  Keita Miyoshi; Tomohiro Miyoshi; Julia Verena Hartig; Haruhiko Siomi; Mikiko C Siomi
Journal:  RNA       Date:  2010-01-19       Impact factor: 4.942

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.