Literature DB >> 15608219

Fungal BLAST and Model Organism BLASTP Best Hits: new comparison resources at the Saccharomyces Genome Database (SGD).

Rama Balakrishnan¹, Karen R Christie, Maria C Costanzo, Kara Dolinski, Selina S Dwight, Stacia R Engel, Dianna G Fisk, Jodi E Hirschman, Eurie L Hong, Robert Nash, Rose Oughtred, Marek Skrzypek, Chandra L Theesfeld, Gail Binkley, Qing Dong, Christopher Lane, Anand Sethuraman, Shuai Weng, David Botstein, J Michael Cherry.

Abstract

The Saccharomyces Genome Database (SGD; http://www.yeastgenome.org/) is a scientific database of gene, protein and genomic information for the yeast Saccharomyces cerevisiae. SGD has recently developed two new resources that facilitate nucleotide and protein sequence comparisons between S.cerevisiae and other organisms. The Fungal BLAST tool provides directed searches against all fungal nucleotide and protein sequences available from GenBank, divided into categories according to organism, status of completeness and annotation, and source. The Model Organism BLASTP Best Hits resource displays, for each S.cerevisiae protein, the single most similar protein from several model organisms and presents links to the database pages of those proteins, facilitating access to curated information about potential orthologs of yeast proteins.

Entities: Chemical Disease Species

Mesh：

Substances：

Year: 2005 PMID： 15608219 PMCID： PMC539977 DOI： 10.1093/nar/gki023

Source DB: PubMed Journal: Nucleic Acids Res ISSN： 0305-1048 Impact factor: 16.971

INTRODUCTION

The Saccharomyces Genome Database (SGD) collects and organizes biological information about genes and proteins of Saccharomyces cerevisiae, and presents this information on individual Locus Pages for each yeast gene (1,2). In addition to assembling a detailed library of information about S.cerevisiae, we continually strive to develop tools and resources that allow users to identify connections between S.cerevisiae genes and proteins and those from different species. These connections may help researchers studying other organisms to glean knowledge from more extensively studied S.cerevisiae genes, or may enhance the study of S.cerevisiae genes with data from other organisms. These resources include the Gene Ontology (3) as well as comparison resources such as PSI-BLAST analyses and the Synteny and Fungal Alignment Viewers (1). The Fungal BLAST and Model Organism BLASTP Best Hits tools, described here, are two new SGD resources that extend the users' reach beyond S.cerevisiae by allowing a variety of sequence comparisons. The Fungal BLAST tool may be used for comparison of any sequence of choice with a wide range of fungal nucleotide or protein sequences, while the Model Organism BLASTP Best Hits resource specifically makes connections between each S.cerevisiae protein and its best hit in protein sets from several other model organisms.

FUNGAL BLAST

The numerous publicly available fungal sequences in GenBank provide a rich source of information for the identification of conserved, functionally important coding and non-coding sequences. For example, recent large-scale comparisons among related fungal species have sparked new insights into the evolution of chromosome structure and regulatory sequences (4,5). This information also initiated revisions of the S.cerevisiae genome sequence (4–8). While bioinformatics approaches to sequence comparisons have been invaluable for gaining a broad understanding of genomes, single gene comparisons across species are often useful to researchers focused on particular areas of biology. The Fungal BLAST tool is designed to put these sorts of comparisons into the hands of researchers who concentrate on single loci or gene families. The Fungal BLAST tool uses the WU-BLAST software (9) to compare any query nucleotide or protein sequence to fungal sequence datasets at GenBank. These include genome sequences from multiple Saccharomyces species (including S.cerevisiae, S.bayanus, S.castellii, S.kluyveri, S.kudriavzevii, S.mikatae, S.paradoxus) as well as sequences from genome projects, ESTs and other available sequences from all phyla in the kingdom Fungi. The sequences are updated periodically, and new fungal sequence datasets are added. The current list of species whose genomic sequences are included in Fungal BLAST analysis at SGD is provided on the Fungal BLAST help page (http://www.yeastgenome.org/help/fungal-blast.html). Although these sequences are available for searching at NCBI, the Fungal BLAST tool facilitates faster, directed searching by dividing the sequences into searchable subcategories according to organism, the status of genome sequencing and annotation (Complete Genomes, Annotated Genomes, Assembled Genomes, etc.), and type of sequence (e.g. Mitochondrial, EST, etc.). In order to accommodate different types of query and target sequence, the Fungal BLAST tool offers four BLAST programs: The Fungal BLAST search is accessible from the ‘Comparison Resources’ pull-down menu on each Locus Page of all S.cerevisiae genes and structural features such as ARS and CEN elements as ‘BLASTN vs fungi’ or ‘TBLASTN vs fungi’. When the form is accessed from a Locus Page, the query sequence box contains the nucleotide or protein sequence of that locus. In addition, links to the interface from both the Analysis & Tools and the Homology & Comparisons contents pages allow input of any sequence of interest, either by simply pasting a text sequence into the dialog box, or by uploading a sequence file. The interface accepts sequences in FASTA, GCG or raw text formats. BLASTN compares a nucleotide query sequence against a nucleotide sequence dataset; TBLASTN compares a protein query sequence against a nucleotide sequence dataset dynamically translated in all six reading frames (both strands); BLASTP compares an amino acid query sequence against a protein sequence dataset; BLASTX compares the six-frame conceptual translation products of a nucleotide query sequence (both strands) against a protein sequence dataset. The ability to analyze this broad spectrum of fungal sequence data provides a powerful means to identify sequences conserved through evolution and presumably important for the biology of the organisms. Limiting the genomes to those from fungi allows researchers to identify signatures in fungal-specific genes and gene products that might not be discernible by comparing sequences across kingdoms.

MODEL ORGANISM BLASTP BEST HITS

Comparison of sequences across diverse taxa is a powerful technique for finding universally conserved domains. If a curated database page exists for a protein of another organism to which a S.cerevisiae protein has similarity, the significance of the sequence conservation may become clear. To help our users find curated information concerning protein sequences conserved between S.cerevisiae and other organisms, SGD has developed the Model Organism BLASTP Best Hits page. The Model Organism BLASTP Best Hits page (Figure 1) displays the results of NCBI BLASTP analyses, with the default parameters, using each S.cerevisiae protein sequence as the query against the complete set of predicted protein sequences from several model organisms. The single best BLASTP hit with an E-value of ≤0.01 is shown for each organism (more than one hit may be shown if the top hits have identical E-values). Protein datasets used for comparison are limited to completely sequenced and annotated genomes where curated database web pages are available for the individual proteins. As of September 2004, BLASTP analyses had been run against predicted protein sequences from six model organisms (Table 1). S.cerevisiae is one of the model organisms used for comparison, but in this case the target sequence identical to the query sequence is excluded from the Best Hits display, and the next best hit is shown.

Figure 1

Summary table of the Model Organism BLASTP Best Hits page. A summary table similar to this representative table is generated for each locus having a ‘hit’ in one or more model organism databases. In this figure, Saccharomyces protein Yer179wp results are shown as an example. Columns of the table are as follows: species of the hit protein; name of the database for the hit protein, hyperlinked to the home page of that database; name of the hit protein from its database, hyperlinked to the database page of that protein or its gene; description of the hit protein, as found in its database; E-value (expectation value), reflecting the number of hits expected to be found by chance; percent aligned, showing the percentage of the length of the query protein over which it aligns with the hit protein; source range, showing the amino acid coordinates of the region of the S.cerevisiae query protein that was aligned; and target range, showing the amino acid coordinates of the region of the ‘hit’ protein that was aligned with the S.cerevisiae query protein.

Table 1.

Summary of best BLASTP hits for S.cerevisiae proteins in selected model organism databases

Organism (database name)	Total predicted proteins	Predicted proteins similar to S.cerevisiae query proteins	S.cerevisiae proteins with a hit in the target organism
Drosophila melanogaster (FlyBase)	18 746	2949 (15.73%)	2929 (44.44%)
Ashbya gossypii (AGD)	4726	4231 (89.53%)	4980 (75.56%)
Caenorhabditis elegans (WormBase)	22 254	2176 (9.77%)	2834 (43.00%)
Arabidopsis thaliana (TAIR)	29 161	2718 (9.32%)	3109 (47.17%)
Homo sapiens (ENSEMBL)	29 802	2730 (9.16%)	3137 (47.60%)
Saccharomyces cerevisiae (SGD)	6703	2246 (33.51%)	2984 (45.27%)

For each model organism, the table displays: total number of predicted proteins, as of September 2004; predicted proteins that are ‘hit’ (E-value ≤ 0.01) by an S.cerevisiae query protein, expressed as the number of proteins and as the percentage of total proteins for that organism; and S.cerevisiae proteins that find a hit in the predicted proteins of that model organism, expressed as the number of proteins and as the percentage of total S.cerevisiae open reading frames. The S.cerevisiae set of predicted protein sequences comprised 6591 open reading frames predicted as of September 2004. For comparisons to S.cerevisiae, the best hit is defined as the most similar protein not identical to the query sequence.

Out of the 6591 protein coding S.cerevisiae genes, 5368 have a hit in at least one other model organism database (MOD) while 2387 ORFs have a hit in all 5 MODs (excluding SGD). Of the 78 dubious ORFs (considered unlikely to encode a protein), five had hits in the Ashbya gossypii dataset and the rest had hits only within the S.cerevisiae protein dataset (Table 2).

Table 2.

BLASTP best hits for S.cerevisiae proteins, sorted by ORF classification

ORF classification	Total number in the S.cerevisiae genome	S.cerevisiae ORFs with a hit in one or more of the MODs (including SGD)	S.cerevisiae ORFs with a hit in all 5 MODS (excluding SGD)
Verified	4231	4069 (96.17%)	2105 (49.75%)
Uncharacterized	1546	1221 (78.98%)	282 (18.24%)
Dubious	814	78 (9.58%)	0
Total	6591	5368 (81.44%)	2387 (36.22%)

ORFs are classified at SGD as verified, uncharacterized or dubious, depending on the likelihood that they are expressed as protein products. For each ORF class and for the whole ORF set, the table displays: the total number of ORFs in that set; the ORFs that find a hit in one or more of the model organism protein sequence datasets including S.cerevisiae proteins, expressed as the number of ORFs and as the percentage of total ORFs in that set; and the ORFs that find a hit in all of the model organism protein datasets excluding S.cerevisiae proteins, expressed as the number of ORFs and as the percentage of total ORFs in that set.

The Model Organism BLASTP Best Hits results page (Figure 1) shows the best hits in the other organisms for the S.cerevisiae query protein along with details about the alignments and links to the relevant database pages for the proteins from other organisms. The BLASTP analyses are run periodically and the model organisms included in these analyses will be updated as new datasets are available from other model organism databases. The Model Organism BLASTP Best Hits page can be accessed from the ‘Comparison Resources’ pull-down menu on the right-hand side of the Locus Page and from a link on the Homology & Comparisons contents page. A file containing all of the Best Hits data is available for download from our FTP site (ftp://ftp.yeastgenome.org/yeast/).

SUMMARY

SGD is continually expanding its resources to increase the ease of access to information about genes and proteins from fungi and other organisms. The Fungal BLAST and the Model Organism BLASTP Best Hits resources allow easy identification and examination of the conserved sequence regions in fungal genomes and facilitate the use of S.cerevisiae as a model organism and reference for comparison with other species. This will further aid in understanding the function and evolution of these sequences.

9 in total

1. Saccharomyces Genome Database (SGD) provides secondary gene annotation using the Gene Ontology (GO).

Authors: Selina S Dwight; Midori A Harris; Kara Dolinski; Catherine A Ball; Gail Binkley; Karen R Christie; Dianna G Fisk; Laurie Issel-Tarver; Mark Schroeder; Gavin Sherlock; Anand Sethuraman; Shuai Weng; David Botstein; J Michael Cherry
Journal: Nucleic Acids Res Date: 2002-01-01 Impact factor: 16.971

2. Local alignment statistics.

Authors: S F Altschul; W Gish
Journal: Methods Enzymol Date: 1996 Impact factor: 1.600

3. Genomic exploration of the hemiascomycetous yeasts: 4. The genome of Saccharomyces cerevisiae revisited.

Authors: G Blandin; P Durrens; F Tekaia; M Aigle; M Bolotin-Fukuhara; E Bon; S Casarégola; J de Montigny; C Gaillardin; A Lépingle; B Llorente; A Malpertuy; C Neuvéglise; O Ozier-Kalogeropoulos; A Perrin; S Potier; J Souciet; E Talla; C Toffano-Nioche; M Wésolowski-Louvel; C Marck; B Dujon
Journal: FEBS Lett Date: 2000-12-22 Impact factor: 4.124

4. Saccharomyces Genome Database (SGD) provides tools to identify and analyze sequences from Saccharomyces cerevisiae and related sequences from other organisms.

Authors: Karen R Christie; Shuai Weng; Rama Balakrishnan; Maria C Costanzo; Kara Dolinski; Selina S Dwight; Stacia R Engel; Becket Feierbach; Dianna G Fisk; Jodi E Hirschman; Eurie L Hong; Laurie Issel-Tarver; Robert Nash; Anand Sethuraman; Barry Starr; Chandra L Theesfeld; Rey Andrada; Gail Binkley; Qing Dong; Christopher Lane; Mark Schroeder; David Botstein; J Michael Cherry
Journal: Nucleic Acids Res Date: 2004-01-01 Impact factor: 16.971

5. The Ashbya gossypii genome as a tool for mapping the ancient Saccharomyces cerevisiae genome.

Authors: Fred S Dietrich; Sylvia Voegeli; Sophie Brachat; Anita Lerch; Krista Gates; Sabine Steiner; Christine Mohr; Rainer Pöhlmann; Philippe Luedi; Sangdun Choi; Rod A Wing; Albert Flavier; Thomas D Gaffney; Peter Philippsen
Journal: Science Date: 2004-03-04 Impact factor: 47.728

6. Finding functional features in Saccharomyces genomes by phylogenetic footprinting.

Authors: Paul Cliften; Priya Sudarsanam; Ashwin Desikan; Lucinda Fulton; Bob Fulton; John Majors; Robert Waterston; Barak A Cohen; Mark Johnston
Journal: Science Date: 2003-05-29 Impact factor: 47.728

7. Sequencing and comparison of yeast species to identify genes and regulatory elements.

Authors: Manolis Kellis; Nick Patterson; Matthew Endrizzi; Bruce Birren; Eric S Lander
Journal: Nature Date: 2003-05-15 Impact factor: 49.962

8. Saccharomyces genome database: underlying principles and organisation.

Authors: Selina S Dwight; Rama Balakrishnan; Karen R Christie; Maria C Costanzo; Kara Dolinski; Stacia R Engel; Becket Feierbach; Dianna G Fisk; Jodi Hirschman; Eurie L Hong; Laurie Issel-Tarver; Robert S Nash; Anand Sethuraman; Barry Starr; Chandra L Theesfeld; Rey Andrada; Gail Binkley; Qing Dong; Christopher Lane; Mark Schroeder; Shuai Weng; David Botstein; J Michael Cherry
Journal: Brief Bioinform Date: 2004-03 Impact factor: 11.622

9. Reinvestigation of the Saccharomyces cerevisiae genome annotation by comparison to the genome of a related fungus: Ashbya gossypii.

Authors: Sophie Brachat; Fred S Dietrich; Sylvia Voegeli; Zhihong Zhang; Larissa Stuart; Anita Lerch; Krista Gates; Tom Gaffney; Peter Philippsen
Journal: Genome Biol Date: 2003-06-25 Impact factor: 13.583

9 in total

27 in total

1. Quantitative assessment of dictionary-based protein named entity tagging.

Authors: Hongfang Liu; Zhang-Zhi Hu; Manabu Torii; Cathy Wu; Carol Friedman
Journal: J Am Med Inform Assoc Date: 2006-06-23 Impact factor: 4.497

2. SemCat: semantically categorized entities for genomics.

Authors: Lorraine Tanabe; Lynne H Thom; Wayne Matten; Donald C Comeau; W John Wilbur
Journal: AMIA Annu Symp Proc Date: 2006

3. Ypt31/32 GTPases and their F-Box effector Rcy1 regulate ubiquitination of recycling proteins.

Authors: Shu H Chen; Ankur H Shah; Nava Segev
Journal: Cell Logist Date: 2011-01

4. Candidate target genes for the Saccharomyces cerevisiae transcription factor, Yap2.

Authors: Seo Young Bang; Jeong Hoon Kim; Phil Young Lee; Seung-Wook Chi; Sayeon Cho; Gwan-Su Yi; Pyung Keun Myung; Byoung Chul Park; Kwang-Hee Bae; Sung Goo Park
Journal: Folia Microbiol (Praha) Date: 2013-01-19 Impact factor: 2.099

Review 5. Telomere maintenance, function and evolution: the yeast paradigm.

Authors: M T Teixeira; E Gilson
Journal: Chromosome Res Date: 2005 Impact factor: 5.239

6. The WTM genes in budding yeast amplify expression of the stress-inducible gene RNR3.

Authors: Susannah Green Tringe; Jason Willis; Katie L Liberatore; Stephanie W Ruby
Journal: Genetics Date: 2006-09-15 Impact factor: 4.562

7. Drosophila Inducer of MEiosis 4 (IME4) is required for Notch signaling during oogenesis.

Authors: Cintia F Hongay; Terry L Orr-Weaver
Journal: Proc Natl Acad Sci U S A Date: 2011-08-22 Impact factor: 11.205

8. Analysis of phosphorylation sites on proteins from Saccharomyces cerevisiae by electron transfer dissociation (ETD) mass spectrometry.

Authors: An Chi; Curtis Huttenhower; Lewis Y Geer; Joshua J Coon; John E P Syka; Dina L Bai; Jeffrey Shabanowitz; Daniel J Burke; Olga G Troyanskaya; Donald F Hunt
Journal: Proc Natl Acad Sci U S A Date: 2007-02-07 Impact factor: 11.205

9. Improving the iMM904 S. cerevisiae metabolic model using essentiality and synthetic lethality data.

Authors: Ali R Zomorrodi; Costas D Maranas
Journal: BMC Syst Biol Date: 2010-12-29

10. Competition between pentoses and glucose during uptake and catabolism in recombinant Saccharomyces cerevisiae.

Authors: Thorsten Subtil; Eckhard Boles
Journal: Biotechnol Biofuels Date: 2012-03-16 Impact factor: 6.040