| Literature DB >> 18028535 |
Cornelia Hedeler1, Han Min Wong, Michael J Cornell, Intikhab Alam, Darren M Soanes, Magnus Rattray, Simon J Hubbard, Nicholas J Talbot, Stephen G Oliver, Norman W Paton.
Abstract
BACKGROUND: The number of sequenced fungal genomes is ever increasing, with about 200 genomes already fully sequenced or in progress. Only a small percentage of those genomes have been comprehensively studied, for example using techniques from functional genomics. Comparative analysis has proven to be a useful strategy for enhancing our understanding of evolutionary biology and of the less well understood genomes. However, the data required for these analyses tends to be distributed in various heterogeneous data sources, making systematic comparative studies a cumbersome task. Furthermore, comparative analyses benefit from close integration of derived data sets that cluster genes or organisms in a way that eases the expression of requests that clarify points of similarity or difference between species. DESCRIPTION: To support systematic comparative analyses of fungal genomes we have developed the e-Fungi database, which integrates a variety of data for more than 30 fungal genomes. Publicly available genome data, functional annotations, and pathway information has been integrated into a single data repository and complemented with results of comparative analyses, such as MCL and OrthoMCL cluster analysis, and predictions of signaling proteins and the sub-cellular localisation of proteins. To access the data, a library of analysis tasks is available through a web interface. The analysis tasks are motivated by recent comparative genomics studies, and aim to support the study of evolutionary biology as well as community efforts for improving the annotation of genomes. Web services for each query are also available, enabling the tasks to be incorporated into workflows.Entities:
Mesh:
Year: 2007 PMID: 18028535 PMCID: PMC2242804 DOI: 10.1186/1471-2164-8-426
Source DB: PubMed Journal: BMC Genomics ISSN: 1471-2164 Impact factor: 3.969
Figure 1Overview of diversity of available databases.
Genomes in e-Fungi with associated data sources
| Oomycete | plant pathogen | filamentous | JGI | |
| Oomycete | plant pathogen | filamentous | JGI | |
| Zygomycota – Mucorales | animal pathogen | filamentous | Broad | |
| Basidiomycete – Ustilaginomycota | plant pathogen | dimorphic | Broad | |
| Basidiomycete – Homobasidiomycota | non pathogen | filamentous | JGI | |
| Ascomycete – Schizosaccharomycetes | non pathogen | yeast – fission | Entrez | |
| Ascomycete – Saccharomycetes | non pathogen | yeast – dimorphic | Entrez | |
| Ascomycete – Saccharomycetes | non pathogen | yeast | SGD | |
| Ascomycete – Saccharomycetes | non pathogen | yeast | Entrez | |
| Ascomycete – Saccharomycetes | non pathogen | yeast | SGD | |
| Ascomycete – Saccharomycetes | non pathogen | yeast | SGD | |
| Ascomycete – Saccharomycetes | non pathogen | yeast | SGD | |
| Ascomycete – Saccharomycetes | non pathogen | yeast | SGD | |
| Ascomycete – Saccharomycetes | animal pathogen | psuedo hyphae – dimorphic | Entrez | |
| Ascomycete – Saccharomycetes | non pathogen | yeast | Entrez | |
| Ascomycete – Saccharomycetes | non pathogen | yeast | SGD | |
| Ascomycete – Saccharomycetes | non pathogen | yeast | Entrez | |
| Ascomycete – Saccharomycetes | plant pathogen | filamentous | Entrez | |
| Ascomycete – Saccharomycetes | animal pathogen | psuedo hyphae – dimorphic | Entrez | |
| Ascomycete – Saccharomycetes | non pathogen | yeast – dimorphic | Entrez | |
| Ascomycete – Saccharomycetes | animal pathogen | yeast – dimorphic | Broad | |
| Ascomycete – Eurotiomycetes | animal pathogen | filamentous | Broad | |
| Ascomycete – Eurotiomycetes | non pathogen | filamentous | Dogan | |
| Ascomycete – Eurotiomycetes | non pathogen | filamentous | JGI | |
| Ascomycete – Eurotiomycetes | animal pathogen | filamentous | CADRE | |
| Ascomycete – Eurotiomycetes | animal pathogen | filamentous | Broad | |
| Ascomycete – Eurotiomycetes | non pathogen | filamentous | Broad | |
| Ascomycete – Dothideomycetes | plant pathogen | filamentous | Broad | |
| Ascomycete – Leotiomycetes | plant pathogen | filamentous | Broad | |
| Ascomycete – Leotiomycetes | plant pathogen | filamentous | Broad | |
| Ascomycete – Sordariomycetes | non pathogen | filamentous | JGI | |
| Ascomycete – Sordariomycetes | plant pathogen | filamentous | Broad | |
| Ascomycete – Sordariomycetes | plant pathogen | filamentous | Broad | |
| Ascomycete – Sordariomycetes | animal pathogen | filamentous | Broad | |
| Ascomycete – Sordariomycetes | non pathogen | filamentous | Broad | |
| Microsporidia | animal pathogen | microsporidia | Entrez |
Figure 2Distribution of 5 of the most frequently found Pfam domains.
Figure 3Distribution of the most frequently assigned PSort predictions.
Figure 4Distribution of the most frequently assigned Wolf-PSort predictions.
Figure 5Overview of e-Fungi architecture.
Figure 6Loading infrastructure. Schematic overview of the loading infrastructure employed to integrate primary and derived data into the e-Fungi database.
Canned query groups currently provided
| Annotation of proteins in clusters | Queries in this group retrieve annotation of all the proteins in particular clusters. The annotation consists of PSort, Wolf-PSort and SignalP predictions, as well as GO annotations, Pfam domains, Enzyme annotation and pathways for each protein, as well as its assignment to a particular MCL and OrthoMCL cluster. The clusters can either be chosen by providing an identifier of a particular cluster or they can be based on the proteins they contain, such as proteins with a particular GO annotation or a particular cellular localisation as predicted by PSort or Wolf- PSort. |
| Cellular localisation analysis | This group of queries retrieves the cellular localisation for proteins as predicted by PSort and Wolf-PSort. It also retrieves proteins with a particular predicted cellular localisation. |
| EST analysis | Collection of general EST analyses. Information available include group/hierarchy structure of ESTs and genes as well as number of homologs of genes in all genomes in the database. |
| Essential yeast genes cluster analysis | Queries to retrieve Mcl Clusters containing proteins of a given genome and proteins of essential or non-essential yeast genes. |
| Essential yeast genes orthology analysis | This group of queries analyses clusters containing a given genome and proteins of essential or non-essential yeast genes in terms of the number of genomes present in those clusters. |
| Functional annotation analysis | Queries in this group enable the retrieval of Gene Ontology or Pfam annotation for a given protein, or the retrieval of proteins with a given annotation. |
| Genomics analysis | Collection of queries for general genomic analyses, such as retrieving the exons of a particular gene. |
| MCL cluster analysis | Queries in this group provide a general analysis of the MCL clusters in the database. Clusters containing proteins of a given genome, or a group of genomes, such as plant pathogens or filamentous fungi, can be retrieved. Furthermore, clusters that contain more or less than a given percentage of proteins of a given genome can also be obtained. |
| OrthoMCL cluster analysis | This group of queries provide a general analysis of the OrthoMCL clusters in the database. The queries in this group are similar in scope to the queries in the MCL cluster analysis group. |
| Pathway analysis | Queries provided in this group retrieve pathways and enzyme annotations for a particular protein as well as all the proteins in a given pathway or with a particular enzyme annotation. |
| Redundancy analysis | The query in this group analyses the redundancy in a given species. Genome redundancy is determined by counting the number of proteins of that given genome in MCL clusters. |
| Secretome analysis | To retrieve the SignalP prediction for a given protein or proteins with a given SignalP prediction, i.e., secretory or non-secretory proteins, queries in this group can be used. |
| Transcript abundance | Collection of queries for transcript abundance analyses. These queries enable the identification of genes that may be highly expressed under a particular growth condition. Information of these genes and conditions can also be retrieved. |
Figure 7Screenshot of parameterisation of a canned query. Screenshot of the web interface showing the parameterisation of the query 'Get clusters with proteins of a given genome.', which is part of the group 'MCL cluster analysis'.
Figure 8Screenshot of the query result. Screenshot of the web interface showing a subset of the MCL clusters with Aspergillus nidulans proteins. The clusters shown are the three clusters containing only proteins of filamentous genomes and no yeast like genomes, whereas all the remaining 7593 contain both.
Figure 9Screenshot of Advanced search. Screenshot of the Advanced search feature of the web interface. This feature enables the filtering of objects of a particular type and can be used to retrieve the exact value of names or identifiers of which only the beginning or end is known.
Figure 10Sample workflow. Workflow schema describing multi-sequence alignment and visualisation using web services.
Clustering of 450 proteins
| 35 | 18 | 6 | |
| 31 | 17 | 3 | |
| 49 | 8 | 4 | |
| 20 | 15 | 0 | |
| 150 | 25 | 10 | |
| 2 | 2 | 0 | |
| 17 | 7 | 1 | |
| 3 | 3 | 0 | |
| 3 | 3 | 0 | |
| 3 | 3 | 0 | |
| 2 | 2 | 0 | |
| 3 | 3 | 0 | |
| 3 | 3 | 0 | |
| 3 | 3 | 0 | |
| 3 | 3 | 0 | |
| 5 | 5 | 0 | |
| 3 | 3 | 0 | |
| 3 | 3 | 0 | |
| 19 | 7 | 0 | |
| 9 | 6 | 0 | |
| 8 | 7 | 0 | |
| 44 | 32 | 6 | |
| 155 | 86 | 28 | |
| 150 | 86 | 20 | |
| 72 | 50 | 5 | |
| 116 | 67 | 25 | |
| 119 | 79 | 15 | |
| 148 | 83 | 37 | |
| 92 | 70 | 13 | |
| 79 | 53 | 18 | |
| 71 | 43 | 10 | |
| 107 | 65 | 14 | |
| 89 | 68 | 14 | |
| 133 | 67 | 30 | |
| 39 | 33 | 2 | |
| 0 | 0 | 0 |