Literature DB >> 21051345

FGDB: revisiting the genome annotation of the plant pathogen Fusarium graminearum.

Philip Wong1, Mathias Walter, Wanseon Lee, Gertrud Mannhaupt, Martin Münsterkötter, Hans-Werner Mewes, Gerhard Adam, Ulrich Güldener.   

Abstract

The MIPS Fusarium graminearum Genome Database (FGDB) was established as a comprehensive genome database on one of the most devastating fungal plant pathogens of wheat, barley and maize. The current version of FGDB v3.1 provides information on the full manually revised gene set based on the Broad Institute assembly FG3 genome sequence. The results of gene prediction tools were integrated with the help of comparative data on related species to result in a set of 13.718 annotated protein coding genes. This rigorous approach involved adding or modifying gene models and represents a coding sequence gold standard for the genus Fusarium. The gene loci improvements results in 2461 genes which either are new or have different structures compared to the Broad Institute assembly 3 gene set. Moreover the database serves as a convenient entry point to explore expression data results and to obtain information on the Affymetrix GeneChip probe sets. The resource is accessible on http://mips.gsf.de/genre/proj/FGDB/.

Entities:  

Mesh:

Substances:

Year:  2010        PMID: 21051345      PMCID: PMC3013644          DOI: 10.1093/nar/gkq1016

Source DB:  PubMed          Journal:  Nucleic Acids Res        ISSN: 0305-1048            Impact factor:   16.971


INTRODUCTION

The ascomycete Fusarium graminearum (anamorph Gibberella zeae) is the causal agent of several plant diseases of world-wide economic importance (1). Fusarium head blight of cereals and Fusarium ear rot of maize lead to severe yield losses and quality problems. Most importantly, mycotoxins (2) produced by the pathogen contaminate infected plant material and derived food and feed products leading to a health risk. To protect consumers and to avoid a negative impact on farm animals, maximum tolerated levels for Fusarium toxins have been enacted in many countries and costly mycotoxin monitoring programs were implemented. The most sustainable solution to the problem seems to be breeding resistant plants. Yet, this is difficult, because the molecular basis of quantitative resistance differences are not understood (3). The pathogen has a very broad host range and seems to be able to suppress plant defense responses in ways that are currently not understood or to a very limited extent (4). The elucidation of fungal virulence mechanisms and the identification of virulence genes that can be targeted by breeding or biotechnological approaches is the main goal of a large research community. As a first step in the development of genomics tools for F. graminearum and as a basis for functional genomics approaches, the full genome sequence of one F. graminearum strain was determined (5). The setup of the first version of the FGDB (6) was supported by a project funded by the Austrian genome initiative GEN-AU and was based on the first genome assembly. It already focused on manual improvements of gene calls. The intuitive user interface allowed access to the data through various search and browsing methods. Input from the research community enhanced the annotation effort and established the resource as a key tool for F. graminearum genomics (5). The current FGDB v3.1 (http://mips.gsf.de/genre/proj/FGDB/) aims to provide a comprehensive resource for the international research community based on the latest assembly of the genome sequence and on a manually revisited set of 13.718 genes, 319 tRNAs and genetic markers with a detailed functional annotation and bioinformatic analysis. In addition, the database was expanded to provide convenient access to available GeneChip expression data.

SOURCE DATA AND CONTENT OF THE DATABASE

The source data for FGDB were provided by the F. graminearum sequencing project at the Broad Institute, which is supported by the National Research Initiative being part of the US Department of Agriculture’s (USDA’s) Cooperative State Research Education and Extension Service. The current content of FGDB v3.1 is based on the Broad assembly 3 resulting in 31 supercontigs (7). The Broad Institute used the previous FGDB version 1 with its manually revised gene calls to improve their current gene set. Based on this set, all gene loci in FGDB v3.1 were re-annotated using a pipeline including (i) Fgenesh with different matrices (www.softberry.com); (ii) GeneMark-ES (8); (iii) Augustus with ESTs, precedingly annotated Fusarium models and/or Neurospora crassa protein sequences as training data or as hints for the predicted model structure (9); and (iv) EST data as well as Blastx data of related Fusarium species (F. verticillioides, F. oxysporum and F. solani). The different models were displayed in GBrowse (10) allowing comprehensive manual validation of the coding sequences (CDSs). The best fitting model per locus was selected manually and in case for required changes, respective gene calls were manually corrected using Apollo (11). The gene identifiers have been retained unchanged from the Broad FG3 gene set if the model was identical. All altered (1770) or newly added gene calls (691) are named FGSG_15xxx and above. The outdated draft identifiers used for the Affymetrix GeneChip design (fgdxx-xxx, 13 938 genes) (12) and the corresponding FG1 identifiers (fgxxxxx, 11 640 genes) are listed as alias in the entry pages and are linked to Pedant databases for details (13). The ORF data and resulting protein sequences are imported in the Pedant system for a detailed functional and structural bioinformatic analysis. The core results are re-imported into FGDB for convenient display and indexing. The Pedant analysis details are inter-linked with each FGDB entry. The assembly 1 data were used for the design of an Affymetrix GeneChip (12). The single probes were mapped on the supercontigs using Blat at 100% identity. Probe sets corresponding to gene loci are searchable and visualized in the GBrowse viewer. The initial expression analysis results are integrated for a brief overview on the expression of single genes. Similarity based data (e.g. homology between protein pairs) is retrieved from and interlinked to the Similarity Matrix of Proteins (SIMAP), which is updated on a monthly interval (14). Comparison of the Broad FG3 and FGDB v3.1 annotated gene sets indicate that 11 257 genes (82%) are exactly the same in terms of exon/intron structure. A total of 2461 genes in the Broad set either have a different structure or are absent from FGDB. A total of 2056 genes in FGDB either have a different structure or are absent from the Broad data. With the evidence of protein similarity to related species, 26 genes in the Broad set have been split into two or more genes in FGDB while 147 genes in FGDB were merged from two or more genes of the Broad set. Overall, FGDB v3.1 contains 383 more introns than the Broad set, with a decrease in mean intron length from 83.4 to 76.6 nt. Both annotation sets have ∼65% of genes annotated with at least one putative InterPro domain (15). The average number of domains annotated per gene for both Broad and FGDB is ∼1.7. As judged by confirmation of introns by available ESTs, both Broad and FGDB are of similar quality indicating that the validation of gene calls by available EST data was similarly efficient for both pipelines. There are 103, 55 and 1651 proteins predicted only in FGDB, only in Broad and in both annotation sets as part of the secretory pathway [TargetP, RC < 4 (16)], respectively. In particular, both Broad and FGDB models now enable secretion prediction of FGSG_17357 (related to inorganic pyrophosphatase IPP1) and FGSG_12369 (related to catalase 2) as identified previously in an extracellular proteomics study (17) on models without SignalP signals (18). In addition, FGDB predictions help confirm the secretory pathway membership of hypothetical protein FGSG_16372 as identified in that study.

RETRIEVAL OF INFORMATION

The database interface provides basic search options on the sidebar which allows full text search across gene codes, gene symbols and gene description. In addition, the annotation catalogs FunCat (19), Enzyme Class (20), InterPro (14) and Protein Class are browsable. The advanced search page offers access also to invalid gene models which disagree with known evidences, details on the GeneChip data like probe and probe set names and their location (12), tables on tRNAs and a customizable table on protein molecular weights and isoelectric points. The ORF / contig DNA and protein sequences are searchable by Blast. The single entry page of a gene locus lists information on outdated gene models, alias names and protein classification (six classes from known to hypothetical). Beside physical features like contig coordinates, molecular weight, etc., the hierarchical, functional classification FunCat (19) and EC-number classes (20) as well as InterPro IDs (15) and TargetP (16) results are provided. SIMAP based protein homology data can be retrieved using links grouped by NCBI-based taxonomic categories. The Pedant links shown in the individual gene records forward to the respective Pedant report pages including alternative views on the DNA level as well as a graphic protein feature view. A small contig pictogram on the right side of each individual gene report page is linked to a GBrowse view allowing graphical browsing of genes, GeneChip probes, EST data and outdated gene models on their corresponding contigs. To get a brief overview on the initial expression analysis data (12,21–23) for single genes, the ‘Expression Data’ link placed below the contig pictogram provides a brief description of experiments and presents the expression data for all matching probe sets. In addition, a more comprehensive overview of the most recent expression data is provided by a link to the ‘PLEXdb GeneOscilloScope’ (24). The advanced query option (Index Search) on the left panel can be used to retrieve a list of the current FGDB entries based on complex queries including InterPro domains, TargetP results and e.g. probe set names (e.g. “fgd122-100_at”[pgs]|“fgd122-620_at”[pgs]). For this purpose, the major database fields are indexed which allows a fast and combined ‘index search’ (http://mips.gsf.de/genre/proj/FGDB/Search/Gise/).

DOWNLOAD/LINKS

The data can be downloaded from ftp://ftpmips.gsf.de/FGDB/. Beside the protein, contig and chromosome sequence file in fasta format the ORF data is provided in gff3 format. Functional data like FunCat, TargetP and InterPro are accessible in tab-delimited files.

CONCLUSIONS AND FUTURE DIRECTIONS

The FGDB v3.1 is a comprehensive resource on the fungal plant pathogen Fusarium graminearum and facilitates a user friendly access to gene structure and functional data. Protein homology-based data from public genomes is routinely updated. Although the ORFeome is completely revised in this version, updates on single gene structures are likely to come as new sequence data of further F. graminearum strains and closely related species or EST data are available in future. We encourage any input of additional evidence to further improve the gene set and overall annotation of the genome. Submitted links to gene specific publications, contact information on existing mutation strains and other details will also be included.

FUNDING

Austrian Science Fund FWF (special research project Fusarium, F3702 and F3705). Funding for open access charge: Helmholtz Zentrum München, German Research Center for Environmental Health, Ingolstädter Landstrasse 1, D-85764 Neuherberg, Germany. Conflict of interest statement. None declared.
  22 in total

1.  Improved prediction of signal peptides: SignalP 3.0.

Authors:  Jannick Dyrløv Bendtsen; Henrik Nielsen; Gunnar von Heijne; Søren Brunak
Journal:  J Mol Biol       Date:  2004-07-16       Impact factor: 5.469

2.  Comparative proteomics of extracellular proteins in vitro and in planta from the pathogenic fungus Fusarium graminearum.

Authors:  Janet M Paper; John S Scott-Craig; Neil D Adhikari; Christina A Cuomo; Jonathan D Walton
Journal:  Proteomics       Date:  2007-09       Impact factor: 3.984

3.  BarleyBase/PLEXdb.

Authors:  Roger P Wise; Rico A Caldo; Lu Hong; Lishuang Shen; Ethalinda Cannon; Julie A Dickerson
Journal:  Methods Mol Biol       Date:  2007

4.  Predicting subcellular localization of proteins based on their N-terminal amino acid sequence.

Authors:  O Emanuelsson; H Nielsen; S Brunak; G von Heijne
Journal:  J Mol Biol       Date:  2000-07-21       Impact factor: 5.469

5.  Development of a Fusarium graminearum Affymetrix GeneChip for profiling fungal gene expression in vitro and in planta.

Authors:  Ulrich Güldener; Kye-Yong Seong; Jayanand Boddu; Seungho Cho; Frances Trail; Jin-Rong Xu; Gerhard Adam; Hans-Werner Mewes; Gary J Muehlbauer; H Corby Kistler
Journal:  Fungal Genet Biol       Date:  2006-03-13       Impact factor: 3.495

6.  Gene expression shifts during perithecium development in Gibberella zeae (anamorph Fusarium graminearum), with particular emphasis on ion transport proteins.

Authors:  Heather E Hallen; Marianne Huebner; Shin-Han Shiu; Ulrich Güldener; Frances Trail
Journal:  Fungal Genet Biol       Date:  2007-05-08       Impact factor: 3.495

Review 7.  Action and reaction of host and pathogen during Fusarium head blight disease.

Authors:  Stephanie Walter; Paul Nicholson; Fiona M Doohan
Journal:  New Phytol       Date:  2009-10-06       Impact factor: 10.151

8.  PEDANT covers all complete RefSeq genomes.

Authors:  Mathias C Walter; Thomas Rattei; Roland Arnold; Ulrich Güldener; Martin Münsterkötter; Karamfilka Nenova; Gabi Kastenmüller; Patrick Tischler; Andreas Wölling; Andreas Volz; Norbert Pongratz; Ralf Jost; Hans-Werner Mewes; Dmitrij Frishman
Journal:  Nucleic Acids Res       Date:  2008-10-21       Impact factor: 16.971

9.  FGDB: a comprehensive fungal genome resource on the plant pathogen Fusarium graminearum.

Authors:  Ulrich Güldener; Gertrud Mannhaupt; Martin Münsterkötter; Dirk Haase; Matthias Oesterheld; Volker Stümpflen; Hans-Werner Mewes; Gerhard Adam
Journal:  Nucleic Acids Res       Date:  2006-01-01       Impact factor: 16.971

10.  AUGUSTUS: ab initio prediction of alternative transcripts.

Authors:  Mario Stanke; Oliver Keller; Irfan Gunduz; Alec Hayes; Stephan Waack; Burkhard Morgenstern
Journal:  Nucleic Acids Res       Date:  2006-07-01       Impact factor: 16.971

View more
  42 in total

1.  U6 snRNA intron insertion occurred multiple times during fungi evolution.

Authors:  Sebastian Canzler; Peter F Stadler; Jana Hertel
Journal:  RNA Biol       Date:  2016       Impact factor: 4.652

2.  FgFlbD regulates hyphal differentiation required for sexual and asexual reproduction in the ascomycete fungus Fusarium graminearum.

Authors:  Hokyoung Son; Myung-Gu Kim; Suhn-Kee Chae; Yin-Won Lee
Journal:  J Microbiol       Date:  2014-10-03       Impact factor: 3.422

3.  Deciphering the cryptic genome: genome-wide analyses of the rice pathogen Fusarium fujikuroi reveal complex regulation of secondary metabolism and novel metabolites.

Authors:  Philipp Wiemann; Christian M K Sieber; Katharina W von Bargen; Lena Studt; Eva-Maria Niehaus; Jose J Espino; Kathleen Huß; Caroline B Michielse; Sabine Albermann; Dominik Wagner; Sonja V Bergner; Lanelle R Connolly; Andreas Fischer; Gunter Reuter; Karin Kleigrewe; Till Bald; Brenda D Wingfield; Ron Ophir; Stanley Freeman; Michael Hippler; Kristina M Smith; Daren W Brown; Robert H Proctor; Martin Münsterkötter; Michael Freitag; Hans-Ulrich Humpf; Ulrich Güldener; Bettina Tudzynski
Journal:  PLoS Pathog       Date:  2013-06-27       Impact factor: 6.823

4.  Genome-wide functional characterization of putative peroxidases in the head blight fungus Fusarium graminearum.

Authors:  Yoonji Lee; Hokyoung Son; Ji Young Shin; Gyung Ja Choi; Yin-Won Lee
Journal:  Mol Plant Pathol       Date:  2017-05-02       Impact factor: 5.663

5.  Transcription factor RFX1 is crucial for maintenance of genome integrity in Fusarium graminearum.

Authors:  Kyunghun Min; Hokyoung Son; Jae Yun Lim; Gyung Ja Choi; Jin-Cheol Kim; Steven D Harris; Yin-Won Lee
Journal:  Eukaryot Cell       Date:  2014-01-24

6.  In planta stage-specific fungal gene profiling elucidates the molecular strategies of Fusarium graminearum growing inside wheat coleoptiles.

Authors:  Xiao-Wei Zhang; Lei-Jie Jia; Yan Zhang; Gang Jiang; Xuan Li; Dong Zhang; Wei-Hua Tang
Journal:  Plant Cell       Date:  2012-12-24       Impact factor: 11.277

7.  Expression of a Structural Protein of the Mycovirus FgV-ch9 Negatively Affects the Transcript Level of a Novel Symptom Alleviation Factor and Causes Virus Infection-Like Symptoms in Fusarium graminearum.

Authors:  Jörg Bormann; Cornelia Heinze; Christine Blum; Michael Mentges; Anke Brockmann; Arne Alder; Svenja Kim Landt; Brian Josephson; Daniela Indenbirken; Michael Spohn; Birte Plitzko; Sandra Loesgen; Michael Freitag; Wilhelm Schäfer
Journal:  J Virol       Date:  2018-08-16       Impact factor: 5.103

8.  Mitochondrial carnitine-dependent acetyl coenzyme A transport is required for normal sexual and asexual development of the ascomycete Gibberella zeae.

Authors:  Hokyoung Son; Kyunghun Min; Jungkwan Lee; Gyung Ja Choi; Jin-Cheol Kim; Yin-Won Lee
Journal:  Eukaryot Cell       Date:  2012-07-13

9.  WetA is required for conidiogenesis and conidium maturation in the ascomycete fungus Fusarium graminearum.

Authors:  Hokyoung Son; Myung-Gu Kim; Kyunghun Min; Jae Yun Lim; Gyung Ja Choi; Jin-Cheol Kim; Suhn-Kee Chae; Yin-Won Lee
Journal:  Eukaryot Cell       Date:  2013-11-01

10.  A conserved homeobox transcription factor Htf1 is required for phialide development and conidiogenesis in Fusarium species.

Authors:  Wenhui Zheng; Xu Zhao; Qiurong Xie; Qingping Huang; Chengkang Zhang; Huanchen Zhai; Liping Xu; Guodong Lu; Won-Bo Shim; Zonghua Wang
Journal:  PLoS One       Date:  2012-09-21       Impact factor: 3.240

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.