Literature DB >> 23193288

CFGP 2.0: a versatile web-based platform for supporting comparative and evolutionary genomics of fungi and Oomycetes.

Jaeyoung Choi1, Kyeongchae Cheong, Kyongyong Jung, Jongbum Jeon, Gir-Won Lee, Seogchan Kang, Sangsoo Kim, Yin-Won Lee, Yong-Hwan Lee.   

Abstract

In 2007, Comparative Fungal Genomics Platform (CFGP; http://cfgp.snu.ac.kr/) was publicly open with 65 genomes corresponding to 58 fungal and Oomycete species. The CFGP provided six bioinformatics tools, including a novel tool entitled BLASTMatrix that enables search homologous genes to queries in multiple species simultaneously. CFGP also introduced Favorite, a personalized virtual space for data storage and analysis with these six tools. Since 2007, CFGP has grown to archive 283 genomes corresponding to 152 fungal and Oomycete species as well as 201 genomes that correspond to seven bacteria, 39 plants and 105 animals. In addition, the number of tools in Favorite increased to 27. The Taxonomy Browser of CFGP 2.0 allows users to interactively navigate through a large number of genomes according to their taxonomic positions. The user interface of BLASTMatrix was also improved to facilitate subsequent analyses of retrieved data. A newly developed genome browser, Seoul National University Genome Browser (SNUGB), was integrated into CFGP 2.0 to support graphical presentation of diverse genomic contexts. Based on the standardized genome warehouse of CFGP 2.0, several systematic platforms designed to support studies on selected gene families have been developed. Most of them are connected through Favorite to allow of sharing data across the platforms.

Entities:  

Mesh:

Year:  2012        PMID: 23193288      PMCID: PMC3531191          DOI: 10.1093/nar/gks1163

Source DB:  PubMed          Journal:  Nucleic Acids Res        ISSN: 0305-1048            Impact factor:   16.971


INTRODUCTION

Fungal genome sequencing has rapidly increased since the release of the genome sequences of Saccharomyces cerevisiae in 1996 (1). With the current and anticipated advances in sequencing technology (2,3), the rate of fungal genome sequencing will continue to accelerate. Currently, there exist more than 300 fully sequenced fungal genomes in the public domain (4,5), with many species and several isolates of previously sequenced species being sequenced (6,7). In addition, the 1000 Fungal Genome project (F1000; http://1000.fungalgenomes.org/) will greatly help us to uncover genomic underpinnings of fungal evolution and life styles via large-scale comparative genomics studies. In combination with genomes from plants and animals as well as fungi, in-depth comparative genomics across multiple eukaryotic kingdoms will be facilitated (8–10). To efficiently support such large-scale, genome-based inquiries, it is critical to archive the available genome sequences and annotation information in a standardized format so that they can be easily and efficiently retrieved and analyzed. To address this need, in 2007, the first version of Comparative Fungal Genomics Platform (CFGP) was released with 65 fungal and Oomycete genomes (11). The CFGP was founded on a new user interface (UI) called Data-driven User Interface (DUI), which made use of its bioinformatics tools and the management of task histories easy and efficient. Since then, the number of genomes archived and bioinformatics tools have grown substantially. (Supplementary Table S1 and Table 1) Furthermore, the standardized genome data warehouse of CFGP has supported the development of multiple platforms that are specialized for supporting the archiving and analysis of specific gene families and functional groups (Table 2). Some of these platforms share the Favorite of CFGP to provide an efficient mechanism for sharing data with CFGP and to enable the use of its bioinformatics tools for a variety of analyses. In this article, we outline the improvements made in CFGP 2.0 and how its standardized genome warehouse has been exploited in development of other comparative genomics platforms.
Table 1.

List of bioinformatics tools available in CFGP 2.0

CategoryNameInput dataReferences
Sequence searchBLASTSequences(12)
BLAST2Sequences(12)
BLASTMatrixSequences(11)
ID of functional domainsInterPro ScanSequences(13)
Phylogenetic analysisClustalWSequences(14)
DNAMLAlignment(15)
PROMLAlignment(15)
DNAPARSAlignment(15)
PROTPARSAlignment(15)
PHYMLAlignment(16)
ID of secreted proteinsSignalP 3.0Sequences(17)
SigCleaveSequences(18)
SigPredSequences(19)
RPSPSequences(20)
SecretomePSequences(21)
Subcellular localizationPSortIISequences(22)
predictNLSSequences(23)
ChloroPSequences(24)
TargetPSequences(25)
Prediction of trans-membrane helicesTMHMM2Sequences(26)
Prediction of RNA secondary structuretRNAScan-SESequences(27)
mFold 3.2Sequences(28)
Post-translational modificationNetCGlycSequences(29)
NetNGlycSequences
NetOGlycSequences(30)
NetPhosSequences(31)
Conserved motif searchMEMESequences(32)

A total of 27 tools in nine categories are available via the Favorite browser of CFGP 2.0.

Table 2.

List of online platforms and tools supporting studies on specific gene families or functional groupsa

NameURLReference
Cyber-infrastructure for Fusariumhttp://www.fusariumdb.org/(33)
Fungal Transcription Factor Databasehttp://ftfd.snu.ac.kr/(34)
Fungal Cytochrome P450 Databasehttp://p450.riceblast.snu.ac.kr/(35)
Seoul National University Genome Browserhttp://genomebrowser.snu.ac.kr/(36)
The Systematic Platform for Identifying Mutated Proteinshttp://pimp.starflr.info/(37)
Insect Mitochondrial Genome Databasehttp://www.imgd.org/(38)
Fungal Secretome Databasehttp://fsd.snu.ac.kr/(39)
Eukaryotic DNAJ/K Databasehttp://edd.snu.ac.kr/
Cell Wall-degrading Enzyme Databasehttp://www.cwde.org/

aThese platforms were developed using the standardized genome warehouse and the web template engine of CFGP 2.0.

List of bioinformatics tools available in CFGP 2.0 A total of 27 tools in nine categories are available via the Favorite browser of CFGP 2.0. List of online platforms and tools supporting studies on specific gene families or functional groupsa aThese platforms were developed using the standardized genome warehouse and the web template engine of CFGP 2.0.

METHODS

System design

The structure and core databases of CFGP 2.0 are basically identical to those used for the first version. The system consists of databases including a standardized genome warehouse, wrapper programs written by the Perl and C languages and DUI. To balance the server load so as to ensure a more efficient operation of the system, its core databases were distributed in multiple servers, and more web servers were added. The MySQL relational database management system was used to manage and curate the data. Its web interfaces were written in PHP with javascript, and analysis functions in Favorite Browser were relayed by Perl scripts and automatically coordinated by the system monitoring servers.

Mining orthologs

The source code of InParanoid 4.1 was used for the identification of orthologs in the archived proteomes. First, the genomes of 35 species that are frequently utilized were subjected to ortholog identification (Supplementary Table S1). All pairwise comparisons of data from these 35 species were carried out. The latest version of InParanoid 7 provides orthologs from 100 eukaryotic genomes. However, some genomes that we have used were not included in the latest version; data for those species were downloaded from CFGP 2.0 and subsequently subjected to ortholog identification.

EXPANDED GENOME DATA WAREHOUSE

The standardized genome warehouse of CFGP has been substantially expanded in both the number of species and taxonomic coverage. In addition to 283 genomes corresponding to 152 fungal and Oomycete species, 39 plant and 105 animal genomes have also been archived (Figure 1 and Supplementary Table S2). The animal and plant genomes were incorporated to enable comparative evolutionary genomics studies across multiple eukaryotic kingdoms.
Figure 1.

A diagram illustrating the system architecture and the content of the genomes archived in CFGP 2.0. Key features of CFGP 2.0 were depicted on the left. The web-based platforms that have been developed based on the standardized genome warehouse of CFGP 2.0 are listed on the right. Bidirectional arrows indicate that they support the Favorite Borwser, which synchronizes with CFGP 2.0. Dashed arrows denote that SNUGB was integrated in two platforms, FSD and EDD. In the pie graph, the inner and outer circles represent the number of genomes and species for each taxon, respectively.

A diagram illustrating the system architecture and the content of the genomes archived in CFGP 2.0. Key features of CFGP 2.0 were depicted on the left. The web-based platforms that have been developed based on the standardized genome warehouse of CFGP 2.0 are listed on the right. Bidirectional arrows indicate that they support the Favorite Borwser, which synchronizes with CFGP 2.0. Dashed arrows denote that SNUGB was integrated in two platforms, FSD and EDD. In the pie graph, the inner and outer circles represent the number of genomes and species for each taxon, respectively.

ENHANCED UTILITY AND NEW FEATURES

Improved UI

The UI of CFGP 2.0 was greatly improved to provide better user experience. All modifications followed the HTML5 and CSS3 standards, on which most widely used web browsers are based. Three main application/utility frames were rearranged to make the switch from one frame to another more intuitive. The Favorite and presentation frames can be toggled for more flexible browsing (Figure 2). All web pages have been thoroughly tested with multiple web browsers, including Chrome, Firefox, Internet Explorer and Safari.
Figure 2.

Outline of the improved DUI of CFGP 2.0. The frames outlined in blue, orange and purple correspond to the Favorite, presentation and application frame, respectively. The buttons boxed in red are to hide or show Favorite and presentation frames, respectively.

Outline of the improved DUI of CFGP 2.0. The frames outlined in blue, orange and purple correspond to the Favorite, presentation and application frame, respectively. The buttons boxed in red are to hide or show Favorite and presentation frames, respectively.

Taxonomy browser

As the number of species representing diverse taxa continues to increase, a browsing tool based on simple text search (e.g. species name) was not sufficient to help users to efficiently explore available genomes. The Taxonomy Browser implemented in CFGP 2.0 provides predictive text feature as well as a hierarchical tree-based taxon structure to show the taxonomic position of a chosen species. Once a specific species is selected, the number of genomes and available sequence types are listed with direct links to corresponding sequences.

Seoul National University Genome Browser

The first version of CFGP did not offer a graphical interface to present the genomic context and notable features such as GC content, functional domains and the signal peptide. This new genome browser implemented in CFGP 2.0 enabled users to view such information in the chosen region. The target region of viewing can be selected by assigning the start and end positions with a mouse or by typing in its genome coordinate (Figure 3).
Figure 3.

A screenshot of SNUGB implemented in CFGP 2.0. SNUGB allows users to view 13 biological features and the gene structures in the selected genome region. Those features include GC contents, functional domains, nuclear localization signals, signal peptides and trans-membrane helixes.

A screenshot of SNUGB implemented in CFGP 2.0. SNUGB allows users to view 13 biological features and the gene structures in the selected genome region. Those features include GC contents, functional domains, nuclear localization signals, signal peptides and trans-membrane helixes.

New bioinformatics tools added to the Favorite Browser

Compared with the first version that only provided six tools, CFGP 2.0 is equipped with 27 tools covering nine categories of data analysis or viewing (Table 1). This addition enables users to perform more analyses without leaving CFGP.

Ortholog browsing function

Finding orthologs for a specific gene in multiple species often requires numerous BLAST searches and validation processes. In the first version of CFGP, we tried to eliminate copy-and-paste of sequences by incorporating the DUI (11). In CFGP 2.0 we simplified the identification and collection of orthologs by offering pre-computed data. There are several ortholog identification programs such as InParanoid (40), Ortholog, MSOAR (41) and THOR (42). We adopted InParanoid to identify orthologs via pairwise comparisons among 35 frequently accessed genomes. For every protein sequence encoded by each of these 35 genomes, orthologous genes in the other 34 genomes are provided to allow a quick overview of its distribution among these species and also to support their further analyses by saving them into a Favorite on the fly.

FUNCTIONAL/EVOLUTIONARY GENOMICS PLATFORMS DEVELOPED BASED ON THE STANDARDIZED GENOME WAREHOUSE OF CFGP 2.0

Via the use of the standardized genome warehouse of CFGP 2.0, a number of platforms that aim to support comparative analyses of specific gene families and/or functional groups have been developed: (i) Cyber-infrastructure for Fusarium (CiF; http://www.fusariumdb.org/) (33), (ii) Fungal Transcription Factor Database (FTFD; http://ftfd.snu.ac.kr/) (34), (iii) Fungal Cytochrome P450 Database (FCPD; http://p450.riceblast.snu.ac.kr/) (35), (iv) Fungal Secretome Database (FSD; http://fsd.snu.ac.kr/) (39), (v) Eukaryotic DNAJ and DNAK Database (EDD; http://edd.snu.ac.kr/) (Cheong et al., manuscript in preparation) and (vi) Cell Wall-degrading Enzymes Database (CWDE; http://www.cwde.org/) (Choi et al., manuscript in preparation). The Seoul National University Genome Browser (SNUGB) (http://genomebrowser.snu.ac.kr/) (36) was also implemented in FSD and EDD. The Insect Mitochondrial Genome Database (IMGD; http://www.imgd.org/) (38) employs the Species-driven UI, which enables intuitive and fast taxonomical browsing with multiple add-on analysis functions. Finally, the Systematic Platform for Identifying Mutated Proteins (SysPIMP; http://pimp.starflr.info/) (37) was developed to support the identification of mutations related to human diseases. The Favorite Browser of CFGP 2.0 is connected with many of those platforms to efficiently support data exchange and sharing across multiple platforms. All the data saved in the Favorite Browser are synchronized in real-time so that users can fully exploit data and functions provided by these platforms.

FUTURE DIRECTIONS

To keep up with the rapidly released and updated eukaryotic genomes, CFGP 2.0 will be updated on a regular basis. We will integrate more useful modules, software or interface scheme to continuously improve the environment for users conducting comparative and evolutionary genomics studies. In order to support efforts to uncover possible functions of many hypothetical genes, the ortholog information database will be expanded by adding the corresponding information from more species.

SUPPLEMENTARY DATA

Supplementary Data are available at NAR Online: Supplementary Tables 1 and 2.

FUNDING

National Research Foundation of Korea grant funded by the Korea government [2012-0001149 and 2012-0000141]; TDPAF [309015-04-SB020]; Next-Generation BioGreen 21 Program of Rural Development Administration in Korea [PJ00821201]; a graduate fellowship through the Brain Korea 21 Program (to J.C., K.C. and J.J.). Funding for open access charge: Seoul National University. Conflict of interest statement. None declared.
  39 in total

1.  ChloroP, a neural network-based method for predicting chloroplast transit peptides and their cleavage sites.

Authors:  O Emanuelsson; H Nielsen; G von Heijne
Journal:  Protein Sci       Date:  1999-05       Impact factor: 6.725

2.  A hidden Markov model for predicting transmembrane helices in protein sequences.

Authors:  E L Sonnhammer; G von Heijne; A Krogh
Journal:  Proc Int Conf Intell Syst Mol Biol       Date:  1998

3.  NetOglyc: prediction of mucin type O-glycosylation sites based on sequence context and surface accessibility.

Authors:  J E Hansen; O Lund; N Tolstrup; A A Gooley; K L Williams; S Brunak
Journal:  Glycoconj J       Date:  1998-02       Impact factor: 2.916

Review 4.  Life with 6000 genes.

Authors:  A Goffeau; B G Barrell; H Bussey; R W Davis; B Dujon; H Feldmann; F Galibert; J D Hoheisel; C Jacq; M Johnston; E J Louis; H W Mewes; Y Murakami; P Philippsen; H Tettelin; S G Oliver
Journal:  Science       Date:  1996-10-25       Impact factor: 47.728

5.  Feature-based prediction of non-classical and leaderless protein secretion.

Authors:  Jannick Dyrløv Bendtsen; Lars Juhl Jensen; Nikolaj Blom; Gunnar Von Heijne; Søren Brunak
Journal:  Protein Eng Des Sel       Date:  2004-04-28       Impact factor: 1.650

6.  InterPro: the integrative protein signature database.

Authors:  Sarah Hunter; Rolf Apweiler; Teresa K Attwood; Amos Bairoch; Alex Bateman; David Binns; Peer Bork; Ujjwal Das; Louise Daugherty; Lauranne Duquenne; Robert D Finn; Julian Gough; Daniel Haft; Nicolas Hulo; Daniel Kahn; Elizabeth Kelly; Aurélie Laugraud; Ivica Letunic; David Lonsdale; Rodrigo Lopez; Martin Madera; John Maslen; Craig McAnulla; Jennifer McDowall; Jaina Mistry; Alex Mitchell; Nicola Mulder; Darren Natale; Christine Orengo; Antony F Quinn; Jeremy D Selengut; Christian J A Sigrist; Manjula Thimma; Paul D Thomas; Franck Valentin; Derek Wilson; Cathy H Wu; Corin Yeats
Journal:  Nucleic Acids Res       Date:  2008-10-21       Impact factor: 16.971

7.  IMGD: an integrated platform supporting comparative genomics and phylogenetics of insect mitochondrial genomes.

Authors:  Wonhoon Lee; Jongsun Park; Jaeyoung Choi; Kyongyong Jung; Bongsoo Park; Donghan Kim; Jaeyoung Lee; Kyohun Ahn; Wonho Song; Seogchan Kang; Yong-Hwan Lee; Seunghwan Lee
Journal:  BMC Genomics       Date:  2009-04-07       Impact factor: 3.969

8.  Fungal cytochrome P450 database.

Authors:  Jongsun Park; Seungmin Lee; Jaeyoung Choi; Kyohun Ahn; Bongsoo Park; Jaejin Park; Seogchan Kang; Yong-Hwan Lee
Journal:  BMC Genomics       Date:  2008-08-28       Impact factor: 3.969

9.  SysPIMP: the web-based systematical platform for identifying human disease-related mutated sequences from mass spectrometry.

Authors:  Hong Xi; Jongsun Park; Guohui Ding; Yong-Hwan Lee; Yixue Li
Journal:  Nucleic Acids Res       Date:  2008-11-26       Impact factor: 16.971

10.  NCBI BLAST: a better web interface.

Authors:  Mark Johnson; Irena Zaretskaya; Yan Raytselis; Yuri Merezhuk; Scott McGinnis; Thomas L Madden
Journal:  Nucleic Acids Res       Date:  2008-04-24       Impact factor: 16.971

View more
  29 in total

1.  Blocking Hsp70 enhances the efficiency of amphotericin B treatment against resistant Aspergillus terreus strains.

Authors:  Michael Blatzer; Gerhard Blum; Emina Jukic; Wilfried Posch; Peter Gruber; Markus Nagl; Ulrike Binder; Elisabeth Maurer; Bettina Sarg; Herbert Lindner; Cornelia Lass-Flörl; Doris Wilflingseder
Journal:  Antimicrob Agents Chemother       Date:  2015-04-13       Impact factor: 5.191

2.  Oxidative Stress Response Tips the Balance in Aspergillus terreus Amphotericin B Resistance.

Authors:  Emina Jukic; Michael Blatzer; Wilfried Posch; Marion Steger; Ulrike Binder; Cornelia Lass-Flörl; Doris Wilflingseder
Journal:  Antimicrob Agents Chemother       Date:  2017-09-22       Impact factor: 5.191

3.  Bidirectional-genetics platform, a dual-purpose mutagenesis strategy for filamentous fungi.

Authors:  Jaejin Park; Yong-Hwan Lee
Journal:  Eukaryot Cell       Date:  2013-09-20

4.  Systematic functional profiling of transcription factor networks in Cryptococcus neoformans.

Authors:  Kwang-Woo Jung; Dong-Hoon Yang; Shinae Maeng; Kyung-Tae Lee; Yee-Seul So; Joohyeon Hong; Jaeyoung Choi; Hyo-Jeong Byun; Hyelim Kim; Soohyun Bang; Min-Hee Song; Jang-Won Lee; Min Su Kim; Seo-Young Kim; Je-Hyun Ji; Goun Park; Hyojeong Kwon; Suyeon Cha; Gena Lee Meyers; Li Li Wang; Jooyoung Jang; Guilhem Janbon; Gloria Adedoyin; Taeyup Kim; Anna K Averette; Joseph Heitman; Eunji Cheong; Yong-Hwan Lee; Yin-Won Lee; Yong-Sun Bahn
Journal:  Nat Commun       Date:  2015-04-07       Impact factor: 14.919

5.  Insights on the evolution of mycoparasitism from the genome of Clonostachys rosea.

Authors:  Magnus Karlsson; Mikael Brandström Durling; Jaeyoung Choi; Chatchai Kosawang; Gerald Lackner; Georgios D Tzelepis; Kristiina Nygren; Mukesh K Dubey; Nathalie Kamou; Anthony Levasseur; Antonio Zapparata; Jinhui Wang; Daniel Buchvaldt Amby; Birgit Jensen; Sabrina Sarrocco; Emmanuel Panteris; Anastasia L Lagopodi; Stefanie Pöggeler; Giovanni Vannacci; David B Collinge; Dirk Hoffmeister; Bernard Henrissat; Yong-Hwan Lee; Dan Funck Jensen
Journal:  Genome Biol Evol       Date:  2015-01-08       Impact factor: 3.416

6.  dbHiMo: a web-based epigenomics platform for histone-modifying enzymes.

Authors:  Jaeyoung Choi; Ki-Tae Kim; Aram Huh; Seomun Kwon; Changyoung Hong; Fred O Asiegbu; Junhyun Jeon; Yong-Hwan Lee
Journal:  Database (Oxford)       Date:  2015-06-08       Impact factor: 3.451

7.  Fungal plant cell wall-degrading enzyme database: a platform for comparative and evolutionary genomics in fungi and Oomycetes.

Authors:  Jaeyoung Choi; Ki-Tae Kim; Jongbum Jeon; Yong-Hwan Lee
Journal:  BMC Genomics       Date:  2013-10-16       Impact factor: 3.969

8.  Carbamoyl Phosphate Synthase Subunit CgCPS1 Is Necessary for Virulence and to Regulate Stress Tolerance in Colletotrichum gloeosporioides.

Authors:  Aamar Mushtaq; Muhammad Tariq; Maqsood Ahmed; Zongshan Zhou; Imran Ali; Raja Tahir Mahmood
Journal:  Plant Pathol J       Date:  2021-06-01       Impact factor: 1.795

9.  Genome-Wide Analysis of Hypoxia-Responsive Genes in the Rice Blast Fungus, Magnaporthe oryzae.

Authors:  Jaehyuk Choi; Hyunjung Chung; Gir-Won Lee; Sun-Ki Koh; Suhn-Kee Chae; Yong-Hwan Lee
Journal:  PLoS One       Date:  2015-08-04       Impact factor: 3.240

Review 10.  fPoxDB: fungal peroxidase database for comparative genomics.

Authors:  Jaeyoung Choi; Nicolas Détry; Ki-Tae Kim; Fred O Asiegbu; Jari P T Valkonen; Yong-Hwan Lee
Journal:  BMC Microbiol       Date:  2014-05-08       Impact factor: 3.605

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.