| Literature DB >> 17947331 |
Jongsun Park1, Bongsoo Park, Kyongyong Jung, Suwang Jang, Kwangyul Yu, Jaeyoung Choi, Sunghyung Kong, Jaejin Park, Seryun Kim, Hyojeong Kim, Soonok Kim, Jihyun F Kim, Jaime E Blair, Kwangwon Lee, Seogchan Kang, Yong-Hwan Lee.
Abstract
Since the completion of the Saccharomyces cerevisiae genome sequencing project in 1996, the genomes of over 80 fungal species have been sequenced or are currently being sequenced. Resulting data provide opportunities for studying and comparing fungal biology and evolution at the genome level. To support such studies, the Comparative Fungal Genomics Platform (CFGP; http://cfgp.snu.ac.kr), a web-based multifunctional informatics workbench, was developed. The CFGP comprises three layers, including the basal layer, middleware and the user interface. The data warehouse in the basal layer contains standardized genome sequences of 65 fungal species. The middleware processes queries via six analysis tools, including BLAST, ClustalW, InterProScan, SignalP 3.0, PSORT II and a newly developed tool named BLASTMatrix. The BLASTMatrix permits the identification and visualization of genes homologous to a query across multiple species. The Data-driven User Interface (DUI) of the CFGP was built on a new concept of pre-collecting data and post-executing analysis instead of the 'fill-in-the-form-and-press-SUBMIT' user interfaces utilized by most bioinformatics sites. A tool termed Favorite, which supports the management of encapsulated sequence data and provides a personalized data repository to users, is another novel feature in the DUI.Entities:
Mesh:
Substances:
Year: 2007 PMID: 17947331 PMCID: PMC2238957 DOI: 10.1093/nar/gkm758
Source DB: PubMed Journal: Nucleic Acids Res ISSN: 0305-1048 Impact factor: 16.971
List of genome sequences stored in the data warehouse of the CFGP
| Species | Size (Mb) | No. of ORFs | Source | Reference |
|---|---|---|---|---|
| | ||||
| | 2.3 | 1727 | NCBI | ( |
| | 8.7 | 7769 | CBS | ( |
| | 9.0 | 7575 | CBS | |
| | ||||
| | 4.6 | 4311 | NCBI | ( |
| | 7.1 | 6137 | NCBI | ( |
| | ||||
| | 0.7 | 627 | CBS | ( |
| | ||||
| | 34.7 | 3241 | SGTC | ( |
| | ||||
| | ||||
| | ||||
| | 42.7 | 16 448 | BI | |
| | 38.3 | 14 522 | BI | |
| | 27.9 | 9119 | BI | |
| | 32.6 | 10 403 | BI | |
| | 36.8 | 12 587 | BI | |
| | 28.8 | 9926 | TIGR | ( |
| | 30.1 | 10 701 | BI | ( |
| | 37.1 | 12 062 | DOGAN | ( |
| | 29.3 | 10 406 | BI | |
| | 37.2 | 11 200 | JGI | |
| | 28.9 | 10 457 | BI | |
| | 55.6 | BI | ||
| | 28.9 | BI | ||
| | 27.4 | BI | ||
| | 28.1 | BI | ||
| | 33.0 | 9349 | BI | |
| | 22.3 | 7798 | BI | |
| | 34.9 | 11 124 | BI | |
| | 36.6 | 13 321 | BI | ( |
| | 15.1 | BI | ( | |
| | 61.4 | 17 608 | BI | |
| | 41.9 | 14 155 | BI | |
| | 51.3 | 15 707 | JGI | |
| | 41.6 | 12 841 | BI | ( |
| | 39.2 | 10 620 | BI | ( |
| | 35.7 | 9872 | IGM | |
| | 34.5 | 9997 | JGI | |
| | 32.0 | WGSC | ||
| | 38.0 | BI | ||
| | 41.9 | 11 395 | JGI | |
| | 73.4 | 10 313 | JGI | |
| | 37.2 | 16 597 | BI | |
| | ||||
| | 27.8 | 14 216 | SGTC | ( |
| | 14.5 | 6157 | BI | |
| | 14.5 | 6027 | SI | |
| | 12.3 | 5174 | CBS | ( |
| | 10.6 | 5920 | BI | |
| | 12.1 | 5941 | BI | |
| | 13.1 | SI | ||
| | 14.7 | 6258 | BI | |
| | 12.2 | 6354 | CBS | ( |
| | 8.7 | 4718 | NCBI | ( |
| | 10.7 | 5327 | Genoscope | |
| | 10.6 | 5214 | BI | ( |
| | 15.5 | 5796 | BI | |
| | 12.2 | 5898 | SGD | ( |
| | 11.7 | 5383 | BI | |
| | 11.9 | 5471 | SI | |
| | 11.5 | 9385 | BI | ( |
| | 11.4 | 4677 | VBI | ( |
| | 11.2 | 3768 | VBI | |
| | 11.0 | 2968 | WUGSC | ( |
| | 11.5 | 9016 | BI | ( |
| | 11.9 | 8939 | BI | ( |
| | 15.4 | 5839 | JGI | ( |
| | 20.5 | 6524 | CBS | ( |
| | ||||
| | 6.3 | 4020 | SI | |
| | 12.6 | 5005 | GeneDB | ( |
| | 11.3 | 5172 | BI | |
| | ||||
| | ||||
| | 90.9 | 17 173 | JGI | |
| | 30.0 | 10 048 | JGI | ( |
| | 36.3 | 13 544 | BI | |
| | 64.9 | 20 614 | JGI | |
| | 19.5 | 7302 | BI | |
| | 19.0 | 6870 | NCBI | |
| | 19.3 | 6578 | SGTC | ( |
| | 19.1 | 6475 | SGTC | ( |
| | ||||
| | 21.2 | 5536 | JGI | |
| | 88.7 | 20 567 | BI | |
| | ||||
| | 19.7 | 6689 | BI | ( |
| | 19.7 | BI | ( | |
| | ||||
| | 23.9 | 8818 | BI | |
| | ||||
| | 45.3 | 17 467 | BI | |
| | 55.9 | 14 792 | JGI | |
| | ||||
| | 2.5 | 1996 | Genoscope | ( |
| | 6.1 | 2606 | JBPC | |
| | ||||
| | ||||
| | 228.5 | 22 658 | BI | |
| | 86.0 | 19 276 | JGI | ( |
| | 66.7 | 16 066 | JGI | ( |
| | 83.8 | VBI | ||
| | ||||
| | ||||
| | 119.2 | 28 581 | TAIR | ( |
| | 370.8 | 37 555 | IRGSP | ( |
| | 426.3 | 49 710 | BGI | ( |
| | 485.5 | 58 036 | JGI | ( |
| | 251.7 | 40 567 | MTGSP | |
| | ||||
| | ||||
| | 287.8 | 15 802 | Ensembl | ( |
| | 118.4 | 19 389 | BDGP | ( |
| | ||||
| | 356.6 | 27 273 | JGI | ( |
| | ||||
| | 100.3 | 21 124 | NCBI | ( |
| | ||||
| | 173.5 | 19 744 | Ensembl | ( |
| | 177.0 | 20 150 | Ensembl | |
| | ||||
| | 1636.5 | 14 966 | Ensembl | |
| | 402.2 | 28 005 | Ensembl | |
| | 1510.9 | 28 305 | Ensembl | |
| | 3144.2 | 32 991 | Ensembl | |
| | 2519.8 | 30 308 | Ensembl | |
| | 1105.2 | 24 166 | Ensembl | |
| | 4295.0 | 39 648 | Ensembl | |
| | 2724.2 | 36 471 | Ensembl | ( |
| | 2718.9 | 32 543 | Ensembl | |
| | 3418.7 | 33 869 | Ensembl | ( |
| 28 984.2 | 1 353 360 |
aSGTC, Stanford Genome Technology Center; SI, Sanger Institute; CBS, Center For Biological Sequences; BI, Broad Institute; WGSC, Washington Univ. Genome Sequencing Center; JGI, DOE Joint Genomic Institute; DOGAN, Database Of the Genomes Analyzed at Nite; IGM, Instituté de Génétique et Microbiologie; TAIR, The Arabidopsis Information Resource; IRGSP, International Rice Genome Sequencing Project; BDGP, Berkeley Drosophila Genome Project; BGI, Beijing Genome Institute; VGI, Virginia Bioinformatics Institute; JBPC, Josephine Bay Paul Center for Comparative Molecular Biology and Evolution; MTGSP, Medicago Truncatula Genome Sequencing Project.
bTaxonomy based on (66).
cTaxonomy based on (67).
dTaxonomy based on (12).
eIncomplete coverage of genome information.
Figure 1.Overall system architecture and data flow in the CFGP. The basal layer contains a data warehouse, Favorite (a personal data repository and management tool), and external databases, such as InterPro and GO, stored in the CFGP. The wrapper in the middle layer relays requests from the UI to both the internal and external programs. The task manager at the right side of the wrapper manages tasks by assigning them to servers. At the upper layer, the DUI, a template engine developed with PHP, operates. A ‘command’ from the user goes to the middle layer. The basal layer passes the data to the middle layer as ‘input’. At the middle layer, chosen programs generate results and pass them to the upper layer for ‘representation’ and to the basal layer for ‘storage’.
Figure 2.Structure of DUI. (A) A screenshot shows the process of data acquisition from Contig Browser. On the left side, ‘Data Frame’ displays the list of Magnaporthe oryzae proteins and ‘Manipulation Frame’ on the right side shows a list of Favorite. The ‘Collection arrow’ in the middle transfers chosen sequences from the Data Frame to the Manipulation Frame. (B) Collected sequences can be analysed by data analysis tools in Favorite. Users can choose sequences by clicking the checkbox in front of each sequence. (C) A BLAST search output is shown with Favorite in the Manipulation Frame. From the BLAST result, users can transfer sequences to Favorite via the use of the ‘Collection Arrow’.
Figure 3.Format of BLASTMatrix output. An example of BLASTMatirx output generated using the aflatoxin gene cluster in Aspergillus nidulans as queries. The results are presented in a matrix format (A) and a distribution based on e-value (B). Additionally, BLASTMatrix analyses the pattern of conservation in the BLASTMatrix dataset (such as novel gene, ‘highly conserved gene’ or ‘taxon-specific gene’) based on the distribution pattern of matched genes in all screened taxa.