Literature DB >> 24165882

YeastNet v3: a public database of data-specific and integrated functional gene networks for Saccharomyces cerevisiae.

Hanhae Kim1, Junha Shin, Eiru Kim, Hyojin Kim, Sohyun Hwang, Jung Eun Shim, Insuk Lee.   

Abstract

Saccharomyces cerevisiae, i.e. baker's yeast, is a widely studied model organism in eukaryote genetics because of its simple protocols for genetic manipulation and phenotype profiling. The high abundance of publicly available data that has been generated through diverse 'omics' approaches has led to the use of yeast for many systems biology studies, including large-scale gene network modeling to better understand the molecular basis of the cellular phenotype. We have previously developed a genome-scale gene network for yeast, YeastNet v2, which has been used for various genetics and systems biology studies. Here, we present an updated version, YeastNet v3 (available at http://www.inetbio.org/yeastnet/), that significantly improves the prediction of gene-phenotype associations. The extended genome in YeastNet v3 covers up to 5818 genes (∼99% of the coding genome) wired by 362 512 functional links. YeastNet v3 provides a new web interface to run the tools for network-guided hypothesis generations. YeastNet v3 also provides edge information for all data-specific networks (∼2 million functional links) as well as the integrated networks. Therefore, users can construct alternative versions of the integrated network by applying their own data integration algorithm to the same data-specific links.

Entities:  

Mesh:

Substances:

Year:  2013        PMID: 24165882      PMCID: PMC3965021          DOI: 10.1093/nar/gkt981

Source DB:  PubMed          Journal:  Nucleic Acids Res        ISSN: 0305-1048            Impact factor:   16.971


INTRODUCTION

The complete mapping of gene-to-phenotype associations, which is a fundamental goal in the field of genetics, seems unconquerable due in part to the complex functional relationship among genes. However, the relationship among genes can be used to identify gene-to-phenotype associations based on the principle of guilt-by-association (1). Therefore, enormous efforts have been made to construct genome-scale gene networks to enable the effective prediction of novel gene functions and gene-to-phenotype associations in various organisms. This task requires the development of machine learning algorithms to infer and integrate functional links from a wide variety of biological data as well as efficient graph analysis algorithms to generate novel biological hypotheses from gene networks. Saccharomyces cerevisiae, i.e. baker’s yeast, has been widely studied for two reasons. First, S. cerevisiae is a single-cell eukaryote in which many human cellular machineries have been conserved through evolution. Second, the genetic manipulation and phenotype profiling of S. cerevisiae are simple protocols to execute. Therefore, numerous high-throughput studies have been conducted, which have resulted in the generation of large data sets that have been deposited into public databases. Access to this abundant data, which has been derived from diverse experimental and informational technologies, provides a unique opportunity to develop methods in network construction and analysis. Over the past several years, we have used these data to develop genome-scale gene networks for yeast in YeastNet (2,3). YeastNet v2 (3) significantly improved the original version of YeastNet (2) by removing the functional bias in the gold-standard links to enable functionally unbiased network training. YeastNet v2 included 5483 coding genes with 102 803 cofunctional links and showed high predictive power for independent pathway annotations and knockout (KO) phenotypes (4). YeastNet v2 proved to be useful for novel systems biology applications, including the network-based systematic reconstruction of the Gene Ontology of yeast genes (5), the prediction of the phenotypic effect of personal genome variations in yeast (6) and the prediction of epistasis (7). The success of YeastNet for these proof-of-concept novel systems biology studies has demonstrated the power of genome-scale gene networks in biological research. In the 5 years since the release of YeastNet v2, the availability of large-scale yeast data in public databases has continued to grow. The incorporation of these data into the existing network is expected to further improve the functionality of YeastNet. Here, we present an updated version of YeastNet, YeastNet v3, that substantially improves phenotype prediction via several new approaches: (i) an extended and less-biased set of gold-standard cofunctional links for more efficient learning; (ii) the incorporation of a large number of additional large-scale experimental data; (iii) improved algorithms to infer functional associations from each data type; (iv) an improved web interface to serve various routs of novel hypothesis generation based on the principle of guilt-by-association; and (v) the availability of ∼2 million data-specific functional links that can be used to construct alternative integrated networks with user-defined integration methods.

CONSTRUCTION OF YeastNet v3

The cofunctional links of nine different data types (CC, co-citation; CX, co-expression; DC, domain co-occurrence; GN, gene neighbor; GT, genetic interaction; HT, high-throughput protein–protein interaction; LC, literature curated protein–protein interaction; PG, phylogenetic profiles; TS, tertiary structure of protein) in YeastNet v3 are summarized in Table 1. YeastNet v3 includes 5818 genes (∼99% of the yeast coding genome) with 362 512 cofunctional links, which is ∼3.5 times the number of cofunctional links that were included in YeastNet v2. A total of 81 996 links (∼80% of YeastNet v2 links) were retained in the new network. The additional cofunctional links were inferred from new data and algorithms, which are described in the supplementary online methods.
Table 1.

The cofunctional links of nine data types in YeastNet v3

Network descriptionNumber of proteins (coverage of coding genome)Number of functional associations
Co-citation (CC)4355 (74%)82 427
Co-expression (CX)5730 (97%)242 504
Domain co-occurrence (DC)3679 (62%)29 880
Genomic neighbor (GN)1863 (32%)29 475
Genetic interaction (GT)4365 (74%)149 498
High-throughput PPIa (HT)5487 (93%)141 347
Literature curated PPIa (LC)5293 (90%)54 421
Phylogenetic profiles (PG)2463 (42%)54 496
Tertiary structure of protein (TS)1101 (19%)3510
YeastNet v35818 (99%)362 512

aPPI: protein–protein interaction.

The cofunctional links of nine data types in YeastNet v3 aPPI: protein–protein interaction.

ASSESSMENT OF YeastNet v3 FOR IMPROVED PREDICTABILITY

Assessment by pathway annotations

To determine whether the rewired and additional functional links of YeastNet v3 improved the prediction of pathways and phenotypes in yeast, we assessed the predictive power of YeastNet v3 on different data sets. First, YeastNet v3 was assessed by pathway annotations in the Kyoto Encyclopedia of Genes and Genomes (KEGG) (8) database to test whether the measured accuracy of cofunctional links can be generalized. KEGG provides 279 327 positive and 1 385 073 negative gold-standard links to test. Only 10 064 links in the KEGG database (12.5%) overlap with the network construction in YeastNet v3, which guarantees that the validation set is independent from the training set. Precision-recall curves for the networks generated from the individual data types and integrated network are summarized in Figure 1. These curves show a substantial improvement in YeastNet v3 compared with the networks constructed by the individual data types as well as the previous version of YeastNet. Because the same benchmarking and integration methods were used in the current and previous versions of YeastNet, the improvements in the performance of YeastNet v3 are likely due to two important changes in the current version. First, the method of linkage discovery was changed for the phylogenetic profile similarity, genomic neighborhood and genetic interaction data types, and domain co-occurrence analysis was incorporated. Second, the amount of input data has grown significantly since the previous version release. The number of Medline abstracts that cite yeast genes grew from 39 135 to 46 111, and the number of reference genomes for the genome context methods grew from 149 to 2144. We also included six new high-throughput protein-protein interaction (PPI) data sets in YeastNet v3, and 1639 microarray samples were used in the current version of the network compared with the 500 microarray samples incorporated in the earlier version of the network. Therefore, both the algorithm improvements and the data expansion have enhanced the quality of the inferred molecular network.
Figure 1.

Precision-recall curves for YeastNet v3, YeastNet v2 and incorporated individual functional networks for nine distinct data types (CC, co-citation; CX, co-expression; DC, domain co-occurrence; GN, gene neighbor; GT, genetic interaction; HT, high-throughput protein–protein interaction; LC, literature curated protein–protein interaction; PG, phylogenetic profiles; TS, tertiary structure of protein). Precision was calculated from cofunctional links that were derived from the KEGG pathway annotation database, which were largely independent from the links in the network training set. Recall was measured as the percentage of coverage of the 5887 validated coding genes in the yeast genome. Gene pairs for each functional network were ranked by log likelihood scores from the benchmarking process, as described in the text. Precision and recall were calculated in a cumulative manner in which every consecutive 1000 gene pairs were binned (as indicated by each symbol). The plot shows that YeastNet v3 outperforms all other networks, including YeastNet v2.

Precision-recall curves for YeastNet v3, YeastNet v2 and incorporated individual functional networks for nine distinct data types (CC, co-citation; CX, co-expression; DC, domain co-occurrence; GN, gene neighbor; GT, genetic interaction; HT, high-throughput protein–protein interaction; LC, literature curated protein–protein interaction; PG, phylogenetic profiles; TS, tertiary structure of protein). Precision was calculated from cofunctional links that were derived from the KEGG pathway annotation database, which were largely independent from the links in the network training set. Recall was measured as the percentage of coverage of the 5887 validated coding genes in the yeast genome. Gene pairs for each functional network were ranked by log likelihood scores from the benchmarking process, as described in the text. Precision and recall were calculated in a cumulative manner in which every consecutive 1000 gene pairs were binned (as indicated by each symbol). The plot shows that YeastNet v3 outperforms all other networks, including YeastNet v2.

Assessment by quantitative phenotype profile data

The holy grail of genetics is to understand the genetic organization of phenotypes. Therefore, we have pursued the development of gene networks as a platform to dissect the genetics of complex phenotypes. The previous version of YeastNet was able to predict the loss-of-function phenotype (4). We tested whether the improved quality of the current version of the network was able to generate a more sensitive and cost-effective genetic prediction for diverse phenotypes. First, we compared YeastNet v3 with the previous version of YeastNet using KO phenotype data. We used the same set of 100 KO phenotypes collected from the literature that was used for the previous network study (4) and assessed the predictive power of the networks by a receiver operating characteristic (ROC) curve analysis. Based on the leave-one-out method of cross-validation, we measured the prioritizing power of every test gene for each phenotype by the network links among member genes in the network. The resultant ROC curves were summarized by area under the curve (AUC) scores, which are summarized in Figure 2 for 100 tested KO phenotypes. We found that YeastNet v3 shows significant improvements in the prediction of the 100 KO phenotypes compared with the previous network (Figure 2a, P < 1.17 × 10−9, Wilcoxon signed rank sum test)
Figure 2.

Box-and-whisker plots summarize the predictive power of networks for various phenotype data sets: (a) 100 knockout phenotypes (KO); (b) 586 high-dimensional morphology parameters (HDM); and (c) 88 types of chemical/environmental sensitivities (CES). The predictive power of the phenotypes was measured by an ROC curve analysis and summarized as area under the curve (AUC). AUC scores present how well a network recovers the connectivity among genes for a given phenotype, where an AUC of 0.5 indicates a prediction based on chance and an AUC of 1 indicates a perfect prediction. In the given box-and-whisker plots, the boundaries of the box represent the first and third quartiles, the whiskers represent the 10th and 90th percentiles, and the black circles represent individual outliers.

Box-and-whisker plots summarize the predictive power of networks for various phenotype data sets: (a) 100 knockout phenotypes (KO); (b) 586 high-dimensional morphology parameters (HDM); and (c) 88 types of chemical/environmental sensitivities (CES). The predictive power of the phenotypes was measured by an ROC curve analysis and summarized as area under the curve (AUC). AUC scores present how well a network recovers the connectivity among genes for a given phenotype, where an AUC of 0.5 indicates a prediction based on chance and an AUC of 1 indicates a perfect prediction. In the given box-and-whisker plots, the boundaries of the box represent the first and third quartiles, the whiskers represent the 10th and 90th percentiles, and the black circles represent individual outliers. Most, if not all, phenotypes are quantitative such that the loss of a single gene does not shut down an entire system but provides some extent of phenotypic influence. The degree of phenotypic influence varies by gene and follows an approximately normal distribution. We used genome-wide high-dimensional morphology (HDM) profile data (9) and chemical/environmental sensitivity (CES) profile data (10) to test the predictive power of the network for quantitative phenotypes. There were 501 different HDM parameters that were profiled for 4718 genes with KO mutants. We generated 1002 test gene sets using genes with extreme morphological parameter values from both sides of the distribution (with a P-value threshold of 10−4). Among these test gene sets, we used 586 sets that each contained no less than five member genes to assess the predictive power of the network. From the CES database, we selected CES phenotypes that had no less than five sensitive genes with a P-value of 10−7 after five generations, for a total of 88 gene sets. For both quantitative phenotype data sets, we observed a significant improvement in YeastNet v3 compared with the previous network (Figure 2b and c, P < 2.2 × 10−16 for both HDM and CES, Wilcoxon signed rank sum test). From these results, we conclude that the improved network quality has enhanced the effectiveness of YeastNet v3 for predictive genetics research.

WEB-BASED SERVICES FOR YeastNet v3

Network-guided novel hypothesis generation

The principle of guilt-by-association has been an effective method to generate novel hypothesis based on molecular networks. The previous version of YeastNet was used to successfully identify novel ribosomal biogenesis genes (11) and shmoo localized proteins (12). Both the quality and usability of a network are important to promote network-guided predictive genetics among experimental biologists. Therefore, we completely re-implemented the web interface for YeastNet v3 to provide three options for hypothesis generation under ‘network search’ (Figure 3): (i) find new members of a pathway; (ii) infer functions from network neighbors; and (iii) find modulators for a cell state.
Figure 3.

A schematic figure of the three options for network-guided hypothesis generation that are implemented in the YeastNet v3 web server.

A schematic figure of the three options for network-guided hypothesis generation that are implemented in the YeastNet v3 web server. From the ‘Find new members of a pathway’ option, users can predict novel candidate genes for a query functional concept (e.g. function, pathway or phenotype) based on the network associations of the user-provided genes that are known to be related to the query functional concept. Users can also visualize a network for the user-provided genes and their neighbors on YeastNet v3 using Cytoscape web (13). From the ‘Infer functions from network neighbors’ option, users can also predict novel functions for uncharacterized genes. If a user submits a query gene, then YeastNet looks up its network neighbors, collects their annotated Gene Ontology biological process terms and lists the 30 most enriched annotations. YeastNet also provides a tool to predict modulators for a cell state. Yeast cells change their physiological state in response to harsh environments or chemical treatments. Many genetic modulators are involved in the transition between cell states. Although modulators are often transcriptionally activated, many modulators do not significantly alter their transcription levels during cell state modulation. Therefore, the expression profile may be insufficient to identify the genetic modulators for a cell state. Our hypothesis is that the cell state modulators are functionally coupled in YeastNet. Therefore, network links could identify modulators with no transcriptional change by their connections to genes with transcriptional changes. For the ‘Find modulators for a cell state’ option, the user submits a set of differentially expressed genes (DEGs) for a cell state. YeastNet v3 predefines gene sets for each gene in the network with their network neighbors such that the name of the gene set was assigned by the central hub gene. For the given query DEGs, the network server calculates the significance of the association between the DEGs and each gene set by hypergeometric probability. If a gene set is associated with the DEGs, its central hub gene is a candidate modulator for the query cell state. For example, we submitted 66 DEGs upon the treatment of an antifungal drug, sampangine (14). Because sampangine perturbs heme metabolism, the treatment of the drug shows a similar transcriptional response to hypoxia. The results showed that many genes that are required for anaerobic growth were highly ranked. However, many of the top candidates did not show a transcriptional response to sampangine, but did show strong associations to the DEGs by network links. For example, FLR1, a multidrug transporter, which might modulate drug response state was highly ranked (sixth) by YeastNet, but did not show a transcriptional response. Therefore, YeastNet provides a complement to DEG analysis to find cell state modulators.

Availability of data-specific functional link information

A researcher may want to investigate alternative methods to integrate networks or perform in-depth analysis for each type of functional association between genes. To meet these potential demands, YeastNet v3 releases network edge information for data-specific networks in addition to the integrated network. This information can be downloaded from the ‘Network-download’ page. YeastNet v3 provides networks for all individual data sets (50 sets for CX, 10 sets for GT and 12 sets for HT), which include a total of ∼2 million edges.

CONCLUSIONS

YeastNet v3 is a freely available database that provides a comprehensive cofunctional gene network for S. cerevisiae. This version of YeastNet has been substantially improved from previous versions by including novel linkage discovery algorithms, additional input data and versatile web-based analysis tools. YeastNet users will be able to generate diverse biological hypotheses with higher accuracy and sensitivity. We also expect that YeastNet v3 will improve other related networks and network construction tools using orthology-based links from YeastNet (15). We are confident that YeastNet v3 will provide a significant resource for predictive genetics and continue to facilitate new challenges in systems biology.

SUPPLEMENTARY DATA

Supplementary Data are available at NAR Online.

FUNDING

National Research Foundation of Korea [2010-0017649, 2012M3A9B4028641, 2012M3A9C7050151] and the Next-Generation BioGreen 21 Program [SSAC, PJ009029 to I.L.]. Funding for open access charge: National Research Grant. Conflict of interest statement. None declared.
  14 in total

1.  Predicting genetic modifier loci using functional gene networks.

Authors:  Insuk Lee; Ben Lehner; Tanya Vavouri; Junha Shin; Andrew G Fraser; Edward M Marcotte
Journal:  Genome Res       Date:  2010-06-09       Impact factor: 9.043

2.  High-dimensional and large-scale phenotyping of yeast mutants.

Authors:  Yoshikazu Ohya; Jun Sese; Masashi Yukawa; Fumi Sano; Yoichiro Nakatani; Taro L Saito; Ayaka Saka; Tomoyuki Fukuda; Satoru Ishihara; Satomi Oka; Genjiro Suzuki; Machika Watanabe; Aiko Hirata; Miwaka Ohtani; Hiroshi Sawai; Nicolas Fraysse; Jean-Paul Latgé; Jean M François; Markus Aebi; Seiji Tanaka; Sachiko Muramatsu; Hiroyuki Araki; Kintake Sonoike; Satoru Nogami; Shinichi Morishita
Journal:  Proc Natl Acad Sci U S A       Date:  2005-12-19       Impact factor: 11.205

3.  The chemical genomic portrait of yeast: uncovering a phenotype for all genes.

Authors:  Maureen E Hillenmeyer; Eula Fung; Jan Wildenhain; Sarah E Pierce; Shawn Hoon; William Lee; Michael Proctor; Robert P St Onge; Mike Tyers; Daphne Koller; Russ B Altman; Ronald W Davis; Corey Nislow; Guri Giaever
Journal:  Science       Date:  2008-04-18       Impact factor: 47.728

4.  A probabilistic functional network of yeast genes.

Authors:  Insuk Lee; Shailesh V Date; Alex T Adai; Edward M Marcotte
Journal:  Science       Date:  2004-11-26       Impact factor: 47.728

5.  Role of heme in the antifungal activity of the azaoxoaporphine alkaloid sampangine.

Authors:  Ameeta K Agarwal; Tao Xu; Melissa R Jacob; Qin Feng; Michael C Lorenz; Larry A Walker; Alice M Clark
Journal:  Eukaryot Cell       Date:  2007-12-21

6.  Cytoscape Web: an interactive web-based network browser.

Authors:  Christian T Lopes; Max Franz; Farzana Kazi; Sylva L Donaldson; Quaid Morris; Gary D Bader
Journal:  Bioinformatics       Date:  2010-07-23       Impact factor: 6.937

7.  An improved, bias-reduced probabilistic functional gene network of baker's yeast, Saccharomyces cerevisiae.

Authors:  Insuk Lee; Zhihua Li; Edward M Marcotte
Journal:  PLoS One       Date:  2007-10-03       Impact factor: 3.240

8.  Rational extension of the ribosome biogenesis pathway using network-guided genetics.

Authors:  Zhihua Li; Insuk Lee; Emily Moradi; Nai-Jung Hung; Arlen W Johnson; Edward M Marcotte
Journal:  PLoS Biol       Date:  2009-10-06       Impact factor: 8.029

9.  JiffyNet: a web-based instant protein network modeler for newly sequenced species.

Authors:  Eiru Kim; Hanhae Kim; Insuk Lee
Journal:  Nucleic Acids Res       Date:  2013-05-17       Impact factor: 16.971

10.  Broad network-based predictability of Saccharomyces cerevisiae gene loss-of-function phenotypes.

Authors:  Kriston L McGary; Insuk Lee; Edward M Marcotte
Journal:  Genome Biol       Date:  2007       Impact factor: 13.583

View more
  33 in total

1.  RiceNet v2: an improved network prioritization server for rice genes.

Authors:  Tak Lee; Taeyun Oh; Sunmo Yang; Junha Shin; Sohyun Hwang; Chan Yeong Kim; Hyojin Kim; Hongseok Shim; Jung Eun Shim; Pamela C Ronald; Insuk Lee
Journal:  Nucleic Acids Res       Date:  2015-03-26       Impact factor: 16.971

Review 2.  An overview of bioinformatics methods for modeling biological pathways in yeast.

Authors:  Jie Hou; Lipi Acharya; Dongxiao Zhu; Jianlin Cheng
Journal:  Brief Funct Genomics       Date:  2015-10-17       Impact factor: 4.241

3.  Compact Integration of Multi-Network Topology for Functional Analysis of Genes.

Authors:  Hyunghoon Cho; Bonnie Berger; Jian Peng
Journal:  Cell Syst       Date:  2016-11-23       Impact factor: 10.304

4.  Active Interaction Mapping Reveals the Hierarchical Organization of Autophagy.

Authors:  Michael H Kramer; Jean-Claude Farré; Koyel Mitra; Michael Ku Yu; Keiichiro Ono; Barry Demchak; Katherine Licon; Mitchell Flagg; Rama Balakrishnan; J Michael Cherry; Suresh Subramani; Trey Ideker
Journal:  Mol Cell       Date:  2017-01-26       Impact factor: 17.970

5.  YQFC: a web tool to compare quantitative biological features between two yeast gene lists.

Authors:  Wei-Sheng Wu; Lai-Ji Wang; Han-Chen Yen; Yan-Yuan Tseng
Journal:  Database (Oxford)       Date:  2020-01-01       Impact factor: 3.451

6.  Network-assisted genetic dissection of pathogenicity and drug resistance in the opportunistic human pathogenic fungus Cryptococcus neoformans.

Authors:  Hanhae Kim; Kwang-Woo Jung; Shinae Maeng; Ying-Lien Chen; Junha Shin; Jung Eun Shim; Sohyun Hwang; Guilhem Janbon; Taeyup Kim; Joseph Heitman; Yong-Sun Bahn; Insuk Lee
Journal:  Sci Rep       Date:  2015-03-05       Impact factor: 4.379

7.  EcoliNet: a database of cofunctional gene network for Escherichia coli.

Authors:  Hanhae Kim; Jung Eun Shim; Junha Shin; Insuk Lee
Journal:  Database (Oxford)       Date:  2015-02-02       Impact factor: 3.451

8.  The metabolic background is a global player in Saccharomyces gene expression epistasis.

Authors:  Mohammad Tauqeer Alam; Aleksej Zelezniak; Michael Mülleder; Pavel Shliaha; Roland Schwarz; Floriana Capuano; Jakob Vowinckel; Elahe Radmanesfahar; Antje Krüger; Enrica Calvani; Steve Michel; Stefan Börno; Stefan Christen; Kiran Raosaheb Patil; Bernd Timmermann; Kathryn S Lilley; Markus Ralser
Journal:  Nat Microbiol       Date:  2016-02-01       Impact factor: 17.745

9.  WormNet v3: a network-assisted hypothesis-generating server for Caenorhabditis elegans.

Authors:  Ara Cho; Junha Shin; Sohyun Hwang; Chanyoung Kim; Hongseok Shim; Hyojin Kim; Hanhae Kim; Insuk Lee
Journal:  Nucleic Acids Res       Date:  2014-05-09       Impact factor: 16.971

10.  Inferring gene ontologies from pairwise similarity data.

Authors:  Michael Kramer; Janusz Dutkowski; Michael Yu; Vineet Bafna; Trey Ideker
Journal:  Bioinformatics       Date:  2014-06-15       Impact factor: 6.937

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.