| Literature DB >> 18053250 |
Kriston L McGary1, Insuk Lee, Edward M Marcotte.
Abstract
We demonstrate that loss-of-function yeast phenotypes are predictable by guilt-by-association in functional gene networks. Testing 1,102 loss-of-function phenotypes from genome-wide assays of yeast reveals predictability of diverse phenotypes, spanning cellular morphology, growth, metabolism, and quantitative cell shape features. We apply the method to extend a genome-wide screen by predicting, then verifying, genes whose disruption elongates yeast cells, and to predict human disease genes. To facilitate network-guided screens, a web server is available http://www.yeastnet.org.Entities:
Mesh:
Substances:
Year: 2007 PMID: 18053250 PMCID: PMC2246260 DOI: 10.1186/gb-2007-8-12-r258
Source DB: PubMed Journal: Genome Biol ISSN: 1474-7596 Impact factor: 13.583
Figure 1Overview of guilt-by-association phenotype prediction. Guilt-by-association phenotype prediction employs a functional gene network, represented here as circles (genes) connected by lines (functional linkages), and a seed set of genes (blue filled circles) whose disruption is known to give rise to the phenotype of interest. Neighboring genes in a functional gene network (red filled circles) are candidates for also giving rise to the phenotype. Candidates are prioritized by the sum of their network linkage weights to the set of seed genes. A gene strongly linked to multiple seed genes will thus rank more highly than a gene weakly linked to a single seed gene. Networks in Figures 1, 5, and 7 were drawn with Cytoscape [73].
Figure 2Diverse yeast gene loss-of-function phenotypes are predictable using guilt-by-association in a functional gene network. Predictability is measured in a receiver operating characteristic plot of the true positive rate (sensitivity) versus false positive rate (1 - specificity) for predicting genes giving rise to ten specific loss-of-function phenotypes, as well as for essential genes whose disruption produces nonviable yeast [4]. For each phenotype, each gene in the yeast genome was prioritized by the sum of the weights of its network linkages to the seed genes associated with the phenotype. Genes with higher scores are more tightly linked to the seed set and therefore more likely to give rise to the phenotype. Each phenotype was evaluated using leave-one-out cross-validation, omitting genes from the seed set for the purposes of evaluation. More predictable phenotypes tend toward the top-left corner of the graph; random predictability is indicated by the diagonal. For clarity, the line connecting the final point of each graph to the top right corner has been omitted. FN, false negative; FP, false positive; TN, true negative; TP, true positive.
Predictability of 100 yeast gene deletion phenotypes
| Phenotypea | AUC | Seed genes with phenotype ( | Seed genes in network ( | Ref. |
| Caspofungin sensitive | 0.996 | 20 | 18 | [36] |
| Increased resistance to calcofluor white | 0.982 | 10 | 10 | [33] |
| Unipolar budding | 0.941 | 10 | 10 | [68] |
| CPY secretion (3) | 0.937 | 46 | 44 | [34] |
| Cell cycle arrest defective | 0.930 | 8 | 8 | [74] |
| UVC sensitive (high) | 0.919 | 15 | 14 | [75] |
| Sensitivity at 15 generations in galactose | 0.908 | 17 | 14 | [4] |
| CANR mutator (high) | 0.904 | 18 | 18 | [76] |
| Haploinsufficient in rich medium (YPD) | 0.898 | 184 | 184 | [77] |
| Cellular chitin level increased (3) | 0.873 | 22 | 21 | [33] |
| Bleomycin resistant (3) | 0.871 | 5 | 4 | [37] |
| Morphology: branched (diploid) | 0.870 | 5 | 5 | [4] |
| Sensitivity at 15 generations in 1.5 M sorbitol | 0.867 | 6 | 4 | [4] |
| Caspofungin resistant | 0.866 | 8 | 8 | [36] |
| Inviable (essential) | 0.845 | 1100 | 1027 | [4,30] |
| Shortened telomeres (3) | 0.843 | 20 | 18 | [32] |
| Sensitivity at 15 generations in minimal +his +leu +ura medium | 0.843 | 77 | 70 | [4] |
| MMS sensitive (3) | 0.837 | 78 | 73 | [78] |
| Cellular chitin level reduced (2) | 0.835 | 17 | 17 | [33] |
| Petite | 0.833 | 179 | 166 | [79] |
| Sensitivity at 5 generations in minimal +his +leu +ura medium | 0.827 | 62 | 51 | [4] |
| Long telomeres (3) | 0.824 | 6 | 6 | [32] |
| Decreased calcofluor white resistance | 0.814 | 65 | 63 | [77,80] |
| Growth defect on a fermentable carbon source | 0.812 | 257 | 249 | [81] |
| Transposon cDNA expression changed (high) | 0.810 | 27 | 26 | [82] |
| Morphology: clumpy (3)(diploid) | 0.802 | 18 | 18 | [4] |
| Gamma radiation sensitive (3) | 0.793 | 31 | 31 | [83] |
| Cell cycle arrest defective and defective shmoo | 0.782 | 30 | 29 | [74] |
| Sensitivity at 5 generations in galactose | 0.781 | 11 | 10 | [4] |
| Small (haploid) | 0.778 | 215 | 192 | [84] |
| Retrotransposition reduced | 0.772 | 99 | 89 | [82] |
| K1 killer toxin sensitive (40%) | 0.770 | 72 | 72 | [80] |
| Increased iron uptake | 0.757 | 76 | 70 | [35] |
| Growth defect on a non-fermentable carbon source | 0.755 | 498 | 448 | [81] |
| Gentamycin sensitive (high) | 0.754 | 11 | 11 | [85] |
| Proteasome inhibitor sens (high) | 0.753 | 22 | 22 | [86] |
| Reduced fitness in rich medium (YPD) | 0.748 | 891 | 872 | [77] |
| Mycophenolic acid sensitive | 0.746 | 38 | 33 | [87] |
| Axial budding | 0.745 | 4 | 4 | [68] |
| Morphology: elongate (3) (diploid) | 0.739 | 77 | 73 | [4] |
| Sporulation deficient | 0.738 | 261 | 244 | [88] |
| Random budding (high) | 0.737 | 74 | 72 | [68] |
| Large (haploid) | 0.728 | 227 | 205 | [84] |
| Reduced sporulation (3) (normal respiration) | 0.722 | 136 | 119 | [89] |
| Bleomycin sensitive (4) | 0.721 | 58 | 55 | [37] |
| Sensitivity at 5 generations in synthetic complete - lys medium | 0.715 | 23 | 22 | [4] |
| Decreased rapamycin resistance | 0.707 | 272 | 256 | [90] |
| 0.706 | 19 | 19 | [79] | |
| Sensitivity at 5 generations in 1.5 M sorbitol | 0.704 | 13 | 11 | [4] |
| Decreased wortmannin resistance | 0.703 | 89 | 85 | [90] |
| Sensitivity at 20 generations in 1 M NaCl | 0.703 | 63 | 59 | [4] |
| K1 killer toxin resistant (40%) | 0.698 | 19 | 18 | [80] |
| Morphology: round (3) (diploid) | 0.696 | 105 | 99 | [4] |
| 0.694 | 28 | 26 | [79] | |
| Sensitivity at 5 generations in synthetic complete - trp medium | 0.694 | 48 | 45 | [4] |
| Sensitivity at 5 generations in 1 M NaCl | 0.693 | 60 | 56 | [4] |
| Rapamycin resist (2) | 0.692 | 26 | 26 | [91] |
| Reduced iron uptake | 0.688 | 5 | 5 | [35] |
| Rate of growth loss of growth in 0.85 M NaCl | 0.682 | 212 | 189 | [92] |
| Sensitivity at 5 generations in medium of pH 8 | 0.677 | 102 | 93 | [4] |
| Sensitivity at 15 generations in medium of pH 8 | 0.676 | 128 | 115 | [4] |
| Morphology: small (3)(diploid) | 0.672 | 79 | 69 | [4] |
| Sensitivity at 15 generations in 10 uM nystatin | 0.672 | 28 | 27 | [4] |
| Morphology: large (3)(diploid) | 0.669 | 88 | 80 | [4] |
| Reduced glycogen storage (2) | 0.666 | 44 | 41 | [93] |
| Sensitivity at 5 generations in 10 uM nystatin | 0.666 | 124 | 108 | [4] |
| Increased rapamycin resistance | 0.662 | 114 | 100 | [90] |
| Morphology: unusual shmoo (haploid) | 0.661 | 29 | 25 | [74] |
| Morphology: polarized bud growth (haploid) | 0.657 | 5 | 5 | [74] |
| Wortmannin resistant (5) | 0.656 | 25 | 23 | [94] |
| Sensitivity at 5 generations in synthetic complete - thr medium | 0.647 | 31 | 29 | [5] |
| Enhanced glycogen storage (2) | 0.645 | 61 | 55 | [93] |
| Proteasome inhibitor resistant | 0.642 | 7 | 6 | [86] |
| Reduced spores per ascus | 0.641 | 37 | 34 | [89] |
| Rate of growth sensitivity in 0.85 M NaCl | 0.629 | 209 | 191 | [92] |
| Morphology: football (3) (diploid) | 0.628 | 59 | 53 | [5] |
| Germination deficient | 0.627 | 158 | 147 | [88] |
| Sporulation promoting | 0.622 | 102 | 98 | [88] |
| 6AU sensitive (3) | 0.618 | 28 | 26 | [95] |
| Increased wortmannin resistance | 0.617 | 80 | 75 | [90] |
| Morphology: elongated (haploid) | 0.603 | 110 | 101 | [74] |
| Rapamycin sensitive (4) | 0.597 | 20 | 20 | [91] |
| Efficiency of growth sensitivity in 0.85 M NaCl | 0.597 | 65 | 58 | [92] |
| Decreased rapamycin resistance | 0.597 | 8 | 7 | [90] |
| Slow growth in YPD (16× below WT) | 0.585 | 23 | 22 | [4] |
| MPA sensitive (3) | 0.563 | 24 | 22 | [95] |
| Morphology: round (haploid) | 0.552 | 13 | 11 | [74] |
| Efficiency of growth resistance in 0.85 M NaCl | 0.541 | 44 | 40 | [92] |
| Sensitivity at 5 generations in synthetic complete medium | 0.531 | 88 | 78 | [5] |
| Morphology: large (haploid) | 0.527 | 23 | 21 | [74] |
| Adaptation time loss of growth in 0.85 M NaCl | 0.526 | 103 | 91 | [92] |
| Adaptation time sensitivity in 0.85 M NaCl | 0.521 | 284 | 259 | [92] |
| Decreased sensitivity to the anticancer drug, cisplatin | 0.512 | 22 | 19 | [96] |
| Morphology: chain (diploid) | 0.485 | 5 | 5 | [5] |
| Morphology: small (haploid) | 0.480 | 94 | 89 | [74] |
| Rate of growth resistance in 0.85 M NaCl | 0.479 | 59 | 49 | [92] |
| Morphology: clumped (haploid) | 0.479 | 32 | 28 | [74] |
| Adaptation time resistance in 0.85 M NaCl | 0.465 | 69 | 60 | [92] |
| Efficiency of growth loss of growth in 0.85 M NaCl | 0.464 | 23 | 21 | [92] |
| Morphology: pointed (haploid) | 0.453 | 99 | 88 | [74] |
aNumbers in parentheses indicate threshold applied to generate seed set; for instance, '(3)' indicates '+++' or '---', as appropriate.
Figure 3Loss-of-function phenotypes are predicted significantly better than random expectation. Here, predictability is measured as the area under a receiver operating characteristic (ROC) curve (AUC), measuring the AUC for each of 100 yeast phenotypes observed in genome-wide screens and plotting the resulting AUC distributions. Real phenotypes are significantly more predictable than size-matched random gene sets. At the left of each box-and-whisker plot, the center of the blue diamond indicates the AUC mean, the top and bottom of the diamond indicate the 95% confidence interval, and the accompanying solid vertical line indicates ± 2 standard deviations. The bottom, middle, and top horizontal lines of the box-and-whisker plots represent the first quartile, the median, and the third quartile of AUCs, respectively; whiskers indicate 1.5 times the interquartile range. Red plus signs represent individual outliers.
Figure 4A plot of seed set size versus predictability of the phenotype shows no significant correlation. Thus, there does not appear to be an intrinsic limitation for applying network-guided reverse genetics even when seed set size is small. Each filled circle indicates the prediction strength (area under the receiver operating characteristic [ROC] curve, as calculated in Figure 3) of one of the 100 loss-of-function phenotypes relative to the number of genes in that seed set.
Figure 5Relative predictive power of functional and physical protein networks. (a) Median values of predictive power (area under the receiver operating characteristic [ROC] curve [AUC]) across 100 loss-of-function phenotypes are plotted versus the median fraction of each seed gene set covered by a network (coverage; measured as the fraction of seed genes with at least one linkage in the network). Five networks are compared: the functional yeast network (YeastNet v. 2 [24]) and four versions of the network of yeast physical protein interactions (Database of Interacting Proteins [DIP] [45], Probabilistic Integrated Co-complex [PICO] [29], Munich Information Center for Protein Sequences [MIPS] physical complexes [44], and Collins and coworkers [43]). DIP, PICO, and YeastNet are each evaluated at two reported confidence thresholds. The YeastNet functional gene network shows considerably higher predictive power than for the networks composed only of physical interactions; the full YeastNet shows higher predictive power than a more confident core set of the top 47,000 linkages, indicating that the lower confidence linkages nonetheless add predictive power. Error bars indicate the first and third quartiles. Panels b and c show example seed gene sets (green circles) and their network connections, indicating functional linkages in grey lines, physical interactions in thin black lines, and both functional and physical interactions in thick black lines. (b) Genes whose deletion increases cellular chitin levels [33] (AUC = 0.87), whose prediction relies upon a mix of physical and functional interactions. (c) Genes whose deletion confers sensitivity at 5 generations in synthetic complete medium lacking threonine [4] (AUC = 0.65), whose prediction derives predominantly from functional linkages.
Figure 6Lower probability linkages continue to improve predictive accuracy. The continued improvement of predictions, albeit with diminishing returns, is shown in a plot of the predictive accuracy (median area under the receiver operating characteristic [ROC] curve across the 100 phenotypes, calculated as in Figure 3) versus median network coverage of the 100 phenotype sets, as calculated for the top-ranked 20,000 (20 K), 40,000 (40 K), 60,000 (60 K), 80,000 (80 K), and 100,000 (100 K) linkages in YeastNet v. 2. This trend derives from the fact that all links in this network have at least a 60% probability of linking genes in the same pathway. The probabilistic nature of the network means that low confidence linkages are unlikely to undercut high confidence linkages during phenotype prediction because the links are weighted according to the strength of the evidence supporting them. Error bars indicate the first and third quartiles.
Figure 7Network-guided extension of a genetic screen. Guilt-by-association (GBA) was applied to predict essential yeast genes whose disruption resulted in elongated yeast cells, based on the genes' network connectivity to a seed set of 77 nonessential genes already known to cause cell elongation when deleted [4]. (a) Five examples of successful predictions, observed in yeast strains carrying tetracycline downregulatable conditional alleles [47] of the essential genes TAF9, MED6, MED7, SWI1, and RPO21. In contrast, conditional downregulation of an unrelated essential gene, SCM3, caused no such cell elongation. (b) Sixteen out of 33 tested essential genes (yellow circles) showed elongated cell phenotypes on the basis of their connections to the seed set genes (green circles), with particular enrichment for genes associated with RNA polymerase II transcriptional initiation and the mediator complex. The color of the edge between two genes indicates the source of evidence supporting the functional link: thick black, multiple types of evidence; blue, affinity purification/mass spectrometry; green, literature mining by co-citation; cyan, gene neighbors or tertiary structure; pink, literature curated physical interaction; and red, genetic interaction.
Figure 8Network-based prediction of quantitative cell morphology phenotypes. A wide variety of phenotypes based upon quantitative yeast cell shape and intracellular features [46] are predictable, as shown for the ten phenotypes in this receiver operating characteristic (ROC) analysis (selected from S. cerevisiae Morphology Database [SCMD] phenotypes with area under the ROC curve [AUC] > 0.68). For each of the features, the 40 genes whose deletion mutants show either the 40 highest or 40 lowest values for that quantitative feature (indicated by 'high' or 'low', respectively) were selected as the seed gene set. Predictability was evaluated using ROC analysis as in Figure 2, plotting the true positive prediction rate versus false positive rate, using leave-one-out cross-validation. For clarity, the line connecting the final point of each graph to the top right corner has been omitted. Labels of features are adapted for clarity from the SCMD [50]; the SCMD labels A, A1B, and C represent unbudded cells, budded cell with one nucleus in mother cell, and large-budded post-mitotic cells with nuclei in both mother and daughter cell, respectively. Ratio measurements refer to proportions across a population of cells. FN, false negative; FP, false positive; TN, true negative; TP, true positive.
Figure 9Quantitative cell morphology phenotypes are predicted significantly better than random expectation. In contrast, genes whose disruption decreases population co-efficient of variance (CV) were not predictable. (a) A histogram plotting the distribution of the area under the receiver operating characteristic (ROC) curve (AUC) values for 562 quantitative morphological phenotypes shows a significantly higher proportion of high AUC values than for 1,000 size-matched random gene sets. (b) Separate analyses of phenotypes associated with morphologic features and phenotypes associated with cell-to-cell variability in the morphologic features reveals asymmetry in predictability. Sets of genes whose disruption causes the 40 largest or smallest mean values of a morphological feature (middle plots) are significantly more predictable than random gene sets (left side). By contrast, although the sets of genes whose disruption most increase the CV tend to be predictable (high AUC), those that most decrease the CV are not (low AUC). Box-and-whisker plots are drawn as in Figure 3. (c) A comparison of the median phenotypic CVs observed for deletion strains versus replicate analyses of wild-type cells shows that deletion strains with the most reduced CVs are essentially wild-type-like in character, whereas those with the most increased CVs show significantly more cell-to-cell variability than wild-type cells. These latter knockout strains carry deletions for genes predominantly involved in maintaining genomic integrity. This trend is therefore likely to have arisen from nonclonal genetic variation in these strains, recapitulating the classic mutator phenotype.
Figure 10Yeast genes with human orthologs linked to the same diseases are predicted better than random expectation. Predictability is measured as the area under a receiver operating characteristic (ROC) curve (AUC), as in Figure 3, measuring the AUC for each of 28 human diseases reported in the Online Mendelian Inheritance in Man (OMIM) disease database [51] that have four or more yeast orthologs annotated in the yeast function network and plotting the resulting AUC distributions. Real disease gene sets are significantly more predictable than size-matched random gene sets drawn from the set of yeast-human orthologs. Box plots are drawn as in Figure 3.