| Literature DB >> 21966263 |
T M Murali1, Matthew D Dyer, David Badger, Brett M Tyler, Michael G Katze.
Abstract
HIV Dependency Factors (HDFs) are a class of human proteins that are essential for HIV replication, but are not lethal to the host cell when silenced. Three previous genome-wide RNAi experiments identified HDF sets with little overlap. We combine data from these three studies with a human protein interaction network to predict new HDFs, using an intuitive algorithm called SinkSource and four other algorithms published in the literature. Our algorithm achieves high precision and recall upon cross validation, as do the other methods. A number of HDFs that we predict are known to interact with HIV proteins. They belong to multiple protein complexes and biological processes that are known to be manipulated by HIV. We also demonstrate that many predicted HDF genes show significantly different programs of expression in early response to SIV infection in two non-human primate species that differ in AIDS progression. Our results suggest that many HDFs are yet to be discovered and that they have potential value as prognostic markers to determine pathological outcome and the likelihood of AIDS development. More generally, if multiple genome-wide gene-level studies have been performed at independent labs to study the same biological system or phenomenon, our methodology is applicable to interpret these studies simultaneously in the context of molecular interaction networks and to ask if they reinforce or contradict each other.Entities:
Mesh:
Substances:
Year: 2011 PMID: 21966263 PMCID: PMC3178628 DOI: 10.1371/journal.pcbi.1002164
Source DB: PubMed Journal: PLoS Comput Biol ISSN: 1553-734X Impact factor: 4.475
Figure 1Cross validation results on the unweighted human PPI network.
(a) Histograms of area under precision-recall curve for all algorithm-dataset combinations. Each group of vertical bars corresponds to one algorithm. Error bars indicate one standard deviation from the mean, computed over 10 independent runs of 2-fold cross validation. Algorithm abbreviations: Hopfield (H), Local (L), SinkSource (SS), FunctionalFlow with 1 phase (FF 1), FunctionalFlow with 7 phases (FF 7), Local without negative examples (L+), SinkSource without negative examples (SS+), and PRINCE (P). Dataset abbreviations: Brass (B), Konig (K), Zhou (Z), Brass or Konig or Zhou (BKZ). (b) Precision-recall curves for the SinkSource algorithm on the four datasets. At each value of recall, error bars indicate one standard deviation in the value of precision. (c) Precision-recall curves for the SinkSource+ algorithm on the four datasets. (d) Precision-recall curves for all algorithms on the BKZ dataset.
Statistics on the 10 clusters with the largest number of PPIs reported by MCODE.
| Ranking by #PPIs | #proteins | #PPIs | Density | Median rank | Minimum rank | Maximum rank | #HIV interactors | #BKZ HDFs |
| 1 | 112 | 5684 | 0.91 | 44 | 1 | 210 | 34 | 33 |
| 2 | 108 | 4701 | 0.81 | 408 | 164 | 588 | 11 | 12 |
| 3 | 60 | 1770 | 1 | 222 | 152 | 419 | 5 | 0 |
| 4 | 57 | 1596 | 1 | 138 | 107 | 230 | 2 | 10 |
| 5 | 29 | 331 | 0.81 | 507 | 452 | 659 | 6 | 3 |
| 6 | 24 | 273 | 0.99 | 812 | 730 | 978 | 4 | 1 |
| 7 | 26 | 264 | 0.81 | 76 | 31 | 178 | 2 | 11 |
| 8 | 20 | 182 | 0.96 | 141 | 80 | 239 | 35 | 20 |
| 9 | 37 | 304 | 0.46 | 264 | 69 | 584 | 46 | 9 |
| 10 | 56 | 443 | 0.29 | 854 | 779 | 998 | 11 | 0 |
The ten clusters with the largest number of PPIs reported by MCODE and the functions that each is the most enriched in.
| Ranking by #PPIs | #proteins | #HIV interactors | Highly enriched functions |
| #proteins with function | #BKZ HDFs | #BKZ HDFs with function |
| 1 | 112 | 34 | RNA metabolic process | 1.4×10−69 | 107 | 33 | 29 |
| Spliceosomal complex | 2.7×10−36 | 52 | 14 | ||||
| 2 | 108 | 11 | Ribosome | 7.1×10−96 | 75 | 12 | 0 |
| Translational elongation | 9.5×10−88 | 75 | 0 | ||||
| 3 | 60 | 5 | Kinetochore | 2.2×10−42 | 33 | 0 | 0 |
| 4 | 57 | 2 | Respiratory chain | 2.8×10−80 | 47 | 10 | 9 |
| NADH dehydrogenase complex | 2.9×10−75 | 34 | 6 | ||||
| 5 | 24 | 4 | small GTPase mediated signal transduction | 1.6×10−9 | 21 | 3 | 1 |
| 6 | 29 | 6 | DNA replication initiation | 3.6×10−14 | 13 | 1 | 0 |
| 7 | 20 | 2 | Transcription factor binding | 3.4×10−10 | 13 | 11 | 7 |
| Transcription initiation | 5.3×10−9 | 12 | 6 | ||||
| 8 | 60 | 35 | Proteasome complex | 6.8×10−29 | 18 | 20 | 13 |
| 9 | 37 | 46 | Proteasome complex | 2.3×10−33 | 22 | 9 | 0 |
| 10 | 39 | 11 | MHC protein complex | 9.2×10−17 | 10 | 0 | 0 |
| Cell cycle process | 5.2×10−7 | 13 |
Some columns are repeated from Table 1 for the sake of convenience.
*(in the column titled “#BKZ HDFS”) indicates that the overlap BKZ HDFs with clusters computed by MCODE is statistically significant at the 0.05 level.
Figure 2Plots of the fraction of BKZ or of predicted HDFs that are also differentially expressed in the AGM-PTM comparison: (a) SinkSource+ and (b) SinkSource.
There are six plots for each algorithm, with one plot for each tissue-day combination. In each plot, the x-axis corresponds to the rank of a predicted HDF. At each rank k on the x-axis, the y-axis plots the fraction of HDFs with the top k ranks that are also differentially expressed. Note that the scale of the y-axis changes from plot to plot. The red and green curves display the results for predicted HDFs, at different prediction ranks. Red values indicate statistically significant overlaps, at the 0.05 level, between predicted HDFs and differentially-expressed genes. Green values indicate overlaps that are not statistically significant. Figures S9 and S10 plot the corresponding p-values. The horizontal dotted blue line in each plot denotes the overlap of BKZ HDFs with the corresponding set of differentially-expressed genes.
The overlap of the genes reported by each siRNA study with the set of human orthologs of essential mouse genes.
| Study name | #genes | #genes that are also essential |
|
| Brass | 275 | 5 | 0.807 |
| Konig | 296 | 14 | 0.013 |
| Zhou | 375 | 12 | 0.2 |
| Brass, Konig, or Zhou | 908 | 28 | 0.112 |
The number of genes in each set and the number in each set that are also in the PPI network.
| Study name | #genes | #genes that are also in the PPI network |
| Brass (B) | 275 | 157 |
| Konig (K) | 296 | 199 |
| Zhou (Z) | 375 | 215 |
| Brass, Konig, or Zhou (BKZ) | 908 | 545 |
| Essential genes | 483 | 373 |
The seven algorithms tested, whether they use negative examples, the parameters they use, and the values of the parameters tested.
| Algorithm | Uses negative examples | Parameters | Values tested |
| SinkSource | Yes | None | |
| Local | Yes | None | |
| Hopfield | Yes | None | |
| Local+ | No | None | |
| SinkSource+ | No | λ = weight of edges incident on artificial negative example | 0.01. 0.1, 0.5, 1, 2, 10, and 100 |
| FunctionalFlow | No | Number of phases | 1, 3, 5, 7 |
| PRINCE | No | α = trade-off between contributions from neighbors and prior information | 0.1 to 0.9 in steps of 0.1 |