| Literature DB >> 25691914 |
Sandeep S Amberkar1, Lars Kaderali2.
Abstract
BACKGROUND: Big data is becoming ubiquitous in biology, and poses significant challenges in data analysis and interpretation. RNAi screening has become a workhorse of functional genomics, and has been applied, for example, to identify host factors involved in infection for a panel of different viruses. However, the analysis of data resulting from such screens is difficult, with often low overlap between hit lists, even when comparing screens targeting the same virus. This makes it a major challenge to select interesting candidates for further detailed, mechanistic experimental characterization.Entities:
Keywords: Network analysis; RNAi screening; Virus-host interactions
Year: 2015 PMID: 25691914 PMCID: PMC4331137 DOI: 10.1186/s13015-015-0035-7
Source DB: PubMed Journal: Algorithms Mol Biol ISSN: 1748-7188 Impact factor: 1.405
Figure 1Overview of the data analysis pipeline. (1) Protein interactions from public databases are collated to build an integrated human PPI network. (2) Greedy unsupervised clustering is used to identify relevant, possibly overlapping, submodules in the PPI network. (3) Hits from one or several RNAi screens are mapped to these modules and modules are filtered for significant enrichment. (4) Subnetworks are further filtered based on network topology and semantic similarity values. (5) Resulting modules are visualized as subnetworks, color-coded for hits, non-hits, and (6a,b) are then functionally characterized based on GO and Reactome pathway. (6c) Lastly, using gene expression data from different tissues, tissue-specific putative novel host factors are predicted.
P-values of Wilcoxon test to determine significance of mean values of network centralities and semantic measures for subnetwork
|
|
|
| |||||
|---|---|---|---|---|---|---|---|
|
|
|
|
|
|
|
|
|
| Betweenness | < 0.0001 | 0.0131 | 0.0247 | 0.0005 | 0.0131 | 0.0131 | 0.0040 |
| Closeness | <0.0001 | 0.0131 | 0.0247 | 0.0005 | 0.0131 | 0.0131 | 0.0040 |
| Clustering Coefficient | < 0.0001 | 0.0247 | 0.0001 | 0.0005 | 0.0131 | 0.0131 | 0.0040 |
| Eigenvector Centrality | < 0.0001 | 0.0131 | 1 | 1 | 1 | 0.0057 | 0.0057 |
| Node Degree | < 0.0001 | 1 | 0.0247 | 0.0005 | 0.0211 | 0.0131 | 0.0040 |
| Path Length | < 0.0001 | 0.0131 | 0.0247 | 0.0002 | 0.0131 | 0.0131 | 0.0040 |
| Dice Similarity | < 0.0001 | 0.0131 | 0.0247 | 0.0005 | 0.0131 | 0.0131 | 0.0040 |
| Wang Sim. (GO.BP) | 0.0004 | 0.0286 | 0.5926 | 0.6009 | 0.0284 | 0.0286 | 0.0136 |
| Wang Sim. (GO.CC) | 0.0004 | 0.0286 | 0.5926 | 0.0315 | 0.0284 | 0.0286 | 0.0136 |
| Wang Sim. (GO.MF) | 0.0004 | 0.0286 | 0.0498 | 0.7713 | 1 | 0.3429 | 0.1077 |
A Wilcoxon test was used to determine the significance of network centrality measures and semantic similarity measures of subnetworks significantly enriched with RNAi screening hits. Average similarity measures over all nodes in a given enriched cluster were tested against non-enriched subnetworks of comparable size, using a Wilcoxon test to assess significance of the differences between the means for each of the given network centrality and semantic similarity measures. Shown are resulting p-values for two clusters for HIV, two clusters for HCV, and three combined clusters.
Key results achieved for HIV-1 and HCV
|
|
|
|
|---|---|---|
| HIV | HIV_s52 | |
| ∙ KDM4B - lysine-specific demethylase 4B | ||
| HIV_s66 | ||
| ∙ HNRNPK - Heterogeneous nuclear ribonucleoprotein K (hnRNP K) (Transformation up-regulated nuclear protein) (TUNP) | ||
| ∙ HNRNPL - Heterogeneous nuclear ribonucleoprotein L | ||
| ∙ HNRNPM - Heterogeneous nuclear ribonucleoprotein M | ||
| ∙ HNRNPU - Heterogeneous nuclear ribonucleoproteinU (hnRNP U) (Scaffold attachment factor A) (SAF-A) (p120) (pp120) | ||
| ∙ RBM11 - Splicing regulator RBM11 (RNA-binding motif protein 11) | ||
| ∙ RBM41 - RNA-binding protein 41 (RNA-binding motif protein 41) | ||
| ∙ RBM42 - RNA-binding protein 42 (RNA-binding motif protein 42) | ||
| ∙ RBM4B - RNA-binding protein 4B (RNA-binding motif protein 30) (RNA-binding motif protein 4B) (RNA-binding protein 30) | ||
| ∙ ‘RBM7 - RNA-binding protein 7 (RNA-binding motif protein 7) | ||
| ∙ SRSF3 - Serine/arginine-rich splicing factor 3 (PremRNA-splicing factor SRP20) (Splicing factor, arginine/serine-rich 3), | ||
| ∙ SRSF4 - Serine/arginine-rich splicing factor 4 (Pre-mRNA-splicing factor SRP75) (SRP001LB) (Splicing factor, | ||
| arginine/serine-rich 4) | ||
| ∙ SRSF10 - Serine/arginine-rich splicing factor 10 (40 kDa SR-repressor protein) | ||
| HCV | HCV_s43 | |
| ∙ | ||
| ∙ Heat-shock proteins (HspB1, HspB2, HspB6, HspB7 and HspB8) | ||
| HCV_s64 | ||
| ∙ Tyrosine-protein phosphatase non-receptors, various types (PTP-1B, TCPTP, PTP-H1, PTPase MEG2) | ||
| ∙ Tankyrase-1 (Poly-ADP-ribosyltransferase) |
The table shows the main novel findings for HIV-1 and hepatitis C virus obtained by mapping RNAi data to protein interaction networks, and using the clustering and filtering procedure proposed here. Results for the combined analysis are given in Additional file 5.
Figure 2HIV and HCV enrichment analysis. The figure shows Reactome pathways annotations significantly enriched with hits from the individual RNAi screens or significant clusters from (A) HIV and (B) HCV. Size of the dots indicates percentage of genes in the respective annotation category that were significant in the screen, color codes statistical significance of enrichment.
Figure 3Combi_s239 subnetwork- subnetwork resulting from analysis of all seven RNAi screens for three different viruses (HIV, HCV, WNV). Nodes represents proteins and node labels represent Uniprot identifiers. All colored nodes represent hits from a RNAi screen, white nodes represent proteins from the Dharmacon library and black nodes are proteins from the Hu.PPI but not in the Dharmacon library.
Figure 4The figure shows the HCV_s64 subnetwork, including TNKS1, SERCA1 and JAK2. Tissue-specific expression data from the Human Protein Atlas were overlaid on the network using data from hepatocytes.