| Literature DB >> 22355630 |
Jun Guo1, Hanliang Guo, Zhanyi Wang.
Abstract
Affinity measure is a key factor that determines the quality of the analysis of a complex network. Here, we introduce a type of statistics, activation forces, to weight the links of a complex network and thereby develop a desired affinity measure. We show that the approach is superior in facilitating the analysis through experiments on a large-scale word network and a protein-protein interaction (PPI) network consisting of ∼5,000 human proteins. The experiment on the word network verifies that the measured word affinities are highly consistent with human knowledge. Further, the experiment on the PPI network verifies the measure and presents a general method for the identification of functionally similar proteins based on PPIs. Most strikingly, we find an affinity network that compactly connects the cancer-associated proteins to each other, which may reveal novel information for cancer study; this includes likely protein interactions and key proteins in cancer-related signal transduction pathways.Entities:
Mesh:
Substances:
Year: 2011 PMID: 22355630 PMCID: PMC3216595 DOI: 10.1038/srep00113
Source DB: PubMed Journal: Sci Rep ISSN: 2045-2322 Impact factor: 4.379
Figure 1Three ordinary nodes and their in- and out-links in the sparse W.
For every node (word), the strongest 6 and the weakest 1 in- and out-links are presented, which shows the sharply descending strengths and the most forceful restraints of the meanings of the nodes. (a) “hands” (noun, 164 in-links and 141 out-links in total) is characterized by the forceful links of modifiers (his, her, your), corresponding verbs (shook, shake, shaking), and associates (pockets, knees, hips). (b) “live” (verb, 129, 153) is characterized by the links of subjects (who, people, we, they), syntactic restraints (to, in, with, here, alone, happily, where) and associates (births). (c) “scientific” (adjective, 70, 185) is characterized by the links of the words composing phrases (research, knowledge, method, interest, journals), near-synonyms (technological, mathematical), and syntactic restraints (of, the, a). The unbalanced link strengths can be seen, for example, by the contrasting strong in-links of “hands” and the weak in-links of “scientific”. Note that the coloured strengths are at exponential scales.
Figure 2Top 5 neighbours of words and clusters identified by such types of neighbourhoods.
The colours are used to label parts of speech, the thickness of a link represents the strength of the affinity between its nodes, but the length means nothing. (a) Ten sample neighbourhoods show that the affinities are reasonably measured across different parts of speech. The central nodes in each neighbourhood are enlarged to promote ease of reading. The affinities range from 0.06 (inch∼mile) to 0.29 (two∼three). (b) Five sample clusters that were identified based on the top-5-neighbourhood show the effectiveness of the clustering. The nodes for the initial words are in the largest size, their neighbours have a medium size, and the neighbours' neighbours are the smallest. The affinities range from 0.05 (math∼chemistry) to 0.23 (tea∼coffee).
Figure 3Four typical nodes in P.
For every node (protein), the weighted links are higher than the threshold (1.0e-5), and the corresponding proteins are shown. As examples of link-rich nodes, the proteins BRCA1 and CANX have 99 and 51 links, respectively, while the ordinary proteins SLC4A1 and BSG only have 10 and 9 links, respectively. The sharp decrease at the high end of the link strengths of a node is striking. Note that the coloured strengths are at exponential scales, and the length of a link is not meaningful. *The proteins are named by gene symbols in this study.
A comparison of the clustering based on Betweenness and our affinity measure
| Cluster1 | Cluster2 | Cluster3 | ||
|---|---|---|---|---|
| Betweenness based method | ATM, BARD1, BCCIP, BCL3, BRCA1, CCNA2, CCNB1, CCND1, CCNE1, CDK1, CDK14, CDK2, CDK4, CDKN1A, CSTF1, ESR1, HAP1, LMO4, MAPK14, MDM2, MSH2, MSH6, MYC, NBN, | ALK, ARF1, EIF2AK2, | ||
| DNA repairScore = 13, | Protein autophosphorylationScore = 2, | NucleosomeScore = 3, | ||
| ATM, BARD1, BCCIP, BRCA1, MSH2, MSH6, NBN, PARP1, PCNA, POLR2A, RAD50, RBBP8, RPA1 | ALK, EIF2AK2 | HIST1H1A, HIST2H2BE, HIST3H3 | ||
| Affinity based method | ATM, BARD1, BCCIP, BCL3, BRCA1, CCNA2, CCNB1, CCND1, CCNE1, CDK1, CDK14, CDK2, CDK4, CDKN1A, CSTF1, ESR1, | ALK, ARF1, EIF2AK2, | GADD45GIP1, HIST1H1A, HIST2H2BE, HIST3H3, HIST4H4, MAP3K4 | |
| DNA repairScore = 14, | Nucleotide bindingScore = 6, | NucleosomeScore = 3, | ||
| ATM, BARD1, BCCIP, BRCA1, GADD45A, MSH2, MSH6, NBN, NPM1, PCNA, POLR2A, RAD50, RBBP8, RPA1 | ALK, ARF1, EIF2AK2, NCL, PLK1, SREK1 | HIST1H1A, HIST2H2BE, HIST3H3 | ||
*The gene symbols in bold indicate the members which are absent in the corresponding cluster of the compared method. The featured GO descriptors are obtained by using the tool of Set Distiller of GeneDecks. http://www.genecards.org/.
Figure 4The affinity networks of cancer-associated proteins.
The nodes of CAPs (red) and CAPCs (orange) are linked to each other by their affinities that are higher than T. The thickness of a link represents the affinity. (a) Eighty-two CAPCs are identified by setting T = 0.03 (p < 10−2) and T = 4. Incorporating the 82 CAPCs, 58 CAPs form an integral network with 445 links that are stronger than T (not including the links between CAPCs), leaving only 2 isolated CAPs. A link-dense portion is located at the bottom left and covers the CAPs of RAD51, MLH1, MSH2, MSH6, BRCA2, ATM, CHEK2, BUB1 and BRAC1. The affinities range from 0.03 (AR∼PTPN1) to 0.31 (MSH6∼MSH2). (b) The core network of the affinities among the CAPs is revealed by enhancing T to 0.04 (p < 10−3), which includes 37 CAPs and 16 CAPCs. Eight CAPs are paired and 15 are isolated. The central portion that consists of ATM, MSH2, MSH6, CHEK2 and BRCA1 is crucial, which is consistent with the results of previous studies3132333435. The 16 CAPCs may be particularly meaningful for cancer study. The affinities range from 0.04 (MSH6∼CDX2) to 0.31 (MSH6∼MSH2).
Protein pairs with high A
| CAP | CAP/CAPC | A | Blast E value | CAP | CAP/CAPC | A | Blast E value |
|---|---|---|---|---|---|---|---|
| MSH6 | MSH2 | 0.31 | 4e-059 | ATM | ATR | 0.10 | 8e-063 |
| KRAS | NRAS | 0.22 | 4e-084 | BRAF | MDM4 | 0.10 | > 10 |
| MSH6 | RAD50 | 0.21 | > 10 | MSH6 | MRE11A | 0.10 | > 10 |
| PMS2 | MSH3 | 0.18 | > 10 | FAS | FASLG | 0.09 | > 10 |
| PTPRJ | FER | 0.17 | > 10 | RAD51 | BLM | 0.09 | > 10 |
| FGFR3 | FGFR4 | 0.16 | 0 | MSH2 | BLM | 0.09 | > 10 |
| ATM | MDC1 | 0.15 | > 10 | EGFR | SHC1 | 0.09 | > 10 |
| MSH2 | MLH1 | 0.13 | > 10 | MUTYH | MSH3 | 0.09 | > 10 |
| IRF1 | HMGA2 | 0.13 | > 10 | IRF1 | TACC2 | 0.09 | > 10 |
| MLH1 | BLM | 0.13 | > 10 | ATM | H2AFX | 0.09 | > 10 |
| BUB1B | BUB1 | 0.13 | 1e-027 | PTPRJ | MUC1 | 0.09 | > 10 |
| MLH1 | MSH6 | 0.12 | > 10 | NRAS | CNKSR1 | 0.09 | > 10 |
| KLF6 | PPP1R13L | 0.12 | > 10 | RAD54B | DMC1 | 0.09 | > 10 |
| ATM | NBN | 0.11 | > 10 | MSH6 | RFC1 | 0.08 | > 10 |
| IRF1 | ING4 | 0.11 | > 10 | BUB1 | BUB3 | 0.08 | > 10 |
| AR | ESR1 | 0.11 | 8e-040 | MUTYH | ERCC5 | 0.08 | > 10 |
| MSH6 | MDC1 | 0.11 | > 10 | PMS2 | MLH1 | 0.08 | 8e-015 |
| MLH1 | RFC1 | 0.11 | > 10 | MUTYH | FEN1 | 0.08 | > 10 |
| PHB | EFCAB6 | 0.11 | > 10 | PMS2 | RAD50 | 0.08 | > 10 |
| MSH2 | MDC1 | 0.11 | > 10 | MSH2 | RFC1 | 0.08 | > 10 |
| MLH1 | RAD50 | 0.11 | > 10 | BUB1B | BUB3 | 0.08 | > 10 |
| MSH2 | RAD50 | 0.10 | > 10 | NRAS | RRAS2 | 0.08 | 5e-024 |
Figure 5Illustration of the computation of the affinity measure.
(a) The affinity between the central nodes u and v will be computed. The digits on the links are pafs, and the colours of the nodes are simply used for their identification. (b) From the mean in-link overlap rate (upper portion) and the mean out-link overlap rate (lower portion) to the geometric average of the two, i.e., the affinity.