| Literature DB >> 23874167 |
Eleanor R Brush1, David C Krakauer, Jessica C Flack.
Abstract
Biological and social networks are composed of heterogeneous nodes that contribute differentially to network structure and function. A number of algorithms have been developed to measure this variation. These algorithms have proven useful for applications that require assigning scores to individual nodes-from ranking websites to determining critical species in ecosystems-yet the mechanistic basis for why they produce good rankings remains poorly understood. We show that a unifying property of these algorithms is that they quantify consensus in the network about a node's state or capacity to perform a function. The algorithms capture consensus by either taking into account the number of a target node's direct connections, and, when the edges are weighted, the uniformity of its weighted in-degree distribution (breadth), or by measuring net flow into a target node (depth). Using data from communication, social, and biological networks we find that that how an algorithm measures consensus-through breadth or depth- impacts its ability to correctly score nodes. We also observe variation in sensitivity to source biases in interaction/adjacency matrices: errors arising from systematic error at the node level or direct manipulation of network connectivity by nodes. Our results indicate that the breadth algorithms, which are derived from information theory, correctly score nodes (assessed using independent data) and are robust to errors. However, in cases where nodes "form opinions" about other nodes using indirect information, like reputation, depth algorithms, like Eigenvector Centrality, are required. One caveat is that Eigenvector Centrality is not robust to error unless the network is transitive or assortative. In these cases the network structure allows the depth algorithms to effectively capture breadth as well as depth. Finally, we discuss the algorithms' cognitive and computational demands. This is an important consideration in systems in which individuals use the collective opinions of others to make decisions.Entities:
Mesh:
Year: 2013 PMID: 23874167 PMCID: PMC3715438 DOI: 10.1371/journal.pcbi.1003109
Source DB: PubMed Journal: PLoS Comput Biol ISSN: 1553-734X Impact factor: 4.475
The interpretation of consensus about the state of a node, or its capacity to perform a behavior, depends on the type of interactions constituting edges in the network.
| Interaction | Node State | Functional Consequences |
| Subordination signal | Social power | Conflict management behavior |
| Collaboration | Scientific reputation | Grants awarded |
| Functional linkage | Gene importance | Growth/Fitness |
| Friendship | Popularity | Friends/Gifts |
| Citation | Influence | Grants/Positions |
| Trade | Quality of goods | Prevalence of goods |
We suggest and find that the consensus about this state predicts function. This result is strongest for the subordination signaling network, for which the mechanistic basis of consensus is best understood and the data strongly indicate that the subordination signals are not proxies for power but are direct measures of it.
Matrices used in the text.
| Name | Entries | Definition |
|
| redistribution probabilities used for Eigenvector Centrality | we use |
|
| binary interaction matrix |
|
|
| interaction matrix |
|
|
| shuffled interaction matrix | |
|
| probabilities of direction of an interaction used for David's Score |
|
|
| column stochastic matrix used for Shannon Consensus |
|
|
| modified version of |
|
|
| votes used for Borda Count |
|
|
| row stochastic matrix used for Eigenvector Centrality |
|
|
| modified version of |
|
Variables used in the text.
| Variable | Definition |
|
|
|
|
|
|
|
|
|
| Δ |
|
|
|
|
|
| Shannon entropy, |
|
|
|
| Π |
|
|
| weighted in-degree, or number of interactions received, |
In the text, the subscript i on the algorithms is sometimes omitted, in which case the variable refers to the vector of scores rather than a node's score.
Figure 1This figure shows for the primate communication network the fit of each algorithm to the functional data.
The x-axis indicates which subset of nodes are being considered– 1 is the top quartile, 2 is the top half, 3 is the top three quartiles, 4 is all nodes, 5 is the bottom three quartiles, 6 is the bottom half, and 7 is the bottom quartile– where the quartiles may vary from algorithm to algorithm (see Section heterogeneity). The values for the three dependent variables are distinguished by color: support solicited (green), aggression used (blue), intervention cost (purple). The multivariate values are shown in red. The number in the each plot indicates the rank of each algorithm with respect to its performance predicting the functional data. As expected, we find that the consensus scores for the top-ranked nodes are most predictive of the functional data (see, Section Prediction heterogeneity).
Tables of the predictive value of the scores produced by each algorithm for all nodes on the three data sets.
| Support Solicited | Intensity of Aggression Used | Intervention Cost | Multivariate | |||||
|
| p-value |
| p-value |
| p-value |
| p-value | |
|
| 0.41 | <0.001 | 0.17 | .003 | 0.41 | <0.001 | 0.45 | <0.001 |
| Δ | 0.87 | <0.001 | 0.61 | <0.001 | 0.78 | <0.001 | 0.92 | <0.001 |
| Π | 0.86 | <0.001 | 0.58 | <0.001 | 0.79 | <0.001 | 0.91 | <0.001 |
|
| 0.86 | <0.001 | 0.58 | <0.001 | 0.78 | <0.001 | 0.90 | <0.001 |
|
| 0.37 | <0.001 | 0.15 | 0.005 | 0.41 | <0.001 | 0.44 | <0.001 |
|
| 0.83 | <0.001 | 0.58 | <0.001 | 0.78 | <0.001 | 0.89 | <0.001 |
|
| 0.52 | <0.001 | 0.25 | <0.001 | 0.51 | <0.001 | 0.56 | <0.001 |
Summary of data sets and the most highly predictive algorithms, in order of their performance predicting the functional data.
| Network | Functional Data | Most Predictive Algorithms |
| Primate communication network | support solicited, aggression used, intervention cost | Δ, Π, |
| Collaboration network | grants awarded |
|
| Gene functional linkage network | viability of mutants, competitive fitness |
|
Only the algorithms that significantly predict the functional data are included. Note that in many cases the differences in performance across the algorithms are small. In addition, the r 2 values are small for the functional gene network and the physicist collaboration network and large for the primate communication network. This difference is probably due to the fact that the subordination signals are direct measures of power in the primate network, whereas the edges in the other networks are either indirect/proxy measures of reputation and importance or are only one of many contributors to the variance.
Figure 2This figure shows show the sensitivity of each algorithm to source bias in the interaction matrix.
For each algorithm, we report the drop in rank induced when a node receives all of its edges from one of its neighbors. The point shows the mean correlation and the bars show plus or minus one standard deviation. The algorithms are ordered from left to right by their predictive power for the primate communication network. In the case of the primate communication network, we exhaust all possible pairs and in the case of the collaboration and functional linkage networks, we choose at random. A. Primate communication network. B. Physicist collaboration network. C. Functional linkage network of genes.