| Literature DB >> 34907253 |
Zachary P Neal1, Rachel Domagalski2, Bruce Sagan2.
Abstract
Projections of bipartite or two-mode networks capture co-occurrences, and are used in diverse fields (e.g., ecology, economics, bibliometrics, politics) to represent unipartite networks. A key challenge in analyzing such networks is determining whether an observed number of co-occurrences between two nodes is significant, and therefore whether an edge exists between them. One approach, the fixed degree sequence model (FDSM), evaluates the significance of an edge's weight by comparison to a null model in which the degree sequences of the original bipartite network are fixed. Although the FDSM is an intuitive null model, it is computationally expensive because it requires Monte Carlo simulation to estimate each edge's p value, and therefore is impractical for large projections. In this paper, we explore four potential alternatives to FDSM: fixed fill model, fixed row model, fixed column model, and stochastic degree sequence model (SDSM). We compare these models to FDSM in terms of accuracy, speed, statistical power, similarity, and ability to recover known communities. We find that the computationally-fast SDSM offers a statistically conservative but close approximation of the computationally-impractical FDSM under a wide range of conditions, and that it correctly recovers a known community structure even when the signal is weak. Therefore, although each backbone model may have particular applications, we recommend SDSM for extracting the backbone of bipartite projections when FDSM is impractical.Entities:
Year: 2021 PMID: 34907253 PMCID: PMC8671427 DOI: 10.1038/s41598-021-03238-3
Source DB: PubMed Journal: Sci Rep ISSN: 2045-2322 Impact factor: 4.379
SDSM probabilities given agent and artifact degree sequences [1,1,2].
| ( | ||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 1 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 1 | 0 | 0 | 1 | 0 | 1 | 0 |
| 0 | 0 | 1 | 1 | 0 | 0 | 0 | 0 | 1 | 0 | 1 | 0 | 0 | 0 | 1 |
| 0 | 1 | 1 | 0 | 1 | 1 | 1 | 1 | 0 | 1 | 0 | 1 | 1 | 0 | 1 |
Figure 1(A) Accuracy and (B) speed computing using different methods. Lines show means, while shaded regions show 95% confidence intervals.
Figure 2Statistical power of SDSM. (A) Distribution of weights for the Paris-Milan edge in projections derived from FDSM and SDSM ensembles. (B) Similarity of an FDSM backbone extracted at to SDSM backbones extracted at various from an empirical bipartite network (green line) and from 100 synthetic bipartite networks (purple line = mean, purple region – percentile).
Bipartite degree distributions, with examples in the context of a scholarly authorship bipartite network.
| Degree distribution | Authors (agents) | Papers (artifacts) |
|---|---|---|
| Right-tailed | Most write some papers, but a few are prolific (most departments) | Most papers are sole-authored, but some are written by large teams (e.g., sociology) |
| Left-tailed | Most are prolific, but some are inactive (elite departments) | Most papers are written by large teams, but some are sole-authored (e.g., physics) |
| Uniform | There is substantial diversity in scholarly output (e.g., interdisciplinary departments) | There is substantial diversity in the size of authorship teams (e.g., an entire university) |
| Constant | There are strong norms about how many papers an author should have (e.g., for performance evaluations) | There are strong norms about how many authors a paper should have (e.g., two: a senior author & a junior author) |
| Normal | Scholarly output varies around some typical level | Authorship teams vary around some typical size |
Figure 3Jaccard similarity of a backbone extracted at using the Fixed Degree Sequence Model and a backbone extracted using (A) the Fixed Fill Model, (B) Fixed Row Model, (C) Fixed Column Model, (D) Stochastic Degree Sequence Model. Each cell represents the mean over 100 instances of a bipartite network with given agent and artifact degree distributions.
Figure 4(A) Given agent and artifact degree distributions, there exists a statistical significance level that maximizes the similarity between an SDSM backbone extracted at this level and an FDSM backbone extracted at , and (B) when used yields an SDSM backbone that is very similar to the corresponding FDSM backbone.
Figure 5(A) Synthetic bipartite networks with varying levels of block structure, from which (B) backbones extracted using different models exhibit varying modularity.