| Literature DB >> 25838988 |
Bilal Khan1, Kirk Dombrowski2, Ric Curtis2, Travis Wendel3.
Abstract
This paper presents a new method for obtaining network properties from incomplete data sets. Problems associated with missing data represent well-known stumbling blocks in Social Network Analysis. The method of "estimating connectivity from spanning tree completions" (ECSTC) is specifically designed to address situations where only spanning tree(s) of a network are known, such as those obtained through respondent driven sampling (RDS). Using repeated random completions derived from degree information, this method forgoes the usual step of trying to obtain final edge or vertex rosters, and instead aims to estimate network-centric properties of vertices probabilistically from the spanning trees themselves. In this paper, we discuss the problem of missing data and describe the protocols of our completion method, and finally the results of an experiment where ECSTC was used to estimate graph dependent vertex properties from spanning trees sampled from a graph whose characteristics were known ahead of time. The results show that ECSTC methods hold more promise for obtaining network-centric properties of individuals from a limited set of data than researchers may have previously assumed. Such an approach represents a break with past strategies of working with missing data which have mainly sought means to complete the graph, rather than ECSTC's approach, which is to estimate network properties themselves without deciding on the final edge set.Entities:
Keywords: Missing Data; Network Imputation; Respondent-Driven Sampling; Spanning Tree Completions
Year: 2015 PMID: 25838988 PMCID: PMC4380167 DOI: 10.4236/sn.2015.41001
Source DB: PubMed Journal: Soc Netw ISSN: 2169-3285
Figure 1A 100 vertex BA graph.
Figure 2ECSTC on a 100 node network.
Figure 3ECSTC on a 500 node network.
Correlation (mean and standard deviation) over 25 trials.
| Measure: BC | ||||
|---|---|---|---|---|
|
| 1 comps | 10 comps | 30 comps | 50 comps |
| 1 trees | 0.954 | 0.977 | 0.979 | 0.979 |
| 10 trees | 0.979 | 0.981 | 0.981 | 0.982 |
| 30 trees | 0.981 | 0.982 | 0.982 | 0.982 |
| 50 trees | 0.981 | 0.982 | 0.982 | 0.982 |
Misclassification (mean and standard deviation) over 25 trials.
| Measure: BC | ||||
|---|---|---|---|---|
|
| 1 comps | 10 comps | 30 comps | 50 comps |
| 1 trees | 11.404 | 9.596 | 9.762 | 9.895 |
| 10 trees | 9.814 | 11.088 | 11.389 | 11.561 |
| 30 trees | 10.667 | 11.596 | 11.812 | 11.784 |
| 50 trees | 10.869 | 11.735 | 11.895 | 11.868 |