| Literature DB >> 23418436 |
Luke J Matthews1, Peter DeWan, Elizabeth Y Rula.
Abstract
Studies of social networks, mapped using self-reported contacts, have demonstrated the strong influence of social connections on the propensity for individuals to adopt or maintain healthy behaviors and on their likelihood to adopt health risks such as obesity. Social network analysis may prove useful for businesses and organizations that wish to improve the health of their populations by identifying key network positions. Health traits have been shown to correlate across friendship ties, but evaluating network effects in large coworker populations presents the challenge of obtaining sufficiently comprehensive network data. The purpose of this study was to evaluate methods for using online communication data to generate comprehensive network maps that reproduce the health-associated properties of an offline social network. In this study, we examined three techniques for inferring social relationships from email traffic data in an employee population using thresholds based on: (1) the absolute number of emails exchanged, (2) logistic regression probability of an offline relationship, and (3) the highest ranked email exchange partners. As a model of the offline social network in the same population, a network map was created using social ties reported in a survey instrument. The email networks were evaluated based on the proportion of survey ties captured, comparisons of common network metrics, and autocorrelation of body mass index (BMI) across social ties. Results demonstrated that logistic regression predicted the greatest proportion of offline social ties, thresholding on number of emails exchanged produced the best match to offline network metrics, and ranked email partners demonstrated the strongest autocorrelation of BMI. Since each method had unique strengths, researchers should choose a method based on the aspects of offline behavior of interest. Ranked email partners may be particularly useful for purposes related to health traits in a social network.Entities:
Mesh:
Year: 2013 PMID: 23418436 PMCID: PMC3572121 DOI: 10.1371/journal.pone.0055234
Source DB: PubMed Journal: PLoS One ISSN: 1932-6203 Impact factor: 3.240
Name generator survey questions.
| Which colleagues do you spend free time with, either in the work environment or outside? (please list up to 5) |
| Which colleagues would you feel comfortable discussing important personal (non work) matters with? (please list up to 5) |
| Which colleagues would you be most likely to discuss health matters with or ask for health advice? (please list up to 5) |
| Which colleagues would you be most likely to engage in an outside activity with, such as a walk, playing a sport or a game, volunteering, or taking a course? |
Parameters for the logistic regression model to predict Name Generator ties.
| Independent Variable | Definition | Dummy For Zeros? |
| z-score sent single | The logged single recipient emails sent by individual 1 standardized to 0 mean and standarddeviation of 1 based on all other email partners for individual 1 | |
| z-score sent multiple | The logged multiple recipient email sent by individual 1 standardized to 0 mean and standarddeviation of 1 based on all other email partners for individual 1 | |
| single recipient | The number of single recipient emails between members of a pair | yes |
| multiple recipient | The number of multiple recipient emails between members of a pair | yes |
| sum file sizes | The sum of file size for all the emails between members of a pair | yes |
| shared contacts | The number of other individuals that both members of a pair have emailed | |
| secondary ties | The number of other individuals with whom the email sender has more than one “shared contacts” | |
| asymmetry single | The asymmetry index for single recipient emails | |
| asymmetry multiple | The asymmetry index for multiple recipient emails |
This field indicates a dummy variable was also included. If a data point for the row variable was a 0, the dummy took on a value of 1. Otherwise the dummy was 0. Row variables with blank entries did not exhibit over-dispersion of zeros and so did not require dummy variables.
Variable was log transformed to better meet generalized linear model assumptions.
Network tie densities and correlations of email networks with the Name Generator.
| Email Network | Correlation with NameGenerator | Tie density (No. ties/No. undirected pairwise relations) | Proportion Name Generator ties recovered |
| Single Recipient Network | 0.35 | 0.004 | 0.45 |
| Logistic Regression Network | 0.37 | 0.005 | 0.54 |
| Ranked Partner Network | 0.36 | 0.006 | 0.54 |
N = 1992 individuals who were common to both the email and Name Generator networks.
Tie density in the Name Generator Network equaled 0.003, N ties = 4989.
average single recipient emails >2 per week.
logistic regression fitted value >0.09.
single recipient emails rank > = 13.
Figure 1Ratios of false to true positives during threshold optimization of email networks on the Name Generator Survey network.
Networks used in all subsequent analyses were derived from the thresholds that produced the highest correlations between the Name Generator and email networks (vertical lines).
Figure 2Distribution of the total emails exchanged between all pairs for the 5 month dataset used in the study.
Slope coefficients from network autoregressive model with Name Generator metrics as the dependent variable (displayed with 95% confidence interval).
| Email Network | |||
| Egocentric Metric | Single Recipient Network | Logistic Regression Network | Ranked Partner Network |
| Betweenness | 0.32 (+−0.039) | 0.34 (+−0.037) | 0.25 (+−0.041) |
| Closeness | 0.39 (+−0.041) | 0.37 (+−0.041) | 0.23 (+−0.044) |
| Transitivity | 0.10 (+−0.051) | 0.14 (+−0.053) | 0.13 (+−0.058) |
| Eigenvector centrality | 0.22 (+−0.039) | 0.10 (+−0.053) | 0.03 (+−0.062) |
| Degree | 0.36 (+−0.039) | 0.36 (+−0.036) | 0.24 (+−0.033) |
N = 1839 except for Transitivity, which had N = 1411.
N = 1922 except for Transitivity, which had N = 1574.
N = 1951 except for Transitivity, which had N = 1609.
Figure 3Stability of Single Threshold and Ranked Partner networks as additional weeks of email are added to the analysis.
The y-axis shows the correlation of the network calculated from weeks 1 through N to the network calculated from weeks 1 through N+1.