| Literature DB >> 31116767 |
Brittany I Davidson1, Simon L Jones2, Adam N Joinson1, Joanne Hinds1.
Abstract
Online communities are virtual spaces for users to share interests, support others, and to exchange knowledge and information. Understanding user behavior is valuable to organizations and has applications from marketing to security, for instance, identifying leaders within a community or predicting future behavior. In the present research, we seek to understand the various roles that users adopt in online communities-for instance, who leads the conversation? Who are the supporters? We examine user role changes over time and the pathways that users follow. This allows us to explore the differences between users who progress to leadership positions and users who fail to develop influence. We also reflect on how user role proportions impact the overall health of the community. Here, we examine two online ideological communities, RevLeft and Islamic Awakening (N = 1631; N = 849), and provide a novel approach to identify various types of users. Finally, we study user role trajectories over time and identify community "leaders" from meta-data alone. Study One examined both communities using K-MEANS cluster analysis of behavioral meta-data, which revealed seven user roles. We then mapped these roles against Preece and Schneiderman's (2009) Reader-to-Leader Framework (RtLF). Both communities aligned with the RtLF, where most users were "contributors", many were "collaborators", and few were "leaders". Study Two looked at one community over a two-year period and found that, despite a high churn rate of users, roles were stable over time. We built a model of user role transitions over the two years. This can be used to predict user role changes in the future, which will have implications for community managers and security focused contexts (e.g., analyzing behavioral meta-data from forums and websites known to be associated with illicit activity).Entities:
Mesh:
Year: 2019 PMID: 31116767 PMCID: PMC6530841 DOI: 10.1371/journal.pone.0216932
Source DB: PubMed Journal: PLoS One ISSN: 1932-6203 Impact factor: 3.240
Fig 1Reader-to-Leader Framework (RtLF).
Examples of roles identified in previous research.
| Author(s) | Roles Identified |
|---|---|
| Golder & Donath [ | Newbie, Celebrity, Lurker, Flamer, Troll, Ranter |
| Turner, Smith, Fisher, & Welser [ | Answer person, Questioner, Troll, Spammer, Binary poster, Flame warrior, Conversationalist |
| Campbell, Fletcher, and Greenhill [ | Big man, Sorcerer, Trickster |
| Chan, Hayes & Daly [ | Joining conversationalist, Grunt, Taciturn, Popular participants, Popular initiator, Supporter, Ignored |
| Pfeil, Svangstu, Ang, & Zaphiris [ | Moderating supporter, Central supporter, Active member, Passive member, Technical expert, Visitor |
| Welser, Lin, Cosley, Dokshin, Smith, Kossinets, & Gay [ | Substantive experts, Technical editors, Counter vandalism contributors, Social networkers |
| Panteli [ | Emergent leaders, Appointed leaders, Community founder, Sustaining leaders |
| Arazy, Lifshitz-Assaf, Nov, Daxenberger, Balestra, & Coye [ | Role-Article Samplers, Role Embracing, Article Embracing, Role-Article Polymathing |
Metrics used in cluster analysis for communities A and B.
| Total number of unique network neighbors replying to (or quoting) a user | ||
| Total number of unique network neighbors receiving posts from (or being quoted by) a user | ||
| Mean average word count for all of a user’s posts | ||
| Percentage of a user’s posts that contain question marks (excluding within URLs) | ||
| Percentage of a user’s posts that contain URLs | ||
| Mean average number of thanks per post. Calculated as: Total Number of Thanks Received / Total Number of Posts Made | ||
| Calculated as: Number of threads initiated/Number of threads participated in | ||
| Total number of threads participated in | ||
| Total number of sub-forums participated in | ||
| Calculated as: Total number of posts/Number of sub forums participated in | ||
| Calculated as: Total Number of posts/number of threads participated in | ||
| Calculated as: Number of neighbors that a user has both received posts from and posted replies to/Total number of unique network neighbors |
Fig 2Elbow plot or “Sum of Squared Errors” plot for communities A (A) and B (B).
The dotted red lines denote the upper and lower boundaries for the ideal number of clusters, k.
Cluster descriptions for community A.
See Table 4 for more detailed information on cluster centers.
| Cluster/Role Name | Description |
|---|---|
| High initiation rate, highest overall number of questions asked in the community, and typically long word counts in their posts. Lowest in- and out-degree and number of posts | |
| High overall metrics, particularly in- and out- degree, bi-directional neighbor degree, and thanks rate. Users were similar to the Elite users, however, overall lower in each metric | |
| Low in all metrics, largely not engaged, as reflected by their low activity (e.g., low number of posts and connectivity) | |
| High number of posts per thread and initiation rates, with high bi-directional neighbors. Low in most other metrics, particularly thanks rate | |
| Highest in- and out-degree, thanks rate, number of posts, posts per subforum, and typically had posts with low word counts | |
| Typically, moderate in all metrics, however, often a high number of questions per post and long posts | |
| Longest posts by a substantial measure with the highest number of URL links and a high initiation rate. Moderate in most other metrics |
Cluster centers for community A.
Red to green coloring indicates the lowest to highest values per metric (row).
| Input Variable | Overall | Newbie | Popular Supporters | Taciturn | Conversat-ionalist | Elite | Low Volume Supporter | Information Provider |
|---|---|---|---|---|---|---|---|---|
| In Degree | 40.68 | 6.26 | 83.43 | 8.75 | 9.22 | 234.83 | 14.96 | 11.58 |
| Out Degree | 42.41 | 3.97 | 87.38 | 8.76 | 8.11 | 255.16 | 14.46 | 10.08 |
| Total Posts | 78.52 | 6.40 | 133.74 | 10.00 | 15.96 | 610.97 | 18.12 | 22.32 |
| Mean Word Count | 107.76 | 157.44 | 91.42 | 75.39 | 107.01 | 98.90 | 115.44 | 279.35 |
| Thank Rate | 0.71 | 0.60 | 1.01 | 0.57 | 0.50 | 1.13 | 0.59 | 0.74 |
| % Questions per Post | 0.29 | 0.77 | 0.28 | 0.07 | 0.22 | 0.30 | 0.40 | 0.22 |
| % URLs per Post | 0.07 | 0.03 | 0.06 | 0.03 | 0.03 | 0.06 | 0.05 | 0.66 |
| Mean Posts Per Thread | 2.06 | 1.91 | 2.53 | 1.34 | 4.78 | 2.65 | 1.86 | 1.78 |
| Initiation Ratio | 0.21 | 0.55 | 0.09 | 0.16 | 0.56 | 0.08 | 0.17 | 0.37 |
| Mean Posts Per Subforum | 8.56 | 2.94 | 16.25 | 2.51 | 8.00 | 37.77 | 4.31 | 4.72 |
| % Bi- directional Neighbors | 0.21 | 0.12 | 0.32 | 0.09 | 0.59 | 0.45 | 0.18 | 0.18 |
Cluster descriptions for community B.
See Table 6 for more detailed information on cluster centers.
| Cluster/Role Name | Description |
|---|---|
| Highest in- and out-degree, thanks rate, number of posts, and posts per subforum. Typically, their posts had low word counts. | |
| Highest overall number of questions asked in the community, and typically long word counts in their posts. Low in all other metrics. | |
| Moderately low in all metrics, however, a high initiation rate. | |
| High overall metrics, particularly mean posts per subforum, in-, out-, and bi-directional neighbor degrees. Low initiation ratio and URLs in posts. Similar to Elite, however, overall lower in each metric. | |
| Highest initiation rate, high word counts, and bi-directional neighbors. Low in- and out-degrees, and thanks rate. | |
| Lowest in all metrics, aside from slightly higher connectivity (in- and out-degree). | |
| Longest posts, highest number of URLs per post and initiation rate. Lowest in all other metrics. |
Cluster centers for community B.
Red to green coloring indicates the lowest to highest values per metric (row).
| Cluster Input | ALL | Elite | Newbie | Low Volume Supporter | Popular Supporter | Conversat-ionalist | Taciturn | Information Provider |
|---|---|---|---|---|---|---|---|---|
| In Degree | 28.66 | 149.05 | 8.24 | 12.96 | 45.37 | 5.38 | 8.61 | 1.95 |
| Out Degree | 29.44 | 158.89 | 8.32 | 10.97 | 46.90 | 3.65 | 9.16 | 0.87 |
| Total Posts | 72.24 | 489.44 | 12.75 | 22.74 | 92.15 | 13.64 | 11.74 | 5.23 |
| Mean Word Count | 118.04 | 96.00 | 150.16 | 118.02 | 116.49 | 163.88 | 67.08 | 217.03 |
| Thank Rate | 0.62 | 0.89 | 0.66 | 0.61 | 0.64 | 0.50 | 0.63 | 0.19 |
| % Questions per Post | 0.29 | 0.30 | 0.67 | 0.23 | 0.30 | 0.28 | 0.10 | 0.28 |
| % URLs per Post | 0.08 | 0.03 | 0.03 | 0.08 | 0.04 | 0.06 | 0.03 | 0.77 |
| Mean Posts Per Thread | 2.24 | 2.50 | 1.85 | 1.73 | 3.79 | 2.27 | 1.41 | 1.25 |
| Initiation Ratio | 0.28 | 0.13 | 0.12 | 0.44 | 0.10 | 0.95 | 0.04 | 0.77 |
| Mean Posts Per Subforum | 10.06 | 39.90 | 3.90 | 5.87 | 15.44 | 6.66 | 3.46 | 3.60 |
| % Bi-directional Neighbors | 0.26 | 0.51 | 0.19 | 0.19 | 0.42 | 0.31 | 0.12 | 0.09 |
Fig 3Proportion of users in each cluster for community A and community B.
Fig 4Proportion of users in each category of the reader-to-Leader Framework (RtLF) for A (A) and B (B).
Each six-month time slice and number of users in community A.
| Time Period | Number of Users |
|---|---|
| -24 months to -18 months | 1293 |
| -18 months to -12 months | 1458 |
| -12 months to -6 months | 1495 |
| -6 months to time of data collection | 1631 |
Classification accuracy by cluster (%).
| TP Rate | FP Rate | Precision | Recall | F Measure | MCC | ROC Area | PRC | |
|---|---|---|---|---|---|---|---|---|
| 0.87 | 0.06 | 0.58 | 0.87 | 0.69 | 0.68 | 0.95 | 0.87 | |
| 0.87 | 0.03 | 0.88 | 0.87 | 0.88 | 0.85 | 0.98 | 0.93 | |
| 0.88 | 0.05 | 0.88 | 0.88 | 0.88 | 0.83 | 0.98 | 0.96 | |
| 0.60 | 0.01 | 0.79 | 0.60 | 0.68 | 0.67 | 0.97 | 0.73 | |
| 0.99 | 0.01 | 0.89 | 0.99 | 0.94 | 0.93 | 1.00 | 0.96 | |
| 0.62 | 0.07 | 0.76 | 0.62 | 0.68 | 0.59 | 0.92 | 0.78 | |
| 0.92 | 0.01 | 0.77 | 0.92 | 0.84 | 0.84 | 0.96 | 0.87 | |
| 0.80 | 0.05 | 0.81 | 0.80 | 0.80 | 0.76 | 0.96 | 0.88 |
Confusion matrix for classification (%).
Actual values as rows; predicted values as columns.
| 0 | 0 | 0.25 | 0 | 0.67 | 0.25 | |||
| 0 | 0.12 | 0.12 | 0.8 | 1.35 | 0.12 | |||
| 0.18 | 0.61 | 0 | 0 | 2.51 | 0.25 | |||
| 0.55 | 0.25 | 0.49 | 0 | 0.61 | 0.31 | |||
| 0 | 0.06 | 0 | 0 | 0 | 0 | |||
| 4.97 | 1.41 | 2.82 | 0.43 | 0 | 0.18 | |||
| 0 | 0.06 | 0.06 | 0.06 | 0.06 | 0.06 | |||
Fig 5Percentage of users in each role—Mapped to the RtLF (A) and unmapped to the RtLF (B).
Top Ten most common pathways for users to take based on all possible transitions (N = 7,712).
| Pathway (Clusters) | Number of Users | % of Users |
|---|---|---|
| Inactive → Inactive | 1082 | 14.03 |
| Inactive → Taciturn | 834 | 10.81 |
| Taciturn → Inactive | 780 | 10.11 |
| Low Volume Supporter → Inactive | 629 | 8.16 |
| Inactive → Low Volume Supporter | 480 | 6.22 |
| Newbie → Inactive | 367 | 4.76 |
| Inactive → Newbie | 335 | 4.34 |
| Popular Supporter → Popular Supporter | 324 | 4.20 |
| Inactive → Popular Supporter | 292 | 3.79 |
| Low Volume Supporter → Low Volume Supporter | 195 | 2.53 |
Fig 6Model showing percentage of all users role transitions mapped to the reader-to-Leader Framework.
Fig 7Model showing the top 25 transitions of users role transitions.
Demonstrating the linear and non-linear pathways users took.