| Literature DB >> 35756152 |
Pablo Cleveland1, Sebastian A Rios2, Felipe Aguilera1, Manuel Graña3.
Abstract
Understanding at microscopic level the generation of contents in an online social network (OSN) is highly desirable for an improved management of the OSN and the prevention of undesirable phenomena, such as online harassment. Content generation, i.e., the decision to post a contributed content in the OSN, can be modeled by neurophysiological approaches on the basis of unbiased semantic analysis of the contents already published in the OSN. This paper proposes a neuro-semantic model composed of (1) an extended leaky competing accumulator (ELCA) as the neural architecture implementing the user concurrent decision process to generate content in a conversation thread of a virtual community of practice, and (2) a semantic modeling based on the topic analysis carried out by a latent Dirichlet allocation (LDA) of both users and conversation threads. We use the similarity between the user and thread semantic representations to built up the model of the interest of the user in the thread contents as the stimulus to contribute content in the thread. The semantic interest of users in discussion threads are the external inputs for the ELCA, i.e., the external value assigned to each choice.. We demonstrate the approach on a dataset extracted from a real life web forum devoted to fans of tinkering with musical instruments and related devices. The neuro-semantic model achieves high performance predicting the content posting decisions (average F score 0.61) improving greatly over well known machine learning approaches, namely random forest and support vector machines (average F scores 0.19 and 0.21).Entities:
Keywords: Information diffusion; Leaky competing accumulator; Microscopic model of social interaction; Multi-topic text preferences; Social interaction decision making
Year: 2022 PMID: 35756152 PMCID: PMC9214480 DOI: 10.1007/s00521-022-07307-0
Source DB: PubMed Journal: Neural Comput Appl ISSN: 0941-0643 Impact factor: 5.102
Information diffusion modeling approaches found in the literature
| Ref./year | Model description | Results | Data set |
|---|---|---|---|
| [ | SIR model to estimate number of accesses to a site | N/A | “2 channel” web forum. DATA: number of posters per 15 min 9 p.m. Jan 10 2007–6 a.m. Jan 11 2007 |
| [ | Topological properties of OSN graph | N/A | Flickr like data |
| [ | Game theoretic diffusion of technologies model that allows for competition between agents | N/A | Not applicable to implicit networks |
| [ | Topic-based SIR model. Applied to violent topic diffusion | Ummah data set Dark Web Forum Portal by AI lab of U. of Arizona. 1,263,724 posts, 76,242 threads, 15,345 authors | |
| [ | Probabilistic generative model of information emergence in networks, capturing internal and external exposures. URL diffusion | N/A | Tested on synthetic data and complete Twitter January 2011 data set. 3 billion tweets, 18,186 URLs |
| [ | SCIR model | N/A | Tested on synthetic data |
| [ | Event-driven SIR model | Yahoo! Finance Walmart message board | |
| [ | Deterministic model of competitive information diffusion on the Iterated Local Transitivity | N/A | Not applicable to implicit networks |
| [ | Evolutionary game theory model for diffusion dynamics | N/A | Twitter hashtag data set. 1000 Twitter hashtags, number of mentions per hour and time series |
| [ | SIS and SIR models with edge weights | N/A | Synthetic data |
| [ | Meme propagation model based on network topology | N/A | Tested on Higgs Twitter Network |
| [ | Adoption probability. Machine learning prediction | Twitter hashtags and URLs 2009 | |
| [ | Topic-level SIR model | Yahoo! Finance Walmart message board (139,062 threads, 441,954 messages, 25,500 authors) and US Politics Online Breaking News in Politics (2192 threads, 130,850 messages, 1124 authors) | |
| [ | SIR model with stifling and forgetting mechanisms | N/A | Synthetic data and on OSN Renren (9590 nodes, 89,873 edges) |
| [ | Hydrodynamic information diffusion prediction model | 6500 video tweets from Sina-weibo | |
| [ | Physical radiation transfer | N/A | Twitter dataset about 9000 users |
| [ | Decision payoff modeling | Avg. precision: 0.7 | Sina Weibo and Flickr datasets |
| [ | Expectation maximizacion. Monte Carlo simulation | SINA microblogging prediction of diffusion volume | |
| [ | Bayesian logistic regression and random forests predictors | Twitter data crawled on informative and trending topics. N/A | |
| [ | Modified forest fire | Num. spreaders | Twitter datasets |
http://socialnetworks.mpi-sws.org/datasets.html
N/A not available
Fig. 1Study computational pipeline
Plexilandia’s activity measured in number of content publications per relevant sub-forum per year
| Sub-forum | 2006 | 2007 | 2008 | 2009 | 2010 | 2011 | 2012 | 2013 | 2014 | Total |
|---|---|---|---|---|---|---|---|---|---|---|
| Amplifiers (SF 2) | 392 | 2165 | 2884 | 3940 | 3444 | 3361 | 2398 | 1252 | 985 | 20,821 |
| Effects (SF 3) | 184 | 1432 | 3362 | 3718 | 4268 | 5995 | 4738 | 2317 | 1331 | 27,345 |
| Luthier (SF 4) | 34 | 388 | 849 | 1373 | 1340 | 2140 | 926 | 699 | 633 | 8382 |
| General (SF 5) | 76 | 403 | 855 | 1200 | 2880 | 5472 | 3737 | 1655 | 1295 | 17,573 |
| Pro Audio (SF 6) | – | – | – | – | – | 342 | 624 | 396 | 219 | 1581 |
| Synthesizers (SF 7) | – | – | – | – | – | – | – | 104 | 92 | 196 |
Bold values correspond to summary values, either total or first order statistics, mean, min and max values
Fig. 2Hierarchical topology of VCoP web forums
Sub-forum statistics (number of active users, number of active threads, number of posts) per month
| Month | Sub-forums | ||||
|---|---|---|---|---|---|
| SF 2 | SF 3 | SF 4 | SF 5 | SF 6 | |
| 1 | (45, 25, 103) | (49, 43, 145) | (32, 40, 115) | (60, 37, 164) | (14, 11, 49) |
| 2 | (19, 10, 51) | (46, 29, 169) | (25, 8, 81) | (47, 27, 131) | (7, 5, 13) |
| 3 | (35, 20, 83) | (51, 46, 252) | (20, 13, 60) | (58, 30, 182) | (16, 6, 33) |
| 4 | (38, 27, 133) | (53, 43, 196) | (22, 15, 50) | (36, 23, 84) | (6, 5, 13) |
| 5 | (32, 22, 55) | (51, 44, 184) | (12, 8, 23) | (55, 28, 145) | (11, 9, 30) |
| 6 | (33, 22, 94) | (52, 38, 208) | (5, 3, 7) | (53, 36, 202) | (11, 5, 13) |
| 7 | (26, 14, 57) | (49, 32, 173) | (19, 10, 46) | (55, 35, 176) | (10, 7, 52) |
| 8 | (38, 24, 127) | (42, 37, 171) | (21, 17, 57) | (45, 29, 116) | (9, 3, 13) |
| 9 | (35, 17, 94) | (43, 33, 174) | (19, 10, 52) | (25, 19, 72) | (11, 7, 41) |
| 10 | (35, 23, 110) | (44, 29, 138) | (20, 9, 30) | (34, 25, 66) | (15, 5, 27) |
| 11 | (38, 22, 121) | (43, 24, 124) | (22, 9, 72) | (25, 13, 41) | (8, 5, 37) |
| 12 | (31, 19, 94) | (49, 38, 156) | (12, 8, 33) | (42, 25, 105) | (15, 6, 36) |
| 13 | (27, 14, 59) | (31, 30, 102) | (28, 17, 104) | (38, 24, 98) | (11, 6, 27) |
| IBR | 31.43 | 27.86 | 17.6 | 30.48 | 61.32 |
Bold values correspond to summary values, either total or first order statistics, mean, min and max values
Last row contains the imbalance ration (IBR) computed as explained in the text
Fig. 3Experimental setup of data exploitation for model validation. Red dots correspond to months with missed data. Blue dots correspond to months whose data is used for training. Green dots correspond to months whose data is used for testing (color figure online)
Fig. 4Transformations applied to the semantic modeling of users and threads to obtain the input values for the extended LCA
Fig. 5An instance of thread utility long tail distribution for a user at some specific time period
Fig. 6An instance evolution of the accumulators corresponding to a decision to post by a specific user
Fig. 7Flowchart of the GA used for ELCA optimal parameter search
Optimal ELCA parameter values for each sub-forum found by independent GA searches over the training data (January 2013)
| Sub-forums | ||||||||||||
| SF 2 | 0.863 | 0.148 | 0.511 | 0.553 | 0.174 | 0.055 | 0.070 | 0.965 | 0.491 | 0.137 | 0.399 | 0.189 |
| SF 3 | 0.584 | 0.906 | 0.389 | 0.029 | 0.684 | 0.340 | 0.217 | 0.588 | 0.146 | 0.951 | 0.189 | 0.949 |
| SF 4 | 0.586 | 0.833 | 0.352 | 0.476 | 0.642 | 0.389 | 0.866 | 0.981 | 0.639 | 0.478 | 0.107 | 0.245 |
| SF 5 | 0.628 | 0.184 | 0.000 | 0.429 | 0.707 | 0.733 | 0.047 | 0.623 | 0.0935 | 0.864 | 0.847 | 0.640 |
| SF 6 | 0.516 | 0.126 | 0.490 | 0.595 | 0.287 | 0.692 | 0.087 | 0.401 | 0.956 | 0.869 | 0.044 | 0.315 |
Predictive performance results averaged over all test periods of the proposed ELCA approach per sub-forum
| Sub-forums | |||||
|---|---|---|---|---|---|
| SF 2 | SF 3 | SF 4 | SF 5 | SF 6 | |
| Mean recall | 0.55 | 0.48 | 0.63 | 0.51 | 0.83 |
| Mean accuracy | 0.92 | 0.93 | 0.89 | 0.93 | 0.92 |
| Mean precision | 0.57 | 0.50 | 0.67 | 0.53 | 0.85 |
| Mean | 0.56 | 0.49 | 0.65 | 0.52 | 0.84 |
Detailed F-measure results of the proposed ELCA per testing month and sub-forum
| Month | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11 | 12 | 13 | Mean | Max | Min |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Sub-forums | |||||||||||||||
| SF 2 | 0.72 | 0.54 | 0.45 | 0.52 | 0.49 | 0.65 | 0.54 | 0.58 | 0.58 | 0.49 | 0.53 | 0.68 | |||
| SF 3 | 0.44 | 0.47 | 0.51 | 0.44 | 0.50 | 0.44 | 0.47 | 0.45 | 0.56 | 0.43 | 0.48 | 0.65 | |||
| SF 4 | 0.65 | 0.48 | 0.63 | 0.78 | *** | 0.61 | 0.67 | 0.66 | 0.71 | 0.68 | 0.72 | 0.55 | |||
| SF 5 | 0.49 | 0.44 | 0.56 | 0.46 | 0.39 | 0.47 | 0.47 | 0.68 | 0.63 | 0.65 | 0.51 | 0.45 | |||
| SF 6 | 0.82 | 0.81 | 0.86 | 0.84 | *** | 0.84 | 0.80 | 0.92 | 0.95 | 0.85 | 0.84 | 0.69 | |||
Bold values correspond to summary values, either total or first order statistics, mean, min and max values
Detailed F-measure results of the Random Forest approach per testing month and sub-forum
| Month | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11 | 12 | 13 | Mean | Max | Min |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Sub-forums | |||||||||||||||
| SF 2 | 0.20 | 0.16 | 0.10 | 0.14 | 0.13 | 0.17 | 0.10 | 0.15 | 0.14 | 0.12 | 0.13 | 0.17 | |||
| SF 3 | 0.08 | 0.07 | 0.09 | 0.07 | 0.08 | 0.12 | 0.10 | 0.10 | 0.11 | 0.15 | 0.08 | 0.14 | |||
| SF 4 | 0.22 | 0.19 | 0.24 | 0.40 | *** | 0.30 | 0.19 | 0.22 | 0.27 | 0.26 | 0.31 | 0.17 | |||
| SF 5 | 0.13 | 0.10 | 0.11 | 0.11 | 0.07 | 0.08 | 0.09 | 0.14 | 0.14 | 0.22 | 0.10 | 0.11 | |||
| SF 6 | 0.43 | 0.30 | 0.55 | 0.30 | *** | 0.29 | 0.59 | 0.32 | 0.36 | 0.35 | 0.60 | 0.28 | |||
Bold values correspond to summary values, either total or first order statistics, mean, min and max values
Detailed F-measure results of the SVM approach per testing month and sub-forum
| Month | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11 | 12 | 13 | Mean | Max | Min |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Sub-forums | |||||||||||||||
| SF 2 | 0.21 | 0.19 | 0.11 | 0.12 | 0.11 | 0.13 | 0.11 | 0.17 | 0.15 | 0.18 | 0.11 | 0.15 | |||
| SF 3 | 0.10 | 0.05 | 0.11 | 0.09 | 0.10 | 0.15 | 0.12 | 0.13 | 0.16 | 0.19 | 0.11 | 0.13 | |||
| SF 4 | 0.18 | 0.22 | 0.28 | 0.38 | *** | 0.33 | 0.22 | 0.19 | 0.23 | 0.25 | 0.28 | 0.18 | |||
| SF 5 | 0.11 | 0.13 | 0.15 | 0.11 | 0.11 | 0.10 | 0.11 | 0.13 | 0.17 | 0.26 | 0.12 | 0.16 | |||
| SF 6 | 0.39 | 0.31 | 0.45 | 0.31 | *** | 0.25 | 0.61 | 0.33 | 0.39 | 0.33 | 0.63 | 0.27 | |||
Bold values correspond to summary values, either total or first order statistics, mean, min and max values
****entries correspond to non convergent computation processes, i.e. we do not reach a final value
Fig. 8Example of middle performance result corresponding to the post publication graph of SF 4 for Month
Fig. 9Best predictive performance corresponding to post publication graph of SF 6 for Month 10
Post publication decision rules for SF4-M4
| User | Posts in: | User | Posts in: | User | Posts in: |
|---|---|---|---|---|---|
| U1 | T384, T413 | U46 | T387 | U215 | T196 |
| U8 | T372, T402 | U67 | T266, T384, T414, T419, T438 | U229 | T37, T266, T413, T419 |
| U9 | T37, T367, T402, T413 | U111 | T402 | U233 | T266 |
| U13 | T365, T372 | U127 | T103, T367, T438 | U245 | T37, T196, T367, T413 |
| U14 | T372, T414 | U132 | T365, T367 | U248 | T103 |
| U15 | T369, T414 | U154 | T266, T367, T372 | U249 | T369 |
| U30 | T266, T369 | U198 | T365 | ||
| U43 | T372, T384 | U201 | T266, T367, T384, T387, T414 |
User = U**, conversation threads the user has published posts in = T***
Post publication decision rules for SF6-M10
| User | Posts in: | User | Posts in: | User | Posts in: |
|---|---|---|---|---|---|
| U1 | T46, T610, T840 | U151 | T788 | U229 | T610 |
| U9 | T840 | U163 | T840 | U237 | T46 |
| U16 | T610 | U180 | T703, T788 | U241 | T46 |
| U32 | T703 | U207 | T610 | U257 | T788 |
| U75 | T788 | U228 | T840 | U279 | T46, T610, T840 |
User = U**, conversation threads the user has published posts in = T***
Fig. 10Worst result corresponding to publication graph of sub-forum 5 for Month 6
Fig. 11Relationship between number of posts and F-measure score