Literature DB >> 34941953

Predicting online participation through Bayesian network analysis.

Elizaveta Kopacheva1.   

Abstract

Despite the fact that preconditions of political participation were thoroughly examined before, there is still not enough understanding of which factors directly affect political participation and which factors correlate with participation due to common background variables. This article scrutinises the causal relations between the variables associated with participation in online activism and introduces a three-step approach in learning a reliable structure of the participation preconditions' network to predict political participation. Using Bayesian network analysis and structural equation modeling to stabilise the structure of the causal relations, the analysis showed that only age, political interest, internal political efficacy and no other factors, highlighted by the previous political participation research, have direct effects on participation in online activism. Moreover, the direct effect of political interest is mediated by the indirect effects of internal political efficacy and age via political interest. After fitting the parameters of the Bayesian network dependent on the received structure, it became evident that given prior knowledge of the explanatory factors that proved to be most important in terms of direct effects, the predictive performance of the model increases significantly. Despite this fact, there is still uncertainty when it comes to predicting online participation. This result suggests that there remains a lot to be done in participation research when it comes to identifying and distinguishing factors that stimulate new types of political activities.

Entities:  

Mesh:

Year:  2021        PMID: 34941953      PMCID: PMC8699968          DOI: 10.1371/journal.pone.0261663

Source DB:  PubMed          Journal:  PLoS One        ISSN: 1932-6203            Impact factor:   3.240


Introduction

With the development of Internet technologies, new forms of political activities, e.g., online activism, become more popular and gain potential to replace traditional political participation types, e.g., contacting politicians, signing petitions, etc. [1, 2]. It has been argued that online participation attracts new groups of people into political action as social networking systems (SNSs) provide access to resources [3], expedite engendering of identities [4, 5] and expose users to recruitment [6]. In the meanwhile, political participation is an essential element of the system of checks and balances in democratic societies [7-9]. Thus, it is essential to understand what motivates people to participate in new forms of political activities to facilitate the mobilisation of people into political action. Despite the fact that scholars started to investigate preconditions of political participation several decades ago [e.g., 10, 11], there is still not enough understanding of which factors directly affect political participation and which factors correlate with the variable. The Civic Voluntarism Model (CVM), which is well-known in political participation research, proposes that political motivations, i.e., political information, interest, efficacy and party identification, resources and recruitment are the main factors to determine political participation [12]. Over the years, the list of mobilising factors extended radically. Political participation scholars highlight such characteristics as political [13, 14] and social trust [15-17], placement on the left-right scale [18], external [19] and internal political efficacy [20, 21] as factors associating with political participation. However, only a limited number of studies examined causality between those factors (often, within small-scale experimental studies [e.g., 22, 23]). Thus, e.g., [24] found that lower levels of political trust are positively associated with non-institutionalised political participation (i.e., activities that aim to influence political decision-making indirectly, e.g., petition signing, boycotting, protesting [25, p. 188]) but did not identify if participation is affected by political trust. Moreover, it has not been examined before if knowing some characteristics of a person, e.g., if a person has a high political interest or low political trust, can help to predict the probability of a person to participate in one or another political action. The reasons for the lack of such explications are the limitations of regression analysis as a method often used in political participation research [e.g., 12, 25]. When conducting regression analysis of the preconditions of political participation, researchers are forced to make an assumption that such characteristics as political interest, political efficacy, social trust, etc. are independent of each other, which is not necessarily the case. Another type of analysis used in political participation research is structural equation modeling (e.g., applied by [21]) that allows analysing structural relationships, which may be interpreted as causal relations between the variables. Structural equation modeling estimates the interrelated multiple dependencies given exogenously and does not provide tools to infer the relationships from the data, which in the case of preconditions of political participation, is essential as there is not enough evidence to suggest causal relations between factors associated with participation. Indeed, for a long time, experimental studies with interventions were considered the only type of analysis that allows inferring causal relations in a social context. Judea Pearl was one of the first researchers who proposed using Bayesian networks to infer causality [26]. Later, reasoning about causal relations based on the results of Bayesian network analysis was frequently discussed in the literature [e.g., 27–29]. It has been proposed that this method of research has the capacity to learn reliable structures of causal relations [e.g., see 30]. Moreover, Bayesian network analysis does not have limitations discussed in relation to the regression analysis (i.e., an assumption of independence between the variables), and decreases the number of assumptions to a minimum [31]. Overall, by being more flexible, Bayesian network analysis allows researchers to lessen the expenses related to experimental research and provides the ability to infer causality and predict events using available data, which is often collected within other studies and with other purposes. Hence, Bayesian network analysis is becoming more popular to infer causal relations even when examining complex social phenomena [e.g., 32, 33]. In this study, Bayesian network analysis is utilised in order to analyse preconditions of political participation and acquire a probability distribution table of online activism (i.e., citizens’ activities that aim to “raise awareness about political issues” and mobilise citizens to participate in other more traditional forms of political participation, e.g., petition signing, protesting, contacting politicians, to promote political or institutional reforms [34]), as an example of political participation. Relying on the European Social Survey data [35] and the theoretical suggestions of the earlier research [12, 14, 21, 36] to limit the number of explanatory variables (see S1 File for more details), this research is a unique analysis of the online activism precondition structure that gives not only a deeper understanding of what motivates people to participate online but also a capacity to predict political participation having limited prior knowledge about a person, e.g., person’s social or political trust, income, level of education, age, etc. The aims of this research are to answer the theoretical questions discussed before and more importantly to propose a methodological solution to the problem of inferring a reliable causal relation structure using available, however, restrictive survey data (in comparison to the data collected within experiments or specifically for the purposes of the study). In particular, this paper presents an innovative three-step approach of utilising the tools of Bayesian network analysis and structural equation modeling to acquire a reliable structure of causal relations between characteristics operationalised by survey questionnaires. This research emphasises the importance of using structural equation modeling once the causal structure has been learned with the tools of Bayesian network analysis. Structural equation modeling allows stabilising the variability in results when acquiring the causal structure using different Bayesian network algorithms, i.e., constrained-based, score-based and hybrid. Comparing received models applying structural equation modeling tools, it becomes possible to acquire a structure more reliable for further interference. When used in the context of preconditions of online participation, the three-step analysis, presented in the paper, allowed to conclude that only age, political interest and internal political efficacy, i.e., the belief of an individual that one can influence political decision-making [37], have direct effects on online activism. Moreover, the direct effect of political interest is mediated by the effects of age and internal political efficacy, which affect political participation directly and indirectly via political interest. Contrary to the previous suggestions [19, 38–42], political and social trust as well as external political efficacy, which is often understood as the responsiveness of the political system to the political actions of citizens [37], are independent of online political participation given internal political efficacy. The results also suggest that offline mobilisation (via non-governmental organisations, trade unions, work places) is independent of online political participation given internal political efficacy and political interest. This result is consistent with the suggestions of [3], who proposed that offline mobilisation is associated only with offline political participation.

Methods

Within the study, 30 ESS questions [35] that operationalise possible preconditions of political participation were analysed. The list of the variables includes such factors as party identification, political interest, internal and external political efficacy, highlighted in the state-of-the-art work on political participation preconditions of Verba, Schlozman and Brady [12]. In addition to those factors, the influence of other characteristics that are expected to be associated with political participation is examined. Thus, earlier, scholars suggested political trust to be negatively associated with participation in non-institutionalised activities [25, 38–41, 43] and social trust to be positively correlated with this type of participation [15, 44, 45]. Moreover, Verba, Schlozman and Brady [12] highlight the importance of resources and recruitment for any type of political participation suggesting that access to resources and mobilisation play a key role in stimulating political participation. Previous research also showed gender, nationality, income and educational level to be associated with political participation [12, 46], thus, those variables were also used for the analysis. The dependent variable, online political participation, measured within the 2018 European Social Survey by the survey question “Have you… posted or shared anything about politics online in the last 12 months” [35] and originally has a binary scale, where 1 represent the occurrence of online participation and 2—no occurrence. Table 1 shows how the rest of the variables were operationalised.
Table 1

Variables used for the analysis.

Variable nameMeaningOperationalisation ofValues
pstplonlPosted or shared anything about politics online in the last 12 monthsOnline activism1—Posted; 2—Did not post
sgnptitSigned a petition in the last 12 monthsSigning petition1—Signed; 2—Did not sign
contpltContacted a politician or government official during the last 12 monthsContacting politicians1—Contacted; 2—Did not contact
voteVoted in the last national electionParticipation in voting1–Voted; 2—Did not vote; 3—Not eligible
ppltrstMost people can be trusted or you can’t be too carefulSocial trustFrom 0 (“You can’t be too careful”) to 10 (“Most people can be trusted”)
pplfairMost people try to take advantage of you, or try to be fairSocial trustFrom 0 (“Most people try to take advantage”) to 10 (“Most people try to be fair”)
pplhlpMost of the time people helpful or mostly looking out for themselvesSocial trustFrom 0 (“mostly look out for themselves”) to 10 (“mostly try to be helpful”)
trstlglTrust in the legal systemPolitical trustFrom 0 (“No trust at all”) to 10 (“Complete trust”)
trstplcTrust in the policePolitical trustFrom 0 (“No trust at all”) to 10 (“Complete trust”)
trstpltTrust in politiciansPolitical trustFrom 0 (“No trust at all”) to 10 (“Complete trust”)
trstprtTrust in political partiesPolitical trustFrom 0 (“No trust at all”) to 10 (“Complete trust”)
trstprlTrust in country’s parliamentPolitical trustFrom 0 (“No trust at all”) to 10 (“Complete trust”)
polintrHow interested in politicsPolitical interest1—“Very interested”; 2—“Quite interested”; 3—“Hardly interested”; 4—“Not at all interested”
psppsgvaPolitical system allows people to have a say in what government doesExternal political efficacy1—“Not at all”; 2—“Very little”; 3—“Some”; 4—“A lot”; 5—“A great deal”
psppiplaPolitical system allows people to have influence on politicsExternal political efficacy1—“Not at all”; 2—“Very little”; 3—“Some”; 4—“A lot”; 5—“A great deal”
clsprtyIs there a particular political party you feel closer to than all the other parties?Party identification1—“Yes”; 2- “No”
lrscalePlacement on left right scalePlacement on the left-right scaleFrom 0 (left) to 10 (right)
actrolgaAble to take active role in political groupInternal political efficacy1—“Not at all able”; 2—“A little able”; 3—“Quite able”; 4—“Very able”; 5—“Completely able”
cptppolaConfident in own ability to participate in politicsInternal political efficacy1—“Not at all confident”; 2—“A little confident”; 3—“Quite confident”; 4—“Very confident”; 5—“Completely confident”
pdwrkDoing last 7 days: paid workRecruitment0—“Not marked”; 1—“Marked”
wrkorgWorked in another organisation or association last 12 monthsRecruitment1—“Yes”; 2—“No”
rlgblgBelonging to particular religion or denominationRecruitment1—“Yes”; 2—“No”
dscrgrpMember of a group discriminated against in this countryRecruitment1—“Yes”; 2—“No”
mbtruMember of trade union or similar organisationRecruitment1—“Yes, currently”; 2—“Yes, previously”; 3—“No”
eduyrsYears of full-time education completedEducationNumber of years
hincfelFeeling about household’s income nowadaysIncome1—“Living comfortably on present income”; 2—“Coping on present income”; 3—“Difficult on present income”; 4—“Very difficult on present income”
brncntrWere you born in country?Nationality1—“Yes”; 2—“No”
gndrGenderGender1—“Male”; 2—“Female”
ageaAge of respondent, calculatedAgeNumber of years
cntryCountryCountry of originCountry

Source: ESS 2018 [35]. N = 36 015 individuals in 19 countries.

Source: ESS 2018 [35]. N = 36 015 individuals in 19 countries. Exploratory factor analysis was applied to 12 out of 30 ESS questions to reduce the number of variables and operationalise social trust, political trust, external and internal political efficacy (see Table 2 for the factor loadings).
Table 2

Operationalisation of social trust, political trust, external and internal political efficacy.

Operationalisation ofVariablesFactor loadingsProportion of variance explained by the factor
Social trustppltrst0.7510.57
pplfair0.802
pplhlp0.711
Political trusttrstlgl0.7640.64
trstplc0.646
trstplt0.875
trstprt0.847
trstprl0.842
External political efficacypsppsgva0.7510.56
psppipla0.751
Internal political efficacyactrolga0.757
cptppola0.7570.57

Source: ESS 2018 [35]. N = 36 015 individuals in 19 countries.

Source: ESS 2018 [35]. N = 36 015 individuals in 19 countries. In order to perform Bayesian structure learning and acquire a reliable structure, all the variables, including those received as the result of the exploratory factor analysis, were discretised as suggested by [47] (see S1 File to know how the variables were originally measured and how they were transformed for the analysis). Despite the fact that discretisation caused some information loss, the step was necessary in order to follow conditional Gaussian distribution assumptions that apply when working with the mixed data, i.e., continuous and discrete variables. The conditional Gaussian distribution suggests that discrete nodes cannot have continuous parents [47, p. 128]. Following this assumption, we would be forced to presume that age cannot directly influence online political participation, which would affect the results. Another constrain of the Bayesian network structure learning as a method to analyse factors influencing online activism is associated with the nature of a Bayesian network (BN) as a directed acyclic graph (DAG). Due to the fact that the structure cannot contain cycles, Bayesian network analysis does not allow children to affect parents, thus, simplification takes place. It is worth mentioning that no other methods allow directed graphs to be cyclic and cycles are possible only in undirected graphs. Hence, when evaluating the results, it is important to understand that only strong links between the variables are retained as Bayesian network algorithms aim to increase the ability to predict online participation. When discussing the constraints of Bayesian network analysis to infer causal relations between the variables, the idea of causality must be also touched upon. Indeed, once interpreting edges as causal relations, one must be aware of the fact that only the relations between the variables as opposed to the relations between singular events are distinguished. Thus, Bayesian network analysis infers generic causality rather than the single-case one [48]. In other words, this article presents the dependencies between the variables instead of the causality in traditional understanding as event A leading to the occurrence of B [26]. Moreover, as for any other statistical inference method, the accuracy of Bayesian network structure learning is highly affected by missing or imprecise observations. Thus, estimating the performance of Bayesian network structure learning algorithms on several synthetic networks, [49] found that the accuracy of the structure learning decreases by 13%-28% if data contain from 5% to 10% missing values and by 18%-28% if there are 5%-10% inaccurate values. Thus, if both types of imprecise observations characterise the data, the decrease in accuracy ranges between 26% and 30% [49, p. 22]. For another thing, the same study showed that Bayesian network structure learning performs best on the datasets containing 100 000—1 000 000 observations [49, p. 15], which is often hard to reach when working with survey data. The minimum number of observations needed for a correct Bayesian network structure learning is an ongoing topic of research. Earlier, it was proposed that this number depends on both, the total of nodes in the examined network and the complexity of the data [50], i.e., the number of categories within each variable and the number of missing and inaccurate values. In that regard, within this study, several robustness tests were performed to distinguish arcs that are not learned correctly (see S1 File). A three-step analysis was conducted in order to learn a reliable structure of the network and acquire a set of conditional probability tables. The step-wise description of the procedure is presented in Fig 1.
Fig 1

The step-wise description of the procedure in analysing the data.

Step 1: Bayesian network structure learning

Firstly, a set of the BN structure learning algorithms was applied to the network. All structure learning algorithms aim to find such model that maximises where D is a dataset, B = (G, Θ) is a Bayesian network, in which the parameters of the global distribution of a set of variables X = X1, X2, …, X is denoted as Θ and G is a directed acyclic graph, Pr(G|D) is the posterior probability of the DAG, Pr(G) is the product of the prior distribution over the possible DAGs and Pr(D|G) is the probability of the data [47]. All structure learning algorithms approach the task of formula maximisation differently. Score-based algorithms apply heuristic optimisation techniques assigning to each structure candidate a network score (i.e., Bayesian Information criterion (BIC) [51] or Bayesian Dirichlet equivalent (BDE) uniform [52] scores), which shows the model’s goodness of fit, and trying to maximise it. Constraint-based algorithms use conditional independence tests to firstly, determine which pairs of variables are connected by an arc and cannot be d-separated; secondly, find v-structures and then, identify compelled arcs and their orientation. Finally, hybrid algorithms combine the approaches of the constraint-based and score-based algorithms trying to maximise the network score while restricting the results keeping only those structures, which satisfy certain conditions [47]. In this analysis, the performance of the following algorithms was tested: constraint-based Grow-Shrink [53], Incremental Association [54] and Max-Min Parents and Children [55]; score-based hill-climbing [47] and Tabu search [56] and hybrid Max-Min Hill Climbing [57] and Hybrid HPC [58]. Score-based and hybrid algorithms were successful in finding directed acyclic graph (DAG) structures. In the meanwhile, constraint-based algorithms found only partially directed structures, which is in line with the results previously reported by [49]. Earlier, it was suggested that score-based algorithms are “superior” to constraint-based algorithms when working with the data characterised by high volumes of noise [49, p. 24]. Thus, both TABU and hill-climbing (HC) showed high accuracy in learning the structures on the datasets including missing and inaccurate values [49]. Due to the fact that constraint-based algorithms learned partially directed structures (in comparison to the fully directed models found by the score-based and hybrid algorithms), only the results of the score-based and hybrid learning were used for further comparison. The structures learned by score-based and hybrid algorithms were compared by evaluating the predictive performance of the models using cross-validation (see Table 3). In addition to that, the performance of the DAG structures received using the same algorithms but also, applying model averaging was assessed. Model averaging is used as a technique to acquire a model that gives better predictive performance and reduces over-fitting [47]. This method is often used when learning network structures on the datasets characterised by a limited number of observations and a high volume of noise [e.g., 59, 60]. Thus, the models were learned on the sets of 5000 network structures.
Table 3

The results of cross-validation using different BN structures.

Algorithm typeScore-basedHybridScore-based (averaged)Hybrid (averaged)
AlgorithmTABUHCMMHCH2PCTABUHCMMHCH2PC
Sensitivity.844.843.842.835.845.844.840.844
Specificity.543.540.497.521.537.539.496.530
Precision.978.979.977.990.977.978.981.977
Recall.844.843.842.835.845.844.840.844
F1.906.906.904.906.906.906.905.906
Prevalence.961.962.961.982.959.961.968.959
Detection Rate.811.811.809.820.810.811.813.810
Detection Prevalence.829.829.829.829.829.829.829.829
Balanced Accuracy.694.692.669.678.691.691.668.687
BIC loss (SD)14.1388 (4e-04)14.139 (6e-04)14.2645 (8e-04)14.1529 (8e-04)14.1304 (6e-04)14.115 (7e-04)14.2859 (7e-04)14.1301 (7e-04)
BDE loss (SD)14.1386 (6e-04)14.1388 (5e-04)14.2646 (6e-04)14.1528 (8e-04)14.1313 (6e-04)14.1149 (5e-04)14.2858 (5e-04)14.1301 (7e-04)
Prediction error (SD)0.1681 (4e-04)0.1683 (4e-04)0.1718 (4e-04)0.1707 (3e-04)0.1682 (4e-04)0.1682 (3e-04)0.1711 (2e-04)0.1684 (4e-04)
Algorithm typeScore-based and hybrid (balanced)Averaged score-based and hybrid (balanced)
AlgorithmTABU + H2PCTABU + MMHCHC + MMHCHC + H2PCTABU + H2PCTABU + MMHCHC + MMHCHC + H2PC
Sensitivity.835.838.838.835.845.829.829.843
Specificity.530.546.532.510.543.7501.000.523
Precision.990.987.986.990.9771.0001.000.978
Recall.835.838.838.835.845.829.829.843
F1.906.906.906.906.906.906.906.905
Prevalence.982.976.976.982.9591.0001.000.962
Detection Rate.820.818.817.820.810.829.829.811
Detection Prevalence.829.829.829.829.829.829.829.829
Balanced Accuracy.683.692.685.673.694.790.914.683
BIC loss (SD)14.2078 (7e-04)14.1487 (8e-04)14.1488 (5e-04)14.2078 (5e-04)14.1899 (5e-04)14.2053 (7e-04)14.153 (8e-04)14.1391 (9e-04)
BDE loss (SD)14.2078 (5e-04)14.149 (7e-04)14.1489 (6e-04)14.2081 (5e-04)14.1899 (6e-04)14.2053 (7e-04)14.1537 (7e-04)14.1389 (8e-04)
Prediction error (SD)0.1711 (7e-04)0.1694 (4e-04)0.1693 (4e-04)0.1712 (5e-04)0.1683 (5e-04)0.1713 (1e-04)0.1712 (1e-04)0.1683 (5e-04)

Source: ESS 2018 [35]. N = 27 379 individuals in 19 countries. The 10-fold cross-validation is applied to evaluate the predictive performance of the models. The top rows show the predictive performance of the structures learned by score-based and hybrid algorithms. The bottom rows show the predictive performance of the models received applying ensemble learning.

Source: ESS 2018 [35]. N = 27 379 individuals in 19 countries. The 10-fold cross-validation is applied to evaluate the predictive performance of the models. The top rows show the predictive performance of the structures learned by score-based and hybrid algorithms. The bottom rows show the predictive performance of the models received applying ensemble learning. Comparing the predictive performance of all eight models (see Table 3), one can find TABU [56], HC [47] and H2PC [58] algorithms to outperform MMHC [57] on almost all dimensions of evaluation. This result is in line with the earlier observations of [49, 61]. In the meanwhile, model averaging, indeed, shows to slightly increase the accuracy and recall, while reducing the prediction error in almost all of the cases. In general, it is advised to consider the results of both score-based and hybrid algorithms when trying to learn the structures of networks consisting of categorical variables. While optimising the global model, score-based algorithms do not deal with local structure identification, which is considered when hybrid algorithms learn network structures. In that regard, it is worth recognising the results of the structure learning conducted by two types of algorithms, score-based and hybrid algorithms, as there is a higher risk of model over-fitting when using only score-based algorithms [49, p. 18]. Thus, the next step of the analysis was to apply structural equation modeling using the models found by the Bayesian structure learning algorithms to find significant paths. This was done with the goal to find the model that (1) provides high predictive performance and (2) reduced over-fitting and (3) that can serve as a theoretical foundation for future studies on online political participation predictors.

Step 2: Structural equation modeling

The second step of the analysis was to test for the significance of the paths found by any pair of score-based—hybrid algorithms. For each pair of score-based—hybrid models, the initial structure of the network, which was fitted into the data by the means of structural equation modeling, included those arcs that appeared in both structures. After that, arcs present in one structure (e.g., a score-based model) but absent in another (e.g., a hybrid structure) were introduced in the model. As a result of computing the chi-square tests to compare the models [62], the structure with the best model fit was found and its performance was evaluated. Thus, Table 3 shows that the combination of the averaged TABU and H2PC structures, which was balanced by the means of structural equation modeling, gives the highest recall (i.e., 0.845) and lowest prediction error (0.1683) when compared with the performance of other balanced models. Moreover, this balanced model can be compared only with the averaged TABU structure when analysing the predictive performance. Thus, both models reach the recall of 0.845, while the averaged TABU structure has a slightly lower prediction error. Still, the balanced structure scores higher on the balanced accuracy, which is 0.694, compared to 0.691 for the averaged TABU structure. Thus, balancing the structures allows reaching goal (1), which was identified in the previous section, i.e., allows finding the model providing high predictive performance. In order to test if balancing structures learned by score-based and hybrid algorithms decreases over-fitting, the method was used on the simulated dataset “A Logical Alarm Reduction Mechanism” (ALARM) [63] including 37 variables and 20 000 observations. Testing the method on the synthetic data allows finding the number of falsely identified arcs associated with each model. Thus, Fig 2 shows that in almost all of the cases, the accuracy of the models stays rather high, i.e., the number of correctly identified arcs stays with the range of 55%—65%. In the meanwhile, with the increase of the noise percentage in the data, the number of falsely identified arcs (i.e., false positives) rapidly grows. Moreover, as expected, score-based algorithms tend to over-fit the model.
Fig 2

Results of the over-fit testing on the synthetic dataset ALARM.

Source: [63]. N = 20 000 observations. Notes: The two-step approach was used to balance the structures. The over-fitting was tested on the original dataset and on the same dataset with an increased percentage of noise (i.e., 0.5%, 1%, 2%, 5%, 10% and 15% of noise). Eight rows on the top show the accuracy of the models in identifying the correct arcs. Thus, longer bars display higher accuracy. On the bottom, the rows show the number of falsely identified arcs. The shorter the bar the better as it signifies lower over-fitting.

Results of the over-fit testing on the synthetic dataset ALARM.

Source: [63]. N = 20 000 observations. Notes: The two-step approach was used to balance the structures. The over-fitting was tested on the original dataset and on the same dataset with an increased percentage of noise (i.e., 0.5%, 1%, 2%, 5%, 10% and 15% of noise). Eight rows on the top show the accuracy of the models in identifying the correct arcs. Thus, longer bars display higher accuracy. On the bottom, the rows show the number of falsely identified arcs. The shorter the bar the better as it signifies lower over-fitting. While balancing score-based—hybrid pair of algorithms decreases accuracy only slightly, the number of falsely identified arcs drops significantly. In particular, Fig 2 shows that balancing the combination of the averaged TABU and H2PC structures allows reaching both, (1) high predictive performance and (2) reduced over-fitting, which supports the results reported in Table 3. In that regard, ensemble learning allows reaching both of the methodological goals identified earlier, as well as decreases the number of false positive arcs while keeping the number of correctly identified arcs high, thus, provides a good theoretical foundation for future studies on online political participation predictors. Due to the fact that the balanced TABU + H2PC (both averaged on 5000 structures) model allowed reaching all of the identified methodological goals, this model was chosen for acquiring the set of conditional probability distribution tables.

Step 3: Fitting the parameters

The third step of the analysis was to fit the parameters of the Bayesian network dependent on the received structure. In order to avoid receiving missing parameter estimates in the case when configurations of the discrete parents are not observed in the data [64], the Bayesian parameter estimation method was used to fit the parameters of the network. As a result, a set of probability distribution tables was acquired. Based on this data, it became possible to estimate the probability of a person to participate in online activism having some prior knowledge, e.g., one’s level of political interest, political trust or placement of the left-right scale. All statistical analyses were performed using the R 4.0.1 platform [65] and a number of additional packages, including bnlearn [64], lavaan [62], gRain [66] and psych [67]. The R scripts of the analysis and additional tests are provided as supplementary materials (i.e., S2 and S3 Files).

Results

The results of the structure learning using the score-based Tabu and hybrid H2PC algorithms (averaged on 5000 structured) are illustrated by Fig 3.
Fig 3

Directed acyclic graphs of the relations between factors associated with participation in online activism.

Source: [35]. N = 27 379 individuals in 19 countries. Notes: Within Bayesian network analysis, score-based Tabu and hybrid H2PC algorithms were applied to analyse the data and learn the structure of the causal relations between the variables. Dashed blue lines represent false positives, i.e., edges that are not present in the structure learned by the Tabu algorithm but present in the structure learned by H2PC. Orange lines represent false negatives, i.e., edges that are present in the structure learned by the Tabu algorithm but absent in the structure learned by H2PC. The direction of each arc represents the orientation of the causality between two variables: e.g., in a relationship A → B, A is a parent and B is its child. All the edges from the other nodes to “Age”, “Gender” and “Born in the country” were blacklisted prior to learning the structure. No other edges were blacklisted. In the figure, those nodes that can only be parents have a darker blue color. The node “Country” (i.e., the country of the respondent’s residency) is present in the structure but not depicted by the figure to facilitate the apprehension of the relations between the nodes of interest. All variables are individual-level variables.

Directed acyclic graphs of the relations between factors associated with participation in online activism.

Source: [35]. N = 27 379 individuals in 19 countries. Notes: Within Bayesian network analysis, score-based Tabu and hybrid H2PC algorithms were applied to analyse the data and learn the structure of the causal relations between the variables. Dashed blue lines represent false positives, i.e., edges that are not present in the structure learned by the Tabu algorithm but present in the structure learned by H2PC. Orange lines represent false negatives, i.e., edges that are present in the structure learned by the Tabu algorithm but absent in the structure learned by H2PC. The direction of each arc represents the orientation of the causality between two variables: e.g., in a relationship A → B, A is a parent and B is its child. All the edges from the other nodes to “Age”, “Gender” and “Born in the country” were blacklisted prior to learning the structure. No other edges were blacklisted. In the figure, those nodes that can only be parents have a darker blue color. The node “Country” (i.e., the country of the respondent’s residency) is present in the structure but not depicted by the figure to facilitate the apprehension of the relations between the nodes of interest. All variables are individual-level variables. Fig 3 shows that both algorithms agree on the relations between the majority of the nodes. Some arcs, however, are present in the structure learned by the Tabu algorithm, while being absent in the structure learned by H2PC (in Fig 3 they are orange). Moreover, two structures do not seem to agree on the directions of some arcs, i.e., arcs between political and social trust (thus, TABU algorithm found political trust to affect social trust, i.e., the relationship political trust → social trust, where political trust is a parent and social trust is its child; in the meanwhile, H2PC found inverse causality, i.e., the relationship social trust → political trust, where social trust is a parent and political trust is its child), internal and external political efficacy, internal political efficacy and education, as well as the relations between gender and internal political efficacy was determined only by the H2PC algorithm. While conducting the robustness tests, i.e., learning the structures of the relations between the explanatory variables in regard to participation in signing petitions, contacting politicians and voting, other uncertainties in relation to the inverse or absent causality between participation in online activism and working in an NGO, absent causality between online participation and party identification, as well as in relation to a direct effect of internal political efficacy on party identification, arose. Those uncertainties were eliminated in the second step of the analysis by the means of structural equation modeling. As a result of the structural equation modeling and comparison of the models based on the chi-square tests, the structure with the best model fit was distinguished. This structure is illustrated by Fig 4.
Fig 4

Directed acyclic graph of the relations between factors associated with participation in online activism.

Source: ESS 2018 [35]. N = 27 379 individuals in 19 countries. Notes: Structural equation modeling was applied to analyse the data. Entities depicted in association with the edges are parameter estimates of the structural equation modeling. Sign.: *p < 0.05; **p < 0.01; ***p < 0.001. All variables are individual level variables.

Directed acyclic graph of the relations between factors associated with participation in online activism.

Source: ESS 2018 [35]. N = 27 379 individuals in 19 countries. Notes: Structural equation modeling was applied to analyse the data. Entities depicted in association with the edges are parameter estimates of the structural equation modeling. Sign.: *p < 0.05; **p < 0.01; ***p < 0.001. All variables are individual level variables. Based on the learned structure, participation in online activism is directly affected by only three explanatory variables, i.e., age, internal political efficacy and political interest, while political interest in itself is highly affected by internal political efficacy and age. In accordance with the set of probability distribution tables, acquired as the result of fitting the parameters of the Bayesian network dependent on the received structure, the probability of participation in online activism changes depending on the age, internal political efficacy and political interest of a person (see Fig 5).
Fig 5

Probability distribution table of participation in online activism.

Source: ESS 2018 [35]. N = 27 379 individuals in 19 countries. Notes: Bayesian parameter estimation, conditional on the acquired structure of the network, was applied to analyse the data. Entities are the probability of participation in online activism in percentage.

Probability distribution table of participation in online activism.

Source: ESS 2018 [35]. N = 27 379 individuals in 19 countries. Notes: Bayesian parameter estimation, conditional on the acquired structure of the network, was applied to analyse the data. Entities are the probability of participation in online activism in percentage. Fig 5 shows that the group of people that is more likely to participate in online activism are those between 15 and 30 years old with high political interest and internal political efficacy (about 44 out 100 people of that group are likely to participate in online activism). In the meanwhile, the probability of participation online in the age group least expected to participate, i.e., between 61 and 90 years old, grows from 2,3% to 17,99% with the increase in political interest and internal political efficacy. That suggests significant growth in the probability of online political participation dependent on the political interest and internal political efficacy that is partly in line with the previous suggestions of [12]. Fig 5 also suggests that political interest has a higher influence on online political participation than internal political efficacy in all age groups except for those between 61 and 90 years old. For the elderly, internal political efficacy, which is party operationalises access to resources, has a bigger influence on participation in online activism (the probability of online participation in the group of people with high internal political efficacy is 7,28%) than political interest (the probability of online participation in the group of people with high political interest is 5,92%). The acquired structure of the network (see Fig 4) suggests that the rest of the variables are independent of participation in online activism given age, internal political efficacy and political interest. The causal relations between the explanatory variables seem rather interesting regarding the results of earlier studies. Thus, for instance, nodes that are often referred to as recruitment variables [12], i.e. membership in a trade union, belonging to a particular religion, being in workforce and working in a non-governmental organisation, in practice, indicate the age of a person rather than affect online political participation. Moreover, only one of those variables, working in an NGO, depends on political motivation variables, i.e. political interest and political efficacy, and within robustness tests, showed to affect at most participation in contacting politicians while also was found to be dependent on participation in signing petitions. Party identification, placement on the left-right scale, self-identification with a discriminated group, political and social trust and external political efficacy also appear to be independent of participation in online activism given internal political efficacy and age. Despite the fact that the majority of the variables associated with participation in online activism does not have a direct influence on the response variable, having prior knowledge of some of those characteristics can suggest a higher or lower probability of a person to participate in online activism. Thus, without having any prior knowledge about a person, it is expected that there is a 17,03% chance that the person participates online. Fig 6 shows the probability distributions of all the network variables if there is no prior knowledge about a person.
Fig 6

Probability distribution of all factors associated with participation in online activism.

Source: ESS 2018 [35]. N = 27 379 individuals in 19 countries. Notes: Bayesian parameter estimation, conditional on the acquired structure of the network, was applied to analyse the data. Entities are the probabilities of events in percentage.

Probability distribution of all factors associated with participation in online activism.

Source: ESS 2018 [35]. N = 27 379 individuals in 19 countries. Notes: Bayesian parameter estimation, conditional on the acquired structure of the network, was applied to analyse the data. Entities are the probabilities of events in percentage. If there is some prior knowledge about a person, e.g., it is a person with the low income, graduate level of education, working in a non-governmental organisation, placing oneself on the left of the left-right scale and having a low political trust, the probability of online participation is expected to increase from 17,03% to 24,48%. Thus, even without having any prior knowledge about those factors that directly influence participation in online activism, i.e., age, internal political efficacy and political interest, the probability of participation is expected to be higher. In that case, the probabilities of other events would also change. Thus, the probability of having high internal political efficacy would grow from 45,67% to 80,64%, the probability of having a high political interest would increase from 49,02% to 74,99%, the probability of self-identification with a discriminated group would change from 7,30% to 12,41%, the probability of having high social trust would drop from 58,14% to 44,86% (see S6 Fig to know how the probability distributions of other variables would change).

Conclusion

While conducting the analysis of the survey data, it became evident that using Bayesian network analysis as the only method of research can produce unreliable results. This paper presents a three-step approach to acquire a reliable structure of causal relations between characteristics operationalised by survey questionnaires. The analysis showed that while agreeing on the majority of the causal relations of networks based on survey data, constrained-based, score-based and hybrid algorithms used for structure learning still do not agree on some relations. That being the case, it is necessary to refine the results of the Bayesian network structure learning comparing received structures by the means of structural equation modeling. In this analysis, the accuracy of the learned structure was also constrained by the imbalanced data. Thus, out of 27,379 observations, only 4,687 individuals (17,12%) participate in online activism. Hence, it became also necessary to conduct robustness tests adding participation in signing petitions, contacting politicians and voting as other outcome variables. That allowed to significantly increase the number of observations associated with political participation (e.g., out of 27,323 observations, 9,426 people (34,5%) participate in signing petitions and online activism) and redefine the causal relation between some of the nodes. Applying a three-step approach and conducting robustness and validity tests to analyse the survey data, it became possible to receive a reliable structure of the causal relations between the variables associated with participation in online activism. The acquired structure (see Fig 4) suggests causality dissimilar to that reported before. Hence, despite the fact that the structure partly supports the Civic Voluntarism Model (CVM) developed by [12] in relation to the effects of internal political efficacy and political interest, the effect of political interest is still mediated by the indirect effects of internal political efficacy and age via political interest. Furthermore, in regard to other factors, the causal relation is absent. For instance, recruitment does not seem to increase participation in online activism. When performing the robustness tests, it became evident that similar causal relations are also in place in regard to participation in signing petitions, contacting politicians and voting. Moreover, robustness tests showed that those who sign petitions also get mobilised into other activities, i.e., start working in an NGO, which is a reverse causality than the one expected by [12]. Political trust does seem to be independent of participation in online activism contrary to some of the previous suggestions [14, 38–41]. Furthermore, within the structure, the variable is only affected by the country of residence and independent of political interest in the European context. Fig 4 also shows that such resources as education and income do not contribute to increasing the probability of participation in online activism. However, due to the fact that internal political efficacy, which in this analysis, operationalises access to resources and acquired skills that allow a person to use such resources, has a direct effect on online participation, it is suggested that access to resources other than money or education increases the probability of online activism participation in Europe. That result is also partly in line with the CVM [12]. Despite the fact that Bayesian network analysis allowed to distinguish the structure of causal relations between the variables associated with participation in online activism and determine which variables directly affect participation, 44% is the highest probability of a person to participate in online activism given prior knowledge of the variables examined by political participation scholars (see Fig 5). Such a result may propose that rather than characteristics, personal motives [68] and emotions [69], factors highlighted by the social movement literature, or other factors, which are not yet reported, stimulate political participation. In that regard, there is much work to be done in order to distinguish the factors that directly affect political participation and allow predicting the last one. It seems necessary to examine such factors outside the European context as well since the present study showed that access to resources plays a big role in stimulating political participation as suggested by previous research [12, 68]. This paper showed how causal relations can be inferred using Bayesian network analysis in combination with structural equation modeling. Other methods can also be considered when stabilising the results of Bayesian network structure learning.

Directed acyclic graphs of the relationships between factors associated with participation in petition signing.

Source: [35]. N = 27 366 individuals in 19 countries. Notes: Within Bayesian network analysis, score-based Tabu and hybrid H2PC algorithms were applied to analyze the data and learn the structure of the causal relationships between the variables. Dashed blue lines represent false positives, i.e., edges that are not present in the structure learned by the Tabu algorithm but present in the structure learned by H2PC. Orange lines represent false negatives, i.e., edges that are present in the structure learned by the Tabu algorithm but absent in the structure learned by H2PC. All the edges from the other nodes to “Age”, “Gender” and “Born in the country” are blacklisted prior to learning the structure. In the figure, those nodes that can only be parents have a darker blue color. The node “Country” (i.e., the country of the respondent’s residency) is present in the structure but not depicted by the figure to facilitate the apprehension of the relationships between the nodes of interest. All variables are individual-level variables. (TIF) Click here for additional data file.

Directed acyclic graphs of the relationships between factors associated with participation in online activism and petition signing.

Source: [35]. N = 27 323 individuals in 19 countries. Notes: Within Bayesian network analysis, score-based Tabu and hybrid H2PC algorithms were applied to analyze the data and learn the structure of the causal relationships between the variables. Dashed blue lines represent false positives, i.e., edges that are not present in the structure learned by the Tabu algorithm but present in the structure learned by H2PC. Orange lines represent false negatives, i.e., edges that are present in the structure learned by the Tabu algorithm but absent in the structure learned by H2PC. All the edges from the other nodes to “Age”, “Gender” and “Born in the country” are blacklisted prior to learning the structure. In the figure, those nodes that can only be parents have a darker blue color. The node “Country” (i.e., the country of the respondent’s residency) is present in the structure but not depicted by the figure to facilitate the apprehension of the relationships between the nodes of interest. All variables are individual-level variables. (TIF) Click here for additional data file.

Directed acyclic graphs of the relationships between factors associated with participation in contacting politicians.

Source: [35]. N = 27 397 individuals in 19 countries. Notes: Within Bayesian network analysis, score-based Tabu and hybrid H2PC algorithms were applied to analyze the data and learn the structure of the causal relationships between the variables. Dashed blue lines represent false positives, i.e., edges that are not present in the structure learned by the Tabu algorithm but present in the structure learned by H2PC. Orange lines represent false negatives, i.e., edges that are present in the structure learned by the Tabu algorithm but absent in the structure learned by H2PC. All the edges from the other nodes to “Age”, “Gender” and “Born in the country” are blacklisted prior to learning the structure. In the figure, those nodes that can only be parents have a darker blue color. The node “Country” (i.e., the country of the respondent’s residency) is present in the structure but not depicted by the figure to facilitate the apprehension of the relationships between the nodes of interest. All variables are individual-level variables. (TIF) Click here for additional data file.

Directed acyclic graphs of the relationships between factors associated with participation in voting.

Source: [35]. N = 25 404 individuals in 19 countries. Notes: Within Bayesian network analysis, score-based Tabu and hybrid H2PC algorithms were applied to analyze the data and learn the structure of the causal relationships between the variables. Dashed blue lines represent false positives, i.e., edges that are not present in the structure learned by the Tabu algorithm but present in the structure learned by H2PC. Orange lines represent false negatives, i.e., edges that are present in the structure learned by the Tabu algorithm but absent in the structure learned by H2PC. All the edges from the other nodes to “Age”, “Gender” and “Born in the country” are blacklisted prior to learning the structure. In the figure, those nodes that can only be parents have a darker blue color. The node “Country” (i.e., the country of the respondent’s residency) is present in the structure but not depicted by the figure to facilitate the apprehension of the relationships between the nodes of interest. All variables are individual-level variables. (TIF) Click here for additional data file.

Directed acyclic graph of the relationships between factors associated with participation in online activism.

Source: [35]. N = 27 379 individuals in 19 countries. Notes: Structural equation modeling was applied to analyze the data. Only those arcs that were determined by both Tabu and H2PC algorithms are present in the model. Entities depicted in association with the edges are parameter estimates of the structural equation modeling. Sign.: *p < 0.05; **p < 0.01; ***p < 0.001. All variables are individual level variables. (TIF) Click here for additional data file. Source: ESS 2018 [35]. N = 27 379 individuals in 19 countries. Notes: Bayesian parameter estimation, conditional on the acquired structure of the network, was applied to analyse the data. Entities are the probabilities of events in percentage. The following conditional probability query was applied: education is “graduate”, placement on the left-right scale is “left”, work in an NGO is “yes”, political trust is “low” and income is “low”. (TIF) Click here for additional data file. Source: [35]. N = 27 379 individuals in 19 countries. Notes: Bayesian parameter estimation, conditional on the acquired structure of the network, was applied to analyze the data. Entities are the probabilities of events in percentage. The following conditional probability query was applied: age is “31–45”, political interest is “high”, political trust is “high”, social trust is “high”, internal political efficacy is “high” and born in the country of residence is “yes”. (TIF) Click here for additional data file.

Supplementary information.

The document provides additional information on the data handling, methods and results and complements the main text of the manuscript. (PDF) Click here for additional data file.

Supplementary R script.

The R script used for the analysis. (PDF) Click here for additional data file. The R script used to compare the predictive performance of the models and to test a two-fold approach in network structure learning, i.e., the combination of Bayesian structure learning and structural equation modeling, on simulated data. (PDF) Click here for additional data file. 1 Oct 2021 PONE-D-21-05594 Predicting online participation through Bayesian network analysis PLOS ONE Dear Dr. Kopacheva, Thank you for submitting your manuscript to PLOS ONE. After careful consideration, we feel that it has merit but does not fully meet PLOS ONE’s publication criteria as it currently stands. Therefore, we invite you to submit a revised version of the manuscript that addresses the points raised during the review process. Both reviewers have made comments on a number of issues requiring your attention. Specifically: - Clarification of methodological aspects (most comments by both reviewers): More details should be provided about the applied methodology. In this sense, pleasenotice PLOS ONE's publication criterion #3 (). - Results and their interpretation: Clarify the issues raised by Reviewer 1 in his comments #8, 9 and 10. Additionally, consider the request for more information on Bayesian networks made by Reviewer 2 in his/her comment #2. Please submit your revised manuscript by Nov 06 2021 11:59PM. If you will need more time than this to complete your revisions, please reply to this message or contact the journal office at plosone@plos.org. When you're ready to submit your revision, log on to https://www.editorialmanager.com/pone/ and select the 'Submissions Needing Revision' folder to locate your manuscript file. Please include the following items when submitting your revised manuscript: A rebuttal letter that responds to each point raised by the academic editor and reviewer(s). You should upload this letter as a separate file labeled 'Response to Reviewers'. A marked-up copy of your manuscript that highlights changes made to the original version. You should upload this as a separate file labeled 'Revised Manuscript with Track Changes'. An unmarked version of your revised paper without tracked changes. You should upload this as a separate file labeled 'Manuscript'. If you would like to make changes to your financial disclosure, please include your updated statement in your cover letter. Guidelines for resubmitting your figure files are available below the reviewer comments at the end of this letter. If applicable, we recommend that you deposit your laboratory protocols in protocols.io to enhance the reproducibility of your results. Protocols.io assigns your protocol its own identifier (DOI) so that it can be cited independently in the future. For instructions see: https://journals.plos.org/plosone/s/submission-guidelines#loc-laboratory-protocols. Additionally, PLOS ONE offers an option for publishing peer-reviewed Lab Protocol articles, which describe protocols hosted on protocols.io. Read more information on sharing protocols at https://plos.org/protocols?utm_medium=editorial-email&utm_source=authorletters&utm_campaign=protocols. We look forward to receiving your revised manuscript. Kind regards, Sergi Lozano Academic Editor PLOS ONE Journal Requirements: When submitting your revision, we need you to address these additional requirements. 1. Please ensure that your manuscript meets PLOS ONE's style requirements, including those for file naming. The PLOS ONE style templates can be found at https://journals.plos.org/plosone/s/file?id=wjVg/PLOSOne_formatting_sample_main_body.pdf and https://journals.plos.org/plosone/s/file?id=ba62/PLOSOne_formatting_sample_title_authors_affiliations.pdf. 2. Thank you for stating the following financial disclosure: "NO" At this time, please address the following queries: a) Please clarify the sources of funding (financial or material support) for your study. List the grants or organizations that supported your study, including funding received from your institution. b) State what role the funders took in the study. If the funders had no role in your study, please state: “The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.” c) If any authors received a salary from any of your funders, please state which authors and which funders. d) If you did not receive any funding for this study, please state: “The authors received no specific funding for this work.” Please include your amended statements within your cover letter; we will change the online submission form on your behalf. 3. Thank you for stating the following in your Competing Interests section: "NO" Please complete your Competing Interests on the online submission form to state any Competing Interests. If you have no competing interests, please state "The authors have declared that no competing interests exist.", as detailed online in our guide for authors at http://journals.plos.org/plosone/s/submit-now This information should be included in your cover letter; we will change the online submission form on your behalf. 4. Please upload a copy of Supporting Information Table 1 and 2 which you refer to in your text on page 9. [Note: HTML markup is below. Please do not edit.] Reviewers' comments: Reviewer's Responses to Questions Comments to the Author 1. Is the manuscript technically sound, and do the data support the conclusions? The manuscript must describe a technically sound piece of scientific research with data that supports the conclusions. Experiments must have been conducted rigorously, with appropriate controls, replication, and sample sizes. The conclusions must be drawn appropriately based on the data presented. Reviewer #1: Yes Reviewer #2: Yes ********** 2. Has the statistical analysis been performed appropriately and rigorously? Reviewer #1: Yes Reviewer #2: Yes ********** 3. Have the authors made all data underlying the findings in their manuscript fully available? The PLOS Data policy requires authors to make all data underlying the findings described in their manuscript fully available without restriction, with rare exception (please refer to the Data Availability Statement in the manuscript PDF file). The data should be provided as part of the manuscript or its supporting information, or deposited to a public repository. For example, in addition to summary statistics, the data points behind means, medians and variance measures should be available. If there are restrictions on publicly sharing data—e.g. participant privacy or use of data from a third party—those must be specified. Reviewer #1: Yes Reviewer #2: Yes ********** 4. Is the manuscript presented in an intelligible fashion and written in standard English? PLOS ONE does not copyedit accepted manuscripts, so the language in submitted articles must be clear, correct, and unambiguous. Any typographical or grammatical errors should be corrected at revision, so please note any specific errors here. Reviewer #1: Yes Reviewer #2: Yes ********** 5. Review Comments to the Author Please use the space provided to explain your answers to the questions above. You may also include additional comments for the author, including concerns about dual publication, research ethics, or publication ethics. (Please upload your review as an attachment if it exceeds 20,000 characters) Reviewer #1: 1.- The paper reports the application of Bayesian networks to find causal relationships between variables affecting online political participation. The authors propose a three-step methodology to find these causal relationships: a. finding the network structure, b. finding the statistical significance of those relationships using structural equation modeling, and c. fitting the distributions or parameters. 2. I think the study represents an interesting methodological application for modeling the participation of individuals in political issues. As the authors point out, the use of Bayesian networks is limited to the availability of large amounts of data. Consequently, it is not often practical in the social science data domain. Nevertheless, the ESS provides an excellent opportunity to test this type of tool. The study stands out for its originality. I can currently point out a few improvements that the authors could consider and clarify some doubts regarding the methodology. Comments: 3.- As the authors rightly point out, the structure of a Bayesian network can be affected by imprecise or missing observations. In this regard, in the Results section, I found no information regarding missing data. How much did missing data were there?, on which variables? What was done with the observations in cases where there was a missing measurement or missing variable? Regarding measurement imprecision, it would be useful to clarify at least for the most important variables (online participation, political participation, political participation, etc.) how these variables were measured in the ESS. 4.- At the structure learning stage: A typical issue that needs to be dealt with is selecting the best structure for the Bayesian network. In this process, as mentioned by the authors, it is possible to use several algorithms. They use score-based and hybrid algorithms. Is it possible to report a table with more accurate information or some statistics of the goodness-of-fitness of the algorithms to compare the performance? On the other hand, as a means of robustness, I understand, they use several learned structures to predict the variable "online participation." The idea is to evaluate these models to get the best of them. Based on what performance measure do the authors evaluate the predictive ability? In summary, it would be nice to explain how they found the best set of network structures in more detail. 5.- how do you prevent the problem of overfitting? do you use any cross-validation scheme? 6.- According to this, step 2 (SEM) identifies those edges or arcs of the Bayesian network that are statistically significant. If we assume that we are only interested in the model that performs best in predicting the dependent variable, then what do we gain from step 2? I assume that the authors are not only interested in predictive ability, but also in "explaining" the online participation variability between observations. 7.- From the Bayesian network, we obtain causality information. In SEM we see if there is statistical significance in the network arcs. So how do we interpret causality between two variables that are not significant? How do we interpret a pair of variables whose relationship is significant but without connecting arcs in the network? 8.- The results section reports the results of the directed acyclic graphs. The section reports the Edges present in the structures and that are influential in online activism. It also reports the inference parameters (conditional probability distributions). However, I do not find results concerning "prediction ." In this regard, I would like to differentiate between the knowledge we gain in interpreting the network structure and the ability of this model to predict. What happens with the model's performance (Bayesian network learned) to predict the online activism variable in a different sample than the one used for the learning process? My understanding is that the suggested three-step approach allows adding robustness and validity to the results, especially for survey data. However, I wonder how much the predictive capacity increases between a model found with this methodology and another without this methodology. This is of interest because the title of this paper is precisely related to the "prediction" of online participation. 9.- In the results section on page 8, Lines 289-301, the authors report sensitivity. They indicate how much online participation could increase or decrease when the level of other variables changes. I believe that much could be gained from the results by specifying a more detailed sensitivity analysis that would allow us to understand the most critical factors on online participation on the dependent variable in terms of intensity and direction. Minor comments: 10.- This seems obvious, but for a reader not so knowledgeable in Bayesian networks, it might be nice to interpret the directionality (causality) of the Bayesian network edges. Reviewer #2: This article scrutinizes the causal relations between the variables associated with participation in online activism and introduces a three-step approach in learning a reliable structure of the participation preconditions’ network to predict political participation. The authors use Bayesian network analysis and structural equation modeling to stabilize the structure of the causal relations. Please consider the following points for revision: 1. clarify which algorithm was used for the discretization 2. Step 1 should include more information on Bayesian networks. mathematical model, different networks and a mathematical description of the algorithms used. https://www.sciencedirect.com/science/article/pii/S0167739X19303322 https://link.springer.com/article/10.1023/A:1007465528199 3. I do not understand how the authors choose the high scores of the networks. what is this? what is the class label used in the Bayesian network? 4. Authors should include more details of the data used, especially when it is used in the Bayesian network, it is not clear. ********** 6. PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files. If you choose “no”, your identity will remain anonymous but your review may still be made public. Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy. Reviewer #1: Yes: Mauricio A. Valle Reviewer #2: No [NOTE: If reviewer comments were submitted as an attachment file, they will be attached to this email and accessible via the submission site. Please log into your account, locate the manuscript record, and check for the action link "View Attachments". If this link does not appear, there are no attachment files.] While revising your submission, please upload your figure files to the Preflight Analysis and Conversion Engine (PACE) digital diagnostic tool, https://pacev2.apexcovantage.com/. PACE helps ensure that figures meet PLOS requirements. To use PACE, you must first register as a user. Registration is free. Then, login and navigate to the UPLOAD tab, where you will find detailed instructions on how to use the tool. If you encounter any issues or have any questions when using PACE, please email PLOS at figures@plos.org. Please note that Supporting Information files do not need this step. 5 Nov 2021 I thank the reviewers and the associate editor for their constructive comments. I have addressed all of them and modified the paper accordingly. Please, find the modified manuscript (i.e., Revised Manuscript with Track Changes. pdf) and the detailed answer to the questions in the file Response to reviewers.pdf. Thank you! Submitted filename: Response to reviewers.pdf Click here for additional data file. 25 Nov 2021 PONE-D-21-05594R1Predicting online participation through Bayesian network analysisPLOS ONE Dear Dr. Kopacheva, Thank you for submitting your manuscript to PLOS ONE. As you can see below, the reviewers' are satisfied with your revision of the manuscript. Still, Reviewer 2 has pointed out to two (very specific) issues. According to his/her comments, please check Fig. 2 and consider moving Table 3 to the Supplementary (as it does not contain specific information needed to follow the main text). We invite you to submit a revised version of the manuscript that addresses the points raised during the review process. Please submit your revised manuscript by Jan 09 2022 11:59PM. If you will need more time than this to complete your revisions, please reply to this message or contact the journal office at plosone@plos.org. When you're ready to submit your revision, log on to https://www.editorialmanager.com/pone/ and select the 'Submissions Needing Revision' folder to locate your manuscript file. Please include the following items when submitting your revised manuscript:If you would like to make changes to your financial disclosure, please include your updated statement in your cover letter. Guidelines for resubmitting your figure files are available below the reviewer comments at the end of this letter. A rebuttal letter that responds to each point raised by the academic editor and reviewer(s). You should upload this letter as a separate file labeled 'Response to Reviewers'. A marked-up copy of your manuscript that highlights changes made to the original version. You should upload this as a separate file labeled 'Revised Manuscript with Track Changes'. An unmarked version of your revised paper without tracked changes. You should upload this as a separate file labeled 'Manuscript'. If applicable, we recommend that you deposit your laboratory protocols in protocols.io to enhance the reproducibility of your results. Protocols.io assigns your protocol its own identifier (DOI) so that it can be cited independently in the future. For instructions see: https://journals.plos.org/plosone/s/submission-guidelines#loc-laboratory-protocols. Additionally, PLOS ONE offers an option for publishing peer-reviewed Lab Protocol articles, which describe protocols hosted on protocols.io. Read more information on sharing protocols at https://plos.org/protocols?utm_medium=editorial-email&utm_source=authorletters&utm_campaign=protocols. We look forward to receiving your revised manuscript. Kind regards, Sergi Lozano Academic Editor PLOS ONE Journal Requirements: Please review your reference list to ensure that it is complete and correct. If you have cited papers that have been retracted, please include the rationale for doing so in the manuscript text, or remove these references and replace them with relevant current references. Any changes to the reference list should be mentioned in the rebuttal letter that accompanies your revised manuscript. If you need to cite a retracted article, indicate the article’s retracted status in the References list and also include a citation and full reference for the retraction notice. [Note: HTML markup is below. Please do not edit.] Reviewers' comments: Reviewer's Responses to Questions Comments to the Author 1. If the authors have adequately addressed your comments raised in a previous round of review and you feel that this manuscript is now acceptable for publication, you may indicate that here to bypass the “Comments to the Author” section, enter your conflict of interest statement in the “Confidential to Editor” section, and submit your "Accept" recommendation. Reviewer #1: All comments have been addressed Reviewer #2: All comments have been addressed ********** 2. Is the manuscript technically sound, and do the data support the conclusions? The manuscript must describe a technically sound piece of scientific research with data that supports the conclusions. Experiments must have been conducted rigorously, with appropriate controls, replication, and sample sizes. The conclusions must be drawn appropriately based on the data presented. Reviewer #1: Yes Reviewer #2: Yes ********** 3. Has the statistical analysis been performed appropriately and rigorously? Reviewer #1: Yes Reviewer #2: Yes ********** 4. Have the authors made all data underlying the findings in their manuscript fully available? The PLOS Data policy requires authors to make all data underlying the findings described in their manuscript fully available without restriction, with rare exception (please refer to the Data Availability Statement in the manuscript PDF file). The data should be provided as part of the manuscript or its supporting information, or deposited to a public repository. For example, in addition to summary statistics, the data points behind means, medians and variance measures should be available. If there are restrictions on publicly sharing data—e.g. participant privacy or use of data from a third party—those must be specified. Reviewer #1: Yes Reviewer #2: Yes ********** 5. Is the manuscript presented in an intelligible fashion and written in standard English? PLOS ONE does not copyedit accepted manuscripts, so the language in submitted articles must be clear, correct, and unambiguous. Any typographical or grammatical errors should be corrected at revision, so please note any specific errors here. Reviewer #1: Yes Reviewer #2: Yes ********** 6. Review Comments to the Author Please use the space provided to explain your answers to the questions above. You may also include additional comments for the author, including concerns about dual publication, research ethics, or publication ethics. (Please upload your review as an attachment if it exceeds 20,000 characters) Reviewer #1: (No Response) Reviewer #2: The manuscript has improved. I have some minor comments: 1. Figure 2 is illegible. Please improve the quality. 2. Table 3 is very long. Please, move table 3 to supplementary material. ********** 7. PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files. If you choose “no”, your identity will remain anonymous but your review may still be made public. Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy. Reviewer #1: Yes: Mauricio A. Valle Reviewer #2: No [NOTE: If reviewer comments were submitted as an attachment file, they will be attached to this email and accessible via the submission site. Please log into your account, locate the manuscript record, and check for the action link "View Attachments". If this link does not appear, there are no attachment files.] While revising your submission, please upload your figure files to the Preflight Analysis and Conversion Engine (PACE) digital diagnostic tool, https://pacev2.apexcovantage.com/. PACE helps ensure that figures meet PLOS requirements. To use PACE, you must first register as a user. Registration is free. Then, login and navigate to the UPLOAD tab, where you will find detailed instructions on how to use the tool. If you encounter any issues or have any questions when using PACE, please email PLOS at figures@plos.org. Please note that Supporting Information files do not need this step. 3 Dec 2021 I thank the reviewers and the associate editor for their comments. In particular, Reviewer 2 mentioned two specific issues: 1. That ``Figure 2 is illegible": I have addressed this issue and improved the quality of the figure; 2. That ``Table 3 is very long": I have moved Table 3 to the supplementary material file (S1 File). Thank you! Submitted filename: Response to reviewers.pdf Click here for additional data file. 9 Dec 2021 Predicting online participation through Bayesian network analysis PONE-D-21-05594R2 Dear Dr. Kopacheva, We’re pleased to inform you that your manuscript has been judged scientifically suitable for publication and will be formally accepted for publication once it meets all outstanding technical requirements. Within one week, you’ll receive an e-mail detailing the required amendments. When these have been addressed, you’ll receive a formal acceptance letter and your manuscript will be scheduled for publication. An invoice for payment will follow shortly after the formal acceptance. To ensure an efficient process, please log into Editorial Manager at http://www.editorialmanager.com/pone/, click the 'Update My Information' link at the top of the page, and double check that your user information is up-to-date. If you have any billing related questions, please contact our Author Billing department directly at authorbilling@plos.org. If your institution or institutions have a press office, please notify them about your upcoming paper to help maximize its impact. If they’ll be preparing press materials, please inform our press team as soon as possible -- no later than 48 hours after receiving the formal acceptance. Your manuscript will remain under strict press embargo until 2 pm Eastern Time on the date of publication. For more information, please contact onepress@plos.org. Kind regards, Sergi Lozano Academic Editor PLOS ONE Additional Editor Comments (optional): Reviewers' comments: 14 Dec 2021 PONE-D-21-05594R2 Predicting online participation through Bayesian network analysis Dear Dr. Kopacheva: I'm pleased to inform you that your manuscript has been deemed suitable for publication in PLOS ONE. Congratulations! Your manuscript is now with our production department. If your institution or institutions have a press office, please let them know about your upcoming paper now to help maximize its impact. If they'll be preparing press materials, please inform our press team within the next 48 hours. Your manuscript will remain under strict press embargo until 2 pm Eastern Time on the date of publication. For more information please contact onepress@plos.org. If we can help with anything else, please email us at plosone@plos.org. Thank you for submitting your work to PLOS ONE and supporting open access. Kind regards, PLOS ONE Editorial Office Staff on behalf of Dr. Sergi Lozano Academic Editor PLOS ONE
  6 in total

1.  Identifying characteristics that impact motor carrier safety using Bayesian networks.

Authors:  Steven Hwang; Linda Ng Boyle; Ashis G Banerjee
Journal:  Accid Anal Prev       Date:  2019-04-06

2.  Killing the bill online? Pathways to young people's protest engagement via social media.

Authors:  Timothy Macafee; J J De Simone
Journal:  Cyberpsychol Behav Soc Netw       Date:  2012-09-24

3.  Peer review and gender bias: A study on 145 scholarly journals.

Authors:  Flaminio Squazzoni; Giangiacomo Bravo; Mike Farjam; Ana Marusic; Bahar Mehmani; Michael Willis; Aliaksandr Birukou; Pierpaolo Dondio; Francisco Grimaldo
Journal:  Sci Adv       Date:  2021-01-06       Impact factor: 14.136

4.  Model averaging strategies for structure learning in Bayesian networks with limited data.

Authors:  Bradley M Broom; Kim-Anh Do; Devika Subramanian
Journal:  BMC Bioinformatics       Date:  2012-08-24       Impact factor: 3.169

5.  Bayesian network modelling study to identify factors influencing the risk of cardiovascular disease in Canadian adults with hepatitis C virus infection.

Authors:  Alaa Badawi; Giancarlo Di Giuseppe; Alind Gupta; Abbey Poirier; Paul Arora
Journal:  BMJ Open       Date:  2020-05-05       Impact factor: 2.692

6.  A hierarchical Bayesian Belief Network model of household water treatment behaviour in a suburban area: A case study of Palu-Indonesia.

Authors:  D Daniel; Mita Sirait; Saket Pande
Journal:  PLoS One       Date:  2020-11-06       Impact factor: 3.240

  6 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.