Literature DB >> 34679123

Remapping the foundations of morality: Well-fitting structural model of the Moral Foundations Questionnaire.

Michael Zakharin1, Timothy C Bates1.   

Abstract

Moral foundations theory posits five moral foundations, however 5-factor models provide poor fit to the data. Here, in five studies, each with large samples (total N = 11,496), we construct and replicate a well-fitting model of the Moral Foundations Questionnaire (MFQ). In study 1 (N = 2,271) we tested previously theorised models, confirming none provide adequate fit. We then developed a well-fitting model of the MFQ. In this model, the fairness/reciprocity and harm/care foundations were preserved intact. The binding foundations, however, divided into five, rather than the original three foundations. Purity/sanctity split into independent foundations of purity and sanctity. Similarly, Ingroup/loyalty divided into independent factors of loyalty to clan and loyalty to country. Authority/respect was re-focussed on hierarchy, losing one item to the new sanctity foundation and another into loyalty to country. In addition to these 7 foundations, higher-level factors of binding and individualizing were supported, along with a general/acquiescence factor. Finally, a "moral tilt" factor corresponding to coordinated left-leaning vs. right-leaning moral patterns was supported. We validated the model in four additional studies, testing replication of the 7-foundation model in data including from US, Australia, and China (total N = 9,225). The model replicated with good fit found in all four samples. These findings demonstrate the first well-fitting replicable model of the MFQ. They also highlight the importance of modelling measurement structure, and reveal important additional foundations, and structure (binding, individualizing, tilt) above the foundations.

Entities:  

Mesh:

Year:  2021        PMID: 34679123      PMCID: PMC8535174          DOI: 10.1371/journal.pone.0258910

Source DB:  PubMed          Journal:  PLoS One        ISSN: 1932-6203            Impact factor:   3.240


Introduction

Moral psychology seeks to understand how people form moral judgments, why individuals differ in these judgments and what the structure of these judgements is [1]. Moral Foundations Theory [MFT: 2] has emerged as the leading theory addressing these questions, suggesting that moral judgment arises from five universal foundations, each evolved to facilitate cooperative relations [2]. The Moral Foundations Questionnaire [MFQ: 3] was developed based on this theory and has been used in many hundreds of studies. To date, however, no well-fitting model of the MFQ has been produced [4, 5]. Here, we test the fit of existing models, confirming that these fit poorly. We then used a multi-trait multi-method approach to develop a well-fitting model of the MFQ, and demonstrate that this new model replicates in multiple large independent samples. Below, we briefly introduce the foundations, the MFQ measurement of these foundations, and outline previous tests of the model and proposed alternative models before presenting study 1, which develops a well-fitting model of the MFQ items. Applying an evolutionary approach to previous theories of morality and human values [6-9], Haidt and colleagues argued that morality consists of at least five culturally universal moral domains identified as follows: First, harm/care concerns avoidance of suffering for others and is experienced as compassion. Second, fairness/reciprocity concerns avoidance of unfairness or unequal treatment of self and others and failure to reciprocate. When triggered, it is experienced as personal anger if one is oneself the target of unfair action or being taken advantage of, or empathic anger if the target is another person. Third, ingroup/loyalty concerns maintenance of loyalty to one’s group by oneself and by other members of one’s group. When triggered, this foundation is experienced positively as feelings of unity with the members of one’s group, as feelings of guilt if one is tempted to betray one’s group and as feelings of treachery if others are disloyal. Fourth, authority/respect is concerned with recognizing, respecting and preserving societal hierarchy. It is triggered by the presence of hierarchy cues and by behavior lacking respect for hierarchy. When activated, it is reflected in feelings of deference and respect for the authority of these hierarchies. Patriotism or loyalty to country has been contrasted by Haidt [10] with globalism, creating a dimension of caring for the people in one’s own country more than those of other countries, or treating all people as identical in terms of their moral call upon us. Finally, purity/sanctity is a foundation proposed to have evolved both to avoid toxic and parasitic contamination, but also to promote beliefs and ritual: what Durkheim identified as “sacred things… things set apart and forbidden” [11/1912, p. 44]. Haidt and Graham (2) describe purity as "a guardian of the body in all cultures, responding to elicitors that are biologically or culturally linked to disease transmission" (p. 106). Separately, ritualised beliefs contrast virtues where "the soul is in charge of the body" against unnatural vices such as lust and gluttony seen as "debased, impure, and less than human" (p. 106). In each culture, morally impure practices violating purity/sanctity are experienced as feelings bearing some similarity to disgust and revulsion [2]. Finally, we should note that liberty was also identified as a possible moral foundation [12], but is typically omitted in questionnaire studies of moral foundations theory. Following Graham, Haidt [13], it is common to group the five moral foundations into two super-factors based on whether the locus of moral value is the individual or the group. In this scheme, the foundations of harm/care and fairness/reciprocity are grouped as “individualising” foundations and the foundations of ingroup/loyalty, authority/respect and purity/sanctity are categorized as “binding” foundations because their locus of moral value is the group. The relative strength of the individualising versus the binding foundations in a particular individual are argued to underpin individual differences between liberal and conservative values [13-15]. A strength of moral foundations theory is that the authors created an open measurement instrument, allowing others to test the structure and predictions of the moral foundations model. The MFQ-30 [3] consists of 30 items plus 2 foil items used to filter out inattentive participants. It also includes two distinct item formats: a 15 “relevance” items section and a 15 “judgment” items section containing three items for each of the five foundations. The inclusion of two measurement methods within the questionnaire is an under-exploited strength, allowing improved modelling accuracy [16]. Items in the “relevance” block measure the moral relevance of various aspects of behavior, posing an example, e.g. “Whether or not someone was cruel” and asking participants to score how relevant to them this behavior would be in reaching a moral decision from “not at all relevant” to “extremely relevant”. By contrast, judgment items assess the extent to which participants agree with a specific moral judgment, e.g. “I am proud of my country’s history” scored strongly agree to strongly disagree). All MFQ-30 items are measured on a 6-point Likert scale. Since its development, the correlation of the MFQ with external measures has been widely studied, especially in the domain of politics. For example, moral foundations scores predict voting outcomes over and above traditional demographic predictors [17] and MFQ scores correlate with location on the right wing-left wing ideological divide [13, 18], as well as people’s stances on other ‘culture war’ attitudes [19]. Fairness, authority, ingroup and purity account for significant variance in the self-identification with distinct religious orientations postulated by religious orientations theory [20]. In sacrificial dilemma studies, harm, purity and ingroup foundations are associated with endorsement of harmful action [21]. There has also been support for brain volumes being associated with the MFQ responses [22]. Alongside this supportive research, both moral foundations theory and the MFQ measure have been criticised both on theoretical and empirical grounds (e.g. by papers using it as a trait of interest). Regarding the prediction of political orientation, in a meta-analysis of the relationship between moral foundations and political orientation, Kivikanagas et al. [18] found that correlations between the five foundations and political orientation were close to zero in Black samples. At the level of taxonomy, Suhler and Churchland [23] have questioned the basis for the selection of the five foundations as foundational. In a similar vein, Curry et al. [24] argued that the theory includes content they consider non-moral (purity/sanctity and harm/and care), collapses domains which in the view of Curry et al. [24] should be kept distinct, and misses entirely other moral domains (e.g. heroism). Smith et al. and Hatemi et al. [25, 26], using multiple samples, reported that the MFQ did not reliably factor into 5 dimensions, but rather into 2, with structure differing between the US and Australia. They also found that moral foundations are not stable across time and that MFQ scores reflect rather than cause political attitudes. Smith et al. [25] also reported that the MFQ foundations show no evidence of heritability. Regarding the foundational nature of the moral foundations, Strupp-Levitsky et al. [27] suggest that rather than being causal, the foundations build on other, more basic, variables such as empathy, need for closure and need for cognition. In a study manipulating partisan and group identity cues by embedding these in modified MFQ items, Ciuk [28] reported that item endorsement was affected by partisan alignment, supporting the conclusion that causality runs from political ideology to moral foundations. At the psychometric level, Iurino and Saucier [5] tested measurement invariance of the 5-factor structure of the MFQ in 27 countries and concluded that there was little support for a five-factor solution for the questionnaire. Similarly, in a US sample, Davis et al. [29] tested measurement invariance of the MFQ in Black and White samples, concluding that the assumption of scalar invariance could not be supported. Jointly, these pose considerable challenges for a measure that aims to be culturally universal, perhaps especially problems in finding a well-fitting model as a basis for prediction. We next turn to the measurement implementation of moral foundations theory in the MFQ, reviewing previous work testing predicted theoretical structures. Specifically, we address the question of whether the MFQ shows the proposed structure of (at least) five distinct moral foundations aligned with the predicted item content. We then build and test a series of models, testing existing five foundation models, and, after showing that this does not fit, exploring alternative formulations improving measurement modelling of the MFQ, before replicating this model in a series of independent datasets.

Psychometric modelling of the Moral Foundations Questionnaire (MFQ)

In their seminal paper, Graham et al. [3] collected online responses from a large international sample, mostly from Western countries (total N = 34,476). They tested one to six-factor models as well as a hierarchical five-factor model with two super-factors. Fig 1 shows the structure of two-factor, five-factor, and hierarchical models. The best fitting model consisted of five correlated factors. This model fit better than competing models and showed adequate fit according to Root Mean Square Error of Approximation (RMSEA) = .046. However, by other important conventional indices, the model fit poorly. For instance, the Comparative Fit Index (CFI) for this model was only .824, well below the generally accepted value of 0.95 [30]. No reports, to our knowledge, have resulted in satisfactory fit (see Table 1 for a sample of fits for different MFQ models in different samples and cultures).
Fig 1

The three models previously applied to the MFQ: A: 2-factor model, B: 5-factor model, C: Hierarchical model.

Table 1

MFQ fit metrics obtained in the previous studies.

StudyNMetrics reportedMFQ versionBest fitting modelSample description
RMSEACFI
Curry et al. (2019)1,0420.0500.910MFQ-307 factorsUK online sample
Graham et al. (2011)4130.0430.876MFQ-305 factorsEastern Europe online sample
Iurino & Saucier (2018)8,0550.0840.853MFQ-205 factors (modified)Survey of World Views international sample
Yalçındağ et al. (2019)14320.1040.850MFQ-305 factorsThree Turkish samples
Graham et al. (2011)4110.0390.841MFQ-305 factorsLatin America online sample
Graham et al. (2011)2990.0420.838MFQ-305 factorsSouth Asia online sample
Davies et al. (2014)39940.0630.829MFQ-305 factorsNew Zealand national probability sample
Graham et al. (2011)26,0140.0480.824MFQ-305 factorsUS online sample
Graham et al. (2011)1,6700.0460.811MFQ-305 factorsWestern Europe online sample
Kivikangas et al. (2017)8740.0780.749MFQ-305 factorsFinnish nationally representative sample
Ji & Janicke (2018)2340.0770.744MFQ-30HierarchicalChinese students
Kim, Kang & Yun (2012)4780.0680.681MFQ-305 factorsSouth Korean students
Nilsson & Erlandsson (2015)5400.0720.679MFQ-305 factorsSwedish students
Ji & Janicke (2018)2040.0780.658MFQ-30HierarchicalUS students
Harper & Rhodes (2021)3220.0800.77MFQ-305 factorsUK online sample
Hadarics & Kende (2017)4030.0910.681MFQ-305 factorsHungarian student sample
Doğruyol et al. (2019)4,9710.0500.940REL-155 factorsMany Labs 2 Project, Western countries
Doğruyol et al. (2019)1,9970.0600.940REL-155 factorsMany Labs 2 Project, Non-Western countries
Du (2019)7610.0800.930REL-155 factorsTwo Chinese samples

MFQ-20 is the short 20-item version of MFQ. Rel-15 is the 15-item Relevance subscale from MFQ-30. Iurino & Saucier (2018) used alternative model derived from exploratory factor analysis of original MFQ items

The three models previously applied to the MFQ: A: 2-factor model, B: 5-factor model, C: Hierarchical model. MFQ-20 is the short 20-item version of MFQ. Rel-15 is the 15-item Relevance subscale from MFQ-30. Iurino & Saucier (2018) used alternative model derived from exploratory factor analysis of original MFQ items Since the initial work by Graham et al. [3], several reports have been published aimed at replicating or improving the five-factor structure of the MFQ. Davies et al. [4] applied the models tested by Graham et al. [3] to new data collected in New Zealand (N = 3,994). They concluded that the 5-factor model fit better than models with more or fewer factors but that fit metrics, specifically the CFI, were unsatisfactory. Nilsson & Erlandsson [31] modified the hierarchical model, separating purity from other binding foundations and testing this in a sample of Swedish students (N = 540). However, they found that a five correlated-factors models showed the best fit in their sample despite, again, no model showing adequate fit. More recently, Harper & Rhodes [32] tested the factor structure of the MFQ in two British samples (total N = 750), confirming that the proposed five-factor structure was not psychometrically sound according to accepted metrics. They also tested an extended MFQ, including the nine items of the sixth “Liberty” foundation proposed by Haidt and colleagues [12]. Adding the Liberty scale, however, did not lead to a well-fitting six-factor model, and instead was better explained by a three-factor model comprising “traditionalism”, “compassion” and “liberty”. The structure of the MFQ has also been investigated in non-Western samples. Yalçındağ et al. [33] used three Turkish samples to test replication of the models reported by Graham et al. [3]. In each sample, the 5-factor model was best-fitting, but no model had adequate fit. Hadarics and Kende [34] tested the 5-factor structure in a sample of Hungarian students, finding this model fit poorly. Iurino and Saucier [5] used the 20-item MFQ administered in the Survey of World Views [35] and covering respondents from 27 countries (N = 8055). An exploratory factor analysis supported a 5-factor model, though the item-factor loadings differed substantively from those proposed by Graham et al. [3] and formal fit of the model was below the threshold for acceptability. Some researchers have tested models using just the relevance or the judgment items alone. Both formats yield similar structures. For example, Doğruyol et al. [36] compared the fit of a 5-factor and hierarchical models in relevance items from the many labs project [37]. They found that in both Western and non-Western countries the 5-factor model fit better than did a hierarchical model. Du [38] fit models with 1 to 5 factors in Chinese sample (N = 761), finding that the 5-factor model offered the best fit compared to models with fewer factors. Interestingly, although the MFQ authors included these distinct measurement methods (the relevance and judgment items), to our knowledge only one analysis has used these to account for measurement variance in a model. Curry et al. [24] added relevance and judgment method factors to an item-level 5-correlated factor model (see Fig 2) reporting that this improved fit. Still, though, this model did not meet the standard levels of acceptable fit as measured by CFI, leading Curry et al. [24] to reject the MFQ as failing to show a psychometrically valid measurement model.
Fig 2

Incorporating multitrait-multimethod into a 5-foundation model of the MFQ.

Summary and directions for improved modeling

Taken together, earlier attempts to model the MFQ-30 suggest that it is best modelled as including at least five factors, one for each of the proposed foundations. The imperfect fit of this model, however, indicates that, while a five-factor model may reasonably be considered as a good starting point to develop a better fitting structure of MFQ-30, significant elements of covariance structure are not captured by this model. At least three plausible explanations could account for the poor fit of models tested to date. One is that correlated-factor models cannot readily represent clustering within the foundations, thus failing to fully reflect binding and individualising effects which form part of moral foundations theory. Supporting this, Graham et al. [3] modelled binding and individualising as hierarchical superfactors which improved the model, but not to the level considered a good fit. An alternative approach, not attempted to date, would involve implementing the binding and individualizing factors in a bi-factor structure. This also raises the possibility of implementing the model at the level of the items (see Fig 3). To our knowledge, there have been no reported attempts to model the group factors in bi-factor structures in item-level models.
Fig 3

5-factor item-level model of including individualising and binding domains.

A second explanation for poor fit may lie in the foundations themselves. One or more foundations may have sub-components which need to be modelled as distinct entities. Other foundations may be better modelled if collapsed together. Hence, splitting and collapsing of factors may be required. Third, and finally, general influences on responding, whether from a general morality factor, or response biases such as acquiescence [39] or social desirability [40] will cause item-covariance lowering the fit of models not accounting for these effects. It will be valuable, therefore, to explore the impact of these effects when building a psychometrically well-fitting model of the MFQ. To our knowledge, no previous attempts to model the MFQ have investigated these types of hypothesis even though this has been explored in personality models, where, for instance, social desirability has proven to be a useful measurement component of well-fitting models [41].

Summary

In the present paper, we used structural equation modelling to test existing models of the MFQ-30, and to build and validate a new, well-fitting structural model of this measure. In Study 1, we tested 4 MFQ models suggested in previous research. Specifically, we compared the fit of the two-factor, five-factor, hierarchical and Multitrait-Multimethod models in a novel dataset. We showed that a Multitrait-Multimethod model fits better than the three other designs but that no model fit well. Next, we developed a new model of the MFQ. Our approach was to develop models increasing in complexity from the simplest predicted model to more complex structures, as required to achieve good fit. Using Multitrait-Multimethod model as the base model, we found that including binding and individualizing factors were required, and that modeling these at the item level results in better fit compared to including them as associations among foundations. Two additional foundations, loyalty to country and purity (separated from sanctity) were required. Finally, we showed that adding a general morality and left-right tilt factors improved fit and that together, these innovations yielded a well-fitting model. In Studies 2 through 5 we replicated the model developed in the Study 1 in four independent open-access samples showing that the model proposed and replicated internally in study 1 fits better than any other competing model in each of these datasets and meets accepted criteria for good fit. The five samples used were, for Study 1, 2,271 students from the University of Edinburgh and members of the local community; for Study 2, 7,130 participants from Graham et al., study 3 [13]; for study 3, 1,052 participants from Smith et al. [25]; for Study 4, 553 participants from O’Grady et al. [42]; and for Study 5, 452 participants from Wang et al. [43]. Together, the studies amount to a total N of 11,496 and 4 independent replications.

Study 1

Method

Participants

A total of 2,271 people from the UK participated in the study. We used two attention check questions to remove inattentive participants: (a) “It is better to do good than to do bad” and (b) “It is relevant to moral judgment whether or not someone was good at math”. Participants who responded with slightly, moderately or strongly disagree to the first question and somewhat, very or extremely relevant to the second question were excluded. The final sample consisted of 2,039 adults (1404 females, 631 males; age M = 25.3, SD = 13.04). Participants were recruited from a volunteer pool consisting of students and members of the community. After providing basic demographic data, participants completed the MFQ-30 online.

Measures

The extent to which participants endorsed moral foundations was measured by the 30-item Moral Foundations Questionnaire (MFQ-30) [3].

Analytic approach

All analyses were conducted at the item level. We began by implementing and testing fit of 4 theoretical structural models reported in earlier research namely the 2-factor, 5-factor, hierarchical model and Multitrait-Multimethod model (see Figs 1 and 2). Model fit was assessed using the Comparative Fit Index (CFI), Tucker-Lewis Index (TLI) and the root mean square error of approximation (RMSEA). The comparative fit of the models was assessed by the Akaike Information Criterion (AIC) [44] which penalises un-parsimonious models. Following Hu & Bentler [30] and Yu [45] we adopted criteria of TLI > = .95 and RMSEA < = .06. After examining the fit of previously theorised models of MFQ we turned to building a well-fitting model. The full sample was randomly split into training (N = 1,020) and holdout (N = 1,019) datasets. New models were built in the training dataset and the models achieving adequate fit were then tested in the holdout dataset. We used the Multitrait-Multimethod model as the base model for these new analyses. Modification attempts began with modelling binding and individualising factors within a bi-factor framework (i.e., these factors loaded directly on the items, rather than on the MFQ domain factors (see Fig 3). This choice was made a-priori based on research suggesting that that structuring these hierarchically does not improve model fit [3, 4, 24, 31, 33]. Paths from both binding and individualizing factors to their corresponding items were constrained to be positive to ensure that these factors could capture only variance common to all items within their respective clusters. Second, we investigated whether the model can be improved by changing the numbers of foundations. Finally, we investigated whether adding one or more general factors would improve fit in the model. There are several theoretical reasons why general factor can be expected to be found in a personality measurement scales such as MFQ. First, general factor can represent a genuine factor of personality. Second, general factor can arise as a result of acquiescence bias, tendency to agree with all questionnaire items [39]. Finally, self-enhancement or social desirability effect, the tendency to overestimate one’s positive personality traits can also manifest itself as a general factor.

Results

All statistical analyses were completed in R [46] and umx [47]. We first tested the hypothesis that binding and individualizing factors are sufficient to explain the variance in the MFQ (2-factor model). This resulted in unsatisfactory fit, χ2 (404) = 5148.24, p < 0.001; CFI = 0.699; TLI = 0.676; RMSEA = 0.076. Next, we tested whether a five-factor model would offer better fit to the data. This also resulted in unsatisfactory fit (χ2 (395) = 4752.32, p < 0.001; CFI = 0.724; TLI = 0.696; RMSEA = 0.074). Then, we tested whether adding two superfactors to the five-factor model (the hierarchical model) would improve the fit. This was not the case (χ2 (394) = 4817.19, p < 0.001; CFI = 0.719; TLI = 0.69; RMSEA = 0.074). Finally, we tested Multitrait-Multimethod model. The fit of this model was better than the fit of three other models, however the fit statistics (with the exception of RMSEA) were still below the acceptable level: χ2 (365) = 2903.63, p < 0.001; CFI = 0.839; TLI = 0.808; RMSEA = 0.058. Although none of the models achieved acceptable fit, the Multitrait-Multimethod model offered the best fit to the data (see Table 2 for the model comparisons).
Table 2

Study 1 comparative model fits, reported in order of most complex to least complex.

ModelEPΔ -2LLΔ dfpAICCompare with Model
1. Multitrait-multimethod 130 65049.98
2. Hierarchical model1011913.5629< .00166905.54Multitrait-multimethod
3. 5-factor model1001848.6930< .00166838.67Multitrait-multimethod
4. 2-factor model912244.6139< .00167216.59Multitrait-multimethod

AIC = Akaike information criteria. Low AIC values indicate better fit. Best fitting model is printed in bold.

AIC = Akaike information criteria. Low AIC values indicate better fit. Best fitting model is printed in bold. After confirming that existing proposed models did not fit adequately, and that a Multitrait-Multimethod model fit better than the alternatives, we proceeded to attempt to develop a better-fitting model based on this foundation. The fit statistics accompanying each step we took to improve the Multitrait-Multimethod model fit in the training dataset are shown in Table 3.
Table 3

Model fit comparisons for the training dataset in Study 1.

ModelEPCFITLIRMSEAAICCompare with Model
M1. Multitrait-Multimethod model130.843.813.05732757.7Model 5
M2. M1 + Binding & individualising161.901.871.04832337.7Model 5
M3. M2 + Two new foundations172.918.890.04432212.9Model 5
M4. M3 + One general factor202.950.926.03631989.9Model 5
M5. M4 + Second general factors 232 .963 .939 .033 31918.2

AIC = Akaike information criteria; Best fitting model is model 5, printed in bold.

AIC = Akaike information criteria; Best fitting model is model 5, printed in bold. The first change made was to include factors representing binding and individualising foundations at the item level (Model 2). This led to a significant improvement in fit of the Multitrait-Multimethod model (χ2(334) = 1111.43, p < 0.001; CFI = 0.901; TLI = 0.871; RMSEA = 0.048). Next, we examined the residuals (unmodeled covariance among items) in this model. This indicated that two groups of items had covariance not accounted for by their membership of a foundation nor by broader binding or individualising factors, suggesting a need for two additional factors. To model these groupings, two additional factors were added. The first split the items in the sanctity/purity foundation, dividing these among a factor loading on two sanctity/purity items ("Whether or not someone acted in a way that God would approve of" and "Chastity is an important and valuable virtue"), and one authority item "Men and women each have different roles to play in society". We assigned the name “sanctity” to this factor. The four remaining items from the original sanctity/purity foundation involve attitudes towards avoiding disgusting or unnatural things so we named this factor “purity”. The second factor split off two loyalty items ("Whether or not someone’s action showed love for his or her country" and "I am proud of my country’s history") and one authority item ("If I were a soldier and disagreed with my commanding officer’s orders, I would obey anyway because that is my duty"). We termed this factor “loyalty to country”. Adding these two additional foundations (Model 3) significantly improved fit relative to Model 2 (χ2(323) = 964.64, p < 0.001; CFI = 0.918; TLI = 0.89; RMSEA = 0.044). The fit of this model, while improved, still, however, fell below modern standards. Next, we explored the effects of adding a general factor to the model with paths to items allowed to load positively or negatively on this factor (see Model 4 in Table 3). Adding this unconstrained general factor improved fit compared to Model 3 (χ2(293) = 681.65, p < 0.001; CFI = 0.95; TLI = 0.926; RMSEA = 0.036). The general factor loaded positively on fairness and harm items and negatively on items related to authority and on purity. Example fairness loadings included “I think it’s morally wrong that rich children inherit a lot of money while poor children inherit nothing” (ß = .35), “Whether or not some people were treated differently than others” (ß = .29;). The harm item “Whether or not someone cared for someone weak or vulnerable” loaded ß = .27. Negative loadings on authority items included “Respect for authority is something all children need to learn” (ß = -.54), “Men and women each have different roles to play in society” (ß = -.53) and “I am proud of my country’s history” (ß = -.53). Loadings on purity items were smaller but all negative. From these loadings, we concluded that this factor measured a domain running from liberal egalitarianism to conservatism. This would be consistent with the characterisation within moral foundations theory of the difference in liberal and conservative orientation. Alternatively, this factor may reflect social dominance [48] or social desirability bias. However, we could not test this with the present data. Next, we investigated whether modifying Model 4 by adding a second factor, in this case constrained to load positively on all items, would improve fit. Given all items are scored in the same direction, this factor modelled overall greater or lower levels of moral concern/orientation, possibly representing acquiescence bias. We added this additional general/acquiescence factor to Model 4, with all paths from this factor to the items constrained to be positive (Model 5). This change improved fit compared to the Model 4, and yielded the first well-fitting model by modern standards (χ2(263) = 552, p < 0.001; CFI = 0.963; TLI = 0.939; RMSEA = 0.033). Table 3 shows the comparison statistics for models 1 through 5. Having generated a well-fitting model in the training dataset, we next tested replication of this model in the holdout dataset (N = 1,019) to test if the model is reliable, and not over-fitted. For additional confirmation we also tested each of the intermediate models generated in study 1 verifying that the improvements observed across each change also replicated in this independent sample, as well replicating as the final model. The results of this replication in the hold-out sample closely replicated those from the test dataset. As can be seen in Table 4, each modification which improved model fit in the training set also improved model fit in the holdout dataset. Importantly, the final model (seven foundations model with two general factors that also accounts for two measurement methods as well as for binding and individualising domains)—fit well in the independent hold-out data. This strongly confirms the new model and raises confidence that the changes made in seeking a well-fitting model were not simply over-fitting noise in the initial dataset, but identifying important factors in the structure of moral foundations.
Table 4

Model fit comparisons for the holdout dataset replication in Study 1.

ModelEPCFITLIRMSEAAIC
M1. Multitrait-Multimethod model130.828.795.06132416.76
M2. M1 + Binding and individualising161.883.847.05332004.8
M3. M2 + Two new foundations172.901.866.04931873.44
M4. M3 + One general factor202.936.904.04231621.77
M5. M3 + Second general factors 232 .956 .927 .036 31489.21

AIC = Akaike information criteria; Best fitting model is model 5, printed in bold.

AIC = Akaike information criteria; Best fitting model is model 5, printed in bold.

The final model

Having shown that the model developed in the original dataset replicated in the hold-out sample, we reproduced this model in the combined discovery and holdout datasets for maximum precision of estimation of the effect sizes in the model. This model fit well (χ2(263) = 829.19, p < 0.001; CFI = 0.964; TLI = 0.941; RMSEA = 0.032.). We present this model graphically to make clear the findings of study 1. For clarity the model is presented in two parts. Fig 4 shows the 7 foundations and two general factors identified in our final model. Fig 5 shows the measurement part of the model which includes group factors. Full details of the model are tabulated in the OSF site for this paper.
Fig 4

Study 1 best-fitting model showing only the seven moral foundations and two general-factors (binding/individualizing and method variance paths in Fig 5 for clarity).

Fig 5

Study 1 best-fitting model showing only the binding/individualizing and method variance paths (seven moral foundations and two general-factors in Fig 4 for clarity).

Discussion

Study 1 yielded three important findings. First, replicating previous studies (e.g. [3, 4, 33]), simpler models did not fit well: neither hierarchical model, nor two or five factor models fit was acceptable (though a five-factor model’s fit was better compared to other models in this simple class). Building upon these findings, we were able to generate a well-fitting model by utilising Multitrait-Multimethod approaches, adding group and general factors and two additional moral foundations. The final model was considerably improved compared to previous analyses and achieved acceptable fit for both RMSEA and CFI metrics. Second, this well-fitting model replicated in a holdout data set. In studies 2–5 we will test replication in more depth, but this initial replication suggests that the model uncovered a reliable basic structure, though, this will require validation in a range of samples and cultures. The third finding is perhaps most important and highlights theoretical implications for understanding moral foundations. The harm/care and fairness/reciprocity foundations reproduced with perfect fidelity: that is for each of these foundations, all 6 items loaded on a single factor in the 7-factor model supporting the MFT. By contrast, the well-fitting model draws firm distinctions between sanctity and purity and between loyalty to country and loyalty to what we termed clan (combining family and community). This seven (rather than five) foundation model broke-out items from the sanctity/purity, authority/respect, and ingroup/loyalty foundations to form independent sanctity and purity foundations, and independent foundations of loyalty to country and loyalty to clan. These changes also altered the nature of the authority foundation, leaving it more obviously aligned around hierarchy. The differences between 5-factor and our 7-factor models are depicted in Fig 6. We return to discuss these changes in more detail after testing replicability of the model in studies 2 through 4 below.
Fig 6

Movement of items from the five MFQ foundations to the well-fitting 7 factor model.

We also found that binding and individualizing factors were required, reflecting the covariation of item-responses driven by these two larger groupings. This supports a prediction of MFT and, again, we discuss this in more detail in the final discussion. Finally, the model also required two general factors. One factor distinguishing the “tilt” or correlated changes in multiple foundations as one shifts along the liberal to conservative of left to right spectrum. This factor accounts for the otherwise inexplicable patterning of group and individual moral sensitivity along this dimension. Finally, a second general/acquiescence factor capturing tendency to overall higher or lower moral concern across the board, highlighting the need to include the dimension of moral vs. amoral in theorising on the moral foundations. These findings are discussed in more detail in the general discussion after we report additional replications. The results in the holdout dataset closely replicated training dataset modelling: each step we took to improve the model in the training dataset also led to the improvement in the holdout dataset. The fit metrics in the holdout dataset, were comparable to those in the training dataset. The model thus met our first criteria for replicability: comparable fit in an independent holdout dataset. In studies 2 through 5, we test whether our best fitting model replicates in four large independent data samples.

Study 2

In each of studies 2–5 we used a new independent sample, testing the fit of the eight models tested or developed in study 1, including the critical final, well-fitting model from study 1:

Participants, measures and procedure

We used data from the 7,130 participants (2,815 females, 4,315 males; age M = 37.55, SD = 14.53) who participated in the Graham et al. study 3 [13]. Participants in this sample were adults, mostly from Western countries, who filled in the MFQ-30 questionnaire at www.yourmorals.org. We constructed the models from Study 1 in the new data for Study 3 and examined their fit, with no additional modifications made. The best-fitting model from Study 1 was also best-fitting model in this dataset. See Table 5 for the model comparisons and fit statistics. The fit of the Model 8 (our best model in the Study 1) was satisfactory according to all three fit-metrics in this sample. Fig 7 shows the parameter estimations of the model’s structural part in this dataset.
Table 5

Model fit comparisons for the replication dataset in Study 2.

ModelEPCFITLIRMSEAAICCompare with Model
M1. 2-Factor model91.750.731.078252833.38Model 8
M2. Hierarchical model101.809.789.070248690.42Model 8
M3. 5-Factor model100.813.794.069248393.99Model 8
M4. M3 + Method factors130.877.854.058243862.97Model 8
M5. M4 + Binding and individualising161.913.886.051241367.97Model 8
M6. M5 + Two new foundations172.931.906.046240119.36Model 8
M7. M6 + One general factor202.955.934.039238379.91Model 8
M8. M6 + Second general factor 232 .971 .951 .033 237332.63

AIC = Akaike information criteria; Best fitting model is printed in bold.

Fig 7

Structural part of the final model in Study 2.

AIC = Akaike information criteria; Best fitting model is printed in bold.

Study 3

The age and sex composition in this sample is similar to that of Study 2. However, all participants in the Study 3 were either US citizens or US permanent residents whereas participation in the Study 2 was open to everyone around the world. Even though most participants in Study 2 were still from the Western countries, US is markedly different in some personality traits that may be relevant to the moral judgment (e.g. higher religiosity, individualism [49]). It is therefore interesting to investigate whether the sample restriction to one country will affect the fit of the models developed in the Study 1. We used data from the 1,052 participants (566 females, 466 males; age M = 39.66, SD = 12.46) who participated in the Smith et al. study [25]. The data were collected on the MTurk platform. To increase reliability of responses, only participants who had at least 99% Human Intelligence Task approval rate on the MTurk platform could participate. We constructed the models from Study 1 in the new data for Study 3 and examined their fit, with no additional modifications made. The best-fitting model from the Study 1 was also best-fitting model in this dataset. See Table 6 for the model comparisons and fit statistics. The fit of the best-fitting model, model 8, was satisfactory according to RMSEA and CFI metrics, but not TLI. Fig 8 shows the parameter estimations of the model’s structural part in this dataset.
Table 6

Model fit comparisons for the replication dataset in the Study 3.

ModelEPCFITLIRMSEAAICCompare with Model
M1. 2-Factor model91.746.727.08244888.66Model 8
M2. Hierarchical model101.812.793.07243950.13Model 8
M3. 5-Factor model100.816.798.07143887.89Model 8
M4. M3 + Method factors130.875.851.06143063.85Model 8
M5. M4 + Binding and individualising161.910.882.05442603.66Model 8
M6. M5 + Two additional foundations172.929.904.04942358.87Model 8
M7. M6 + One general factor202.952.929.04242017.3Model 8
M8. M6 + Second general factor232.964.941.03841917.68

AIC = Akaike information criteria; Best fitting model is printed in bold.

Fig 8

Structural part of the final model in Study 3.

AIC = Akaike information criteria; Best fitting model is printed in bold.

Study 4

The sample used in this study is comparable to that used in Study 3. Data were collected on the MTurk platform and all participants were from the US. Both studies also have similar age and sex composition. However, the data used in this study were collected in June-July 2018, whereas the data used in study 3 were collected in October 2014. This spread in time of almost 4 years spans some significant US events, including a contentious presidential election which potentially impact perception and responding to questions about moral issues. This sample, then, provides a useful test of the resilience of the new model to such changes. We used data from the 591 participants (267 females, 284 males; age M = 39.48, SD = 10.62) who participated in the O’Grady et al., Study 2 [42]. Participants were hired on the MTurk platform. To increase reliability of responses, only participants meeting the “Master” or expert qualification on the MTurk platform suggesting that they are reliable and experienced workers were allowed to participate. We constructed the models from Study 1 in the new data for Study 3 and examined their fit, with no additional modifications made. The best-fitting model from the Study 1 was also best-fitting model in all four datasets in the Study 4. See Table 7 for the model comparisons and fit statistics. The fit of the best-fitting model, model 8, was satisfactory according to RMSEA and CFI metrics, but not TLI. Fig 9 shows the parameter estimations of the model’s structural part in this dataset.
Table 7

Model fit comparisons for the replication dataset in the Study 4.

ModelEPCFITLIRMSEAAICCompare with Model
M1. 2-Factor model91.698.674.10319433.52Model 8
M2. Hierarchical model101.793.771.08718692.51Model 8
M3. 5-Factor model100.801.781.08518629.21Model 8
M4. M3 + Method factors130.871.846.07118105.93Model 8
M5. M4 + Binding and individualising161.912.886.06117808.67Model 8
M6. M5 + Two additional foundations172.932.908.05517667Model 8
M7. M6 + One general factor202.964.946.04217445.42Model 8
M8. M6 + Second general factor 232 .969 .948 .041 17436.24

AIC = Akaike information criteria; Best fitting model is printed in bold.

Fig 9

Structural part of the final model in Study 4.

AIC = Akaike information criteria; Best fitting model is printed in bold.

Study 5

The sample used in study five differs in two regards from those used in studies 1–4. First, it is a non-Western sample (from China). Second, participants in this sample were significantly younger than participants in the other samples used in this paper. While relatively small, this sample, then, offers an opportunity to investigate whether our model developed in a Western sample fits well in a sample that differs in both age and culture. We used data from the 452 participants (355 females, 97 males; age M = 19.70, SD = 1.34) who participated in the Wang et al. study [43]. Participants were students from two Chinese universities. Moral foundations questionnaire used in this study was translated to the Chinese. We constructed the models from Study 1 in the new data for Study 3 and examined their fit, with no additional modifications made. The best-fitting model from the Study 1 was also best-fitting model in all of the four datasets in the Study 5. See Table 8 for the model comparisons and fit statistics. The fit of the best-fitting model, model 8, was satisfactory according to RMSEA, but not CFI and TLI metrics. Fig 10 shows the parameter estimations of the model’s structural part in this dataset.
Table 8

Model fit comparisons for the replication dataset in the Study 5.

ModelEPCFITLIRMSEAAICCompare with Model
M1. 2-Factor model91.585.554.10514639.74Model 8
M2. Hierarchical model101.599.558.10414582.19Model 8
M3. 5-Factor model100.607.567.10314546.8Model 8
M4. M3 + Method factors130.794.754.07813676.04Model 8
M5. M4 + Binding and individualising161.835.785.07313511.43Model 8
M6. M5 + Two additional foundations172.871.826.06513348.56Model 8
M7. M6 + One general factor202.911.868.05713183.86Model 8
M8. M6 + Second general factor232.932.887.05313115.09

AIC = Akaike information criteria; Best fitting model is printed in bold.

Fig 10

Structural part of the final model in Study 5.

AIC = Akaike information criteria; Best fitting model is printed in bold.

General discussion

The aim of the paper was to construct a well-fitting model of the MFQ, thus identifying accurately the structure of moral foundations, building and replicating the model over five studies. This final model preserved the fairness/reciprocity and harm/care foundations intact. However, the binding foundations divided into five, rather than three foundations. Purity/sanctity split into independent foundations of purity and sanctity; Loyalty/group divided into independent factors of loyalty to clan and loyalty to country. Finally, Authority/respect was re-focussed on hierarchy, losing one item to the new sanctity foundation and another into loyalty to country. In addition to these 7 foundations, higher-level factors of binding and individualizing were supported, along with a general/acquiescence factor. Finally, a “moral tilt” factor corresponding to coordinated left-leaning vs. right-leaning moral patterns was supported. This new model of the Moral Foundations Questionnaire has several implications for moral foundations theory and suggests additional directions for research. Each of these is discussed below.

New moral foundations

In our modelling, two additional moral foundations, sanctity and loyalty to country were added, splitting off items from the sanctity/purity, authority/respect, and ingroup/loyalty foundations. To use Durkheim’s [11] insight, sanctity focussed on “sacred things… things set apart and forbidden”. This factor involved acting in ways that would be approved by God, including being chaste and performing roles assigned to us in society. This combining of attitudes towards religion with restrictive reproductive morals has been reported in practice reliably and across cultures [50]. By contrast, purity loaded on items involving moral support for the avoidance of disgusting and unnatural things. This division of sanctity and purity is clearly recognised in the hyphenated name of the original, and our model formalises this division, allowing for distinct sensitivity to avoidance of disgusting or unnatural things: the "guardian of the body” [2] versus sanctification of “things set apart and forbidden” [11]. Individuals can independently respond weakly or strongly to these two distinct moral patterns, e.g. reacting strongly “to elicitors that are biologically or culturally linked to disease transmission" [2] while being less reactive to ritual and the sanctification of social structure. It will be of value in future studies to identify how this distinction sheds light on external phenomena such as complex multi-dimensional models of religiosity, in which elements of avoidance of vice and pursuit of virtue linked to sanctity might more tightly link to religiosity and emphases on purity may relate to distinctions among religions. Additional research might model how the distinction made here between sanctity and purity maps onto external measures such as complex multi-dimensional indices of religiosity [51]. The second novel foundation, which we identified as “loyalty to country”, absorbed items from the ingroup/loyalty and authority/respect foundations. It distinguished people who are proud of their country, showing love and pride in their nation’s history as well as duty to country from those who are not, including not picking and choosing which orders from leaders they would follow, but obeying out of a sense of duty to country despite disagreement with a specific policy. Patriotism or nationalism has previously been identified as a unique and important foundation by Haidt [10], contrasting this with globalism to create a dimension of caring for the people in one’s own country more than those of other countries, versus treating all people as identical in terms of their moral call upon us. The “loyalty to clan” foundation emerged from what remained of the original ingroup/loyalty foundation. Having lost one item to the new sanctity foundation and another migrating to loyalty-to-country this foundation was refocussed almost exclusively around loyalty to family and immediate group. Such preference for kin is predicted from kin selection theory [52] and reciprocal altruism [53]. By contrast, in modern diverse and large-scale societies loyalty to country implies altruism towards non-kin, a different and evolutionary novel mechanism. Future research should test whether the distinction between loyalty to clan and loyalty co country emerges in small monoethnic countries. Finally, the foundation we termed Hierarchy emerged from the original authority/respect foundation, refocussed tightly on respect for authority and tradition and preference for order over disorder and chaos. This preference for order and obedience to hierarchy was thus rendered distinct from loyalty per-se, which was now moved to separate foundations. This distinction of loyalty to one’s group and obedience to hierarchy, has a distinguished history in theory. In his work on administrative behavior, Simon [54] identified loyalty and obedience as the two necessary conditions for the existence of organisations, defining loyalty as the capacity to introject organizational objectives in place of one’s own aims and obedience as choosing to make one’s default response be to follow requests of a superior: a definition which corresponds closely to notions of respecting the wishes of those in authority. The new model formalises these distinct aspects of moral concern which are merged in the 5-domain model of the MFQ, distinguishing concern for one’s countrymen (loyalty to country) concern for family and kin, and concern for the organization and obedience (authority). A future theory of moral judgment will need to propose and test hypotheses about the selective pressures on loyalty to country vs loyalty to kin and purity vs sanctity factors. In addition, constructing a modified measure of moral foundations may require generating new items to cover the larger number of domains.

Group factors

An early change which improved model fit greatly was the inclusion of group factors representing binding and individualising foundations. Rather than being implemented hierarchically, as has been done previously, these group factors were implemented as a bi-factor structure, at the item level. The extra degrees of freedom provided by a bi-factor model relative to a comparable hierarchical model can, in exploratory cases, be led to model misspecification by modelling sample-specific variance. It is recommended, therefore, that bi-factor models should be validated in new data sets [55]. We did this, testing replication of the model in four independent datasets. Two points are worth discussion regarding these group factors. First, the group factors strongly confirm a predicted component of the moral foundation theory. They highlight the distinct role played in moral perception by the units of the individual, and of the group, organizing moral concerns separately around people and groups, and preserving these levels of value across related foundations. This highlights the second note-worthy aspect of modelling, namely that these group factors fit best when implemented as impacting the items directly rather than working via higher-level latent constructs. This suggests that much as specialised aspects of visual world are processed by regions specialised for colour or motion, concern for the individual and concern for the group may themselves be processed as distinct mechanisms in the “moral mind”, recognising tagging and recruiting behavior across domains. These factors warrant additional study.

General factors

Adding an unconstrained general factor to the model significantly improved fit. This factor instantiates a moral dimension linked to views across all the moral domains, organized in a mono-dimensional manner. This factor loaded positively on fairness and harm items–especially items emphasizing compassion and equality–and negatively on items related to authority and on purity. Higher scorers were both more somewhat more likely to respond positively to items such as “I think it’s morally wrong that rich children inherit a lot of money while poor children inherit nothing” and much less likely to agree that “Men and women have different roles to play in society” or to be “proud of my country’s history”. This pattern of loadings corresponds to a dimension of liberal-conservative or left-wing/right-wing views. As such, it can explain the difference in moral foundations identified by Haidt and Graham [2], moving from liberal egalitarianism to conservatism, including the subtle patterning of this move. Future study could usefully focus on understanding this factor, including relating it to other constructs, for instance one or other component of the social dominance construct [48], which was previously offered as an explanation of liberal-conservative differences [56]. Such an alignment could also arise from several causes, for instance philosophical differences, or unmeasured factors such as socio-economic self-interest factors which might align views and interests across multiple domains. Additional studies will be required to investigate the basis of this factor. The second general factor was constrained to load positively on all items. As such it functions as a general morality factor, loading on all moral attitudes and varying coherently from the low degree of concern for any of harm/care, fairness/reciprocity, ingroup/loyalty, authority/respect, purity/sanctity to high levels of regard for all these foundations. One candidate for explaining such a factor with low concern for morality at the low-end would be psychopathy [57]. This dimension maps onto reckless (unconscientious, now-focussed, goal-lessness) and disagreeable (angered if not getting one’s way) personality and may reflect effects of personality. Making these possibilities testable, for instance by using measures of psychopathy (e.g. [58]) is a benefit becomes straightforward given the new model. While such an interpretation of this factor as representing a genuine “general factor” of moral judgment is possible, other explanations remain plausible, in particular factors related to modelling bias and acquiescence, and we turn too these important but more technical matters in the next section.

Modeling implications

There are at least two reasons why good fit of the model is important. From a theoretical standpoint, good fit strengthens confidence that the elements of the model represent true relationships. From a practical perspective, well-fitting model can be expected to provide better predictions, just as a better-focused lens gives clearer vision. The MFQ incorporates two methods– relevance and judgment–to measure each foundation. Surprisingly, these are rarely modelled explicitly in the moral foundations’ literature and our modelling clearly shows that treating these item-types as distinct and modelling this method variance significantly increased the fit of the model. This increase in fit is in-line with Curry et al. [24] who also found that modeling measurement effects improved fit. Aspects of model design also impact improvements in fit when modeling the group factors of binding and individualising. These clusters were proposed by the MFT creators [13], but previous attempts to incorporate group-factors focussed on hierarchical implementations, where variance from the group factors must pass through the 5-factor structure. These, as covered in the introduction, failed. By contrast, we modelled these group-factors at the item level and this improved model fit significantly. Similar improvements from bi-factor modeling have been reported for cognitive ability models [59]. This methodological move to a bi-factor implementation also suggests something about the mechanism of the binding and individualising factors: namely that the variance they capture does not work via the foundational domains but rather coordinates behavior directly. The second general factor was constrained to load positively on all items. As noted above, this may represent a substantive general “morality” factor. Researchers are increasingly aware of the effects of acquiescence bias–the tendency of respondents to agree (or disagree) with all questionnaire items, regardless of their content. Acquiescence bias could be eliminated from the questionnaire by including reverse-coding items but the MFQ does not contain such items. A useful research project would be to test this using new high-performance reverse coded items or measuring acquiescence and testing if the general factor is related to this. We must also suspect social desirability [40] as an influence on any measure linked to socially-evaluated outcomes. Social desirability emerges when subjects over-report desirable traits due either to a cognitive bias or conscious impression management [60]. Future studies should investigate this possibility, for instance by including a self-perception bias measure (e.g. [61]) and testing evidence for a relationship between the general factor of morality and bias tagged in this way. Constructing a better measure of moral judgment may also require rephrasing of some items and perhaps creating new ones to remove or at least minimise the effects of method variance and possible bias [62]. Another important opportunity for future research is external validation of the seven moral judgment factors. Here we used multiple datasets to establish that seven factors emerge even in demographically diverse samples. Future research should complement this by testing the ability of the new model to predict relevant outcomes such as social and political attitudes, in particular, political affiliation [13, 25], willingness to donate to charitable causes [63], attitudes towards religion [20] and one’s country patriotic symbols [19] with the prediction that scores from this 7-factor model will explain such outcomes better than five-factor scores can. Additional opportunities for development posed by the better-fitting model include the opportunity to generate additional items so that each of the facets of the seven-foundation model are equally well represented.

Limitations

It is important to note that while our final model was the best-fitting model in all samples, the precise path estimations differ between the samples. Such variation, however, is expected by chance. A strength of the paper is its use of multiple samples and not all from the same culture though, given that we used only one non-Western sample, this requires validation. In study 2–5, we extended the findings of study 1 by testing our model in four independent replication datasets including one non-Western sample. In all of them, our model performed better than any other model previously reported. In the first and largest replication dataset (Study 2), all three fit metrics we used (TLI, CFI and RMSEA) were satisfactory. In the second and third datasets (Studies 3 and 4) only RMSEA and CFI were satisfactory whereas in the fourth dataset (Study 5) only RMSEA was acceptable, which can be explained by the non-Western nature of the sample. Lower fit in the non-Western sample may result from cultural differences, alternatively, some concepts or phrasings in the MFQ may not have exact counterparts in all languages. This sample also had higher proportion of females (78%) compared to other samples. It also showed the largest model fit improvement by modelling two method factors. To distinguish between these possibilities, future studies may investigate whether responses to the English and foreign language versions of MFQ differ in bilingual samples from non-Western populations. It is also worth mentioning that in Study 2 we used data provided by participants who voluntarily visited a moral psychology website and, presumably, were highly motivated. Similarly, in Study 3–4 we used data provided only by MTurk participants with high approval rate. This may have affected how representative of general population these samples are. Notably, however, in all four replication samples the rank order of the models tested was the same, including the non-Western one. This suggests that our model is robust to replication across sample characteristics.

Summary

The five studies reported here involved five important changes in modeling of the moral foundations questionnaire and yielded a substantially improved, well-fitting and substantively distinct model of moral foundations. Of theoretical relevance, at the foundational level, two additional foundations were needed. The first formally implemented the distinction between sanctity and purity domains of purity/sanctity foundation recognised in Durkheim’s “sacred things” [11] and what Haidt and Graham [2] described as "guardian of the body". The second involved distinguishing a foundation of loyalty to country in addition to foundations of loyalty to clan consisting of four ingroup items. This, we suggested maps onto important theorised roles of a moral dimension currently best displayed in support for the nation versus globalisation [10]. The distinct foundations of loyalty and respect with their critical functions in allowing people to operate within organizations [54] were retained, but no longer overloaded with national-level moral concern. A third theoretically important change, the necessity of formal “individualizing” and “binding” factors were required, reflecting strong association among the foundations linked to compassion for the individual, and among domains concerning hierarchy, social norms, and survival of the group. The model also required a general factor implementing loading with opposite sign on the binding and individualizing foundations. This theoretical novelty account for the otherwise inexplicable left-right dimension which “tilts” the moral foundations in a coordinated fashion [13]. Finally, a general/acquiescence morality factor loading positively on all domains was required for a good fit. A well-fitting model of personality should reveal factors which are important both theoretically and which, more closely reflecting the causal structure, should better-predict relevant outcomes compared to existing models. The MFQ can be scored using the 7-factor model presented here, together with group factors and general effects to test its predictive validity in past and future studies in comparison to the classic two-cluster and five-foundation scoring systems. The new model may increase the variance MFT can account for in domains such as politics [13, 17], differences among religions in emphasis on ritual versus purity [20], studies of organizational behavior requiring institutional loyalty and respect [54], new work on patriotism and links to constructs such as globalization and the need for social or trait accounts for this domain, as well as studies linking generally low scores on the MFQ to research on psychopathy. 22 Jul 2021 PONE-D-21-17063 Remapping the foundations of morality: Well-fitting structural model of the Moral Foundations Questionnaire. PLOS ONE Dear Dr. Zakharin, Thank you for submitting your manuscript to PLOS ONE. After careful consideration, we feel that it has merit but does not fully meet PLOS ONE’s publication criteria as it currently stands. Therefore, we invite you to submit a revised version of the manuscript that addresses the points raised during the review process. Please submit your revised manuscript by Sep 05 2021 11:59PM. If you will need more time than this to complete your revisions, please reply to this message or contact the journal office at plosone@plos.org. When you're ready to submit your revision, log on to https://www.editorialmanager.com/pone/ and select the 'Submissions Needing Revision' folder to locate your manuscript file. Please include the following items when submitting your revised manuscript: A rebuttal letter that responds to each point raised by the academic editor and reviewer(s). You should upload this letter as a separate file labeled 'Response to Reviewers'. A marked-up copy of your manuscript that highlights changes made to the original version. You should upload this as a separate file labeled 'Revised Manuscript with Track Changes'. An unmarked version of your revised paper without tracked changes. You should upload this as a separate file labeled 'Manuscript' If you would like to make changes to your financial disclosure, please include your updated statement in your cover letter. Guidelines for resubmitting your figure files are available below the reviewer comments at the end of this letter. If applicable, we recommend that you deposit your laboratory protocols in protocols.io to enhance the reproducibility of your results. Protocols.io assigns your protocol its own identifier (DOI) so that it can be cited independently in the future. For instructions see: http://journals.plos.org/plosone/s/submission-guidelines#loc-laboratory-protocols. Additionally, PLOS ONE offers an option for publishing peer-reviewed Lab Protocol articles, which describe protocols hosted on protocols.io. Read more information on sharing protocols at https://plos.org/protocols?utm_medium=editorial-email&utm_source=authorletters&utm_campaign=protocols. We look forward to receiving your revised manuscript. Kind regards, Peter Karl Jonason Academic Editor PLOS ONE Journal Requirements: When submitting your revision, we need you to address these additional requirements. 1. Please ensure that your manuscript meets PLOS ONE's style requirements, including those for file naming. The PLOS ONE style templates can be found at https://journals.plos.org/plosone/s/file?id=wjVg/PLOSOne_formatting_sample_main_body.pdf and https://journals.plos.org/plosone/s/file?id=ba62/PLOSOne_formatting_sample_title_authors_affiliations.pdf [Note: HTML markup is below. Please do not edit.] Reviewers' comments: Reviewer's Responses to Questions Comments to the Author 1. Is the manuscript technically sound, and do the data support the conclusions? The manuscript must describe a technically sound piece of scientific research with data that supports the conclusions. Experiments must have been conducted rigorously, with appropriate controls, replication, and sample sizes. The conclusions must be drawn appropriately based on the data presented. Reviewer #1: Yes Reviewer #2: Yes ********** 2. Has the statistical analysis been performed appropriately and rigorously? Reviewer #1: Yes Reviewer #2: Yes ********** 3. Have the authors made all data underlying the findings in their manuscript fully available? The PLOS Data policy requires authors to make all data underlying the findings described in their manuscript fully available without restriction, with rare exception (please refer to the Data Availability Statement in the manuscript PDF file). The data should be provided as part of the manuscript or its supporting information, or deposited to a public repository. For example, in addition to summary statistics, the data points behind means, medians and variance measures should be available. If there are restrictions on publicly sharing data—e.g. participant privacy or use of data from a third party—those must be specified. Reviewer #1: No Reviewer #2: Yes ********** 4. Is the manuscript presented in an intelligible fashion and written in standard English? PLOS ONE does not copyedit accepted manuscripts, so the language in submitted articles must be clear, correct, and unambiguous. Any typographical or grammatical errors should be corrected at revision, so please note any specific errors here. Reviewer #1: Yes Reviewer #2: Yes ********** 5. Review Comments to the Author Please use the space provided to explain your answers to the questions above. You may also include additional comments for the author, including concerns about dual publication, research ethics, or publication ethics. (Please upload your review as an attachment if it exceeds 20,000 characters) Reviewer #1: Review of “Remapping the foundations of morality”. After review, I think the paper requires revision. Plos offers minor or major revision, I don’t know what those mean, as an editor I stick with accept, revise or reject. The strength of the paper is its empirical analyses, but it requires substantial re-framing and engagement with the literature. So, while I’m supportive of revision for certain, there’s some work to be done. I think it is appropriate to disclose my identity given I’ve contributed directly to this area, so as to be transparent for any potential COI (Pete Hatemi). When Kevin Smith and I began looking into MFT, we found it both elegant and interesting, and started with the idea to provide evidence for genetic influences and one causal path for genetic influences on political attitudes. The data came out the other side; almost none of what Haidt and co proposed about the measures appeared to be empirically supported. And here is where the current paper seems really out of touch with the literature, almost treating the question of the MFQ validity as a open question, when in fact, in dozens of studies, it has been shown to be invalid. So it is not true to say that “the predictive validity of the MFQ has been generally supported, especially in the domain of politics”. Rather just the opposite. What has been supported are correlations between the two, that’s about it. So, let’s go back to the beginning. Kevin and I were perhaps among the first wave of people to find that the MFQ does not reliably factor into 5 dimensions in 10 populations (Smith et al- which btw your paper cites incorrectly), but at best 2, the structure differed by country (US and AU). This paper was not a critique and in fact we had no interest in finding a different factor structure. I suggest a reread of it. Then came Iurino and Saucier and many others who hit this more definitely (see Harper and Rhodes and Davis et al who find MFT is not valid across non US/white populations, and just the other day Hadarics Márton’s paper, then of course, the heterogeneity in MFTs within and across ideologies identified by Frimer; then finding that measure is not stable across time (2 years= Smith et al), and then the findings that MFT appears to be caused by, rather than cause, the traits it is proposed to influence (for attitudes or political orientation see Hatemi Crabtree and Smith , Kivikangas et al, Ciuk, Everett et al., Strupp-Levitsky et al (Jost), and Márton and Kende among many others . Finally it’s not heritable (Smith et al.). In this, I think the paper needs to redo the front end. The above papers are not critiques, so it a bit disingenuous to frame them as such. Rather, most of them were studying a trait of interest, and in so doing found out the predictor variable (MFQ) was garbage. If the current paper wants to restructure the MFQ then it has to do two things. First it has to meaningful engage the some 20+ papers that can’t replicate the proposed factor structure. The comparison of fits in the paper seem highly selective. So, a more clear and thorough review of MFQ/MFT’s serious shortcomings is needed to properly situate the paper and compare your factor structure with the 2, 5, 7, and other outcomes. Simply ignoring the works above, or attempting to frame them as critiques on the side is not good science in my view. It is not that a handful of scholar’s challenge MFT. It is that the most serious empirical explorations of the MFQ, with large samples and good data, find no evidence to support most of MFT’s claims, regarding the measure. Doing this can be easily done. Read the papers, compare your approach and results, and place them in context. The second thing, will take less leg work but a bit more thought. What the current paper proposes is that Haidts MFT is simply wrong. One cannot simply just restructure the MFQ into different domains, without then updating the theory. A read of Haidt’s book here is critical –The MFQ is the proposed measure of Haidt MFT theory. It has specific logic, organizing principles evolutionary roots etc that link the domains to each other and the MFQ questions. Certainly one can simply data drill, as Kevin and I did in the Human Nature paper without any theory. In this third paper Hatemi and Smith, we ran more EFT’s on the MFQ just to satisfy a reviewer to see if we could make the thing heritable, but that was in an SI. Here the paper is centered around offering a different factor structure to the data that conflicts with the theory as written. What are you competing hypotheses? What parts of your findings invalidate Haidt’s theory? What part support it ? What parts of Haidt’s theory require modification based on your results, what part require abandonment? This is important because if one wanted to use this new formulation as a means to lets say, run different analyses, like behavior genetic ones for example, then this should be the paper to address those questions. Otherwise it is simply a data exercise – which is not a bad thing, but just limited in what it offers. So, yes, I’d like to see this paper in print. My suggestions Read the lit, reframe the paper by engaging it more appropriately, mainly it is not that there are critiques, but rather Haidt’s theory and measure simply don’t hold up in a lot of the data. It is well established that the factor structure certainly is not supported. In most every large and nationally rep study its doesn’t work- though it seems to work with Haidt and Grahams student and internet samples. So, the question is what is the actual measure doing , what is the ideal factor structure and if it’s not what the theory proposes what does this say about the theory? Based on your proposed structure- is it still moral foundations? One important and serious concern. You need permissions to use other peoples replication data for anything other than replication. Posting data for replication is for replication. Here you are using replication data for novel purposes. Since you used my data and I was never asked, I’m suspecting you did the same for others. It is an ethical question to take replication data in the manner you’re using it. Smart move here is to ask. It is very low cost , the price of an email and usually results in goodwill. If you dont, you risk your paper getting retracted. Not a smart play in my view. But risk is certainly a trait with individual differences and variation is genetically influenced at that. Feel free to take my comments print them out and use them for TP, or to improve the paper as you see fit. Hopefully they help. P Reviewer #2: This is one of the empirically strongest papers on the Moral Foundations Questionnaire, with strengths that the paper itself accurately touts. But there are some caveats and weaknesses of which the authors seem unaware. 1. The samples are – except for Study 5, the smallest sample – based on predominantly Western respondents (in fact, predominantly Anglo-American). Page 16 claims that study 1 identified a ‘reliable basic structure’ but this might be true only for certain populations. In general, the paper needs more caveats about potential inapplicability to non-Western populations. I don’t think we should be having a few Chinese university attendees representing (standing in for) the entire non-Western world as is done here. 2. With respect to the one non-Western sample, the paper fails to note important details (the sample was evidently almost 80% female) and there are some questions. What universities were the respondents from and how Westernized were they? Why was this particular sample (among all non-Western samples administered the MFQ) chosen, and did that involve cherry-picking the sample most likely to be supportive of the select model? 3. One could also wonder if there is some degree of cherry-picked subjects in studies 1-4. The description of the study 1 sample does not rule out that these are all moral-psychology enthusiasts who volunteered. The study 2 was highly motivated – found their way to a website on moral psychology. Study 3 and 4 samples were from MTurk, but might be called ‘cream of the crop’ MTurkers based on the selection criteria; that’s well and good unless it means these results are dependent on using ‘elite questionnaire-response professionals’ and are ungeneralizable to a typical population. What percentage of MTurk participants meet the criteria specified, and were any MTurk participants recruited but then eliminated (in order to make favorable results more likely)? (A general take-home message might be that the intended structure of the MFQ is fragile, hard to locate in your typical noisy data.) 4. The models employed are entered in a particular order that was apparently set a priori (is there a pre-registration to verify this?), but the question is raised whether, when all is said and done, one (or more) of them might be unnecessary. That is, perhaps by backward removal of the element adding least, a more parsimonious outcome might be reached. This is analogous to the forward versus backward methods in stepwise regression. Related questions: How does it make sense to enter a hierarchical model (which includes the five-factor model) before one enters the five-factor model, and what is the difference between ‘two-factor model’ and ‘binding and individualizing’ foundations in, for example, Table 8. 5. Following that same thread: The conclusion (though unstated) would seem to be that one can use the MFQ, but must really analyze it or make sense of its scores in a complicated and onerous manner. That is perhaps a problem for construct validity. An important area of future implications would be this: What do these results suggest about how to construct a better moral ‘foundations’ measure (e.g., one that would not be so profoundly affected by method variance, or by the pull of an underlying two-factor model and of political leanings and of acquiescence or general morality vs. amorality? In other words, how can the measurement of moral foundations be cleaned up in these regards? Smaller matters: -The middle paragraph on page 26 leads to considerable head-scratching. The first sentence is incomprehensible. As for the 2nd, why is social dominance picked out so prominently among all possible alternatives? What is the ‘alignment’ referred to there? It would seem that one interpretation of the results is that ‘tilt’ and binding-individualizing make independent contributions because one cannot reduce the latter entirely to tilt (as some treatments of the topic have implied in the past), there being both liberal and conservative ways of endorsing binding AND of endorsing individualizing morality. -In Table 8 it is noteworthy that moving from model 3 to model 4 yielded a huge improvement in fit, which was not the case in other samples. Does this say something about how Chinese respondents characteristically handle morality or this questionnaire? -Page 13 identifies a set of three MFQ items and labels them as sanctity, but it is hard to apply this interpretation to the third item mentioned (about roles for men and women). One could just as easily label this factor as ‘traditional gender roles/expectations’ (that also would fit 2 of the 3 items). -Page 13-14 separate out a patriotism factor from a loyalty factor, but it would seem more informative to label one of them as loyalty to country and the other as loyalty to family/team/group (i.e., to smaller-scope entities). In reference to same on page 24, it is implied that Herbert Simon differentiated patriotism from loyalty, but this seems unlikely given how Simon’s viewpoint is stated. Clarity needed. -Until the very last part of the paper, there is a tendency to label one latent variable -- created by constraining all items to load positively on it – as General Morality although an equally plausible interpretation is Acquiescence. Would be better to mention both possible interpretations from the beginning. -In Table 1, the “(modified)” notation for the best-fitting model for the Iurino and Saucier (2018) paper needs some explanation. -The tendency to overestimate one’s personality traits is (on p. 12) mischaracterized as halo effect. Halo effect is more commonly used to refer to overestimation of someone else’s positive traits. Doing it to yourself is self-enhancement or social desirability bias. -It seems that some text on pages 17-18 is redundant with what was said before in the paper, this needs a check. ********** 6. PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files. If you choose “no”, your identity will remain anonymous but your review may still be made public. Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy. Reviewer #1: No Reviewer #2: No [NOTE: If reviewer comments were submitted as an attachment file, they will be attached to this email and accessible via the submission site. Please log into your account, locate the manuscript record, and check for the action link "View Attachments". If this link does not appear, there are no attachment files.] While revising your submission, please upload your figure files to the Preflight Analysis and Conversion Engine (PACE) digital diagnostic tool, https://pacev2.apexcovantage.com/. PACE helps ensure that figures meet PLOS requirements. To use PACE, you must first register as a user. Registration is free. Then, login and navigate to the UPLOAD tab, where you will find detailed instructions on how to use the tool. If you encounter any issues or have any questions when using PACE, please email PLOS at figures@plos.org. Please note that Supporting Information files do not need this step. 5 Sep 2021 Comments from Reviewer 1 Comment: The strength of the paper is its empirical analyses, but it requires substantial re-framing and engagement with the literature. Response: We thank the reviewer for their positive evaluation of the empirical analyses and for their direction to additional citations in the literature. We have incorporated these below. Comment: And here is where the current paper seems really out of touch with the literature, almost treating the question of the MFQ validity as an open question, when in fact, in dozens of studies, it has been shown to be invalid. So it is not true to say that “the predictive validity of the MFQ has been generally supported, especially in the domain of politics”. Rather just the opposite. What has been supported are correlations between the two, that’s about it. Response: Thank you for this suggestion. We were happy to reword this sentence removing the implication of prediction. We now say “Since its development, the correlation of the MFQ with external measures has been widely studied, especially in the domain of politics.” (See lines 118-119 page 5) Comment: So, let’s go back to the beginning. Kevin and I were perhaps among the first wave of people to find that the MFQ does not reliably factor into 5 dimensions in 10 populations (Smith et al- which btw your paper cites incorrectly), but at best 2, the structure differed by country (US and AU). This paper was not a critique and in fact we had no interest in finding a different factor structure. I suggest a reread of it. Response: We apologise for not noting that Smith et al. (2017) and Hatemi et al. (2019) used multiple samples. Also, for using the Smith et al. (2017) reference to cover both Smith et al. (2017) and Hatemi et al. (2019) work. We have corrected the reference and now say (see line 136-140, pages 5-6) “Smith et al. and Hatemi et al. [25, 26], using multiple samples, reported that the MFQ did not reliably factor into 5 dimensions, but rather into 2, with structure differing between the US and Australia. They also found that moral foundations are not stable across time and that MFQ scores reflect rather than cause political attitudes. Smith et al. [26] also reported that the MFQ foundations show no evidence of heritability” Comment: Then came Iurino and Saucier and many others who hit this more definitely (see Harper and Rhodes and Davis et al who find MFT is not valid across non US/white populations, and just the other day Hadarics Márton’s paper, then of course, the heterogeneity in MFTs within and across ideologies identified by Frimer; then finding that measure is not stable across time (2 years= Smith et al), and then the findings that MFT appears to be caused by, rather than cause, the traits it is proposed to influence (for attitudes or political orientation see Hatemi Crabtree and Smith, Kivikangas et al, Ciuk, Everett et al., Strupp-Levitsky et al (Jost), and Márton and Kende among many others . Finally it’s not heritable (Smith et al.). Response: Thank you for this helpful comment: We now cite each of these papers mentioned in the revision. We say (See lines 141-151 page 6 and lines 185-191, page 9): “Strupp-Levitsky et al. [27] suggest that rather than being causal, the foundations build on other, more basic, variables such as empathy, need for closure and need for cognition. In a study manipulating partisan and group identity cues by embedding these in modified MFQ items, Ciuk [28] reported that item endorsement was affected by partisan alignment, supporting the conclusion that causality runs from political ideology to moral foundations. At the psychometric level, Iurino and Saucier [5] tested measurement invariance of the 5-factor structure of the MFQ in 27 countries and concluded that there was little support for a five-factor solution for the questionnaire. Similarly, in a US sample, Davis et al. [29] tested measurement invariance of the MFQ in Black and White samples, concluding that the assumption of scalar invariance could not be supported. Jointly, these pose considerable challenges for a measure that aims to be culturally universal, perhaps especially problems in finding a well-fitting model as a basis for prediction.” We continue “More recently, Harper & Rhodes [32] tested the factor structure of the MFQ in two British samples (total N = 750), confirming that the proposed five-factor structure was not psychometrically sound according to accepted metrics. They also tested an extended MFQ, including the nine items of the sixth “Liberty” foundation proposed by Haidt and colleagues [12]. Adding the Liberty scale, however, did not lead to a well-fitting six-factor model, and instead was better explained by a three-factor model comprising “traditionalism”, “compassion” and “liberty”. Comment: In this, I think the paper needs to redo the front end. The above papers are not critiques, so it a bit disingenuous to frame them as such. Rather, most of them were studying a trait of interest, and in so doing found out the predictor variable (MFQ) was garbage. Response: We have clarified that these papers use the MFQ as a trait of interest. We say “moral foundations theory and the MFQ measure have been criticized both on theoretical and empirical grounds (e.g. by papers using it as a trait of interest)”. See line 127-129, page 5. Comment: The comparison of fits in the paper seem highly selective. Response: We are not quite sure what this comment refers to. We added two new references to Table 1 (see page 8), citing Harper & Rhodes (2021) and Hadarics & Kende (2017) findings, both supporting the conclusion that MFQ falls short of the acceptable degree of model fit. In the text we now say “No reports, to our knowledge, have resulted in satisfactory fit (see Table 1 for a sample of fits for different MFQ models in different samples and cultures).”, lines 168-170, page 7. Comment: …One cannot simply just restructure the MFQ into different domains, without then updating the theory. …The MFQ is the proposed measure of Haidt MFT theory. It has specific logic, organizing principles evolutionary roots etc that link the domains to each other and the MFQ questions. [The present] paper is centered around offering a different factor structure to the data that conflicts with the theory as written. What are you competing hypotheses? What parts of your findings invalidate Haidt’s theory? What part support it? What parts of Haidt’s theory require modification based on your results, what part require abandonment? This is important because if one wanted to use this new formulation as a means to lets say, run different analyses, like behavior genetic ones for example, then this should be the paper to address those questions. Otherwise it is simply a data exercise – which is not a bad thing, but just limited in what it offers. Response: We agree that enumerating which parts of our findings invalidate Haidt’s theory and which support it, and, thus, which parts of Haidt’s theory require modification based on our results is important, and we strove in the discussion to do this. We have taken a second look at the discussion rewriting where possible to make the differences and similarities between the MFQ predictions and our seven-factor model clearer. Regarding the individualizing foundations we now say on lines 416-425, page 18: “The harm/care and fairness/reciprocity foundations reproduced with perfect fidelity: that is for each of these foundations, all 6 items loaded on a single factor in the 7-factor model, supporting the MFT.” Regarding three binding foundations we say: “By contrast, the well-fitting model draws firm distinctions between sanctity and purity and between loyalty to country and loyalty to what we termed clan (combining family and community)”. We have also added a figure showing the mapping from the 5 hyphenated original foundations to the 7-factor model. (Figure 6). This suggests a need to modify MFT to account for additional distinct evolutionary selection pressures which would cause distinct sanctity and purity systems (rather than a single system processing both these kinds of information) and distinct systems for loyalty to kin and to country (rather than a single system processing both these kinds of information). In this work we were constrained by the existing MFQ items which limits our ability to propose and test competing hypotheses. We did not enter with competing hypotheses, but rather generated these in interpreting the results of study 1, before replicating them in four independent datasets. We plan to test our model in future work and regarding this we now say (on lines 609-612, page 26): “A future theory of moral judgment will need to propose and test hypotheses about the selective pressures on loyalty to country vs loyalty to kin and purity vs sanctity factors. In addition, constructing a modified measure of moral foundations may require generating new items to cover the larger number of domains” Comment: One important and serious concern. You need permissions to use other peoples replication data for anything other than replication. Posting data for replication is for replication. Here you are using replication data for novel purposes. Since you used my data and I was never asked, I’m suspecting you did the same for others. It is an ethical question to take replication data in the manner you’re using it. Response: Thank you for noting this. We have emailed the corresponding authors informing them about our use of their data in our work, receiving positive replies from all of them. We note in the text the licensing of each of these open access datasets (Studies 2-5) covered by licenses allowing any use of data, not only replication, as follows (see lines 766-775, pages 32-33): Study 2: CC0 license (You can copy, modify, distribute and perform the work, even for commercial purposes, all without asking permission) https://dataverse.harvard.edu/dataset.xhtml?persistentId=doi:10.7910/DVN/SJTRBI Study 3: CC0 license https://dataverse.harvard.edu/dataset.xhtml?persistentId=doi:10.7910/DVN/WTUGFZ Study 4: CC BY 3.0 license (Free to remix, transform, and build upon the material for non-profit purpose) https://data.mendeley.com/datasets/sbmwmsynxk/1 Study 5: CC BY 4.0 license (Free to remix, transform, and build upon the material for any purpose) https://frontiersin.figshare.com/articles/dataset/Data_Sheet_1_The_Association_Between_Disgust_Sensitivity_and_Negative_Attitudes_Toward_Homosexuality_The_Mediating_Role_of_Moral_Foundations_xls/8234243 Comments from Reviewer 2 Comment: This is one of the empirically strongest papers on the Moral Foundations Questionnaire, with strengths that the paper itself accurately touts. But there are some caveats and weaknesses of which the authors seem unaware. Response: We thank the reviewer for this fulsome complement. Comment: The samples are – except for Study 5, the smallest sample – based on predominantly Western respondents (in fact, predominantly Anglo-American). Page 16 claims that study 1 identified a ‘reliable basic structure’ but this might be true only for certain populations. In general, the paper needs more caveats about potential inapplicability to non-Western populations. I don’t think we should be having a few Chinese university attendees representing (standing in for) the entire non-Western world as is done here. Response: We agree that the lack of global and cultural coverage is a limitation of the study. We now mention this in the Limitations section of the paper’s discussion saying ‘though, given that we used only one non-Western sample, this requires validation” See line 709, page 30 Comment: With respect to the one non-Western sample, the paper fails to note important details (the sample was evidently almost 80% female) and there are some questions. What universities were the respondents from and how Westernized were they? Why was this particular sample (among all non-Western samples administered the MFQ) chosen, and did that involve cherry-picking the sample most likely to be supportive of the select model? Response: Thank you for pointing this out. In addition to stating that the sample is 80% female in the participants’ section, we now note this in the Limitations section of the paper’s discussion. We say “This sample also had higher proportion of females (78 %) compared to other samples”. See lines 717-718, pages 30-31. We used this sample because it was the only non-western sample with MFQ data available to us. Comment: One could also wonder if there is some degree of cherry-picked subjects in studies 1-4. The description of the study 1 sample does not rule out that these are all moral-psychology enthusiasts who volunteered. The study 2 was highly motivated – found their way to a website on moral psychology. Study 3 and 4 samples were from MTurk, but might be called ‘cream of the crop’ MTurkers based on the selection criteria; that’s well and good unless it means these results are dependent on using ‘elite questionnaire-response professionals’ and are ungeneralizable to a typical population. What percentage of MTurk participants meet the criteria specified, and were any MTurk participants recruited but then eliminated (in order to make favorable results more likely)? (A general take-home message might be that the intended structure of the MFQ is fragile, hard to locate in your typical noisy data.) Response: We agree that this is a potential limitation of the study. We have added this as a limitation. We now say in the Limitations section of the paper’s discussion (lines 721-725, page 31: “It is also worth mentioning that in Study 2 we used data provided by participants who voluntarily visited a moral psychology website and, presumably, were highly motivated. Similarly, in Study 3-4 we used data provided only by MTurk participants with high approval rate. This may have affected how representative of general population these samples are”. Comment: The models employed are entered in a particular order that was apparently set a priori (is there a pre-registration to verify this?), but the question is raised whether, when all is said and done, one (or more) of them might be unnecessary. That is, perhaps by backward removal of the element adding least, a more parsimonious outcome might be reached. This is analogous to the forward versus backward methods in stepwise regression. Related questions: How does it make sense to enter a hierarchical model (which includes the five-factor model) before one enters the five-factor model, and what is the difference between ‘two-factor model’ and ‘binding and individualizing’ foundations in, for example, Table 8. Response: Thank you for this suggestion about parsimony and reducing a five-factor model rather than adding to it. Reducing the five-factor model to smaller number of factors made model’s fit worse, not better, we also added a sentence about our approach, which is “Our approach was to develop models increasing in complexity from the simplest predicted model, to more complex structures, as required to achieve good fit”. See page 11, lines 248-250 We did not, unfortunately pre-register this method. Comment: Following that same thread: The conclusion (though unstated) would seem to be that one can use the MFQ, but must really analyze it or make sense of its scores in a complicated and onerous manner. Response: We include a scoring system at item level and sharing the models which can be used to generate scores, available at OSF page of the paper. However, we agree that it would desirable to add items to balance the new scales. Comment: That is perhaps a problem for construct validity. An important area of future implications would be this: What do these results suggest about how to construct a better moral ‘foundations’ measure (e.g., one that would not be so profoundly affected by method variance, or by the pull of an underlying two-factor model and of political leanings and of acquiescence or general morality vs. amorality? In other words, how can the measurement of moral foundations be cleaned up in these regards? Response: Thank you for this suggestion, we now include very similar wording in the modelling implications section (See lines 691-693, pages 29-30 and lines 609-612, page 26): “Constructing a better measure of moral foundations may also require rephrasing of some items and perhaps creating new ones to remove or at least minimize the effects of method variance and possible bias” and “A future theory of moral judgment will need to propose and test hypotheses about the selective pressures on loyalty to country vs loyalty to kin and purity vs sanctity factors. In addition, constructing a modified measure of moral foundations may require generating new items to cover the larger number of domains.”. Smaller matters: Comment 7: The middle paragraph on page 26 leads to considerable head-scratching. The first sentence is incomprehensible. As for the 2nd, why is social dominance picked out so prominently among all possible alternatives? What is the ‘alignment’ referred to there? It would seem that one interpretation of the results is that ‘tilt’ and binding-individualizing make independent contributions because one cannot reduce the latter entirely to tilt (as some treatments of the topic have implied in the past), there being both liberal and conservative ways of endorsing binding AND of endorsing individualizing morality. Response: Thank you for this correction. We removed first sentence and provided a reference for the possible link between the tilt and social dominance. We say: “Future study could usefully focus on understanding this factor, including relating it to other constructs, for instance one or other component of the social dominance construct [48], which was previously offered as an explanation of liberal-conservative differences [56]”. See lines 642-645, page 28. Comment: In Table 8 it is noteworthy that moving from model 3 to model 4 yielded a huge improvement in fit, which was not the case in other samples. Does this say something about how Chinese respondents characteristically handle morality or this questionnaire? Response: Thank you for pointing this out. The improvement from model 3 to model 4 (adding Judgment and Relevance method factors) was present in all samples. One explanation why it was more pronounced in the Chinese sample could be the way self-reflection (relevance items) and evaluative judgments (judgment items) are presented in the Chinese language, however since it was the only non-western sample we decided not to speculate about the cause of this effect. We mentioned this anomaly in the discussion, saying about this sample “It also showed the largest model fit improvement by modelling two method factors” see lines 718-719, page 31. Comment: Page 13 identifies a set of three MFQ items and labels them as sanctity, but it is hard to apply this interpretation to the third item mentioned (about roles for men and women). One could just as easily label this factor as ‘traditional gender roles/expectations’ (that also would fit 2 of the 3 items). Response: We agree with the reviewer that the alternative name for this factor is possible, however, we chose to keep the term sanctity as this is the usage from Haidt, in which both gender roles and religion are sanctified, e.g. “If the body is a temple housing divinity within, then people should not be free to use their bodies in any way they please; rather, moral regulations should help people to control themselves and avoid sin and spiritual pollution in matters related to sexuality, food, and religious law more generally” (Haidt and Graham, 2007). Comment: Page 13-14 separate out a patriotism factor from a loyalty factor, but it would seem more informative to label one of them as loyalty to country and the other as loyalty to family/team/group (i.e., to smaller-scope entities). Response: Thank you for this suggestion. We have re-worked the discussion for study 1 and ongoing discussion to pay close attention to clarity in terminology and adopted a clearer naming scheme. We say (see lines 420-424, page 18): “This seven (rather than five) foundation model broke-out items from the sanctity/purity, authority/respect, and ingroup/loyalty foundations to form independent sanctity and purity foundations, and independent foundations of loyalty to country and loyalty to clan. These changes also altered the nature of the authority foundation, leaving it more obviously aligned around hierarchy.” See lines 420-424, page 18. We have also added a figure showing the mapping from the 5 hyphenated original foundations to the 7-factor model. (Figure 6). Comment: In reference to same on page 24, it is implied that Herbert Simon differentiated patriotism from loyalty, but this seems unlikely given how Simon’s viewpoint is stated. Clarity needed. Response: Thank you for pointing this out. We have clarified that this refers to hierarchy. We say (see lines 599-605, page 26) “This distinction of loyalty to one’s group and obedience to hierarchy, has a distinguished history in theory. In his work on administrative behavior, Simon [54] identified loyalty and obedience as the two necessary conditions for the existence of organisations, defining loyalty as the capacity to introject organizational objectives in place of one’s own aims and obedience as choosing to make one’s default response be to follow requests of a superior: a definition which corresponds closely to notions of respecting the wishes of those in authority.” We also call this factor (consisting of 4 Authority items) as “hierarchy” throughout in the paper. Comment: Until the very last part of the paper, there is a tendency to label one latent variable -- created by constraining all items to load positively on it – as General Morality although an equally plausible interpretation is Acquiescence. Would be better to mention both possible interpretations from the beginning. Response: Thank you for the suggestion. We now call this general/acquiescence factor throughout in the paper. Comment: In Table 1, the “(modified)” notation for the best-fitting model for the Lurino and Saucier (2018) paper needs some explanation. Response: Thank you for pointing this out. We now explain “modified” in the table’s footnote. We say: “Iurino & Saucier (2018) used alternative model derived from exploratory factor analysis of original MFQ items”. See Table 1, page 8, Comment: The tendency to overestimate one’s personality traits is (on p. 12) mischaracterized as halo effect. Halo effect is more commonly used to refer to overestimation of someone else’s positive traits. Doing it to yourself is self-enhancement or social desirability bias. Response: Thank you for this correction. We now describe these as “self-enhancement or social desirability”. See line 302-303, page 13. Comment: It seems that some text on pages 17-18 is redundant with what was said before in the paper, this needs a check. Response: Thank you for pointing this out. We removed the redundant text. Submitted filename: Response to Reviewers.docx Click here for additional data file. 8 Oct 2021 Remapping the foundations of morality: Well-fitting structural model of the Moral Foundations Questionnaire. PONE-D-21-17063R1 Dear Dr. Zakharin, We’re pleased to inform you that your manuscript has been judged scientifically suitable for publication and will be formally accepted for publication once it meets all outstanding technical requirements. Within one week, you’ll receive an e-mail detailing the required amendments. When these have been addressed, you’ll receive a formal acceptance letter and your manuscript will be scheduled for publication. An invoice for payment will follow shortly after the formal acceptance. To ensure an efficient process, please log into Editorial Manager at http://www.editorialmanager.com/pone/, click the 'Update My Information' link at the top of the page, and double check that your user information is up-to-date. If you have any billing related questions, please contact our Author Billing department directly at authorbilling@plos.org. If your institution or institutions have a press office, please notify them about your upcoming paper to help maximize its impact. If they’ll be preparing press materials, please inform our press team as soon as possible -- no later than 48 hours after receiving the formal acceptance. Your manuscript will remain under strict press embargo until 2 pm Eastern Time on the date of publication. For more information, please contact onepress@plos.org. Kind regards, Peter Karl Jonason Academic Editor PLOS ONE Additional Editor Comments (optional): Reviewers' comments: Reviewer's Responses to Questions Comments to the Author 1. If the authors have adequately addressed your comments raised in a previous round of review and you feel that this manuscript is now acceptable for publication, you may indicate that here to bypass the “Comments to the Author” section, enter your conflict of interest statement in the “Confidential to Editor” section, and submit your "Accept" recommendation. Reviewer #1: (No Response) ********** 2. Is the manuscript technically sound, and do the data support the conclusions? The manuscript must describe a technically sound piece of scientific research with data that supports the conclusions. Experiments must have been conducted rigorously, with appropriate controls, replication, and sample sizes. The conclusions must be drawn appropriately based on the data presented. Reviewer #1: Yes ********** 3. Has the statistical analysis been performed appropriately and rigorously? Reviewer #1: Yes ********** 4. Have the authors made all data underlying the findings in their manuscript fully available? The PLOS Data policy requires authors to make all data underlying the findings described in their manuscript fully available without restriction, with rare exception (please refer to the Data Availability Statement in the manuscript PDF file). The data should be provided as part of the manuscript or its supporting information, or deposited to a public repository. For example, in addition to summary statistics, the data points behind means, medians and variance measures should be available. If there are restrictions on publicly sharing data—e.g. participant privacy or use of data from a third party—those must be specified. Reviewer #1: Yes ********** 5. Is the manuscript presented in an intelligible fashion and written in standard English? PLOS ONE does not copyedit accepted manuscripts, so the language in submitted articles must be clear, correct, and unambiguous. Any typographical or grammatical errors should be corrected at revision, so please note any specific errors here. Reviewer #1: Yes ********** 6. Review Comments to the Author Please use the space provided to explain your answers to the questions above. You may also include additional comments for the author, including concerns about dual publication, research ethics, or publication ethics. (Please upload your review as an attachment if it exceeds 20,000 characters) Reviewer #1: First, well done on asking to use other’s data; Kevin let me know you did. I’d suggest you list the grants that funded all of the data you used in the acknowledgements; whether you wish to thank the PI’s is of course up to you. I consider reviews to be suggestive. Ultimately it is up the authors to decide what to put in their papers. I only give hard rejects when the analyses or understanding of the literature is so wrong, that there is no hope for meaningful or valid contribution. Overall, I’m supportive pf publication, as I was originally. That said, I am disappointed in the minimal revision made and I think the paper undersells some major points. So yes publish, it is a fine empirical paper. Whether you want it to be a better paper, that’s always a choice between investing more time vs. just get it out it. I do think it is a missed opportunity to not engage Haidt’s theory here. If your main takeaway is that the factor structure of the MFQ is not the 5 dimensions that Haidt argues for, and the items don’t fit as advertised into their subdimensions, then you achieve that. But despite valiant efforts, this means that half of MFT is not supported by the measures or data. This seems a rather important point and one the paper appears unwilling to engage. If the factor structure is not valid, then the theory or measures are not valid. As it stands there is now a mismatch between MFT and MFQ. The current paper sidesteps this question, but in my view this should be the paper to actually engage it. A single sentence, of let someone else do it, seems both a missed opportunity and also a sin- well Lindon would say that’s too strong a word, and I agree but I don’t have his vocabulary, so going with it. But I’ll ask you this, what other paper would this be addressed in? Empiricism without theoretical segment has value but is limited. Read the book, go at it straight on, remains my suggestion. ********** 7. PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files. If you choose “no”, your identity will remain anonymous but your review may still be made public. Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy. Reviewer #1: No 13 Oct 2021 PONE-D-21-17063R1 Remapping the foundations of morality: Well-fitting structural model of the Moral Foundations Questionnaire. Dear Dr. Zakharin: I'm pleased to inform you that your manuscript has been deemed suitable for publication in PLOS ONE. Congratulations! Your manuscript is now with our production department. If your institution or institutions have a press office, please let them know about your upcoming paper now to help maximize its impact. If they'll be preparing press materials, please inform our press team within the next 48 hours. Your manuscript will remain under strict press embargo until 2 pm Eastern Time on the date of publication. For more information please contact onepress@plos.org. If we can help with anything else, please email us at plosone@plos.org. Thank you for submitting your work to PLOS ONE and supporting open access. Kind regards, PLOS ONE Editorial Office Staff on behalf of Dr. Peter Karl Jonason Academic Editor PLOS ONE
  21 in total

1.  Convergent and discriminant validation by the multitrait-multimethod matrix.

Authors:  D T CAMPBELL; D W FISKE
Journal:  Psychol Bull       Date:  1959-03       Impact factor: 17.737

2.  Liberals and conservatives rely on different sets of moral foundations.

Authors:  Jesse Graham; Jonathan Haidt; Brian A Nosek
Journal:  J Pers Soc Psychol       Date:  2009-05

3.  The nature and structure of correlations among Big Five ratings: the halo-alpha-beta model.

Authors:  Ivana Anusic; Ulrich Schimmack; Rebecca T Pinkus; Penelope Lockwood
Journal:  J Pers Soc Psychol       Date:  2009-12

4.  Reanalysing the factor structure of the moral foundations questionnaire.

Authors:  Craig A Harper; Darren Rhodes
Journal:  Br J Soc Psychol       Date:  2021-02-17

Review 5.  Bifactor and Hierarchical Models: Specification, Inference, and Interpretation.

Authors:  Kristian E Markon
Journal:  Annu Rev Clin Psychol       Date:  2019-01-16       Impact factor: 18.561

Review 6.  The four elementary forms of sociality: framework for a unified theory of social relations.

Authors:  A P Fiske
Journal:  Psychol Rev       Date:  1992-10       Impact factor: 8.934

7.  Moral values are associated with individual differences in regional brain volume.

Authors:  Gary J Lewis; Ryota Kanai; Timothy C Bates; Geraint Rees
Journal:  J Cogn Neurosci       Date:  2012-05-09       Impact factor: 3.225

8.  Moral foundations and decisions to donate bonus to charity: Data from paid online participants in the United States.

Authors:  Trevor O'Grady; Donald Vandegrift
Journal:  Data Brief       Date:  2019-07-30

9.  Understanding libertarian morality: the psychological dispositions of self-identified libertarians.

Authors:  Ravi Iyer; Spassena Koleva; Jesse Graham; Peter Ditto; Jonathan Haidt
Journal:  PLoS One       Date:  2012-08-21       Impact factor: 3.240

10.  Moral "foundations" as the product of motivated social cognition: Empathy and other psychological underpinnings of ideological divergence in "individualizing" and "binding" concerns.

Authors:  Michael Strupp-Levitsky; Sharareh Noorbaloochi; Andrew Shipley; John T Jost
Journal:  PLoS One       Date:  2020-11-10       Impact factor: 3.240

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.