Literature DB >> 29284276

A Psychometric Evaluation of the DSM-IV Criteria for Antisocial Personality Disorder: Dimensionality, Local Reliability, and Differential Item Functioning Across Gender.

Muirne C S Paap¹, Johan Braeken², Geir Pedersen^3,4, Øyvind Urnes⁴, Sigmund Karterud⁴, Theresa Wilberg⁴, Benjamin Hummelen⁴.

Abstract

This study aims at evaluating the psychometric properties of the antisocial personality disorder (ASPD) criteria in a large sample of patients, most of whom had one or more personality disorders (PD). PD diagnoses were assessed by experienced clinicians using the Structured Clinical Interview for Diagnostic and Statistical Manual of Mental Disorders, 4th edition, Axis II PDs. Analyses were performed within an item response theory framework. Results of the analyses indicated that ASPD is a unidimensional construct that can be measured reliably at the upper range of the latent trait scale. Differential item functioning across gender was restricted to two criteria and had little impact on the latent ASPD trait level. Patients fulfilling both the adult ASPD criteria and the conduct disorder criteria had similar latent trait distributions as patients fulfilling only the adult ASPD criteria. Overall, the ASPD items fit the purpose of a diagnostic instrument well, that is, distinguishing patients with moderate from those with high antisocial personality scores.

Entities: Chemical Disease Gene Species

Keywords: antisocial personality disorder; conduct disorder; gender bias; item response theory; psychopathy

Year: 2017 PMID： 29284276 PMCID： PMC6906540 DOI： 10.1177/1073191117745126

Source DB: PubMed Journal: Assessment ISSN： 1073-1911

Antisocial personality disorder (ASPD), as described by the fifth edition of the Diagnostic and Statistical Manual of Mental Disorders (DSM-5; American Psychiatric Association [APA], 2013), is defined by a set of seven criteria of which at least three must be fulfilled in order to establish the diagnosis. In addition, there should be evidence of conduct disorder (CD) with onset before age 15 years. The term antisocial personality was introduced in the DSM system in 1968 with the publication of the second edition (DSM-II; APA, 1968). According to this manual, a person with antisocial personality is grossly selfish, callous, irresponsible, impulsive, unable to feel guilt or to learn from experience and punishment, and has low frustration tolerance. In the third edition of DSM and its revision (APA, 1980, 1987), more emphasis was placed on overt behavior in defining the ASPD criteria, with the intention to obtain greater diagnostic reliability (Widiger et al., 1996). In DSM-IV (APA, 1994), ASPD is conceptualized using a “hybrid approach,” including criteria that are more personality-oriented and criteria that are more behavior-focused (Widiger et al., 1996; see also Table 1). From DSM-IV to DSM-5 (APA, 2013), the ASPD criteria have not been changed.

Table 1.

ASPD Criteria According to DSM-IV and DSM-5.

1. Failure to conform to social norms with respect to lawful behaviors as indicated by repeatedly performing acts that are grounds for arrest

2. Deception, as indicated by repeatedly lying, use of aliases, or conning others for personal profit or pleasure

3. Impulsivity or failure to plan ahead

4. Irritability and aggressiveness, as indicated by repeated physical fights or assaults

5. Reckless disregard for safety of self or others

6. Consistent irresponsibility, as indicated by repeated failure to sustain consistent work behavior or honor financial obligations

7. Lack of remorse, as indicated by being indifferent to or rationalizing having hurt, mistreated, or stolen from another

Note. ASPD = antisocial personality disorder. At least three criteria are required for an ASPD diagnosis, in addition to evidence of childhood conduct disorder.

ASPD Criteria According to DSM-IV and DSM-5. Note. ASPD = antisocial personality disorder. At least three criteria are required for an ASPD diagnosis, in addition to evidence of childhood conduct disorder. Prevalence rates for ASPD in community samples range from 0.2% to 3.6% (Grant et al., 2005; Torgersen, Kringlen, & Cramer, 2001). This broad range in prevalence rates may partly be due to differences in assessment procedures. For instance, Trull, Jahng, Tomko, Wood, and Sher (2010) demonstrated significant reductions of personality disorder (PD) prevalence rates in the study of Grant et al. (2005) by requiring that each PD criterion be associated with significant distress or impairment. In clinical situations, prevalence rates are highly influenced by sample characteristics. For instance, Zimmerman, Rothschild, and Chelminski (2005) found a prevalence of 3.1% in a general clinical outpatient practice, whereas Mariani et al. (2008) found a prevalence of 17.3% in a sample of treatment-seeking cocaine- and cannabis-dependent individuals. Although the frequency might be relatively low in general outpatient clinics, it is important to assess ASPD reliably and effectively as the presence of ASPD may have important consequences for clinical decision making. ASPD is typically assessed using a subscale of a broader instrument encompassing multiple PDs, like the Structured Clinical Interview for DSM-IV Axis II PDs (SCID-II; First, 1994). It is of yet unclear whether the SCID-II ASPD subscale, which is explicitly based on the DSM-IV ASPD criteria, taps into one or multiple underlying factors (e.g., a personality-oriented factor and a behavior-oriented factor). Studies focusing on ASPD or psychopathy (which is a construct closely related to ASPD) have not been consistent with respect to the factorial structure: while some studies found evidence for a one-dimensional structure (Harford et al., 2013; Jane, Oltmanns, South, & Turkheimer, 2007; Rosenström et al., 2017), others found support for two or more factors (Hare & Neumann, 2008; Kendler, Aggen, & Patrick, 2012; Marcus, Lilienfeld, Edens, & Poythress, 2006). In the assessment of ASPD, another important point of discussion has been whether ASPD should be scored along a continuum or as a categorical diagnosis. Early taxometric studies mostly suggested that ASPD has a latent categorical structure (Haslam, 2003). However, most subsequent taxometric studies found support for a continuum approach (Edens, Marcus, Lilienfeld, & Poythress, 2006; Guay, Ruscio, Knight, & Hare, 2007; Marcus et al., 2006). Such a continuum approach may be helpful from a clinical point of view. Some treatment programs for PDs may tolerate patients with low-grade ASPD but not those who are severely disturbed (Bateman, O’Connell, Lorenzini, Gardner, & Fonagy, 2016). In the PD field, the overall number of PD criteria is often taken as a measure of the general PD severity (Hopwood et al., 2011). This is, however, not an optimal approach since certain criteria may be stronger indicators of PD severity than others. The same would apply to obtaining a score reflecting ASPD severity. An alternative, more suitable approach, would be to estimate latent ASPD severity scores using item response theory models (IRT; Reise & Revicki, 2014); these models have the advantage that they can be used to evaluate item (criterion) properties, and take these properties into account when estimating a latent severity score. Several authors have suggested that measurement bias across gender might be present in items measuring ASPD, as some items seem to describe more male-specific behavior, for example, “Irritability and aggressiveness, as indicated by repeated physical fights or assaults” (Dolan & Vollm, 2009; Widiger, 1998). Measurement bias across gender can be investigated by testing whether items show differential item functioning (DIF; e.g., Holland & Wainer, 1993) for gender. DIF is present if the item parameters in one group differ from those in the other group (discrimination parameter and/or threshold parameter). In other words, gender-based DIF would imply that men would be more likely (or less likely) to obtain a given item score compared with women who exhibit a similar trait level. Jane et al. (2007) conducted a DIF analysis on the Structured Interview for DSM-IV Personality items (Pfohl, Blum, & Zimmerman, 1997), using a nonclinical sample (United States Air Force recruits and undergraduate college students). The respondents were assessed by doctoral-level clinical psychologists and graduate students in clinical psychology. Jane et al. (2007) found DIF for three ASPD items, all focused on behavior: Item 1 (failure to conform), Item 4 (aggressiveness), and Item 5 (reckless disregard). These items were more likely to be endorsed by men than by women with comparable trait levels. The authors concluded that their results “reinforce the possibility that the current ASPD criteria do not adequately reflect how the construct is expressed in women.” DIF for behavioral items was also found in a study using the Psychopathy Checklist–Revised, conducted by Bolt, Hare, Vitale, and Newman (2004). In this study, items that belonged to the antisocial/lifestyle domain (Factor 2) were more prone to display DIF than the affective/interpersonal items (Factor 1). This study was based on a sample of criminal offenders, in which female participants may exhibit more male-like antisocial behavior. The generalizability of the results obtained by Jane et al. (2007) and Bolt et al. (2004) has not been sufficiently tested. In this study we aim to extend the literature by carefully assessing whether gender-related DIF is found in the SCID-II ASPD subscale in a large clinical sample; in contrast to the study by Jane et al. (2007), the ASPD criteria were assessed by experienced clinicians. Among the specific PDs in DSM-IV and DSM-5, ASPD is the only one that requires the presence of childhood precursors, that is, CD, with onset before age 15 years. Although there seems to be good empirical evidence for the continuity between CD and ASPD (Gelhorn, Sakai, Price, & Crowley, 2007; Moffitt et al., 2008; Robins, 1978), a substantial number of individuals fulfilling the adult ASPD criteria do not meet criteria for a prior CD diagnosis (Kim-Cohen et al., 2003) and most comparison studies so far have not found clinically significant differences between antisocial individuals with CD and antisocial individuals without CD (Black & Braun, 1998; Perdikouri, Rathbone, Huband, & Duggan, 2007). However, in a study of 327 male prisoners who were assessed by the SCID-II, Walters and Knight (2010) reported that antisocial individuals with evidence of prior CD, showed more severe adult antisocial features, that is, higher levels of criminal thinking, antisocial attitudes, and behavioral adjustment difficulties. Moreover, CD symptom count appeared to have moderate utility in forecasting institutional misconduct in a study of 353 inmates, of whom 185 had ASPD (Edens, Kelley, Lilienfeld, Skeem, & Douglas, 2015). Since the severity of antisocial features is relevant in clinical decision making, it is of special importance to know whether assessing CD symptoms retrospectively may help in determining the severity of ASPD. Since this question appears to be as yet unresolved, more studies are needed, preferably using large clinical samples and a modern psychometric approach.

Aims of the Study

The aim of this study is to perform a psychometric evaluation of the adult DSM-IV ASPD diagnostic criteria, as assessed by experienced clinicians using the SCID-II (First, 1994), in a large sample of personality-disordered patients. More specifically, we will examine whether the SCID-II ASPD items are tapping into a common underlying trait, whether the SCID-II ASPD items can be used for reliable measurement, and whether the items are free of measurement bias across gender. Moreover, we will investigate the diagnostic relevance of CD by comparing latent ASPD severity levels obtained by IRT across four diagnostic groups: (1) patients with ASPD according to DSM-IV (i.e., ASPD with CD), (2) patients with three or more ASPD criteria without CD (late-onset ASPD), (3) patients without ASPD but with evidence of prior CD, and (4) patients without ASPD and without evidence of prior CD. IRT (Embretson & Reise, 2000) provides a great framework and toolbox for psychometric evaluation. IRT encompasses a family of measurement models that focuses on explaining the dependencies between item responses within a person and between persons. IRT models are especially suitable for dichotomous or polytomous (e.g., Likert-type scale) item response data, where the items are expected to measure a common latent trait. The reliability of a measurement instrument is usually represented by a single fixed number such as Cronbach’s alpha; yet, this in conflict with the fact that a test cannot be expected to measure each person equally efficiently along the latent trait dimension. In IRT, this problem is solved by using (Fisher) information as an estimate of measurement precision/reliability conditional on the latent trait value. This function, showing information for different latent trait values, is known as the test information function. Since the goal of the instrument under study is diagnosis, we are interested in having sufficient information for relatively high latent trait values: the focus is on distinguishing patients with moderate levels of antisocial personality from those with high levels (i.e., fulfilling the criteria). Since there has been some debate as to whether the ASPD criteria may focus on behavior more typical for men, we also wanted to check for gender-related item bias or—in IRT terminology—differential item functioning. DIF can potentially lead to measurement artefacts by masking or even inflating group differences, because the relationship between an item showing DIF and the latent trait is not identical for individuals belonging to different subgroups.

Method

Sample

The original sample consisted of 3,391 patients from the Norwegian Network of Personality Focused Treatments Programs (Karterud et al., 2003), admitted to treatment from 1996 to 2008 and diagnosed according to DSM-IV. Among these patients, 75 had missing criteria sets for the adult ASPD criteria (i.e., the ASPD criteria were not assessed or registered), and two patients had missing criteria sets for childhood CD. Moreover, one patient had a mismatch between ASPD diagnosis and the number of ASPD criteria. All these patients (N = 78) were excluded from the analyses, resulting in a sample of 3,313 individuals, of whom 924 were men (28%) and 2,389 were women (72%). Mean age was 37 (SD = 9.3) and 35 (SD = 9.3) years for men and women, respectively. All units in the network adhered to the same treatment model, consisting of short-term day treatment followed by long-term outpatient group therapy. All patients in the sample were admitted to day treatment, including those with ASPD. Most patients had a PD diagnosis (77%, N = 2,595). Fifty-six percent had one PD diagnosis, 15% had two PD diagnoses, and 6.5% had three or more PD diagnoses. Avoidant PD was the most frequent PD (37%), followed by borderline PD (22%) and PD not otherwise specified (17%). The majority (97%) of patients had one or more symptom diagnoses, mostly an affective disorder (74%) or an anxiety disorder (64%). Other frequent symptom disorders were eating disorder (12%) and substance use disorder (9%). Chi-square analyses revealed that ASPD was significantly associated with schizotypal PD (ϕ = .079, p < .001), paranoid PD (ϕ = .088, p < .001), narcissistic PD (ϕ = .122, p < .001), and borderline PD (ϕ = .171, p < .001). The prevalence of these disorders in the subgroup of patients with ASPD was 7% for schizotypal PD, 29% for paranoid PD, 9% for narcissistic PD, and 76% for borderline PD.

Measures

The SCID-II (First, 1994) is a semistructured clinical interview that covers the 11 DSM-IV Personality Disorders, including Personality Disorder not otherwise specified. The SCID-II follows a modular approach, where PDs are assessed one at a time. The initial question for each SCID-II item closely follows the content of the corresponding DSM-IV criterion. The SCID-II items are accompanied by open-ended prompts that can be used to encourage patients to elaborate freely about their symptoms. At times, open-ended prompts can be followed by closed-ended questions to further clarify a specific PD symptom. In the current study, the focus is on the ASPD subscale, which consists of 7 items. The SCID-II items are rated within one of three response categories: 1 = absent or false; 2 = subthreshold (i.e., the threshold for the criterion is almost but not quite, met); and 3 = threshold or true. In order to establish a DSM-IV ASPD diagnosis, it is required that the patient is also (retrospectively) diagnosed with childhood CD. The diagnosis of CD was made when at least three CD criteria were met. The SCID-II does not require that these criteria are confirmed by early caregivers or other sources of information. Interrater reliability studies have shown that adequate interrater reliability can be obtained by using the SCID-II (Maffei et al., 1997; Weertman, Arntz, Dreessen, van Velzen, & Vertommen, 2003).

Procedures

All units in this study complied with the diagnostic and data collection procedures required for membership of the Norwegian Network. The SCID-II was administered by experienced clinicians, that is, health care professionals (mental health nurses, psychologists, or medical doctors) working at clinical units specialized in the assessment and treatment of PDs. Clinicians were trained in PD diagnostics through attendance at local courses and Network conferences. Final PD diagnoses were established by way of the longitudinal expert evaluation using all data (LEAD) standard (Spitzer, 1983). Tentative diagnoses were made at the time of admission, on the basis of referral letters, self-reported history and complaints, as well as two structured clinical diagnostic interviews: (1) Mini-International Neuropsychiatric Interview for Axis I diagnoses (Sheehan et al., 1994) and (2) SCID-II for PDs (First, 1994). During the 18 weeks of day treatment, therapists could affirm or review diagnoses based on information gathered in a variety of clinical situations. A final PD diagnosis required that the criteria from the original SCID-II protocol were confirmed by clinical observations. It is assumed that the LEAD procedure resulted in more valid diagnoses (Pedersen, Karterud, Hummelen, & Wilberg, 2013).

Psychometric Analyses

Dimensionality Analyses

To ascertain whether the SCID-II ASPD items form a scale and thus measure one underlying trait, we assessed the dimensionality of the SCID-II ASPD items using two complementary methods: confirmatory Mokken Scale Analysis (MSA), which is a nonparametric method; and the Empirical Kaiser Criterion (EKC), which is an eigenvalue-based method. The dimensionality analyses were run for the total sample first, followed by separate analyses by gender. In recent years, MSA has increased in popularity in the fields of psychological and health assessment (e.g., Chou, Lee, Liu, & Hung, 2017; Lenferink et al., 2016; Murray, McKenzie, Murray, & Richelieu, 2014; Stewart, Allison, Baron-Cohen, & Watson, 2015; van den Berg, Paap, & Derks, 2013; Watson et al., 2012). MSA identifies scales that allow an ordering of individuals on an underlying scale using unweighted sum scores. In order to ascertain which items covary and form a scale, scalability coefficients are calculated on three levels: item-pairs (H), items (H), and scale (H). H is based on H and reflects the degree to which the scale can be used to reliably order persons on the latent trait using their sum score. A scale is considered acceptable if 0.3 ≤ H < 0.4, good if 0.4 ≤ H < 0.5, and strong if H ≥ 0.5 (Mokken, 1971; Sijtsma & Molenaar, 2002). Eigenvalue-based methods are among the most popular and common methods for dimensionality assessment. Unfortunately, possibly due to historical and/or ease-of-access reasons, many applied researchers still rely on flawed criteria. In particular, the eigenvalue-greater-than-1 rule, also known as the Kaiser criterion (Kaiser, 1960), has repeatedly been shown to have low accuracy (observe that this is not a recent finding; see, e.g., Velicer, Eaton, & Fava, 2000; Zwick & Velicer, 1986). Braeken and van Assen (2017) clarify that the reason why the Kaiser criterion fails is that it does not account for sampling variation in eigenvalues. To remedy this shortcoming, they proposed a modification based on the asymptotical sampling distribution of eigenvalues. Instead of comparing the observed sample eigenvalues to a fixed reference value of 1, the EKC establishes reference eigenvalues that can be expected for a data set of specified size (i.e., persons by items), if no factor structure would be present. The number of dimensions to retain then corresponds to the length of the series of first-ranked eigenvalues that are all greater than these null-reference eigenvalues. Graphically, this simply means finding the point where the line formed by the reference eigenvalues crosses the screeplot of observed sample eigenvalues (for an easy-to-use webapplet, see https://cemo.shinyapps.io/EKCapp). The EKC is a non–simulation-based relative of parallel analysis (which simulates null reference eigenvalues), the current gold standard in the field (Garrido, Abad, & Ponsoda, 2013; Timmerman & Lorenzo-Seva, 2011). Simulation studies show that the EKC performs at par with parallel analysis for uncorrelated scales, and even better than parallel analysis for short correlated scales.

IRT Model

The graded response model (GRM; Samejima, 1996) was used to scale and evaluate the seven SCID-II ASPD items. The GRM applies to ordered categorical item scores. Let the variable Y represent the score of a patient p on an item i, where the observed response Y can range from j = 1 over 2 to 3. The GRM directly models the cumulative conditional probability of scoring greater than or equal to each of the response options where is the position of the person on the latent trait scale and where ai and b are item parameters describing how the item is linked to the latent trait scale. The item parameter a is a discrimination parameter expressing the degree to which the item i can differentiate between patients on the latent trait scale (i.e., higher values for a indicate that small differences in position on the latent trait can lead to large changes in probability). Item parameter b is a threshold parameter for item i indicating the position on the latent trait scale for which a patient would have 50% probability of being assigned a score greater than or equal to j on the item i. The regular response probabilities can then simply be derived by taking differences between the cumulative probabilities: Written out in full, this implies the following set of three category response curves: Note that , because everyone will at least get j = 1, which is the lowest score that can be assigned to a patient. Hence, similar to the dummy coding principle for a categorical predictor, the number of threshold parameters for an item is always one less than the item’s number of response categories. The item score range stops at 3, so by definition

Local Reliability: Test Information Function and Targeting

In IRT, measurement error is conceptualized in terms of information: More information means more precision, meaning less error of measurement. The information a test provides on the scale-position of a patient varies across the latent trait scale and is a direct function of the psychometric properties and scale-position of the items in the test. Given that the squared standard error of measurement is equal to the reciprocal of the test information , an estimate of local reliability can be computed as The first equality stems from the traditional formulation of reliability as a ratio of variances, true variance divided by total variance, or equivalently, 1 minus error variance divided by total variance. The second equality stems from the reciprocal information-error relation and the fact that our scale metric in a GRM is standardized such that

Measurement Bias Across Gender: Differential Item Functioning

We used a DIF model comparison approach[1] to screen for gender-related item bias in the seven SCID-II ASPD items. For more detailed information about this procedure as well as other ways to assess DIF, we refer the reader to Thissen, Steinberg, and Wainer (1993) and Millsap (2011). Two reference models were estimated: a gender equivalent model and a gender nonequivalent model. The gender-equivalent model allows for scale-level differences in means and standard deviations of the latent trait between male and female patients, while constraining the item parameters to be equal across groups; in contrast, the gender-nonequivalent model allows for differences in both item discrimination and item thresholds for all items between male and female patients, while constraining the means and standard deviations to be equal across groups. If the gender-equivalent model shows better fit compared with the nonequivalent model, this would imply that there are only overall scale-level group differences between males and females, whereas if the opposite is true, it would imply that the scales for males and females are to some extent incomparable and that ASPD criteria may function differently for males and females. If the gender-nonequivalent model shows better fit, a set of model comparisons are performed with the goal to establish which items cause the nonequivalence. This is done by taking the gender-equivalent model as a starting point, and relaxing the equivalence constraints, one item at a time. When DIF items have been identified in this manner, Wald tests are used to assess whether the DIF is uniform across the scale (i.e., whether it only affects the thresholds) or also varies across the scale (i.e., also affects the discrimination parameters; nonuniform DIF).

Software

All statistical analyses were coded and performed in the open source software program R version 3.2.3 (R Development Core Team, 2012). The GRM was estimated using a full information maximum likelihood approach in the R package mirt version 1.16 (Chalmers, 2012).

Results

Descriptive Statistics: Sample Prevalence and Gender Distribution of ASPD

Of the total sample of 3,313 patients (72% women), 108 patients scored a “3” on three or more ASPD items (48% women). Fifty-four of these patients (42% women) also fulfilled the criteria for childhood CD and were therefore diagnosed as having ASPD according to the DSM-IV (labeled as “ASPD-DSM-IV”). The 54 patients (55% women) who did not fulfill the CD criteria were tentatively labeled as ASPD-late onset. Since there were 3 answering categories per item and 7 items, the total number of possible scoring patterns equaled 37 = 2,187. There were only 415 unique ASPD symptom endorsement patterns (19% of 2,187), which is typical when studying a clinical diagnosis. One pattern had the highest frequency of occurrence by far: the pattern 1111111 (i.e., absence of all ASPD-related symptoms) occurred 1,845 times. This indicates that SCID-II ASPD items are not too commonly endorsed and can be expected to differentiate well between patients with and without ASPD. Furthermore, among the 108 patients scoring “3” on three or more ASPD items, 90 different endorsement patterns occurred of which 74 were reported by a single patient only. Hence, there is no prototypical ASPD endorsement pattern.

Dimensionality of the SCID-II ASPD Items

The H values exceeded the threshold of .3 for all MSA analyses (.371 and .336 for women and men, respectively, and .303 for the total sample). For the total sample, all but one H value exceeded .3; for the remaining item an H value of .295 was found. Taken together, these findings provide support for a weak to acceptable unidimensional scale. Figure 1 shows the screeplot accompanying the EKC results for the total sample. A sharp drop can be observed between the first and second component. Furthermore, the observed eigenvalue λ was higher than the reference value only for the first component (total sample: λ1 = 2.71 > EKC1 = 1.03, λ2 = .88 < EKC2 = 1.00; men: λ1 = 2.84 > EKC1 = 1.18, λ2 = .96 < EKC2 = 1.00; women: λ1 = 2.55 > EKC1 = 1.11, λ2 = .86 < EKC2 = 1.00). The EKC findings show very clear support for a unidimensional solution.

Figure 1.

Scree plot of eigenvalues. Empirical Kaiser criterion reference line depicted in gray.

Scree plot of eigenvalues. Empirical Kaiser criterion reference line depicted in gray. Since both the MSA and EKC results provided support for a unidimensional scale, the IRT analyses were performed using the unidimensional GRM, which showed good model fit (root mean square error of approximation = .04; Tucker-Lewis index = .98; comparative fit index = .99).

Item Parameters and Local Reliability

The item parameters estimated with the GRM are reported in Table 2. Characteristic of clinical settings (Reise & Waller, 2009), the discrimination parameters are fairly high, and so are the threshold values; in other words, the items discriminate well but mostly at the upper range of the latent trait scale. This finding is also illustrated by the test information function, which shows that the highest information (local reliability) is found for latent trait values between 1 and 3 (see Figure 2). Hence, this is the zone best targeted by the test where we can differentiate between patients with a high degree of precision. This matches well with the purpose of a diagnostic instrument: distinguishing patients with moderate from those with high antisocial personality scores.

Table 2.

Item Parameters Based on the Graded Response Model for the SCID-II ASPD Items.

Item i	Item content	a	b_i ₂	b_i ₃
#1	Failure to conform	2.33	1.22	1.77
#2	Deceitfulness	2.06	1.83	2.59
#3	Impulsivity	1.87	1.35	2.14
#4	Aggressiveness	1.53	1.50	2.62
#5	Reckless disregard	1.60	1.13	1.95
#6	Irresponsibility	1.82	1.65	2.64
#7	Lack of remorse	2.35	1.92	2.62

Note. SCID-II = Structured Clinical Interview for Diagnostic and Statistical Manual of Mental Disorders, 4th Edition, Axis II PD; ASPD = antisocial personality disorder. a = estimated discrimination parameter; b = estimated threshold parameter for item i indicating the position on the latent trait scale for which a patient would have 50% probability of being assigned a score greater than or equal to j. Following the SCID-II manual, responses were coded as 1, 2, or 3. Since the probability of scoring in Category 1 or higher equals 1, only b2 and b3 are reported.

Figure 2.

Test information function based on the graded response model, with estimated latent trait values () on the x-axis, and information conditional on on the y-axis. Four lines have been drawn horizontally to indicate which information scores correspond to a reliability estimate of 0.6, 0.7, 0.8, and 0.9, respectively.

Item Parameters Based on the Graded Response Model for the SCID-II ASPD Items. Note. SCID-II = Structured Clinical Interview for Diagnostic and Statistical Manual of Mental Disorders, 4th Edition, Axis II PD; ASPD = antisocial personality disorder. a = estimated discrimination parameter; b = estimated threshold parameter for item i indicating the position on the latent trait scale for which a patient would have 50% probability of being assigned a score greater than or equal to j. Following the SCID-II manual, responses were coded as 1, 2, or 3. Since the probability of scoring in Category 1 or higher equals 1, only b2 and b3 are reported. Test information function based on the graded response model, with estimated latent trait values () on the x-axis, and information conditional on on the y-axis. Four lines have been drawn horizontally to indicate which information scores correspond to a reliability estimate of 0.6, 0.7, 0.8, and 0.9, respectively. The category response curves, which depict the probability of choosing a particular response category as a function of the latent trait (here: antisocial personality), clearly indicate that the middle category (Category 2) does not get endorsed often; it hardly ever has a higher probability of being chosen compared with Category 1 or 3. This is the case for all items. Figure 3 contrasts one of the empirical category response curve sets from our study to a hypothetical ideal set where all categories contribute information. The corresponding item information curves in Figure 4 illustrate that hypothetically, polytomous items have the potential to provide information across a wider range of the latent trait (i.e., multiple thresholds imply multiple peaks) when compared with dichotomous items (i.e., only one threshold = maximally one peak). Our data illustrate, however, that polytomous items where one of the categories is hardly ever the dominant category may not have much added value as compared with dichotomous items in estimating a patient’s position on the latent trait scale.

Figure 3.

Figure 4.

Item information functions for (1) Item #2 (deceitfulness) using all three categories (solid line), (2) the same item but now ignoring Category 2 (gray dotted line), and (3) a hypothetical item where all categories contribute information (dashed black line).

Category characteristic curves for (1) a hypothetical item where all categories contribute information (Left) and (2) Item 2 (deceitfulness), which shows that Category 2 hardly contributes any information (Right). In the left plot, each category is the most dominant one (highest probability of being selected) for a range of latent trait values; in the right plot Categories 1 and 3 clearly dominate Category 2. Item information functions for (1) Item #2 (deceitfulness) using all three categories (solid line), (2) the same item but now ignoring Category 2 (gray dotted line), and (3) a hypothetical item where all categories contribute information (dashed black line). To compare the DSM-IV scoring rule with IRT-based scoring, we calculated the latent trait score distributions for patients scoring below and above the DSM-IV ASPD cutoff rule (a “3” on at least three items). Figure 5 shows the latent trait distributions for all possible number of 3s: from zero to seven. As expected, the means of the latent trait increase as the number of items on which a “3” is scored increases. However, the figure also indicates that there is still some variability in IRT-based scores within most of the groups. If we focus on the groups near the DSM-IV cutoff, it can be seen that there is still quite some overlap in score distributions. More specifically, for persons with exactly three 3s, the specific items on which they score these 3s matter when it comes to calculating their IRT-based scores. Looking at Table 2, we can see that the bvalues for Items 1, 5, and 3 are markedly lower than those for Items 4, 6, and 7; this means that scoring a “3” on Items 1, 5 and 3 would result in a substantially lower latent trait score than scoring a “3” on Items 4, 6, and 7.

Figure 5.

Boxplots showing the distribution of latent trait values () for seven subgroups in the sample; patients were assigned to these subgroups on the basis of the number of criteria they scored a “3” on. Note that at least three 3s are needed in order to qualify for a diagnosis of antisocial personality disorder (cutoff). To facilitate interpretation, the boxplots for patients scoring above the cut-off are printed in gray.

Differential Item Functioning Across Gender

The first step in examining whether there was DIF for any of the items was comparing the gender-equivalent model (item parameters constrained to be equal) with the gender-nonequivalent model (unconstrained item parameters, equal means and standard deviations). Table 3 shows all the models that were estimated, and which model comparisons were made. The gender-equivalent model is used as the reference model in most cases and was therefore labeled as Model 0. The gender-nonequivalent model showed a significantly better fit compared with the equivalent model. Further model comparisons indicated that models in which the item parameters of Items 3 (impulsivity) and 5 (reckless disregard) were unconstrained (free to vary over groups) showed a significantly better fit than the equivalent model. This indicates that there was DIF for these items. The model where the item parameters of both these items were free to vary across groups was not significantly different from the nonequivalent model. This indicates that it was sufficient to relax the equivalence constraints on the item parameters for these two items (and constrain the other item parameters to be equal across groups).

Table 3.

Overview of the Differential Item Functioning Model Comparison Results.

	Model	LL	Reference model used for comparison	df	χ	p
0	Equivalent	H-
1	Nonequivalent	−9,818	Equivalent	19	97.39	<.001
2	DIF: Item #1	−9,866	Equivalent	3	1.80	.615
3	DIF: Item #2	−9,866	Equivalent	3	.98	.807
4	DIF: Item #3	−9,843	Equivalent	3	47.82	<.001
5	DIF: Item #4	−9,864	Equivalent	3	4.88	.181
6	DIF: Item #5	−9,837	Equivalent	3	59.20	<.001
7	DIF: Item #6	−9,866	Equivalent	3	.99	.803
8	DIF: Item #7	−9,866	Equivalent	3	.95	.812
9	DIF: Items #3 and #5	−9,822	Equivalent	6	89.38	<.001
9	DIF: Items #3 and #5	−9,822	Nonequivalent	13	8.02	.843

Overview of the Differential Item Functioning Model Comparison Results. The Wald tests showed that there was no evidence for nonuniform but only for uniform DIF. In other words, the DIF only affected the thresholds and not the discrimination parameters (Item 3: Δa = .24, p = .189; Item 5: Δa = .13, p = .246). The thresholds for Item 3 were higher for male patients (Δb = .33/.16, p = <.001/.006), whereas the thresholds for Item 5 were lower for male patients (Δb = −.53/−.55, p = <.001/<.001). To facilitate understanding of the effect size of these parameter differences we calculated them in terms of response probabilities as well (the difference between the category response curves). Females had on average a probability that was .07 higher than that of males (with similar θ scores) to score in Category 3 on Item 3, with a maximum probability difference of .17. DIF on Item 5 was associated with an average probability difference in favor of males of .14 to score in Category 3, with a maximum probability difference of .21. Summarizing, for a given latent trait level, female patients were more likely to be perceived as being impulsive (Item 3), while male patients were more likely to be perceived as being reckless (Item 5). Using an IRT model that ignored DIF (constraining the item parameters to be equal across groups), resulted in a lower mean θ for females compared with males (Δ = −.56, p < .001). After having corrected for DIF, the group difference was somewhat smaller but still significantly different from zero (Δ = −.52, p < .001). No difference was found in variance.

Diagnostic Relevance of Conduct Disorder

Finally, we compared latent trait distributions for four diagnostic groups: (1) patients with three or more ASPD criteria and CD (ASPD-CD), (2) patients with three or more ASPD criteria without CD (ASPD-late onset), (3) patients with fewer than three ASPD criteria with CD (CD-only), and (4) patients with fewer than three ASPD criteria and absence of CD (noASPD-noCD). The results are displayed in Figure 6. Using the noASPD-noCD as a reference group, the following 95% confidence intervals were found: [1.2, 1.6] for CD-only; [2.5, 3.8] for ASPD-late onset; and [2.8, 3.9] for ASPD-CD. If the confidence interval contains the value 0, this means that they do not significantly differ from the reference group with respect to average latent trait score. If the confidence intervals overlap, this indicates that the groups in question are not significantly different from each other. In this case, the only confidence intervals overlapping were those of ASPD-late onset and ASPD-CD. This can also be seen in Figure 6: the score distributions of the two ASPD groups are both clearly situated at the high end of the latent trait continuum, followed by the CD-only group and finally the noASDP-noCD group which was placed around the midpoint of the scale. Notice that the location of the peak of the TIF matches well with the location on the trait scale of the two ASPD groups.

Figure 6.

Boxplots showing the distribution of latent trait values () for four diagnostic subgroups in the sample.

Discussion

This study of a large clinical sample sought to examine the psychometric properties of the ASPD criteria as defined by DSM-IV and assessed by the SCID-II (First, 1994). The results of the analyses indicate that ASPD is a unidimensional construct that can be measured reliably at the upper range of the latent trait scale. There was some DIF across gender, but this had little impact on the latent ASPD trait level and was restricted to two items, that is, Items 3 (impulsivity) and 5 (reckless disregard). Patients with three or more ASPD criteria without CD (ASPD-late onset) had similar levels of the underlying antisocial dimension as ASPD according to DSM-IV (ASPD-CD). Our IRT analyses showed that the SCID-II ASPD items had good item discrimination and covered the upper range of the latent trait scale, from 1 SD above the mean to 2.5 SDs above the mean. This fits the purpose of the DSM criteria well: differentiating among people with and without the disorder. If the aim would be to differentiate along the entire scale (low from average severity, average from high, etc.), new items would have to be added with lower item threshold values. In accordance with previous studies (Bolt et al., 2004; Cooke & Michie, 1997; Jane et al., 2007), items that revealed DIF were more behavior focused. More specifically, female patients were more likely to be considered as impulsive compared with male patients with similar ASPD scores. Male patients, on the other hand, were more likely to be considered as reckless compared with female patients with similar ASPD scores. Importantly, the effect of the DIF we found did not have a substantial effect on latent trait scores at the group level. This is in line with the statement made by Reise and Waller (2009, p. 38): “ . . . the presence of item-level DIF does not necessarily lead to bias at the level of scale scores.” Nevertheless, we concur with Reise and Waller (2009) and Orlando and Marshall (2002) that it is important to test for and detect DIF rather than to ignore potential DIF-related problems, even if DIF does not always lead to bias at the scale/group level. In our sample, there was a marked gender imbalance (72% women). Ideally, from a statistical viewpoint, one would prefer to have equal group sizes when studying DIF. However, if both groups are sufficiently large, the group imbalance has much less impact than it would have in smaller samples. In our study, the smallest group was still quite large (N = 924), so we are confident that we had sufficient power to detect DIF. As mentioned in the introduction, Jane et al. (2007) found DIF for three of the seven adult ASPD criteria. Of these three items, only one showed DIF in our sample: recklessness. In contrast to Jane and colleagues, we also found DIF for impulsivity (where they found none). Another difference is that the DIF found by Jane and colleagues all went in the same direction: men were more likely to endorse the DIF items than women (for similar trait levels). However, in our study the two DIF items had opposite directionality. DIF of the recklessness item could be explained by the operationalization of the ASPD items in the SCID-II, that is, the recklessness questions in the SCID-II are focused on driving behavior and unsafe sex, which might be considered as examples of male-like behavior. DIF of the impulsivity item might be explained by the fact that our study concerns a clinical sample with a high prevalence of borderline PD. High comorbidity rates between ASPD and borderline PD is a common phenomenon in clinical samples of patients with severe personality pathology (Bateman et al., 2016). Overall, our results do not support the assertion of Jane et al. (2007) that the current ASPD criteria do not adequately reflect how the construct is expressed in women. In clinical populations, measurement bias across gender may be less prominent, at least when assessed by experienced clinicians using a structured clinical interview. For patients with high latent antisocial trait values, being diagnosed with CD did not lead to a further increase in trait levels. This finding corroborates earlier reports from clinical samples that did not find clinically significant differences between antisocial patients with and without CD (Black & Braun, 1998; Perdikouri et al., 2007). However, this finding is at odds with the study of Walters and Knight (2010), who found that the presence of CD was associated with more severe antisociality. This discrepancy might be explained by methodological differences, since Walters and Knight included a more comprehensive assessment of antisocial features, for example, criminal thinking style and egocentricity. It might also be due to sample differences, that is, a forensic sample with only male individuals versus a clinical sample with predominantly female patients of whom most had a PD. In our sample, the CD-only patients (patients with childhood CD but without ASPD) had higher levels of the latent ASPD trait than the noASPD-noCD patients. These results suggest that even though the majority of children with CD may not develop a “full-blown” ASPD (Robins, 1978), they are still at risk for developing antisocial traits. Moreover, the ASPD-related traits and behavior of what we labeled the ASPD-late onset group may not be adequately addressed/recognized since these patients did not receive a formal diagnosis (in spite of their high latent scores). It should be kept in mind, however, that the CD criteria were assessed retrospectively. It is uncertain to what degree a retrospective CD diagnosis accurately reflects the presence of CD during childhood. Furthermore, the CD criteria were assessed in concordance with the SCID-II interview, which requires the presence of at least three CD criteria. In DSM-IV and DSM-5, however, the required number of CD criteria is not explicitly specified. It is an implicit assumption, supported by empirical research, that personality traits lie along a continuum (Widiger & Simonsen, 2005). Accordingly, in the SCID-II (First, 1994), personality traits are rated within one of three response categories: 1 = not present/do not fulfill; 2 = partly true/subthreshold; and 3 = personality trait present. IRT provides a means of analyzing how subthreshold scores may be helpful in assessing ASPD. By using the graded response model, we found that the scoring of subthreshold criteria did not result in additional/richer information compared with using Categories 1 and 3 only. Since the current version of the SCID-II is not accompanied by guidelines as to how subthreshold diagnostic values should be scored, the use of subthreshold values might have been confounded by how the diagnostic rules were used by the clinicians participating in this study. For example, it may be that clinicians assessed the middle category less carefully, since clear guidelines are lacking. Another possibility is that subthreshold scores may have been used intentionally at times, to avoid setting an ASPD diagnosis. We suggest that in future versions of the SCID-II, items should either be rated dichotomously, or there should be clear rules regarding the use of a middle category (i.e., they should be taken into account in the diagnostic process). A recent study (Huprich, Paggeot, & Samuel, 2015) used IRT analyses to compare the SCID-II borderline personality disorder scale to the corresponding scale in the Personality Disorder Interview for DSM-IV (PDI-IV; Widiger, Mangine, Corbitt, Ellis, & Thomas, 1995). For both interviews, items are scored on a 3-point Likert-type scale; but in contrast to the SCID-II, the PDI-IV is accompanied by explicit scoring guidelines also for the middle category. The middle category takes on a different meaning in the PDI-IV as compared with how it is treated in the SCID-II, however. The middle response options from the PDI-IV are verbally most similar to the highest response option for the SCID-II: They indicate the presence of a criterion, as does the highest response category of the SCID-II. This was also reflected by the IRT parameters that were found for the two measures: SCID-II items allowed for higher precision in the subthreshold range, whereas the PDI-IV items covered a broader range of latent trait values (and thus provided more information about individuals scoring above the diagnostic threshold as compared with the SCID-II). Although these findings are highly interesting, and show that the choice of diagnostic interview can influence how disorders are diagnosed, it is important to keep in mind that the middle category may not be used in a consistent manner for the SCID-II; which may have influenced the results. In the past decades, increased recognition of the limitations of the categorical approach paired with an increasing body of empirical evidence supporting the continuum approach has resulted in a call for abandoning the categorical model in favor of a continuous one (Hopwood et al., 2011; Tyrer et al., 2011; Widiger & Simonsen, 2005). In the alternative DSM-5 model for PDs (APA, 2013), continuous scores are reported in addition to categorical ones (Skodol, Morey, Bender, & Oldham, 2015). This approach can be supported by IRT analysis. Taking our results as an example, one would have information about whether or not the formal criteria for ASPD were fulfilled as well as to what degree a patient scored relatively high or low on an antisocial trait/behavior continuum. In sum, the results of our study suggest that ASPD can be measured with minimal measurement bias across gender in clinical samples—at least when assessed by experienced clinicians using the SCID-II. Overall, the SCID-II ASPD items appear to fit the purpose of the DSM well, that is, differentiating among persons in the upper ranges of the latent trait continuum (ASPD). If the aim would be to differentiate among individuals with less severe antisocial personality features, items would have to be added with lower item threshold values. Finally, we did not find differences in score distributions between ASPD-CD and ASPD-late onset groups. In other words, these two groups show similarly high levels of antisocial behavior and antisocial traits, irrespective of childhood diagnosis of CD.

47 in total

1. A taxometric analysis of the latent structure of psychopathy: evidence for dimensionality.

Authors: Jean-Pierre Guay; John Ruscio; Raymond A Knight; Robert D Hare
Journal: J Abnorm Psychol Date: 2007-11

2. The impact of extended longitudinal observation on the assessment of personality disorders.

Authors: G Pedersen; S Karterud; B Hummelen; T Wilberg
Journal: Personal Ment Health Date: 2013-05-27

3. Interrater reliability and internal consistency of the structured clinical interview for DSM-IV axis II personality disorders (SCID-II), version 2.0.

Authors: C Maffei; A Fossati; I Agostoni; A Barraco; M Bagnato; D Deborah; C Namia; L Novella; M Petrachi
Journal: J Pers Disord Date: 1997

4. The Alternative DSM-5 Model for Personality Disorders: A Clinical Application.

Authors: Andrew E Skodol; Leslie C Morey; Donna S Bender; John M Oldham
Journal: Am J Psychiatry Date: 2015-07 Impact factor: 18.112

5. Co-occurrence of 12-month mood and anxiety disorders and personality disorders in the US: results from the national epidemiologic survey on alcohol and related conditions.

Authors: Bridget F Grant; Deborah S Hasin; Frederick S Stinson; Deborah A Dawson; S Patricia Chou; W June Ruan; Boji Huang
Journal: J Psychiatr Res Date: 2005-01 Impact factor: 4.791

6. Gender bias in diagnostic criteria for personality disorders: an item response theory analysis.

Authors: J Serrita Jane; Thomas F Oltmanns; Susan C South; Eric Turkheimer
Journal: J Abnorm Psychol Date: 2007-02

7. DSM-IV conduct disorder criteria as predictors of antisocial personality disorder.

Authors: Heather L Gelhorn; Joseph T Sakai; Rumi Kato Price; Thomas J Crowley
Journal: Compr Psychiatry Date: 2007-08-22 Impact factor: 3.735

8. Psychopathic, not psychopath: taxometric evidence for the dimensional structure of psychopathy.

Authors: John F Edens; David K Marcus; Scott O Lilienfeld; Norman G Poythress
Journal: J Abnorm Psychol Date: 2006-02

9. Antisocial patients: a comparison of those with and those without childhood conduct disorder.

Authors: D W Black; D Braun
Journal: Ann Clin Psychiatry Date: 1998-06 Impact factor: 1.567

10. A randomised controlled trial of mentalization-based treatment versus structured clinical management for patients with comorbid borderline personality disorder and antisocial personality disorder.

Authors: Anthony Bateman; Jennifer O'Connell; Nicolas Lorenzini; Tessa Gardner; Peter Fonagy
Journal: BMC Psychiatry Date: 2016-08-30 Impact factor: 3.630

3 in total

1. The effects of 5-HTTLPR/rs25531 serotonin transporter gene polymorphisms on antisocial personality disorder among criminals in a sample of the Turkish population.

Authors: Irmak Sah; Emel Hulya Yukseloglu; Nese Kocabasoglu; Burcu Bayoglu; Emre Cirakoglu; Mujgan Cengiz
Journal: Mol Biol Rep Date: 2021-01-15 Impact factor: 2.316

2. A DSM-5 AMPD and ICD-11 compatible measure for an early identification of personality disorders in adolescence-LoPF-Q 12-18 latent structure and short form.

Authors: Ronan Zimmermann; Martin Steppan; Johannes Zimmermann; Lara Oeltjen; Marc Birkhölzer; Klaus Schmeck; Kirstin Goth
Journal: PLoS One Date: 2022-09-21 Impact factor: 3.752

3. Lower Digit Ratio (2D:4D) Indicative of Excess Prenatal Androgen Is Associated With Increased Sociability and Greater Social Capital.

Authors: Verena N Buchholz; Christiane Mühle; Johannes Kornhuber; Bernd Lenz
Journal: Front Behav Neurosci Date: 2019-12-05 Impact factor: 3.558

3 in total