Literature DB >> 34424799

The 25-Item Ontario Child Health Study Emotional Behavioural Scales-Brief Version (OCHS-EBS-B): Test-Retest Reliability and Construct Validity When Used as Categorical Measures.

Michael H Boyle¹, Laura Duncan¹, Li Wang¹, Katholiki Georgiades¹.

Abstract

OBJECTIVE: Child and youth mental health problems are often assessed by parent self-completed checklists that produce dimensional scale scores. When converted to binary ratings of disorder, little is known about their psychometric properties in relation to classifications based on lay-administered structured diagnostic interviews. In addition to estimating agreement, our objective is to test for statistical equivalence in the test-retest reliability and construct validity of two instruments used to classify child emotional, behavioural, and attentional disorders: the 25-item, parent completed Ontario Child Health Study Emotional Behavioural Scales-Brief Version (OCHS-EBS-B) and the Mini International Neuropsychiatric Interview for Children and Adolescents-parent version (MINI-KID-P).
METHODS: This study draws on independent samples (n = 452) and uses the confidence interval approach to test for statistical equivalence. Reliability is based on kappa (κ). Construct validity is based on standardized beta coefficients (β) estimated in structural equation models.
RESULTS: The average differences between the MINI-KID-P and OCHS-EBS-B in κ and β were -0.022 and -0.020, respectively. However, in both instances, criteria for statistical equivalence were met in only 5 of 12 comparisons. Based on κ, between-instrument agreement on the classifications of disorder went from 0.481 (attentional disorder) to 0.721 (emotional disorder) but were substantially higher (0.731 to 0.895, respectively) when corrected for attenuation due to measurement error.
CONCLUSIONS: Although falling short of equivalence, the results suggest on balance that the reliability and validity of the two instruments for classifying child psychiatric disorder assessed by parents are highly comparable. This conclusion is supported by the high levels of agreement between the instruments after correcting for attenuation due to measurement error.

Entities: Chemical

Keywords: child psychiatric disorder; measurement; problem checklist; reliability; structural equation modelling; structured diagnostic interview; validity

Mesh：

Year: 2021 PMID： 34424799 PMCID： PMC9014671 DOI： 10.1177/07067437211037125

Source DB: PubMed Journal: Can J Psychiatry ISSN： 0706-7437 Impact factor: 5.321

Introduction

In the 1980s, concern about the reliability of psychiatric diagnoses based on clinical judgement led to the development of structured diagnostic interviews (SDIs) to classify child and youth (herein child/ren) psychiatric disorders as present or absent. Accompanying the development of SDIs was a debate on the nature of child psychiatric disorder: was it a true class in the taxonomic sense or an underlying continuum?[1,2] No evidence has emerged to indicate that common disorders such as generalized anxiety disorder (GAD) or major depressive disorder (MDD), conduct disorder (CD) or oppositional defiant disorder (ODD), and attention-deficit hyperactivity disorder (ADHD) represent true classes. Angold et al. reported large differences in prevalence for the same individuals across SDIs using identical diagnostic criteria, indicating that classifications of child disorder depend on the way instrument developers convert symptoms into classifications. Despite the challenges of classifying disorder, they are needed by clinicians for setting treatment priorities, administrators for making resource allocation decisions, and researchers for conducting epidemiological studies. Pickles and Angold have called for instruments that can measure child psychiatric disorder dimensionally and classify these disorders as present or absent. Self-completed problem checklists have provided dimensional measures of child psychiatric disorder for decades and exhibit many practical advantages; they are brief, simple, and inexpensive to implement; pose little burden to respondents; and collect information on all symptoms. Although dimensional measures can be converted into classifications of disorder by choosing cut points or thresholds, they are used rarely in this way because SDIs are considered the “gold standard” for doing so.[5,6] In contrast to checklists, SDIs are expensive to implement and burdensome to respondents, making them difficult to use in either community child mental health agencies faced with limited budgets or epidemiological studies faced with increasing nonresponse. There are theoretical reasons and empirical evidence for expecting classifications of disorders derived from problem checklists to be as reliable and valid as classifications obtained using SDIs. The overall test-retest reliability of SDIs is modest (κ = 0.58), highly variable across studies, and within reach of scale scores converted to classifications. Furthermore, past empirical studies have been unable to identify validity differences between the two approaches.[11-14] The objective of this paper is to determine if scale scores from the 25-item, parent completed Ontario Child Health Study Emotional Behavioural Scales-Brief Version (OCHS-EBS-B), converted to binary classifications of these problems, are as reliable and valid as corresponding disorders classified by the Mini International Neuropsychiatric Interview for Children and Adolescents-parent version (MINI-KID-P). In this study, we focus on lay-administered SDIs used in epidemiological studies and clinical research to classify psychiatric disorder as present or absent. This focus excludes the important role that SDIs can play when used by clinicians as aids to the diagnostic process. We hypothesize that (1) between-instrument differences in test-retest reliability and construct validity will be small and nonsignificant (statistically equivalent); or, and (2) agreement between instruments in the classifications of disorder will be substantial after correcting for attenuation due to measurement error. The past empirical studies cited above exhibit one or more of the following limitations: relatively small sample sizes, the absence of reliability comparisons, the use of data “eyeballing” to make inferences, and measurement error compressing associations downwards. We have overcome these limitations by: combining independent samples to obtain a relatively large number of participants, comparing the test-retest reliability of the two instruments in the same time interval (1 to 2 weeks); using methods to correct for the attenuating effects of measurement error and implementing formal empirical tests of measurement equivalence.

Methods

The two studies used in this report are described in Table 1: study A and study B.

Table 1.

Study and Sample Characteristics.

Study	A	B	Total
Study period	2011-2013	2014-2015
Sample size (% response)	284 (∼17%)	180 (∼50%)	464	χ²
Sample source
(1) General population (%)	8 Schools (65%)	CCTB
(2) Outpatient clinic (%)	2 Clinics (35%)
Sample characteristics
n for analysis	277	175	452
% Male	43.0	51.4	46.2	3.1
% 12–18	85.9	50.9	72.3	65.9‡
% Single parent	29.2	17.1	24.6	8.5†
% Receiving social assistance	13.5	5.7	10.5	6.8†
Disorders
OCHS-EBS-B
% Emotional	32.1	4.6	21.5	48.3‡
% Behavioural	26.0	4.0	17.5	36.0‡
% Attentional	9.0	0.6	5.8	14.1‡
MINI-KID-P
% Emotional	26.7	4.0	17.9	37.6†
% Behavioural	25.6	3.4	17.0	37.4‡
% Attentional	9.4	5.7	8.0	1.8
Validity variables
Youth classifications of disorder
% Emotional	20.9	10.2	18.4	5.1 +
% Behavioural	10.8	8.0	10.2	0.6
% Attentional	4.0	3.5	3.9	0.0
Current use of medications
% Anxiety/depression	14.4	0.0	8.8	27.7‡
% Behaviour	7.9	2.9	6.0	4.9 +
% ADHD	12.3	1.7	8.2	15.9‡
Mental-health related service contacts
% School	30.3	17.1	25.2	9.86†
% Social services	35.0	8.6	24.8	40.24‡
% Health professionals	42.6	8.0	29.2	62.10†
Family psycho-social risk
Maternal depression: mean (SD)^a	18.6 (6.5)	3.4 (3.3)	0.0 (1.0)	n/a
Family dysfunction: mean (SD)^a	20.8 (5.4)	18.7 (4.8)	0.0 (1.0)	n/a

ADHD: attention-deficit hyperactivity disorder; CCTB: Canadian Child Tax Benefit File; MINI-KID-P: Mini International Neuropsychiatric Interview for Children and Adolescents-parent version; OCHS-EBS-B: Ontario Child Health Study Emotional Behavioural Scales-Brief checklist; SD: standard deviation.

Converted to z-scores for analysis; +<0.05, †<0.01, ‡<0.001.

Study and Sample Characteristics. ADHD: attention-deficit hyperactivity disorder; CCTB: Canadian Child Tax Benefit File; MINI-KID-P: Mini International Neuropsychiatric Interview for Children and Adolescents-parent version; OCHS-EBS-B: Ontario Child Health Study Emotional Behavioural Scales-Brief checklist; SD: standard deviation. Converted to z-scores for analysis; +<0.05, †<0.01, ‡<0.001.

Participants

Of the 464 parent participants in the two studies, 452 had complete assessment data on the MINI-KID-P and OCHS-EBS-B on two occasions (Table 1)—meeting the inclusion criterion for our analyses. Net response in study A was 17% and in study B, 50%.

Ethical Considerations

Study A was approved by Research Ethics Committees at McMaster University, the School Boards and participating service providers. Informed written consent was obtained from participants. Study B procedures, including consent and confidentiality requirements, were approved by the Chief Statistician at Statistics Canada and were conducted according to the Statistics Act. Informed verbal consent was obtained.

Mental Health Disorders

Ontario Child Health Study Emotional Behavioural Scales-Brief Version. The OCHS-EBS-B is a 25-item problem checklist for completion by parents/caregivers of 4 to 17 years old. It is comprised of rating scales that measure three types of mental health problems : emotional (8 items), behavioural (10 items), and attentional (7 items). Item response options are 0, 1, 2 (“never or not true,” “sometimes or somewhat true,” and “often or very true”). Raw scores are summed to form scale scores measuring each type of problem. The development and evaluation of the OCHS-EBS-B scales appear in the Appendix. To convert the OCHS-EBS-B scale scores into classifications of disorder, their frequency distributions were examined in the 2014 Ontario Child Health Study to find the score cut-points which identified children in the top 7.8% (emotional: 6+), 5.7% (behavioural: 6+), and 3.4% (attentional: 10+) that matched the world-wide prevalence of child major depressive or anxiety disorders, CD or ODD, and ADHD. These cut-points were applied in each study to convert the OCHS-EBS-B scale scores into classifications of disorder independent of the MINI-KID-P. Mini International Neuropsychiatric Interview for Children and Adolescents. The MINI-KID-P (parent version) and MINI-KID-Y (youth version) are SDIs that assess DSM-IV-TR disorders in children aged 6 to 17 years with a reference period of 6 months, except for CD which is one year. Respondent specific test-retest reliability based on kappa (κ ) goes from 0.67 (MDD) to 0.77 (ADHD) for parents and from 0.46 (ADHD) to 0.64 (MDD) for youth. In our study, MINI-KID-P classifications of disorder were coded as (1) present or (0) absent and grouped as follows: emotional (MDD or GAD), behavioural (CD or ODD) and attentional (ADHD) to match the OCHS-EBS-B problem types.

Construct Validity Variables

Construct validity is a general approach to validation that works from a set of fundamental principles grounded in the philosophy of science to evaluate the meaning (informational content) of a measured construct by examining its strength of association with other variables. We focus on variables expected to be positively associated with the disorders and test measurement equivalence by comparing associations between latent variable measures of disorder based on the MINI-KID-P and OCHS-EBS-B classifications of disorder with latent variable measures of youth reported classifications of disorder; and three related constructs measured as latent variables: current use of prescription medications for child mental health problems; child mental health-related service contacts with schools, service organizations or health professionals; and family psycho-social risk. Youth Classifications of Disorder. Youth were administered the MINI-KID-Y at times 1 and 2 and classified with disorder as present or absent at each administration using the same groupings created for parents: emotional (MDD or GAD), behavioural (CD or ODD), and attentional (ADHD). Current use of Prescription Medications for Mental Health Problems. At time 1, a latent variable measure of medication use was based on three binary indicator variables derived from parent interview responses to a stem question “Is currently taking any prescribed medication?” and follow-up “What does take this medication for: (a) hyperactivity, (b) behavioural problems, (c) depression or anxiety?” Mental Health-related Service Contacts. At time 1, a latent variable measure of mental health-related service contacts was based on three binary indicator variables derived from parent interview responses to the following groups of questions: (1) “In the past 6 months, did , or you see or talk to anyone from school about any emotional or behavioural problems?” (contact with schools); (2) “Has ever been away overnight in: a foster or group home? a detention centre or juvenile centre; a police station or jail?”, “In the past 6 months did , or you see or talk to anyone: from the Children’s Aid Society; from court, probation or other justice services; from some other service for children with emotional or behavioural problems?” (contact with service organizations); (3) “During the past 6 months … did , or you, personally see any of the following about emotional, behavioural or learning problems: a paediatrician or family doctor (general practitioner); a psychiatrist, psychologist or social worker?” (contact with health professionals). Family Psycho-social Risk. At time 1, a latent variable measure of family psycho-social risk was based on scale score measures of maternal depressed mood and family dysfunction derived from parent responses on a self-completed questionnaire. In study A, maternal depressed mood is measured by the 12-item adaptation of the Center for Epidemiologic Studies Depression (CES-D) Scale and in study B, by the K6. Used often in empirical studies, the characteristics and psychometric properties of both measures are well described. In studies A and B, internal consistency reliabilities (ICR: alpha) are 0.89 (CES-D) and 0.79 (K6). Family dysfunction is measured in both studies by the 12-item general functioning subscale of the McMaster Family Assessment Device. The items describe family behaviour and relationships in six dimensions: problem-solving, communication, roles, affective responsiveness, affective involvement and behavioural control. Item responses go from (1) strongly agree to (4) strongly disagree. Positive items are reverse-coded; and all items, summed to form scale scores. ICRs were 0.89 (study A) and 0.83 (study B). The scale correlates predictably with alternative measures of family functioning.

Analysis

We use the confidence interval (CI) approach to test our hypotheses of statistical equivalence in the reliability and validity of the MINI-KID-P and OCHS-EBS-B. This test computes between-instrument mean differences in the parameters of interest (i.e., the standard errors (SEs) of these differences and their 90% CIs. These 90% CIs are placed within an equivalence region called the smallest effect size of interest (SESOI) that represents between-instrument differences that are smaller than what is considered meaningful. If the 100(1 − 2α)% CI for the difference lies entirely within the equivalence region, the alternative hypothesis (H1) of no difference is accepted, as illustrated here: [H0 |← H1 →| H0]. We have set the SESOI for testing between-instrument differences in reliability and validity at ±0.10 and ±0.15, respectively. In our view, reliability differences within 0.10 are too small to be meaningful while validity differences within 0.15 represent <2.25% of explained variance associated with the disorders and are too small to be meaningful.

Test-Retest Reliability

We use κ, a chance-corrected measure of agreement scaled from 0 to 1 to quantify the test-retest reliabilities of the MINI-KID-P and OCHS-EBS-B for classifying the disorders. In estimating κ, we account for dependency attributable to the same respondent completing both instruments by using the weighted least-squares (WLS ) approach. To determine if the reliability of the two instruments are statistically equivalent between studies, recruitment samples (general population vs. clinical), child sex or age, we extend the WLS approach in separate analyses to estimate and compare, one at a time, the sub-group specific reliabilities for the MINI-KID-P and OCHS-EBS-B (e.g., males vs. females). In these analyses, to increase statistical power by decreasing the SEs of the estimates, the three disorder types are analysed altogether rather than one at a time. Each of the 452 participants contributes three data points, one for each disorder type, to the reliability estimates. The number of participants remains the same (n = 452) but the number of observations triples (442 × 3 = 1,326). The nesting or clustering of observations within participants (intra-class correlation) is taken into account in the statistical tests.

Agreement Between Instruments

We use κ to quantify between-instrument agreement in the classification of disorders at times 1 and 2. Based on the instrument test-retest reliabilities observed in our study and the sample correlation method described by Moss , we also show the estimated levels of agreement and their 95% CIs corrected for attenuation due to measurement error.

Construct Validity

We use β coefficients derived from structural equation models (SEMs) to estimate and compare the strength of association of the construct validity variables with the disorders classified by the MINI-KID-P and OCHS-EBS-B. All of the variables in the SEMs are measured as latent variables. A latent variable is unobserved—a prediction based on the pattern of associations among a set of observed indicator variables. The variance associated with a latent variable is free of measurement error in the sense that residual variance associated with the indicator variables, not predictive of the latent variable, is removed. As illustrated in Figure 1A, each disorder (emotional, behavioural and attentional) classified by the MINI-KID-P and OCHS-EBS-B is measured as a latent variable based on its time 1 and 2 classifications (indicator variables). The latent variable measures of disorder based on youth interviews draw on their time 1 and 2 MINI-KID-Y classifications in exactly the same way. As a result, the SEMs involving youth classifications are specific to each disorder. The construct validity variables are also measured as latent variables (Figure 1B). These latent variables draw only on indicator variables assessed at time 1. In the comparison of the MINI-KID-P and OCHS-EBS-B, each disorder is modelled as a function of the same construct validity variable (e.g., current use of prescription medications examined separately for emotional, behavioural and attentional disorders). These three construct validity variables are expected to be associated generally with all of the disorders. Although prescription medications are linked with individual disorders, we created a latent variable measure of prescription medications to reduce the impact of misclassification errors associated with parent responses.

Figure 1.

Illustration of latent variable measures developed for SEMs. (A) Parent assessment of behavioural problems based on MINI-KID-P classifications of CD or ODD at times 1 and 2 and OCHS-EBS-B classifications of behavioural problems at times 1 and 2. Construct validity tests if the 90%CI around β1 – β2 falls within −0.20 and+0.20. (B) The construct validity variables include youth assessments of behavioural problems based on MINI-KID-Y classifications of CD or ODD at times 1 and 2 (separate latent variables are developed for emotional and attentional problems); and three construct validity variables: current use of prescribed medications; service contacts with schools, service organizations or health professionals and family psycho-social risk. CI: confidence interval, MINI-KID-P: Mini International Neuropsychiatric Interview for Children and Adolescents interview-parent version; OCHS-EBS-B: Ontario Child Health Study Emotional Behavioural Scales-Brief checklist; SEM: structural equation models. To test for statistical equivalence in the βs, we specify separate SEMs for each disorder. There are three latent variable measures in each SEM: two for the same disorders assessed by each instrument and one for the construct validity variable. We used MPlus 7.4 for the analyses. MPlus offers a generalized measurement component applicable to dichotomous and ordered categorical indicator variables. Although the measurement scale is the same for both instruments (0, 1), the prevalences (variances) are different. As a result, we use standardized variables in the SEMs to ensure that the comparisons of βs are made on a commensurate scale. Adequate model fit was defined as values ≥0.98 for the comparative fit index (CFI, range 0 to 1.0) and ≤0.05 for the root mean squared error of approximation (RMSEA).

Results

The sample characteristics and distribution of variables appear in Table 1. In study A versus B, there are more 12 to 18 year olds, greater levels of socio-economic disadvantage (e.g., single parents) and a higher prevalence of child disorder. Table 2 shows the test-retest reliability outcomes. Differences between the MINI-KID-P and OCHS-EBS-B (κ1–κ2) in the classification of disorders goes from −0.064 (emotional disorder) to +0.053 (attentional disorder) with an average difference of −0.007 (not shown). Only behavioural disorder (−0.041) meets the criteria for statistical equivalence (see a). In the subgroup analysis, the most extreme differences are associated with the recruitment sample: −0.072 (general population) and +0.025 (clinic). The average difference in the subgroup analyses, −0.027. The criteria for equivalence are met for Study A, males, youth aged 13 to 18 years and for both studies combined.

Table 2.

Test-Retest Reliability (κ) of Parent MINI-KID-P and OCHS-EBS-B for Classifying Individual Child Disorders and Combined Disorders Grouped by Study, Recruitment Sample, Child Sex and Age.

Characteristics		MINI-KID-P κ₁ (SE)	OCHS-EBS-B κ₂ (SE)	Δκ₁ – κ₂ (SE)	Δ90% CI
Test-retest reliability of the disorders (n = 452)
Disorder	Emotional	0.774 (0.039)	0.838 (0.031)	–0.064 (0.044)	–0.137 to 0.009
	Behavioural	0.774 (0.040)	0.785 (0.040)	–0.011 (0.050)	–0.094 to 0.071_a
	Attentional	0.685 (0.068)	0.632 (0.078)	+0.053 (0.093)	–0.100 to 0.206
Test-retest reliability in sub-groups (n = 1356)
Study	Study A (n = 831)	0.771 (0.027)	0.795 (0.025)	–0.024 (0.034)	–0.080 to 0.032_a
Study	Study B (n = 525)	0.533 (0.105)	0.550 (0.121)	–0.017 (0.154)	–0.271 to 0.237
Recruitment sample	General (n = 1074)	0.675 (0.047)	0.747 (0.044)	–0.072 (0.057)	–0.166 to 0.022
Recruitment sample	Clinic (n = 282)	0.734 (0.041)	0.709 (0.042)	+0.025 (0.055)	–0.066 to 0.116
Child sex	Female (n = 729)	0.739 (0.038)	0.798 (0.033)	–0.059 (0.046)	–0.135 to 0.017
Child sex	Male (n = 627)	0.786 (0.034)	0.788 (0.034)	–0.002 (0.042)	–0.071 to 0.067_a
Child age (years)	4–12 (n = 489)	0.756 (0.045)	0.787 (0.043)	–0.031 (0.052)	–0.117 to 0.055
Child age (years)	13–18 (n = 867)	0.766 (0.031)	0.796 (0.029)	–0.030 (0.039)	–0.094 to 0.034_a
Both studies (n = 1356)		0.763 (0.025)	0.793 (0.024)	–0.030 (0.031)	–0.081 to 0.021_a

Note: κ = kappa; a90% CI within equivalence boundaries (−0.10 to +0.10); CI: confidence interval; MINI-KID-P: Mini International Neuropsychiatric Interview for Children and Adolescents-parent version; OCHS-EBS-B: Ontario Child Health Study Emotional Behavioural Scales-Brief checklist; SE: standard error.

Test-Retest Reliability (κ) of Parent MINI-KID-P and OCHS-EBS-B for Classifying Individual Child Disorders and Combined Disorders Grouped by Study, Recruitment Sample, Child Sex and Age. Note: κ = kappa; a90% CI within equivalence boundaries (−0.10 to +0.10); CI: confidence interval; MINI-KID-P: Mini International Neuropsychiatric Interview for Children and Adolescents-parent version; OCHS-EBS-B: Ontario Child Health Study Emotional Behavioural Scales-Brief checklist; SE: standard error. Table 3 shows agreement between the MINI-KID-P and OCHS-EBS-B on the classifications of disorder at times 1 and 2. Observed κ goes from 0.481 (attentional at time 1) to 0.721 (emotional at time 1). After correcting for attenuation due to measurement error, the corresponding estimates are substantially higher at 0.731 and 0.895, respectively.

Table 3.

Agreement (κ) Between the MINI-KID-P and Parent OCHS-EBS-B on the Classification of Child Disorders: Observed and Corrected for Attenuation due to Measurement Error (n = 452).

Disorder	κ (SE)	Corrected κ (95% CI)
Time 1
Emotional	0.721 (0.041)	0.895 (0.805 to 0.978)
Behavioural	0.675 (0.047)	0.866 (0.762 to 0.962)
Attentional	0.481 (0.068)	0.731 (0.564 to 0.891)
Time 2
Emotional	0.678 (0.043)	0.842 (0.742 to 0.932)
Behavioural	0.664 (0.048)	0.852 (0.745 to 0.949)
Attentional	0.521 (0.082)	0.792 (0.631 to 0.947)

CI: confidence interval; MINI-KID-P: Mini International Neuropsychiatric Interview for Children and Adolescents-parent version; OCHS-EBS-B: Ontario Child Health Study Emotional Behavioural Scales-Brief checklist; SE: standard error; κ: kappa.

Agreement (κ) Between the MINI-KID-P and Parent OCHS-EBS-B on the Classification of Child Disorders: Observed and Corrected for Attenuation due to Measurement Error (n = 452). CI: confidence interval; MINI-KID-P: Mini International Neuropsychiatric Interview for Children and Adolescents-parent version; OCHS-EBS-B: Ontario Child Health Study Emotional Behavioural Scales-Brief checklist; SE: standard error; κ: kappa. Table 4 shows the βs associated with the MINI-KID-P and OCHS-EBS-B classifications of disorder when modelled as functions of the construct validity variables. Model fit (CFI) is ≥0.99 for all models (not shown). With one exception, the RMSEAs are ≤0.05 (Table 4). The average difference between instruments (β1–β2) is −0.020 (not shown). The criteria for statistical equivalence are met in 5 of 12 comparisons.

Table 4.

Structural Equation Model Regression Results of Parent Classifications of Disorder (MINI-KID-P and Parent Reported OCHS-EBS-B) on Latent Variable Covariates.

Validity variables	Disorder	MINI-KID-P β₁ (SE)	OCHS-EBS-B β₂ (SE)	Δ(β₁ – β₂) (SE)	Δ90% CI
Youth disorders
Attentional	Attentional	0.573 (0.117)	0.613 (0.120)	–0.040 (0.115)	–0.230 to 0.150
Behavioural	Behavioural	0.587 (0.085)	0.631 (0.083)	–0.044 (0.071)	–0.161 to 0.073_b
Emotional	Emotional	0.560 (0.073)	0.508 (0.077)	+0.052 (0.057)	–0.042 to 0.146_b
Current use of prescription medications	Attentional	0.854 (0.047)	0.898 (0.041)	–0.044 (0.054)	–0.133 to 0.045_b
	Behavioural	0.669 (0.070)	0.670 (0.069)	–0.001 (0.063)	–0.105 to 0.103_b
	Emotional_a	0.775 (0.060)	0.886 (0.049)	–0.111 (0.046)	–0.187 to −0.035_b
Mental health-related service contacts	Attentional	0.714 (0.069)	0.835 (0.053)	–0.121 (0.071)	–0.238 to −0.004
	Behavioural	0.847 (0.039)	0.806 (0.041)	+0.041 (0.036)	–0.018 to 0.100_b
	Emotional	0.897 (0.032)	0.900 (0.031)	–0.003 (0.028)	–0.076 to 0.016_b
Family psycho-social risk	Attentional	0.386 (0.096)	0.343 (0.101)	+0.043 (0.088)	–0.102 to 0.188_b
	Behavioural	0.517 (0.072)	0.607 (0.066)	–0.090 (0.048)	–0.169 to −0.011_b
	Emotional	0.503 (0.074)	0.422 (0.074)	+0.081 (0.059)	–0.016 to 0.178_b

aRoot mean square error of approximation = 0.053; bΔβ 90% CI within equivalence boundaries (−0.20 to +0.20); CI: confidence interval; MINI-KID-P: Mini International Neuropsychiatric Interview for Children and Adolescents-parent version; OCHS-EBS-B: Ontario Child Health Study Emotional Behavioural Scales-Brief checklist; SE: standard error.

Structural Equation Model Regression Results of Parent Classifications of Disorder (MINI-KID-P and Parent Reported OCHS-EBS-B) on Latent Variable Covariates. aRoot mean square error of approximation = 0.053; bΔβ 90% CI within equivalence boundaries (−0.20 to +0.20); CI: confidence interval; MINI-KID-P: Mini International Neuropsychiatric Interview for Children and Adolescents-parent version; OCHS-EBS-B: Ontario Child Health Study Emotional Behavioural Scales-Brief checklist; SE: standard error.

Discussion

In our study, average between-instrument differences in test-retest reliability (Δκ) were small and favoured the OCHS-EBS-B over the MINI-KID-P. However, only 5 of 12 comparisons met the criteria for statistical equivalence. Correcting for attenuation due to measurement error revealed that between-instrument agreement in the classifications of disorder based on κ was >0.80 for emotional and behavioural disorders and slightly lower for attentional disorder. This very high level of the agreement indicates substantial overlap between instruments in their classifications of disorder—a finding that can only become apparent when adjustments are made for measurement error. Our analyses of construct validity differences between instruments mirrored the pattern of results observed for reliability: on average Δβ −0.020 favouring the OCHS-EBS-B over the MINI-KID-P. However, the criteria of statistical equivalence were met in only 5 of 12 comparisons. On balance, these findings are suggestive that the reliability and validity of the OCHS-EBS-B and MINI-KID-P completed by parents (mostly mothers) are similar if not equivalent for classifying disorders in children. However, there are several limitations and challenges associated with this work.

Study Limitations and Challenges

One, nonresponse overall was particularly high in study A and we lack the information to evaluate the representativeness of this sample. In study B, the prevalence of OCHS-EBS-B disorders should be the same as the OCHS survey population but it is lower, suggesting that there was either selective nonresponse or the reliability sample was chosen in a geographical area at lower risk for the disorder. The lower estimates of reliability in study B are the product of low prevalence: κ estimates of reliability are compressed downwards under conditions of low prevalence, particularly <5.0%. Nonresponse could affect estimates overall or exert a differential effect on reliability and validity across instruments. It seems unlikely that nonresponse would have a differential effect on the instruments’ reliability and validity—there is no obvious mechanism. Two, we are unaware of any studies in psychiatry that have attempted to demonstrate psychometric equivalence between alternative instruments used to classify psychiatric disorder. Accordingly, there are no precedents for identifying the SESOI. The SESOIs selected in our study are conservative which has resulted in low statistical power. Doubling the sample size to 900 would make it possible for us to show that the 90% CI for observed between-instrument differences in κ =±0.04 would fall within an equivalence boundary of ±0.10. To date, there have been no measurement studies of this size in child psychiatry and the prospects for funding such studies are low. Three, in testing for construct validity differences between the two instruments, we selected three variables that were “independent” of the specific questions/items and instrumentation approach of the MINI-KID-P and OCHS-EBS-B. We expected that these variables would exhibit relatively large, nonspecific associations with the three disorders classified by the instruments. Although the use of the MINI-KID-Y violates the principle of independence (i.e., relies on the same content and similar instrumentation as the MINI-KID-P which should positively bias its association with the MINI-KID-P), we believe that the absence of between-instrument differences in the βs associated with the MINI-KID-Y classification of emotional disorder represents evidence of equivalence. Four, the absence of a strong, evidentiary-based consensus on the variables that might be used to quantify instrument differences in construct validity is a serious challenge. To the best of our knowledge, no studies have compared the construct validity of alternative SDIs. As a result, we are not able to compare our findings with previous work and have a little context for interpreting the effect sizes observed in our study. Five, in this article, we focus exclusively on the measurement objective of classifying child disorders based on the OCHS-EBS-B scale scores converted to binary ratings of disorder and compared with classifications of disorder derived from lay-administered SDIs. In contrast, clinicians may use SDIs as part of the diagnostic process to identify the disorder and its origin and to engage patients and formulate treatment plans. This is an entirely different context beyond the scope of our study.

Conclusions

In the years ahead, we believe that governments will focus on comprehensive approaches to addressing the mental health problems of children. This will require the ability to monitor these problems in both the general population and in those accessing mental health services. A prerequisite for monitoring these problems will be simple, brief, inexpensive measurement instruments that are psychometrically sound, flexible in their administration, able to measure mental health problems as dimensional phenomena or, converted to binary measures, classify corresponding disorders as present or absent on a par with SDIs. Importantly, these instruments will need to exhibit near-identical psychometric properties when used in the general population and clinical samples. We believe that the OCHS-EBS-B go a long way to satisfying these extensive and detailed requirements. Given the striking differences in cost and burden between SDIs and problem checklists, it is surprising how little research has been directed towards testing the extent to which problem checklists can substitute for SDIs in classifying disorder as present or absent. Additional studies addressing this question are urgently needed to provide clinicians, administrators and researchers with appropriate evidence for making cost-effective decisions about using problem checklists to monitor child mental health needs in the general population and in clinical settings.

24 in total

1. Screening for serious mental illness in the general population.

Authors: Ronald C Kessler; Peggy R Barker; Lisa J Colpe; Joan F Epstein; Joseph C Gfroerer; Eva Hiripi; Mary J Howes; Sharon-Lise T Normand; Ronald W Manderscheid; Ellen E Walters; Alan M Zaslavsky
Journal: Arch Gen Psychiatry Date: 2003-02

2. Weighted least-squares approach for comparing correlated kappa.

Authors: Huiman X Barnhart; John M Williamson
Journal: Biometrics Date: 2002-12 Impact factor: 2.571

Review 3. Natural categories or fundamental dimensions: on carving nature at the joints and the rearticulation of psychopathology.

Authors: Andrew Pickles; Adrian Angold
Journal: Dev Psychopathol Date: 2003

4. Construct validity in psychological tests.

Authors: L J CRONBACH; P E MEEHL
Journal: Psychol Bull Date: 1955-07 Impact factor: 17.737

5. Assessing the diagnostic validity of a structured psychiatric interview in a first-admission hospital sample.

Authors: Julie Nordgaard; Rasmus Revsbech; Ditte Sæbye; Josef Parnas
Journal: World Psychiatry Date: 2012-10 Impact factor: 49.548

6. Adequacy of interviews vs checklists for classifying childhood psychiatric disorder based on parent reports.

Authors: M H Boyle; D R Offord; Y A Racine; P Szatmari; M Sanford; J E Fleming
Journal: Arch Gen Psychiatry Date: 1997-09

Review 7. Classifying child and adolescent psychiatric disorder by problem checklists and standardized interviews.

Authors: Michael H Boyle; Laura Duncan; Kathy Georgiades; Kathryn Bennett; Andrea Gonzalez; Ryan J Van Lieshout; Peter Szatmari; Harriet L MacMillan; Anna Kata; Mark A Ferro; Ellen L Lipman; Magdalena Janus
Journal: Int J Methods Psychiatr Res Date: 2016-11-14 Impact factor: 4.035

8. A multilevel analysis of whole family functioning using the McMaster Family Assessment Device.

Authors: Katholiki Georgiades; Michael H Boyle; Jennifer M Jenkins; Mark Sanford; Ellen Lipman
Journal: J Fam Psychol Date: 2008-06

Review 9. Behavioural and emotional disorders in childhood: A brief overview for paediatricians.

Authors: Michael O Ogundele
Journal: World J Clin Pediatr Date: 2018-02-08

10. The 2014 Ontario Child Health Study Emotional Behavioural Scales (OCHS-EBS) Part II: Psychometric Adequacy for Categorical Measurement of Selected DSM-5 Disorders.

Authors: Michael H Boyle; Laura Duncan; Katholiki Georgiades; Li Wang; Jinette Comeau; Mark A Ferro; Ryan J Van Lieshout; Peter Szatmari; Harriet L MacMillan; Kathryn Bennett; Magdalena Janus; Ellen L Lipman; Anna Kata
Journal: Can J Psychiatry Date: 2018-10-30 Impact factor: 4.356