| Literature DB >> 25954230 |
Bruce A Schneider1, Meital Avivi-Reich1, Mindaugas Mozuraitis1.
Abstract
A number of statistical textbooks recommend using an analysis of covariance (ANCOVA) to control for the effects of extraneous factors that might influence the dependent measure of interest. However, it is not generally recognized that serious problems of interpretation can arise when the design contains comparisons of participants sampled from different populations (classification designs). Designs that include a comparison of younger and older adults, or a comparison of musicians and non-musicians are examples of classification designs. In such cases, estimates of differences among groups can be contaminated by differences in the covariate population means across groups. A second problem of interpretation will arise if the experimenter fails to center the covariate measures (subtracting the mean covariate score from each covariate score) whenever the design contains within-subject factors. Unless the covariate measures on the participants are centered, estimates of within-subject factors are distorted, and significant increases in Type I error rates, and/or losses in power can occur when evaluating the effects of within-subject factors. This paper: (1) alerts potential users of ANCOVA of the need to center the covariate measures when the design contains within-subject factors, and (2) indicates how they can avoid biases when one cannot assume that the expected value of the covariate measure is the same for all of the groups in a classification design.Entities:
Keywords: ANCOVA; between-subjects design; classification design; mixed design; within-subject design
Year: 2015 PMID: 25954230 PMCID: PMC4404726 DOI: 10.3389/fpsyg.2015.00474
Source DB: PubMed Journal: Front Psychol ISSN: 1664-1078
(A) Expected values of the Mean Squares for an ANCOVA analysis of a two-level, Between-Subjects Experiment for data characterized by Equation (1), when μ.
| Between | 1 | ||
| Covariate | 1 | σ2 | |
| Error | 2n–3 | σ2 | |
| Between | 1 | ||
| Covariate | 1 | σ2 | |
| Error | 2n–3 | σ2 | |
| Between | 1 | σ2 | |
| Error | 2n–2 | σ2 | |
Figure 1Hypothetical bi-normal distributions of pairs of listening comprehension (dependent variable, . In all four plots the population correlation coefficient between listening and reading comprehension is 0.4 for both native speakers (EL1s) and non-native speakers (EL2s). (A) A hypothetical bi-normal distribution for native speakers (EL1s) with population mean value for listening comprehension of 50 (SD = 6), and reading comprehension of 25 (SD = 6). The green line defines the plane for y = 50, the red line for the plane x = 25. (B) The same distribution for the native speakers (EL1s) as in (A) along with the hypothetical distribution of data for non-native speakers (EL2s). Non-native speakers (EL2s) differ from native speakers (EL1s) only insofar as the population mean for their covariate measures is 17 instead of 25. The blue line outlines the plane for x = 17. (C) The mean of the y values for the native speakers (EL1s) here is 51.6, whereas it is 48.4 for the non-native speakers (EL2s). All of the other parameters are the same as in (B). The gray line outlines the plane for y = 48.4, whereas the black line outlines the plane for y = 51.6. (D) The mean value of the covariate measure (reading comprehension) has been set to 21 for both groups. All other parameter values are the same as in (C). The purple line outlines the plane corresponding to x = 21.
(A) Expected values for an ANCOVA of a Within-Subject Experiment with two levels when the covariate measures are centered for the model described in Equation (2). W.
| Within | 1 | 2 | |
| W*C | 1 | 2( | |
| Error | n−2 | σ2 | |
| Within | 1 | 2 | |
| Error | n−1 | 2α2 | |
(A) Expected values of the Mean Squares for the within portion of mixed 2 × 2 ANCOVA when μ.
| Within | 1 | 4 | |
| W*C | 1 | 4( | |
| W*B | 1 | ||
| Error | 2n–3 | σ2 | |
| Within | 1 | 4 | |
| W*C | 1 | 4( | |
| W*B | 1 | ||
| Error | 2n–3 | σ2 | |
| Within | 1 | 4 | |
| W*B | 1 | 4 | |
| Error | 2n–2 | 2α2 | |
Figure 2Estimated probability density functions for older and younger adults on the Mill Hill Vocabulary test.
Hypothetical number of questions correctly answered under two different levels of background noise (Quiet vs. Noise, within-subject factor) by subjects sampled from two different age groups (Young vs. Old, between-subjects factor).
| 1 | Young | 48 | 41 | 17 | −0.15 | 3.3 |
| 2 | Young | 51 | 39 | 18 | 0.85 | 4.3 |
| 3 | Young | 40 | 40 | 14 | −3.15 | 0.3 |
| 4 | Young | 41 | 39 | 13 | −4.15 | −0.7 |
| 5 | Young | 35 | 34 | 11 | −6.15 | −2.7 |
| 6 | Young | 36 | 32 | 12 | −5.15 | −1.7 |
| 7 | Young | 39 | 41 | 12 | −5.15 | −1.7 |
| 8 | Young | 47 | 44 | 16 | −1.15 | 2.3 |
| 9 | Young | 41 | 37 | 14 | −3.15 | 0.3 |
| 10 | Young | 39 | 41 | 10 | −7.15 | −3.7 |
| 11 | Old | 44 | 39 | 23 | 5.85 | 2.4 |
| 12 | Old | 44 | 45 | 19 | 1.85 | −1.6 |
| 13 | Old | 46 | 46 | 23 | 5.85 | 2.4 |
| 14 | Old | 45 | 40 | 21 | 3.85 | 0.4 |
| 15 | Old | 46 | 43 | 21 | 3.85 | 0.4 |
| 16 | Old | 45 | 48 | 21 | 3.85 | 0.4 |
| 17 | Old | 40 | 46 | 20 | 2.85 | −0.6 |
| 18 | Old | 45 | 43 | 21 | 3.85 | 0.4 |
| 19 | Old | 40 | 42 | 18 | 0.85 | −2.6 |
| 20 | Old | 41 | 43 | 19 | 1.85 | −1.6 |
The covariate measure is vocabulary size. To center the covariate measures across groups, compute the mean value of the covariate for all of the subjects and subtract this value from each of the covariate measures. This is how the column labeled “Covariate (Centered)” was obtained. It is the centered covariate measures that are entered into the analyses (see Figure 3). In the column labeled “Covariate (Centered within each group),” the mean of the covariates in each group is subtracted from the covariate measures in that group. The covariate measures centered within each group are not entered as input to the statistical package. However, they are useful in interpreting the results (see Figure 4).
Figure 3The SPSS data file used as input to both an ANCOVA and an ANOVA of the data from Table . Quiet and Noise are identified as the two levels of the Within-Subject factor in a repeated measures analysis. Age Group is the Between-Subjects factor in this analysis. In the ANCOVA, the covariate is the Centered Vocabulary scores. The output of these analyses are shown in Table 5.
Composite ANCOVA table for the Table .
| Background*VocabularyCentered (from ANCOVA) | 71.348 | 1 | 71.348 | 17.693 | 0.001 |
| Error term (from ANCOVA) | 68.552 | 17 | 4.032 | ||
| Background (from ANOVA) | 22.500 | 1 | 22.500 | 2.895 | 0.106 |
| Background*AgeGroup (from ANOVA) | 19.600 | 1 | 19.600 | 2.522 | 0.130 |
| Error (from ANOVA) | 139.900 | 18 | 7.772 | ||
| VocabularyCentered (from ANCOVA) | 162.950 | 1 | 162.950 | 15.076 | 0.001 |
| Error (from ANCOVA) | 183.750 | 17 | 10.809 | ||
| AgeGroup (from ANOVA) | 108.900 | 1 | 108.900 | 5.654 | 0.029 |
| Error (from ANOVA) | 346.700 | 18 | 19.261 | ||
The data were submitted first to a repeated measures ANCOVA with Vocabulary as a Covariate. Note that the Vocabulary scores were centered when they were submitted to the ANCOVA. The Within*Covariate and the Main effect of the covariate are evaluated within the ANCOVA. All other effects are taken from an ANOVA on the same data without the covariate.
Figure 4Relationships between the number of questions answered correctly and the covariate (centered in each age group) for the data in Table . The top panel plots the number of questions answered correctly, averaged over the within-subject factor, as a function of the covariate measures. The middle and bottom panels plot the data for the quiet and noisy conditions. The estimated scale factors for the different conditions (, ) can be obtained from the slopes of the lines in these plots. In this model the estimated within-subject difference (ŵ1 − ŵ2) is the difference between the intercepts of the two straight lines in the lower two panels. Hence, in this example, the Mean Square for the Within-Subject Main Effect is 10 × (42.65 − 41.15)2 = 22.5, as computed by the ANOVA.
Recommended procedures to follow when conducting an ANCOVA for three types of designs: (1) All factors are Within-Subject; (2) Experimental designs in which subjects are randomly selected from a uniform population and randomly assigned to different experimental conditions, and (3) Classification designs in which the different levels of Between-Subjects factor consist of samples from different populations (e.g., musicians and non-musicians) where it cannot be assumed the expected value of the covariate is the same across populations.
| 1. Center the covariate measures | 1. Center the covariate measures | 1. Center the covariate measures |
| 2. Conduct an ANCOVA | 2. Conduct an ANCOVA | 2. Conduct an ANCOVA |
| 3. Use the ANCOVA to evaluate all effects involving covariates | 3. Use the ANCOVA to evaluate all Between-Subjects effects and any interactions of Between-Subjects and Within-Subject effects, including Within | 3. Use the ANCOVA to evaluate all effects involving a covariate |
| 4. Conduct an ANOVA | 4. Conduct an ANOVA | 4. Conduct an ANOVA |
| 5. Use an ANOVA to evaluate all remaining effects | 5. Use an ANOVA to evaluate all remaining Within-Subject effects | 5. Use the ANOVA to evaluate all remaining effects |
Note that whenever between-subject factors are involved, it is important to first test whether the relationship between the dependent variable and the covariate is the same for all levels of the between-subjects factor (e.g., Howell, .
Although it is not necessary to center the covariate measures before entering the data into a standard statistical package when all factors are Between-Subjects, it is necessary to do so when the experimental design contains Within-Subject factors because these programs do not center the covariate measures when evaluating within-subject effects. To be safe, always center the covariate measures before entering them into a statistical package.