| Literature DB >> 32603857 |
Anderson M Winkler1, Olivier Renaud2, Stephen M Smith3, Thomas E Nichols4.
Abstract
Canonical correlation analysis (CCA) has become a key tool for population neuroimaging, allowing investigation of associations between many imaging and non-imaging measurements. As age, sex and other variables are often a source of variability not of direct interest, previous work has used CCA on residuals from a model that removes these effects, then proceeded directly to permutation inference. We show that a simple permutation test, as typically used to identify significant modes of shared variation on such data adjusted for nuisance variables, produces inflated error rates. The reason is that residualisation introduces dependencies among the observations that violate the exchangeability assumption. Even in the absence of nuisance variables, however, a simple permutation test for CCA also leads to excess error rates for all canonical correlations other than the first. The reason is that a simple permutation scheme does not ignore the variability already explained by previous canonical variables. Here we propose solutions for both problems: in the case of nuisance variables, we show that transforming the residuals to a lower dimensional basis where exchangeability holds results in a valid permutation test; for more general cases, with or without nuisance variables, we propose estimating the canonical correlations in a stepwise manner, removing at each iteration the variance already explained, while dealing with different number of variables in both sides. We also discuss how to address the multiplicity of tests, proposing an admissible test that is not conservative, and provide a complete algorithm for permutation inference for CCA. Published by Elsevier Inc.Entities:
Keywords: Closed testing procedure; Permutation test; canonical Correlation analysis
Year: 2020 PMID: 32603857 PMCID: PMC7573815 DOI: 10.1016/j.neuroimage.2020.117065
Source DB: PubMed Journal: Neuroimage ISSN: 1053-8119 Impact factor: 6.556
Taxonomy of canonical correlation analysis with respect to nuisance variables. In all cases, the aim is to find linear combinations of variables in the left and in the right sets, such that each combination from one set is maximally correlated with a corresponding combination from the other, but uncorrelated with all other combinations from either set.
| Name | Left set | Right set |
|---|---|---|
| Partial | ||
| Part | ||
| Bipartial |
is a residual forming matrix that considers the nuisance variables in , and is computed as , where the symbol + represents a pseudo-inverse. is computed similarly, considering the nuisance variables in . The choice which set is on left or right side is arbitrary.
Proposed permutation method for the various cases of cca, with respect to nuisance variables.
| Name | Left set | Right set |
|---|---|---|
| Partial | ||
| Part | ||
| Bipartial |
is a semi-orthogonal basis for the column space of , such that , , where , and . is a similarly defined matrix for the column space of , . The bipartial cca case generalizes all others: for ‘‘full’’ cca, , and so, ; for partial cca, ; for part cca, and so, . For full and partial, pre-multiplication by can be omitted since , such that results do not change. Once these simplifications are considered, the general bipartial cca case reduces to the other three as shown in the Table. Full and partial have matching number of rows in both sides, such that only one side needs be permuted; part and bipartial, however, have at the time of the permutation a different number of rows in each side, such that both can be permuted separately through the use of suitably sized permutation matrices and ; is size for full cca, and for the three other cases; is size for full and for part cca, and for the two other cases.
Fig. 1A selection matrix is an identity matrix from which some specific rows have been removed. Pre-multiplication by a selection matrix deletes specific rows (those that correspond to columns that are all zero in the selection matrix).
The semi-orthogonal matrix ( and/or , subscripts dropped), discussed in Sections 2.6, 2.7, is not unique. Two principled methods to obtain it are below.
| Method | Matrix |
|---|---|
is the residual-forming matrix ( or , for the respective set of nuisance variables, subscript dropped); since is idempotent, all its eigenvalues (the diagonal elements of ) are equal to 0 or 1. In the Theil method, is a (for ) or (for ) selection matrix; the matrix square root (in the exponent ) is the positive definite solution. In the Huh–Jhun method, after Schur or svd factorisations of are computed, and the R or S columns of that have corresponding zero eigenvalues in the diagonal of are removed, such that computed from the factosisation is reduced from size to or to . At the end of these computations (see Algorithm 2, ‘‘semiortho’’, in the Appendix), for both methods, , , and . Both methods aim at obtaining residuals with a scalar covariance matrix . Theil explictly seeks blus residuals. However, strictly, does not need to be a selection matrix: choose to be (not to be confused with qr decomposition) using computed with the Huh–Jhun approach. Then, following Magnus and Sinha (2005, Theorem 2, p. 42), it can be shown that Huh–Jhun also provides blus residuals.
A valid permutation test for cca needs to consider several computational and statistical aspects, which can be seen as steps in a procedure; each of them may be addressed using different strategies, some of which are studied below. The recommendations for the different options for each case (column “Use”) are based on theoretical grounds shown in the sections as indicated in the table, and verified empirically using synthetic data (Section 4).
| Step | Possible strategies studied | Use | Theory | Scenarios |
|---|---|---|---|---|
| Estimation of the canonical components | ( | Section | ||
| ( | ||||
| Inclusion of the complement of the canonical coefficients | ( | Section | ||
| ( | ||||
| Correction for multiple testing | ( | Section | ||
| ( | ||||
| ( | ||||
| Treatment of nuisance variables | ( | Sections | ||
| ( | ||||
| ( | ||||
| Choice of the test statistic | ( | Sections | ||
| ( |
Can or should be used.
Can but should not be used.
Cannot or should not be used.
Simulation scenarios.
| Scenarios | #( | Distribution | Signals | #Perms. | #Reps. | ||||||
|---|---|---|---|---|---|---|---|---|---|---|---|
| Without nuisance | 100 | 16 | 20 | 0 | 0 | – | normal | – | 2000 | 2000 | |
| 100 | 16 | 20 | 0 | 0 | 10 | normal | – | 2000 | 2000 | ||
| 100 | 16 | 20 | 0 | 0 | – | kurtotic | – | 2000 | 2000 | ||
| 100 | 16 | 20 | 0 | 0 | 10 | kurtotic | – | 2000 | 2000 | ||
| 100 | 16 | 20 | 0 | 0 | – | binary | – | 2000 | 2000 | ||
| 100 | 16 | 20 | 0 | 0 | 10 | binary | – | 2000 | 2000 | ||
| Partial | 100 | 16 | 20 | 15 | – | normal | – | 2000 | 2000 | ||
| 100 | 16 | 20 | 15 | 10 | normal | – | 2000 | 2000 | |||
| 100 | 16 | 20 | 15 | – | kurtotic | – | 2000 | 2000 | |||
| 100 | 16 | 20 | 15 | 10 | kurtotic | – | 2000 | 2000 | |||
| 100 | 16 | 20 | 15 | – | binary | – | 2000 | 2000 | |||
| 100 | 16 | 20 | 15 | 10 | binary | – | 2000 | 2000 | |||
| Bipartial | 100 | 16 | 20 | 15 | 15 | – | normal | – | 2000 | 2000 | |
| 100 | 16 | 20 | 15 | 15 | 10 | normal | – | 2000 | 2000 | ||
| Larger samples | 16 | 20 | 20 | – | normal | – | 1000 | 1000 | |||
| 16 | 20 | 20 | 10 | normal | – | 1000 | 1000 | ||||
| With signal | 100 | 16 | 20 | 0 | 0 | – | normal | sparse | 2000 | 2000 | |
| 100 | 16 | 20 | 0 | 0 | – | normal | dense | 2000 | 2000 |
∗ For scenarios xv and xvi, sample size varied, . In the table, R and S refer to the number of nuisance variables other than the intercept, which is always included (so the number of nuisance variables in left and right sides for all the simulation scenarios was always, respectively and ). For partial cca, the number of nuisance variables on one side is always the same as in the other, i.e., , but that does not have to be for bipartial cca, even though here the same size was used. The case with larger samples was used for investigation of partial cca.
Observed per comparison error rate (%) and 95% confidence intervals for the first 6 canonical correlations in scenario i, assessed using the Wilks’ statistic and three different multiple testing correction methods; the observed familywise error rate (fwer) for each case is also shown. Valid methods should have a fwer close to the nominal 5%, and pcer close to the nominal ; see Sections 4.1, 4.2 for details.
| Null space not included | Null space included | |||
|---|---|---|---|---|
| Single step | Stepwise | Single step | Stepwise | |
| ( | ||||
| 91.35 (90.04–92.50) | 91.35 (90.04–92.50) | 4.70 (3.86–5.72) | 4.70 (3.86–5.72) | |
| 93.50 (92.33–94.50) | 60.40 (58.24–62.52) | 4.60 (3.77–5.61) | 0.25 (0.11–0.58) | |
| 94.70 (93.63–95.60) | 26.70 (24.81–28.68) | 4.60 (3.77–5.61) | 0.00 (0.00–0.19) | |
| 95.55 (94.56–96.37) | 7.25 (6.19–8.47) | 4.85 (3.99–5.88) | 0.00 (0.00–0.19) | |
| 96.10 (95.16–96.86) | 1.45 (1.01–2.07) | 4.40 (3.59–5.39) | 0.00 (0.00–0.19) | |
| 96.75 (95.88–97.44) | 0.25 (0.11–0.58) | 4.30 (3.50–5.28) | 0.00 (0.00–0.19) | |
| 99.90 (99.64–99.97) | 91.35 (90.04–92.50) | 18.30 (16.67–20.05) | 4.70 (3.86–5.72) | |
| ( | ||||
| 91.35 (90.04–92.50) | 91.35 (90.04–92.50) | 4.70 (3.86–5.72) | 4.70 (3.86–5.72) | |
| 90.35 (88.98–91.57) | 60.40 (58.24–62.52) | 3.40 (2.69–4.29) | 0.25 (0.11–0.58) | |
| 89.80 (88.40–91.05) | 26.70 (24.81–28.68) | 2.75 (2.12–3.56) | 0.00 (0.00–0.19) | |
| 89.55 (88.13–90.82) | 7.25 (6.19–8.47) | 2.40 (1.81–3.17) | 0.00 (0.00–0.19) | |
| 89.30 (87.87–90.58) | 1.45 (1.01–2.07) | 1.75 (1.26–2.42) | 0.00 (0.00–0.19) | |
| 88.85 (87.40–90.16) | 0.25 (0.11–0.58) | 1.45 (1.01–2.07) | 0.00 (0.00–0.19) | |
| 91.35 (90.04–92.50) | 91.35 (90.04–92.50) | 4.70 (3.86–5.72) | 4.70 (3.86–5.72) | |
| ( | ||||
| 91.35 (90.04–92.50) | 91.35 (90.04–92.50) | 4.70 (3.86–5.72) | 4.70 (3.86–5.72) | |
| 7.95 (6.84–9.22) | 10.95 (9.66–12.39) | 0.00 (0.00–0.19) | 0.00 (0.00–0.19) | |
| 0.00 (0.00–0.19) | 0.00 (0.00–0.19) | 0.00 (0.00–0.19) | 0.00 (0.00–0.19) | |
| 0.00 (0.00–0.19) | 0.00 (0.00–0.19) | 0.00 (0.00–0.19) | 0.00 (0.00–0.19) | |
| 0.00 (0.00–0.19) | 0.00 (0.00–0.19) | 0.00 (0.00–0.19) | 0.00 (0.00–0.19) | |
| 0.00 (0.00–0.19) | 0.00 (0.00–0.19) | 0.00 (0.00–0.19) | 0.00 (0.00–0.19) | |
| 91.35 (90.04–92.50) | 91.35 (90.04–92.50) | 4.70 (3.86–5.72) | 4.70 (3.86–5.72) | |
Using the Roy’s statistic led to similar results as with Wilks (not shown). Dimensionality reduction with pca led to similar results for the case in which the null space is included (not shown). For the case in which the null space is not included, results are not comparable with the ones above because, after pca, in the simulations, such that there is no null space to be considered as the matrices with canonical coefficients in both sides are then square.
Fig. 2Relationship between canonical correlations (horizontal axes) and associated p-values (vertical axes) for 10 realisations of scenario i, considering two estimation methods (single step and stepwise) and three multiple testing correction methods (uncorrected, corrected using the cumulative maximum, and corrected using the distribution of the maximum statistic). The figure complements Table 6 by showing example realisations that average to the error rates shown in the table for the cases in which the null space is included. For simple, uncorrected p-values, the test is inadmissible; for corrected using the distribution of the maximum statistic, the test is overly conservative; single step does not control the familywise error rate.
Observed per comparison error rate (pcer, %) and 95% confidence intervals for the first 6 canonical correlations in scenarios vii (partial cca) and xiii (bipartial cca), for the three different methods considered for treatment of nuisance variables.
| Simple residuals | Huh–Jhun | Theil | |
|---|---|---|---|
| ( | |||
| 83.85 (82.17–85.40) | 5.10 (4.22–6.15) | 4.85 (3.99–5.88) | |
| 44.15 (41.99–46.34) | 0.30 (0.14–0.65) | 0.35 (0.17–0.72) | |
| 12.75 (11.36–14.28) | 0.05 (0.01–0.28) | 0.00 (0.00–0.19) | |
| 1.75 (1.26–2.42) | 0.00 (0.00–0.19) | 0.00 (0.00–0.19) | |
| 0.20 (0.08–0.51) | 0.00 (0.00–0.19) | 0.00 (0.00–0.19) | |
| 0.00 (0.00–0.19) | 0.00 (0.00–0.19) | 0.00 (0.00–0.19) | |
| ( | |||
| 5.55 (4.63–6.64) | 5.20 (4.31–6.26) | 4.45 (3.63–5.44) | |
| 0.10 (0.03–0.36) | 0.30 (0.14–0.65) | 0.20 (0.08–0.51) | |
| 0.00 (0.00–0.19) | 0.00 (0.00–0.19) | 0.00 (0.00–0.19) | |
| 0.00 (0.00–0.19) | 0.00 (0.00–0.19) | 0.00 (0.00–0.19) | |
| 0.00 (0.00–0.19) | 0.00 (0.00–0.19) | 0.00 (0.00–0.19) | |
| 0.00 (0.00–0.19) | 0.00 (0.00–0.19) | 0.00 (0.00–0.19) | |
Estimation included the null space of the canonical coefficients and a stepwise procedure, assessed using the Wilks’ statistic, and corrected using a closed testing procedure (ctp). The ctp guarantees that the familywise error rate (fwer) matches the pcer for the first canonical correlation (i.e., for ). Using the Roy’s statistic led to similar results as with Wilks’; likewise, dimensionality reduction with pca led to similar results (not shown).
Observed per comparison error rate (pcer, %) and 95% confidence intervals for the first canonical correlation in scenarios i–vi (without nuisance) and vi–xii (partial cca, with Huh–Jhun), considering different distributions for the data.
| Distribution | Without nuisance | Partial | |
|---|---|---|---|
| Normal | 4.70 (3.86–5.72) | 5.15 (4.26–6.21) | |
| Student | 3.95 (3.18–4.90) | 14.70 (13.22–16.32) | |
| 5.45 (4.54–6.53) | 5.40 (4.49–6.48) | ||
| 4.15 (3.36–5.12) | 5.40 (4.49–6.48) | ||
| 4.70 (3.86–5.72) | 5.00 (4.13–6.04) | ||
| 3.85 (3.09–4.79) | 5.10 (4.22–6.15) | ||
| Bernoulli | 5.30 (4.40–6.37) | 5.30 (4.40–6.37) | |
: Degrees of freedom of the Student’s t distribution used to simulate data; q: Parameter of the Bernoulli distribution used to simulate data. Estimation used the null space of the canonical coefficients and a stepwise procedure, assessed using the Wilks’ statistic, and corrected using a closed testing procedure (ctp). The ctp guarantees that the familywise error rate (fwer) matches the pcer for the first canonical correlation (i.e., for ). Using the Roy’s statistic led to similar results as with Wilks’; using Theil led to similar results as Huh–Jhun; likewise, dimensionality reduction with pca led to similar results (not shown).
Observed familywise error rate (fwer, %, that matches the pcer for ) and 95% confidence intervals for scenarios xv and xvi, used to investigate effect of different sample sizes, for the three different methods for dealing with nuisance variables.
| Without | With | |||||
|---|---|---|---|---|---|---|
| Simple residuals | Huh–Jhun | Theil | Simple residuals | Huh–Jhun | Theil | |
| 100 | 96.60 (95.29–97.56) | 4.80 (3.64–6.31) | 5.00 (3.81–6.53) | 59.20 (56.12–62.21) | 5.00 (3.81–6.53) | 5.10 (3.90–6.64) |
| 200 | 42.20 (39.17–45.29) | 5.50 (4.25–7.09) | 5.40 (4.16–6.98) | 22.30 (19.83–24.98) | 5.00 (3.81–6.53) | 4.50 (3.38–5.97) |
| 300 | 25.10 (22.51–27.88) | 5.00 (3.81–6.53) | 5.40 (4.16–6.98) | 12.20 (10.31–14.37) | 5.40 (4.16–6.98) | 5.10 (3.90–6.64) |
| 400 | 18.00 (15.74–20.50) | 4.30 (3.21–5.74) | 5.10 (3.90–6.64) | 11.80 (9.95–13.95) | 4.50 (3.38–5.97) | 5.70 (4.43–7.31) |
| 500 | 11.50 (9.67–13.63) | 6.10 (4.78–7.76) | 4.10 (3.04–5.51) | 9.50 (7.83–11.48) | 6.80 (5.40–8.53) | 5.10 (3.90–6.64) |
| 600 | 10.80 (9.02–12.88) | 5.20 (3.99–6.76) | 5.00 (3.81–6.53) | 9.30 (7.65–11.26) | 4.70 (3.55–6.19) | 4.70 (3.55–6.19) |
| 700 | 11.20 (9.39–13.31) | 4.20 (3.12–5.63) | 4.40 (3.29–5.86) | 7.20 (5.76–8.97) | 5.70 (4.43–7.31) | 5.50 (4.25–7.09) |
| 800 | 10.10 (8.38–12.12) | 5.50 (4.25–7.09) | 4.20 (3.12–5.63) | 7.00 (5.58–8.75) | 4.80 (3.64–6.31) | 5.90 (4.60–7.54) |
| 900 | 8.40 (6.84–10.28) | 5.00 (3.81–6.53) | 4.00 (2.95–5.40) | 7.70 (6.20–9.52) | 5.20 (3.99–6.76) | 5.20 (3.99–6.76) |
| 1000 | 7.70 (6.20–9.52) | 4.30 (3.21–5.74) | 5.90 (4.60–7.54) | 5.00 (3.81–6.53) | 6.20 (4.87–7.87) | 4.70 (3.55–6.19) |
Observed power (%) and 95% confidence intervals for the first canonical correlation in scenarios xvii and xviii, that included a synthetic signal added to either one or half (‘‘sparse’’ or ‘‘dense’’, respectively) of the initial variables, thus captured by only one (sparse case) or multiple (dense case) canonical correlations.
| Signals | Without | With | |||
|---|---|---|---|---|---|
| Wilks ( | Roy ( | Wilks ( | Roy ( | ||
| Sparse | 42.10 (39.95–44.28) | 57.90 (55.72–60.05) | 81.55 (79.79–83.19) | 94.75 (93.68–95.64) | |
| Dense | 83.05 (81.34–84.63) | 37.05 (34.96–39.19) | 95.80 (94.83–96.59) | 70.70 (68.67–72.65) | |
| 42.30 (40.15–44.48) | 4.75 (3.90–5.77) | 72.05 (70.04–73.97) | 24.75 (22.91–26.69) | ||
| 12.75 (11.36–14.28) | 0.25 (0.11–0.58) | 31.95 (29.94–34.03) | 4.30 (3.50–5.28) | ||
| 1.95 (1.43–2.65) | 0.00 (0.00–0.19) | 6.55 (5.55–7.72) | 0.50 (0.27–0.92) | ||
| 0.10 (0.03–0.36) | 0.00 (0.00–0.19) | 1.00 (0.65–1.54) | 0.00 (0.00–0.19) | ||
| 0.05 (0.01–0.28) | 0.00 (0.00–0.19) | 0.05 (0.01–0.28) | 0.00 (0.00–0.19) | ||
Estimation used the null space of the canonical coefficients and a stepwise procedure, assessed using the Wilks’ statistic, and corrected using a closed testing procedure (ctp). The ctp guarantees that the familywise error rate (fwer) matches the pcer for the first canonical correlation (i.e., for ).