Yukai Zhao1,2, Luis Andres Lesmes3,4, Fang Hou5,6, Zhong-Lin Lu7,8,9,10. 1. Center for Neural Science, New York University, New York, NY, USA. 2. zhaoyukai@nyu.edu. 3. Adaptive Sensory Technology Inc., San Diego, CA, USA. 4. luis.lesmes@adaptivesensorytech.com. 5. School of Ophthalmology & Optometry and Eye Hospital, Wenzhou Medical University, Wenzhou, Zhejiang, China. 6. houf@mail.eye.ac.cn. 7. Division of Arts and Sciences, NYU Shanghai, Shanghai, China. 8. Center for Neural Science and Department of Psychology, New York University, New York, NY, USA. 9. NYU-ECNU Institute of Brain and Cognitive Science, Shanghai, China. 10. zhonglin@nyu.edu.
Abstract
Recent development of the quick contrast sensitivity function (qCSF) method has made it possible to obtain accurate, precise, and efficient contrast sensitivity function (CSF) assessment. To improve statistical inference on CSF changes in a within-subject design, we developed a hierarchical Bayesian model (HBM) to compute the joint distribution of CSF parameters and hyperparameters at test, subject, and population levels, utilizing information within- and between-subjects and experimental conditions. We evaluated the performance of the HBM relative to a non-hierarchical Bayesian inference procedure (BIP) on an existing CSF dataset of 112 subjects obtained with the qCSF method in three luminance conditions (Hou, Lesmes, Kim, Gu, Pitt, Myung, & Lu, 2016). We found that the average d's of the area under log CSF (AULCSF) and CSF parameters between pairs of luminance conditions at the test-level from the HBM were 33.5% and 103.3% greater than those from the BIP analysis of AULCSF. The increased d' resulted in greater statistical differences between experimental conditions across subjects. In addition, simulations showed that the HBM generated accurate and precise CSF parameter estimates. These results have strong implications for the application of HBM in clinical trials and patient care.
Recent development of the quick contrast sensitivity function (qCSF) method has made it possible to obtain accurate, precise, and efficient contrast sensitivity function (CSF) assessment. To improve statistical inference on CSF changes in a within-subject design, we developed a hierarchical Bayesian model (HBM) to compute the joint distribution of CSF parameters and hyperparameters at test, subject, and population levels, utilizing information within- and between-subjects and experimental conditions. We evaluated the performance of the HBM relative to a non-hierarchical Bayesian inference procedure (BIP) on an existing CSF dataset of 112 subjects obtained with the qCSF method in three luminance conditions (Hou, Lesmes, Kim, Gu, Pitt, Myung, & Lu, 2016). We found that the average d's of the area under log CSF (AULCSF) and CSF parameters between pairs of luminance conditions at the test-level from the HBM were 33.5% and 103.3% greater than those from the BIP analysis of AULCSF. The increased d' resulted in greater statistical differences between experimental conditions across subjects. In addition, simulations showed that the HBM generated accurate and precise CSF parameter estimates. These results have strong implications for the application of HBM in clinical trials and patient care.
The contrast sensitivity function (CSF), which quantifies the visibility (1/threshold) of narrow-band filtered stimuli over a wide range of spatial frequencies, provides a comprehensive measure of spatial vision (Ginsburg, 1981; Ginsburg, 2003; Hess, 1981). It is closely related to daily visual functions (Ginsburg, 2003), and can better quantify deficits in spatial vision than visual acuity (Jindra & Zemon, 1989; Marmor, 1986). It has long been recognized that the CSF provides important information for monitoring progression of vision change and evaluating treatment efficacy in eye diseases (Bellucci. Scialdone, Buratto, Morselli, Chierego, Criscuolo, Criscuoli, Moretti, & Piers, 2005; Ginsburg, 2006; Loshin & White, 1984; Levi & Li, 2009; Tan & Fong, 2008; Zhou, Huang, Xu, Tao, Qiu, Li, & Lu, 2006).Despite its clinical promise, precise and efficient CSF assessment has presented a challenge. The CSF charts provide a fast but imprecise assessment of contrast sensitivity due to coarse sampling of both spatial frequency and stimulus contrast (Bradley, Hook, & Haeseker, 1991; Buhren, Terzi, Bach, Wesemann, & Kohnen, 2006; Hohberger, Laemmer, Adler, Juenemann, & Horn, 2007; Pesudovs, Hazel, Doran, & Elliott, 2004; van Gaalen, Jansonius, Koopmans, Terwee, & Kooijman, 2009). On the other hand, the long testing time (30–60 minutes) required for measuring the CSF with conventional psychophysical methods has prevented their clinical applications (Kelly & Savoie, 1973; Treutwein, 1995). The quick contrast sensitivity function (qCSF) was developed to address the challenges (Lesmes, Lu, Baek, & Albright, 2010). Based on active learning principles, it estimates the parameters of the CSF in a Bayesian adaptive framework (Kontsevich & Tyler, 1999; Lu & Dosher, 2013; Watson, 2017; Watson & Pelli, 1983). A recent qCSF implementation with a 10-letter identification task enabled assessment of the CSF with a 0.10 log unit standard deviation in about 20 trials (approximately 2 minutes) and reduced the standard deviation of the estimates by 50% (Hou, Lesmes, Bex, Dorr, & Lu, 2015). Accurate and precise qCSF estimates have been obtained in both normal (Reynaud, Tang, Zhou, & Hess, 2014; Rosén, Lundström, Venkataraman, Winter, & Unsbo, 2014) and clinical populations (Hou, Huang, Lesmes, Feng, Tao, Zhou, & Lu, 2010; Jia, Zhou, Lu, Lesmes, & Huang, 2015; Joltikov, de Castro, Davila, Anand, Khan, Farbman, Jackson, Johnson, & Gardner, 2017; Lesmes, Jackson & Bex, 2013; Lesmes, Wallis, Jackson, & Bex, 2013; Lesmes, Wallis, Lu, Jackson, & Bex, 2012; Lin, Mihailovic, West, Johnson, Friedman, Kong, & Ramulu, 2018; Ou, Lesmes, Christie, Denlar, & Csaky, 2021; Ramulu, Dave, & Friedman, 2015; Rosen, Jayaraj, Bharadwaj, Weeber, Van der Mooren, & Piers, 2015; Stellmann, Young, Pottgen, Dorr, & Heesen, 2015; Thomas, Silverman, Vingopoulos, Kasetty, Yu, Kim, Omari, Joltikov, Choi, Kim, Zacks, & Miller, 2021; Vingopoulos, Wai, Katz, Vavvas, Kim, & Miller, 2021; Wai, Vingopoulos, Garg, Kasetty, Silverman, Katz, Laíns, Miller, Husain, Vavvas, Kim, & Miller, 2021; Yan, Hou, Lu, Hu, & Huang, 2017).A Bayesian Inference Procedure (BIP) has been developed to make statistical inference on CSF changes in a within-subject design based on CSF metrics extracted from each subject in each experimental condition (Hou, et al., 2016; Kuss, Jäkel, & Wichmann, 2005; Prins, 2013; Schütt, Harmeling, Macke, & Wichmann, 2016). Because it scores each test independently with an uninformative prior without considering potential relationships of CSF parameters across subjects and experimental conditions, the BIP may have overestimated the variance of each test and resulted in reduced statistical power (Borm, Fransen, & Lemmens, 2007; Egbewale, Lewis, & Sim, 2014; Wilcox, 2012). In addition, a single summary metric, the area under log CSF (AULCSF), is usually used to compare CSFs in different experimental conditions, potentially leaving out information in the multidimensional joint distribution of CSF parameters.In this study, we developed a Hierarchical Bayesian Model (HBM) to reduce the variability of estimated CSF parameters for each test and to further improve the ability to detect between-condition CSF changes in a within-subject design. The HBM is a generative model framework that uses Bayes’ rule to quantify the joint distribution of test-, subject-, and population-level parameters and hyperparameters (Kruschke, 2015; Lee, 2006; Lee, 2011; Rouder & Lu, 2005; Wilson, Cranmer, & Lu, 2020). It explicitly quantifies the covariance of the hyperparameters and parameters (Daniels & Kass, 1999; Klotzke & Fox, 2019; Thall, Wathen, Bekele, Champlin, Baker, & Benjamin, 2003; Wang, Lin, & Nelson, 2020; Yang, Zhu, Choi, & Cox, 2016). By sharing information within and across levels via conditional dependencies, it reduces the variance of the test-level estimates through (1) decomposition of variabilities from different sources (test, subject, and population) with parameters and hyperparameters (Song, Behmanesh, Moaveni, & Papadimitriou, 2020), and (2) shrinkage of the estimated parameters at the lower levels toward the mean of the higher levels when there is not sufficient data at the lower level (Kruschke, 2015; Rouder & Lu, 2005; Rouder, Sun, Speckman, Lu, & Zhou, 2003).Although it has been used in many different disciplines, such as astronomy (Thrane & Talbot, 2019), ecology (Reum, Hovel, & Greene, 2015; Wikle, 2003), genetics (Storz & Beaumont, 2002), machine learning (Li & Perona, 2005), cognitive science (Ahn, Krawitz, Kim, Busmeyer, & Brown, 2011; Lee, 2006; Lee & Mumford, 2003; Merkle, Smithson, & Verkuilen, 2011; Molloy, Bahg, Li, Steyvers, Lu, & Turner, 2018; Molloy, Bahg, Lu, & Turner, 2019; Rouder & Lu, 2005; Rouder et al., 2003; Wilson et al., 2020) and visual acuity (Zhao, Lesmes, Dorr, & Lu, 2021), HBM has not been applied to analyze the CSF. Here, we develop a three-level HBM to model the entire CSF dataset in a single-factor (luminance), multi-condition (3 luminance conditions), and within-subject experiment design. We modeled the data with CSF parameters at the test level and hyperparameters at the individual and population levels, with conditional dependencies across levels. We evaluated the performance of the HBM relative to the BIP using an existing dataset of 112 subjects tested with qCSF in three luminance conditions (Hou et al., 2016), which was collected to mimic mild, medium, and large CSF changes observed in clinical settings (Bellmann, Unnebrink, Rubin, Miller, & Holz, 2003; Haymes, Roberts, Cruess, Nicolela, LeBlanc, Ramsey, Chauhan, & Artes, 2006; Kalia, Lesmes, Dorr, Gandhi, Chatterjee, Ganesh, Bex, & Sinha, 2014; Kleiner, Enger, Alexander, & Fine, 1988; Midena, Degli Angeli, Blarzino, Valenti, & Segato, 1997; Owsley, Sekuler, & Siemsen, 1983). In addition, a simulation study was conducted to evaluate and compare the accuracy and precision of the estimates from the HBM and BIP. We hypothesized that, relative to the BIP, the HBM would reduce the variability of the estimated CSF parameters from each test, increase the d′s of CSF changes between luminance conditions for each subject, and improve statistical inference across subjects.
Bayesian modeling of the CSF
Overview
In a typical within-subject design CSF experiment with multiple conditions, the trial-by-trial data can be organized as y = (f, c, r), where r, either correct or incorrect, is individual i's response in trial m of test k in experimental condition j tested with a stimulus of spatial frequency f and contrast c. The BIP consists of four components (Hou, et al., 2016): (1) a log-parabola model of the contrast sensitivity function with several parameters, (2) a likelihood function that specifies the probability of making a correct or incorrect response in each stimulus condition, (3) a Bayesian procedure to infer the posterior distribution of the CSF parameters for each subject in each test, and (4) inference based on statistics computed from posterior distributions either at the subject level or aggregated across subjects. In this section, we first provide a brief review of the BIP, and then introduce the HBM.
The Bayesian inference procedure
In the BIP (Figures 1, 2a), the contrast sensitivity S(f, θ) at spatial frequency f is modeled with a log parabola function with three parameters, (Lesmes et al., 2010; Rohaly & Owsley, 1993; Watson, & Ahumada Jr, 2005):
Figure 1.
The Bayesian inference procedure (BIP) for a single test. (a) A three-dimensional prior distribution of the CSF parameters. (b) Trial-by-trial data. (c) A CSF model with three parameters. (d) Psychometric functions at different spatial frequencies. (e) A three-dimensional posterior distribution of the CSF parameters.
Figure 2.
(a) The Bayesian Inference Procedure (BIP) computes the posterior distribution of CSF parameters for each test independently. (b) A three-level hierarchical Bayesian model (HBM) of CSFs across multiple individuals, conditions and tests. At the population level, μ and Σ are the mean and covariance hyperparameters of the population. At the individual level ρ and ϕ are the mean and covariance hyperparameters of individual i in experimental condition j. At the test level, θ is the CSF parameter of individual i in test k of condition j.
where is the peak sensitivity, is the peak spatial frequency (cycles/degree), and β is the bandwidth (octaves) at half of the peak sensitivity. The probability of making a correct response is described with a psychometric function (Hou et al., 2015):
where g is the guessing rate, λ, usually set to 0.04 (Lesmes et al., 2010; Wichmann & Hill, 2001), is the lapse rate, Φ is the standard cumulative Gaussian function, and σ determines the steepness of the psychometric function. The probability of making an incorrect response is:The Bayesian inference procedure (BIP) for a single test. (a) A three-dimensional prior distribution of the CSF parameters. (b) Trial-by-trial data. (c) A CSF model with three parameters. (d) Psychometric functions at different spatial frequencies. (e) A three-dimensional posterior distribution of the CSF parameters.(a) The Bayesian Inference Procedure (BIP) computes the posterior distribution of CSF parameters for each test independently. (b) A three-level hierarchical Bayesian model (HBM) of CSFs across multiple individuals, conditions and tests. At the population level, μ and Σ are the mean and covariance hyperparameters of the population. At the individual level ρ and ϕ are the mean and covariance hyperparameters of individual i in experimental condition j. At the test level, θ is the CSF parameter of individual i in test k of condition j.Equations 2 and 3 define the likelihood function, that is, the probability of making a correct or incorrect response given the stimulus and CSF parameters in a trial. The goal in most experiments is to infer the CSF parameters from the experimental data, that is, estimate the posterior distribution p(θ|Y) — the distribution of the CSF parameters θ given the experimental data Y = {y}, for m=1, …, M, where M is the total number of trials in a test. This can be accomplished using Bayes’ rule:
where p0(θ) is the prior probability distribution of the CSF parameters for individual i in test k of experimental condition j, which is usually uninformative and the same for all subjects and experimental conditions, and the denominator is the integral across all possible values of θ, and is a constant for a given dataset and BIP.
The hierarchical Bayesian model
We developed a three-level HBM to account for the entire dataset, incorporating conditional dependencies across test, individual, and population levels to improve estimates for each test (see Figure 2b). The HBM is based on three properties: (1) CSF parameters at the test level are conditionally dependent on hyperparameters at the individual level, (2) CSF hyperparameters at the individual level are conditionally dependent on those at the population level (“conditional dependency”), and (3) the probability p(r|θ, f,c ) of response r is determined only by the CSF parameters θ in that test (Equations 2, 3).In the HBM, the joint distribution of CSF hyperparameter η across all the J experimental conditions at the population level, p(η), is modeled as a mixture of 3 × J-dimensional Gaussian distributions with mean μ and covariance Σ, which have distributions p(μ) and p(Σ):The joint distribution of CSF hyperparameter τ of individual i across all experimental conditions 1:J at the individual level, p(τ|η), is modeled as mixtures of three-dimensional Gaussian distributions with mean ρ and covariance ϕ, which have distributions p(ρ|η) and p(ϕ):
where p(ρ|η) denotes that ρ is conditioned on η, and ϕ is a 3 × 3 covariance matrix in experimental condition j. Finally, at the test level, p(θ|τ), the joint distribution of the CSF parameters, θ, is conditioned on τ.The probability of obtaining the entire dataset is computed by probability multiplication:
where X = (θ1: , ρ1: , ϕ1: , μ, Σ) are all the parameters and hyperparameters in the HBM.We can use Bayes’ rule to compute the joint posterior distribution of X (Kruschke, 2015; Lee, 2006; Lee, 2011; Rouder & Lu, 2005; Wilson et al., 2020):
where the denominator is the integral across all possible values of X and is a constant for a given dataset and HBM; p0(μ), p0(Σ), and p0(ϕ) are the prior distributions.
Methods
Data
The dataset used in this study included 112 college-aged subjects, each tested once (K = 1) in three luminance conditions (low = 2.62 cd/m2, medium = 20.4 cd/m2, and high = 95.4 cd/m2) with the qCSF method (Hou et al., 2016). Each test consisted of 150 trials. Three test trials were presented in each display consisting of three filtered letters of the same size, randomly sampled with replacement from 10 SLOAN letters (C, D, H, K, N, O, R, S, V, and Z), with the center spatial frequency and contrasts of the letters determined by qCSF. Subjects were asked to verbally report the identity of the letters on the screen.
Apparatus
All analysis was conducted on a Dell computer with Intel Xeon W-2145 @ 3.70 GHz CPU (8 cores and 16 threads) and 64 GB installed memory (RAM). The BIP was implemented in Matlab R201Xa (MathWorks Corp., Natick, MA, USA) and the HBM was implemented in JAGS (Plummer, 2003) in R (R Core Team, 2020).
Implementation of the BIP
Because a 10-alternative forced-choice identification task was used in the experiment, we set g to 0.1, and σ to 0.1485 in Equation 2 (Foley & Legge, 1981; Hou et al., 2015; Legge, Kersten, & Burgess, 1987; Lesmes et al., 2010; Lu & Dosher, 1999). Following the qCSF procedure (Hou et al., 2015; Lesmes et al., 2010), we defined a three-dimensional CSF parameter space with 60 log-linearly spaced values between 1.05 and 1050, 40 log-linearly spaced values between 0.1 and 20 cycles/degree, and 27 log-linearly spaced β values between 1 and 9 octaves. The weakly informative prior, p0(θ), identical across all the tests, subjects, and experimental conditions, was defined by a hyperbolic secant function (Lesmes et al., 2010):
where , , , and β for a = 1, 2, and 3, respectively, θ= (0.5, 0.5, 0.5), and θ = (100, 1, 3).The posterior distributions of the CSF parameters p(θ|Y) was computed using Equation 4. Convergence of the BIP solutions was quantified by the half-width of 68.2% credible interval (HWCI: Clayton & Hills, 1993; Edwards, Lindman, & Savage, 1963), equivalent to the standard deviation of the distribution if it is normal. With sufficient number of trials in the qCSF, the HWCI can reach its asymptotic minimum (Hou et al., 2015; Lesmes et al., 2010).
Implementation of the HBM
In the current implementation of the HBM, the prior of μ, p0(μ), was a nine-dimensional uniform distribution:
with μ0, and μ0, of the three parameters in the three luminance conditions specified in Table 1.
Table 1.
μ0, and μ0, of the uniform prior of μ.
L
M
H
γmax
fmax
β
γmax
fmax
β
γmax
fmax
β
μ0,min
1.05
0.1
1
1.05
0.1
1
1.05
0.1
1
μ0,max
1050
20
9
1050
20
9
1050
20
9
H, high; L, low; M, medium.
μ0, and μ0, of the uniform prior of μ.H, high; L, low; M, medium.The weakly informative prior distribution of Σ, p0(Σ), was specified by a 9 × 9 precision matrix Ω with a Wishart distribution:
where the degrees of freedom ν = 9, and the expected mean, Σ−1, was based on the covariance matrix of the estimated CSF parameters Σ across all the subjects and luminance conditions from the BIP procedure.The weakly informative prior distribution of ϕ, p0(ϕ), was specified with a 3 × 3 precision matrix Λ with a Wishart distribution:
where the degrees of freedom ν = 3, and the expected mean, , was based on the average covariance matrix ϕ computed from the estimated CSF parameters across all the subjects in luminance condition j from the BIP procedure.The R (R Core Team, 2020) function in JAGS (Plummer, 2003) was used to compute representative samples of the posterior distributions of θ (3 parameters/condition × 3 conditions × 112 subjects = 1008 parameters), ρ (9 parameters × 112 subjects = 1008 parameters), ϕ (6 parameters/condition × 3 conditions = 18 parameters), μ (9 parameters), and Σ (45 parameters) in three Markov Chain Monte Carlo (MCMC) chains. The MCMC is an algorithm used to efficiently sample the joint posterior distribution (Kruschke, 2015). It started at a randomly selected position in the 2088-dimensional parameter space. In each step, one of the 2088 parameters was selected randomly. The one-dimensional conditional posterior probability distribution of the selected parameter was evaluated by fixing the values of all the other 2087 parameters at the current position. A new value of the selected parameter was chosen based on the one-dimensional conditional probability distribution (Equation 8). By reiterating this process, the probability of visiting a location in the random walk approximated the joint posterior distribution of all the 2088 parameters in Equation 8. These steps were re-iterated until the convergence criterion was reached.Gelman and Rubin's diagnostic (Gelman & Rubin, 1992), the ratio of between-chain and within-chain variances, was used to quantify the convergence between different MCMC chains. The convergence criterion was set at 1.05 for all parameter estimates. After the convergence criterion was met, the program terminated when 1,000,000 total samples were generated in each MCMC chain. There were 10,000 of the 1,000,000 samples that were stored (thinning ratio = 100) to ensure at least 10,000 effective samples of X in subsequent analysis.
Statistical analysis
Goodness of fit
Bayesian predictive information criterion (BPIC; Ando 2007; Ando 2011) was used to quantify the goodness of fit to the trial-by-trial data. The BPIC quantifies the likelihood of the data based on the joint posterior distribution of the parameters of the model and penalizes model complexity.
Posterior distributions of the area under log CSF
The posterior distributions of AULCSF were constructed by computing the AULCSFs from samples of the corresponding posterior distributions of θfrom the HBM and BIP.
d′: Between-condition discriminability of distributions
Discriminability d′ quantifies the signal (mean separation) to noise (variability) ratio of two probability distributions. We used the difference distribution between conditions to compute d′. Each sample in the difference distribution represented the difference between two randomly drawn samples from the corresponding distributions.For a one-dimension difference distribution, d′ is defined as (Green & Swets, 1966):
where Δ is the mean separation, and σ is the standard deviation of the difference distribution.For a multidimensional difference distribution, d′ is defined as (Ashby & Townsend, 1986):
where Δ and cov(Δ) are the mean separation and covariance matrix of the difference distribution, cov(Δ)−1 is the inverse of cov(Δ), Δ is the transpose of Δ, and * represents matrix multiplication.
Statistical tests on CSF parameters and AULCSF
We compared the mean (expected value) and variance of the posterior distributions of θ from the HBM and BIP using Hotelling's T-squared test (Anderson, 2003) in R (R Core Team, 2020; Nordhausen, Sirkia, Oja, & Tyler, 2018). We also compared the correlation coefficients of pairs of CSF parameters from the two methods with paired t-test.To quantify the between-condition discriminability across subjects, we compared the means of the posterior distributions of θ and AULCSF between pairs of experimental conditions from each method with Hotelling's T-squared test and paired t-test, respectively.
Simulation
To compare the accuracy and precision of the BIP and HBM estimates, we conducted a simulation study to investigate the bias and variability of the estimated CSF parameters for each test. The dataset consisted of 336 qCSF tests (112 subjects × 3 conditions). θ1: of the simulated tests were a random sample from the posterior distribution of τ1: obtained from the HBM fit to the real data. Each qCSF test consisted of 150 trials, identical to the real experiment (Hou et al., 2016), with the trial-by-trial responses determined by the CSF parameters of the simulated subject (Equations 1 to 3). Both the HBM and BIP were fit to the simulated dataset. The mean of the posterior distribution of θ was used as the best estimate for each test. The bias, root mean square error (RMSE), variance, d′, and t statistics were computed based on the posterior distributions of θ from both methods.
Results
Goodness of fit
The BPIC for the BIP and HBM were 34886 and 34225, respectively, indicating that the HBM fit the data better than the BIP. Figure 3 shows the estimated CSFs of one subject from the BIP and HBM in three luminance conditions.
Figure 3.
Estimated CSFs of one subject from the BIP (a, b, c) and HBM (d, e, f) methods in three luminance conditions (a, d) L = 2.62 cd/m2, (b, e) M = 20.4 cd/m2, and (c, f) H = 95.4 cd/m2. Color map indicates log10 probability density.
Estimated CSFs of one subject from the BIP (a, b, c) and HBM (d, e, f) methods in three luminance conditions (a, d) L = 2.62 cd/m2, (b, e) M = 20.4 cd/m2, and (c, f) H = 95.4 cd/m2. Color map indicates log10 probability density.Many “image-computable” models have used the CSF as the front-end filter on actual images to predict human performance in image processing and object recognition (Chung, Legge, & Tjan, 2002; Malo, Pons, Felipe, & Artigas, 1997; Schütt, & Wichmann, 2017; Watson, 2000; Watson & Ahumada Jr, 2005; Watson, & Malo, 2002). Although recent studies have suggested that a more comprehensive model may require additional parameters related to non-linearities in the visual system (Chen, Hou, Yan, Zhang, Xi, Zhou, Lu, & Huang, 2014; Hou, Lu, & Huang, 2014), the CSF filtered images nevertheless provide an excellent demonstration of human visual processing. To illustrate the differences between the CSF estimates from the BIP and HBM and their implications for image-computable models (Figure 4), we applied the mean - SD, mean, and mean + SD CSFs from the BIP and HBM in the high luminance condition to filter a letter K (Lu & Dosher, 2013). Although the mean CSFs from the two methods are very similar and generated very similar filtered K's, the CSFs from the BIP exhibited much larger uncertainty.
Figure 4.
Visualization of the CSF estimates from the BIP and HBM. Filtered letter K by the mean - SD (a, d), mean (b, e) and mean + SD (c, f) CSFs from the BIP (a, b, c) and HBM (d, e, f) in the high luminance condition. The original image is shown in (g).
Visualization of the CSF estimates from the BIP and HBM. Filtered letter K by the mean - SD (a, d), mean (b, e) and mean + SD (c, f) CSFs from the BIP (a, b, c) and HBM (d, e, f) in the high luminance condition. The original image is shown in (g).
Posterior distributions from the HBM
Figure 5 shows the three-dimensional posterior distributions of hyperparameters η (marginalized), τ for one individual, and θ for one individual in one test from the HBM.
Figure 5.
Three-dimensional posterior distributions of η (marginalized) (a), τ for one individual (b), and θ (c) for one individual in one test in the HBM. The colors represent log10 probability density.
Three-dimensional posterior distributions of η (marginalized) (a), τ for one individual (b), and θ (c) for one individual in one test in the HBM. The colors represent log10 probability density.
Population level
Table 2 shows the mean and covariance matrix of η. The correlation coefficients were positive and significant between all three pairs of experimental conditions. Table 3 shows the d′s of η between the three pairs of experimental conditions. In the HBM, the posterior distributions of η constrained τ. The large d′s of the posterior distributions of η between different experimental conditions indicated that the posterior distributions of η provided strong constraints on τ.
Table 2.
Mean and covariance of η.
Covariance
L
M
H
Parameter
η1
η2
η3
η4
η5
η6
η7
η8
η9
Mean
L
η1
6.2E-03
1.1E-03
4.4E-05
4.7E-03
1.9E-03
−4.7E-04
4.2E-03
−2.3E-04
8.4E-04
1.564
η2
1.1E-03
9.5E-03
−1.9E-03
3.2E-03
4.5E-03
−5.3E-04
2.6E-03
1.8E-03
1.0E-03
0.319
η3
4.4E-05
−1.9E-03
1.1E-03
3.0E-05
−6.9E-04
5.6E-04
−1.2E-04
−5.4E-04
5.3E-04
0.462
M
η4
4.7E-03
3.2E-03
3.0E-05
6.9E-03
1.7E-03
−1.1E-04
5.4E-03
−1.5E-04
1.2E-03
1.729
η5
1.9E-03
4.5E-03
−6.9E-04
1.7E-03
6.3E-03
−1.7E-03
2.6E-03
1.9E-03
2.9E-04
0.387
η6
−4.7E-04
−5.3E-04
5.6E-04
−1.1E-04
−1.7E-03
1.2E-03
−6.8E-04
−8.1E-04
9.2E-04
0.478
H
η7
4.2E-03
2.6E-03
−1.2E-04
5.4E-03
2.6E-03
−6.8E-04
8.5E-03
−2.7E-03
1.3E-03
1.810
η8
−2.3E-04
1.8E-03
−5.4E-04
−1.5E-04
1.9E-03
−8.1E-04
−2.7E-03
1.0E-02
−3.7E-03
0.385
η9
8.4E-04
1.0E-03
5.3E-04
1.2E-03
2.9E-04
9.2E-04
1.3E-03
−3.7E-03
2.9E-03
0.520
H, high; L, low; M, medium.
Table 3.
d′ of η and average d′ of τ between pairs of luminance conditions.
Hyperparameter
M-L
H-L
H-M
η
8.1
10.2
5.6
τij
11.6
11.6
5.5
H, high; L, low; M, medium.
Mean and covariance of η.H, high; L, low; M, medium.d′ of η and average d′ of τ between pairs of luminance conditions.H, high; L, low; M, medium.
Individual level
Table 4 shows the average covariance matrix of τ1: across all 112 individuals and experimental conditions. Figure 5(b) illustrates the three-dimensional posterior distributions of τ for one individual in all three luminance conditions. Table 3 shows the average d′s of τ between the three pairs of experimental conditions. In the HBM, the posterior distributions of τ constrained θ. The large d′s of the posterior distributions of τ between different experimental conditions indicated that the posterior distributions of τ provided strong constraints on θ.
Table 4.
Average covariance matrix of τ1: .
L
M
H
Parameter
τ1:I,11
τ1:I,12
τ1:I,13
τ1:I,21
τ1:I,22
τ1:I,23
τ1:I,31
τ1:I,32
τ1:I,33
τ1:I,j1
2.6E-03
−9.8E-04
−1.0E-06
2.2E-03
−8.0E-04
−3.2E-05
3.4E-03
−1.9E-03
5.5E-05
τ1:I,j2
−9.8E-04
4.6E-03
−1.8E-03
−8.0E-04
3.6E-03
−1.4E-03
−1.9E-03
8.0E-03
−2.9E-03
τ1:I,j3
−1.0E-06
−1.8E-03
8.7E-04
−3.2E-05
−1.4E-03
7.2E-04
5.5E-05
−2.9E-03
1.4E-03
H, high; L, low; M, medium.
Average covariance matrix of τ1: .H, high; L, low; M, medium.
Test level
We computed the mean, covariance, and correlation coefficient based on the estimated test-level CSF parameters θ1: in the three luminance conditions from the HBM and compared them with the results from the BIP.The means of the posterior distributions of θ from the HBM and BIP were significantly different (t2 (9,103) = 5.34, p < 0.001), and the average variance of the estimated CSF parameters from the HBM (mean = 0.00139 log10 units; range = 0.00030 to 0.00739 log10 units) was 65.8% less than that from the BIP (mean = 0.00407 log10 units; range = 0.00035 to 0.11893 log10 units) (t2 (9,103) = 109, p < 0.001), consistent with the well-known variance shrinkage effect of the HBM (Kruschke, 2015).Figures 6 and 7 show histograms of the difference between the expected values of θ, and the standard deviation (SD = ) of θ from the BIP and HBM. Whereas most of the differences between the expected values of θ from the two methods were small (mean absolute difference = 0.027 log10 units), there were thirteen instances (out of a total of 3 parameters × 3 conditions × 112 subjects = 1008) in which the absolute difference was greater than 0.2 log10 units (range = 0.200 to 0.676 log10 units). The discrepancies were associated with large variances of the BIP estimates in those instances: their average variance of 0.065 log10 units (range = 0.027 to 0.119 log10 units) was 16 times the mean variance (0.00407 log10 units) of θin the BIP procedure, suggesting that BIP did not converge well in those cases. On the other hand, the HBM generated more precise estimates with on average a 93.7% reduction of variance (mean = 0.00407 log10 units; range = 0.00139 to approximately 0.00681 log10 units) compared to the BIP in the 13 cases by incorporating data from all the subjects and conditions in a single model.
Figure 6.
Histograms of the difference between the expected values of θ from the HBM and BIP. (a) ; (b) ; and (c) β.
Figure 7.
Histograms of the standard deviation (SD) of θfrom the HBM and BIP procedures. (a) ; (b) ; and (c) β.
Histograms of the difference between the expected values of θ from the HBM and BIP. (a) ; (b) ; and (c) β.Histograms of the standard deviation (SD) of θfrom the HBM and BIP procedures. (a) ; (b) ; and (c) β.Table 5 lists the average correlation coefficients between θ in pairs of luminance conditions across subjects from the HBM and BIP procedures. All correlations were negative, with the strongest between f and β. Across all the subjects, 97.4% and 97.9% of the correlation coefficients from the BIP and HBM were statistically significant, respectively. Although the paired t-test showed that the correlation coefficients between γ and β in the high luminance condition (p = 0.003) and between fand β in all three luminance conditions (p < 0.001) from the two procedures were significantly different, the magnitudes of the differences were very small and probably not of practical importance.
Table 5.
Average correlations between θ in pairs of luminance conditions.
Condition
Method
Parameter pair
L
M
H
BIP
γmax
fmax
−0.185
−0.134
−0.214
γmax
β
−0.072
−0.145
−0.078
fmax
β
−0.920
−0.910
−0.904
HBM
γmax
fmax
−0.197
−0.169
−0.174
γmax
β
−0.095
−0.143
−0.150
fmax
β
−0.899
−0.891
−0.887
H, high; L, low; M, medium.
Average correlations between θ in pairs of luminance conditions.H, high; L, low; M, medium.Table 6 shows the average d′s of θ and AULCSF between pairs of luminance conditions across all the subjects. Averaged across the three pairs, the AULCSF d′ from the HBM was 33.5% greater than that from the BIP. Compared to AULCSF, incorporating information from the three-dimensional joint distributions of θ led to an average d′ increase of 66.6% for the BIP and 51.7% for the HBM. Compared to the AULCSF d′ from the BIP, using θ in the HBM increased d′ by 103.3% across the three pairs of luminance conditions.
Table 6.
Average d′ of θ and AULCSF between pairs of luminance conditions.
Method
Metric
M-L
H-L
H-M
BIP
AULCSF
7.5
10.8
3.4
θij1
10.8
16.0
7.0
HBM
AULCSF
10.1
13.5
4.8
θij1
13.4
19.2
8.6
H, high; L, low; M, medium.
Average d′ of θ and AULCSF between pairs of luminance conditions.H, high; L, low; M, medium.
Statistics on θ and AULCSF across individuals
Table 7 shows and t(111) of the means of θ and AULCSF among the three pairs of experimental conditions. The HBM generated larger t values than the BIP for both θand AULCSF in all pairs of experimental conditions. Averaged across the three pairs, t(111) of AULCSF and of θ from the HBM were 51.2% and 49.6% greater than those from the BIP.
Table 7.
and t(111) between means of θ and AULCSF.
Method
Metric
M-L
H-L
H-M
BIP
AULCSF
41.78
45.35
21.01
θij1
27.05
34.90
17.87
HBM
AULCSF
63.88
62.78
34.11
θij1
40.99
48.35
28.38
H, high; L, low; M, medium.
and t(111) between means of θ and AULCSF.H, high; L, low; M, medium.The HBM accurately and precisely recovered θ in the simulation, with very small bias (γ, f, β: 0.0028, −0.0091, and 0.0023 log10 units), RMSE (0.0373 log10 units), and average variance (0.00149 log10 units). In comparison, the BIP exhibited lower accuracy and precision (bias = γ, f, β: 0.0147, −0.0395, and 0.0118 log10 units; RMSE = 0.0673 log10 units; average variance = 0.00428 log10 units).
Discussion
The HBM provides a general framework that can be adapted to different experiment designs. In this paper, we developed a three-level HBM to account for CSF data of 112 subjects in a single-factor (luminance), multi-condition (3 luminance conditions), and within-subject experimental design. We applied the HBM to quantify the joint distribution of CSF parameters and hyperparameters at the population, individual, and test levels and compared the performance of the model with that of the BIP. The HBM generated more precise estimates for each test than the BIP by incorporating information across subjects and conditions to constrain the estimates. The increased precision led to increased d′s of AULCSF and CSF parameters between different experimental conditions at the test level for each subject, and bigger statistical differences across subjects. Relative to the BIP, the HBM increased the average d′s of AULCSF and θ between conditions at the test level by 24.5% and 20.5%, and the corresponding t(111)and by 51.2% and 49.6%, respectively. Simulations also showed that the HBM generated accurate and precise CSF parameter estimates.The HBM generated larger d′ and t statistics at the test level because it reduced the variance of θ by 65.8% relative to the BIP (0.00139 vs. 0.00407 log10 units). In addition, the 13 instances in which the absolute difference of θ from the HBM and BIP were greater than 0.2 log10 units further demonstrated the benefit of incorporating information across tests, subjects, and conditions in the HBM (Kruschke, 2015; Rouder & Lu, 2005; Rouder, Sun, Speckman, Lu, & Zhou, 2003). In those cases, the variances of the BIP estimates were very large (16 times the mean variance), suggesting that the BIP did not converge well. On the other hand, the HBM generated much more precise CSF estimates for each test by incorporating data across subjects and conditions in a single model. The ability of the HBM to generate more precise estimates from insufficient or poor-quality data can be quite valuable in clinical trials.The HBM can be used to conduct two types of power analyses. First, a replication power analysis computes the power of different sample sizes in replicated experiments with the exact same experimental design (Kruschke, 2015). Simulated data for new subjects can be generated from the posterior distributions of the hyperparameters based on the HBM fit to the existing dataset, just as we did in the simulation. In this case, the original data of 112 subjects shall be combined with the simulated data to compute the power for each new sample size. A more interesting application of the HBM is in prospective power analysis (Kruschke, 2015). In that case, no data have been collected for a new experiment; simulated data of the new experiment must be based on the generative model constructed based on results from a different experiment. An HBM for the new experimental design is then constructed and fit to the simulated data. Therefore, data from existing studies can only be used as prior knowledge and cannot be combined with the simulated data.A certain sample size is required for the joint posterior distribution of the CSF hyperparameters and parameters in the HBM to become stable. This can be done by evaluating the stability of the estimates with different sample sizes. Relative to the noninformative diffuse prior used in the BIP, Gu, Kim, Hou, Lesmes, Pitt, Lu, and Myung (2016) showed that (1) an informative prior from the HBM fit to as few as five subjects could provide significant improvement in qCSF measurements of new subjects in the hierarchical adaptive design optimization (HADO) procedure, (2) prior constructed from larger samples further improved the accuracy and precision of the estimation, and (3) the improvement stabilized when the sample size was about 30. Simulation studies are necessary to determine the minimum sample size required for the HBM to converge for a given experiment.Although the MCMC algorithm automatically selected by the JAGS (Plummer, 2003) provided an efficient sampling method, the weakly informative priors of the covariance matrices at both the subject and population levels were very helpful. With these priors, it took about 54 hours for each MCMC chain to generate at least 10,000 effective samples (Kruschke, 2015) for all parameters on a computer with eight cores. Fifty-four hours are practical given that the actual physical time decreases with increasing number of CPUs used in parallel computation (Kruschke, 2015). On the other hand, the HBM using diagonal covariance matrices as the priors took 18% longer (63.5 hours) to generate 10,000 effective samples for all parameters, and did not converge, as indicated by the larger variances of the estimated CSF parameters at both the subject (0.00298 log10 units and 114% increase) and population levels (0.212 log10 units and 2873% increase), although the Gelman and Rubin's diagnostic was below 1.05 for all parameters, which is based on the ratio of between-chain and within-chain variances but not the magnitudes of the variances. The effects of priors on covariance estimation were consistent with previous studies (Hobert & Casella, 1996; Rouder et al., 2003).Although the HBM in the current study was developed to account for group differences in a within-subject design, HBM-based approaches can be developed to detect deviations in individual patients belonging to different subpopulations (e.g. healthy versus different stages of an eye disease, or different eye diseases) using different tests (null hypothesis significance testing versus estimation of magnitude/effect size) in both frequentist and Bayesian approaches (Kruschke & Liddell, 2018). The HADO method (Gu, Kim, Hou, Lesmes, Pitt, Lu, & Myung, 2016; Kim, Pitt, Lu, Steyvers, & Myung, 2014) provides a potential framework. HADO uses an informative prior obtained from all previously tested subjects with the same CSF characteristics (e.g. testing in the same luminance condition). It took 20 to 30 trials for the qCSF method with an uninformative diffuse prior to achieve the same initial precision level of the HADO procedure with the informative prior. Moreover, HADO with a mixture prior that represent a wide range of CSF properties (e.g. across different luminance conditions) still achieved higher precision than qCSF with an uninformative prior, and could mitigate the problem of mis-specified prior and improve the qCSF method on testing individual subjects. The joint posterior distributions of the hyperparameters at the population and individual levels from HBM can provide informative priors within the HADO framework for new individuals and repeated tests of the same individual, respectively. Furthermore, the HBM can be extended to model additional covariance between parameters of different measurements (e.g. CSF and visual acuity) in a joint modeling approach (Palestro, Bahg, Sederberg, Lu, Steyvers, & Turner, 2018; Turner, Forstmann, Wagenmakers, Brown, Sederberg, & Steyvers, 2013) to account for multiple test results of multiple subjects and conditions, and potentially further increase statistical power in detecting changes of functional vision in normal and clinical populations. Therefore, the HBM framework can be used to take advantage of all available information at different levels to enable sensitive detection of CSF changes, and thereby improve patient care and clinical trials with increased statistical power.
Conclusions
In this paper, we developed a three-level HBM to account for CSF data of 112 subjects in a within-subject, single-factor (luminance), multi-condition (3 luminance conditions) experimental design. The HBM was used to compute the joint distribution of CSF parameters and hyperparameters at population, individual, and test levels to fully utilize information across levels to accurately estimate the CSF in each test. Relative to the BIP, the HBM increased the average d′s of AULCSF and θ between conditions at the test level by 24.5% and 20.5 %, and the corresponding t statistics by 51.2% and 49.6%, respectively. Future research will further evaluate the potential value of the HBM for analyzing clinical changes in contrast sensitivity, whether in individual patients or groups in clinical trials.
Authors: Stephanie Lin; Aleksandra Mihailovic; Sheila K West; Chris A Johnson; David S Friedman; Xiangrong Kong; Pradeep Y Ramulu Journal: Transl Vis Sci Technol Date: 2018-04-13 Impact factor: 3.283