Brian J Roach1, Holly K Hamilton1,2, Peter Bachman3, Aysenil Belger4, Ricardo E Carrión5,6,7, Erica Duncan8,9, Jason Johannesen10, Joshua G Kenney10, Gregory Light11,12, Margaret Niznikiewicz13, Jean Addington14, Carrie E Bearden15, Emily M Owens15, Kristin S Cadenhead11, Tyrone D Cannon10,16, Barbara A Cornblatt5,6,7,17, Thomas H McGlashan10, Diana O Perkins4, Larry Seidman13, Ming Tsuang11, Elaine F Walker18, Scott W Woods10, Daniel H Mathalon1,2. 1. Department of Psychiatry, San Francisco Veterans Affairs Healthcare System, San Francisco, California, USA. 2. Department of Psychiatry, University of California, San Francisco, California, USA. 3. Department of Psychiatry, University of Pittsburgh, Pittsburgh, Pennsylvania, USA. 4. Department of Psychiatry, University of North Carolina at Chapel Hill, Chapel Hill, North Carolina, USA. 5. Division of Psychiatry Research, The Zucker Hillside Hospital, North Shore-Long Island Jewish Health System, Glen Oaks, New York, USA. 6. Center for Psychiatric Neuroscience, Feinstein Institute for Medical Research, North Shore-Long Island Jewish Health System, Manhasset, New York, USA. 7. Department of Psychiatry, Hofstra North Shore-LIJ School of Medicine, Hempstead, New York, USA. 8. Department of Psychiatry, Atlanta Veterans Affairs Medical Center, Decatur, Georgia, USA. 9. Department of Psychiatry and Behavioral Sciences, Emory University School of Medicine, Atlanta, Georgia, USA. 10. Department of Psychiatry, Yale University, School of Medicine, New Haven, Connecticut, USA. 11. Department of Psychiatry, University of California, San Diego, California, USA. 12. Department of Psychiatry, Veterans Affairs San Diego Healthcare System, San Diego, California, USA. 13. Department of Psychiatry, Harvard Medical School at Beth Israel Deaconess Medical Center and Massachusetts General Hospital, Boston, Massachusetts, USA. 14. Hotchkiss Brain Institute Department of Psychiatry, University of Calgary, Calgary, Alberta, Canada. 15. Semel Institute for Neuroscience and Human Behavior and Department of Psychology, University of California, Los Angeles, California, USA. 16. Department of Psychology, Yale University, School of Medicine, New Haven, Connecticut, USA. 17. Department of Molecular Medicine, Hofstra North Shore-LIJ School of Medicine, Hempstead, New York, USA. 18. Department of Psychology, Emory University, Atlanta, Georgia, USA.
Abstract
OBJECTIVES: Mismatch negativity (MMN), an auditory event-related potential sensitive to deviance detection, is smaller in schizophrenia and psychosis risk. In a multisite study, a regression approach to account for effects of site and age (12-35 years) was evaluated alongside the one-year stability of MMN. METHODS: Stability of frequency, duration, and frequency + duration (double) deviant MMN was assessed in 167 healthy subjects, tested on two occasions, separated by 52 weeks, at one of eight sites. Linear regression models predicting MMN with age and site were validated and used to derive standardized MMN z-scores. Variance components estimated for MMN amplitude and latency measures were used to calculate Generalizability (G) coefficients within each site to assess MMN stability. Trait-like aspects of MMN were captured by averaging across occasions and correlated with subject traits. RESULTS: Age and site accounted for less than 7% of MMN variance. G-coefficients calculated at electrode Fz were stable (G = 0.63) across deviants and sites for amplitude measured in a fixed window, but not for latency (G = 0.37). Frequency deviant MMN z-scores averaged across tests negatively correlated with averaged global assessment of functioning. CONCLUSION: MMN amplitude is stable and can be standardized to facilitate longitudinal multisite studies of patients and clinical features.
OBJECTIVES: Mismatch negativity (MMN), an auditory event-related potential sensitive to deviance detection, is smaller in schizophrenia and psychosis risk. In a multisite study, a regression approach to account for effects of site and age (12-35 years) was evaluated alongside the one-year stability of MMN. METHODS: Stability of frequency, duration, and frequency + duration (double) deviant MMN was assessed in 167 healthy subjects, tested on two occasions, separated by 52 weeks, at one of eight sites. Linear regression models predicting MMN with age and site were validated and used to derive standardized MMN z-scores. Variance components estimated for MMN amplitude and latency measures were used to calculate Generalizability (G) coefficients within each site to assess MMN stability. Trait-like aspects of MMN were captured by averaging across occasions and correlated with subject traits. RESULTS: Age and site accounted for less than 7% of MMN variance. G-coefficients calculated at electrode Fz were stable (G = 0.63) across deviants and sites for amplitude measured in a fixed window, but not for latency (G = 0.37). Frequency deviant MMN z-scores averaged across tests negatively correlated with averaged global assessment of functioning. CONCLUSION: MMN amplitude is stable and can be standardized to facilitate longitudinal multisite studies of patients and clinical features.
Mismatch negativity (MMN) is an auditory event‐related potential (ERP) component that is automatically evoked by an infrequently occurring “deviant” auditory stimulus that differs in duration, pitch, or another physical feature from a series of repeated preceding “standard” stimuli. MMN can be measured using either electroencephalography (EEG) or magnetoencephalography. It is believed to reflect sensory echoic memory, as detecting auditory deviance requires online formation and maintenance of a memory trace of immediately preceding standard stimuli. Because of its robust sensitivity to the pathophysiology of schizophrenia (Avissar et al., 2018; Erickson, Ruffle, & Gold, 2016), and its ability to predict conversion to a psychotic disorder in clinical high‐risk (CHR) individuals (Bodatsch et al., 2011; Perez et al., 2014; Shaikh et al., 2012), MMN has great potential as an ERP biomarker in schizophrenia research, leading to its inclusion in the multisite North American Prodrome Longitudinal Study [NAPLS Addington et al. (2012)]. Accordingly, the test–retest reliability and stability of the MMN response over repeat test occasions must be evaluated to better understand the generalizability of this ERP component in multisite, clinical trials or longitudinal studies of psychosis.The test–retest reliability of the MMN response has been the focus of several studies because of its potential clinical utility [see Naatanen (2003)] for a review), but reliability was assessed with Pearson (Kathmann, Frodl‐Bauch, & Hegerl, 1999; Kujala, Kallio, Tervaniemi, & Naatanen, 2001; Pekkonen, Rinne, & Naatanen, 1995; Schroger, Giard, & Wolff, 2000; Tervaniemi et al., 1999; Uwer & von Suchodoletz, 2000) or Spearman (Deouell & Bentin, 1998; Schall, Catts, Karayanidis, & Ward, 1999) correlation coefficients in many of these studies. Alternatively, a measure that better captures agreement in responses from one test occasion to the next is the intraclass correlation (ICC) coefficient (Shrout & Fleiss, 1979). ICCs have been calculated in some MMN studies (Biagianti et al., 2017; Chen, Chan, & Cheng, 2018; Hall et al., 2006; Lew, Gray, & Poole, 2007; Light et al., 2012; Light & Braff, 2005; McCleery et al., 2019; Recasens & Uhlhaas, 2017). Regardless of what coefficient type was reported, only three studies (Biagianti et al., 2017; Light et al., 2012; Light & Braff, 2005) had sufficiently long time‐intervals between tests (at least 6 months) to be considered relevant to MMN stability.Given the broad age range (12–35 years) in NAPLS (Addington et al., 2012), and potential age differences between CHR individuals who later transition to psychosis and those who do not, such studies must control for potential confounding effects of normal aging on MMN responses. One approach to adjust for any normal aging effects on MMN is to apply a simple linear regression model to the healthy control (HC) data. The resulting regression equation is used to calculate age‐corrected MMN z‐scores for all subjects. This is done by subtracting the predicted MMN based on a subject's age from his/her observed MMN score, and then dividing the result by the standard error of regression obtained from the HC age‐regression model. Such age‐corrected MMN z‐scores derived from a HC model have no relationship with age in the HC sample, but any pathological age effects in patient or hold‐out samples are preserved. The z‐scores are readily interpretable as linear transformations of MMN raw scores, reflecting the degree of deviation or abnormality, in standard units, from the MMN expected for a person of a given age in the HC sample. This approach is not unprecedented, having been implemented previously in MMN (Biagianti et al., 2017; Perez et al., 2014), other ERP (Hamilton, Roach, et al., 2019; Hamilton, Woods, et al., 2019; Mathalon et al., 2018; Mathews et al., 2016; Perez, Woods, et al., 2012; Perez, Ford, et al., 2012), functional (Fryer et al., 2013; Fryer et al., 2016; Fryer et al., 2018) and structural (Heyes et al., 2001; Jernigan, Press, & Hesselink, 1990; Pfefferbaum et al., 1992) magnetic resonance imaging studies.In any multisite study design comparing patients and controls, the effect of laboratory testing site should be carefully considered (Glover et al., 2012). Despite applying all possible best practices to minimize site‐specific influences on experiments, differences in mean responses between sites (i.e., fixed effects of site) remain possible due to real differences in the random samples of participants studied at each site. Therefore, the site effect should be modeled and, even when site is not statistically significant, should not be disregarded based on this criteria alone. Furthermore, it may be of interest to test secondary/exploratory hypotheses that do not involve the entire study sample. In such cases, simply including site in the model may not be sufficient if the subset of participants is too small to accurately estimate site effects.In NAPLS, the effect of site can be added as a categorical covariate in the age model. In this new model, a common age effect is estimated across all sites such that site effects on either MMN or age cannot create a spurious relationship between MMN and age. Moreover, the error from this new model can be used to calculate site‐ and age‐adjusted MMN z‐scores, reflecting the difference, in standard units, from the MMN expected for a person of a given age, from a particular site in the HC sample. Such z‐scoring is particularly useful for planned comparisons between CHR individuals who convert (CHR‐C) to a psychotic disorder and those who do not (CHR‐NC) within 24 months of initial NAPLS baseline assessments because there is no feasible a priori method to match these subjects on age and/or sample an equal number from each site.Accordingly, the goals of this study were to (a) describe a set of site and age regression models in HCs that will be used to create standardized, site‐ and age‐adjusted MMN z‐scores for all NAPLS subjects, (b) estimate variance components and associated G‐coefficients representing the single site stability of MMN responses measured with different scoring methods separately at each of the eight NAPLS sites, and (c) compare such G‐coefficients calculated using raw and z‐scored MMN responses. Additional exploratory analyses are presented to demonstrate the Spearman‐Brown prophecy (Brown, 1910; Spearman, 1910) in practice by averaging across MMN measured on separate occasions to capture more trait‐like aspects of the MMN and relate it to subject traits.
METHODS
Participants
Participants were recruited at each of the eight NAPLS2 sites and all provided written, informed consent to participate in this IRB‐approved study. EEG data were collected at baseline assessment from 241 HCs, and 167 (~70%) of these HCs completed at least one follow up EEG assessment. Additional demographic characteristics of these 167 subjects are presented in Table 1.
TABLE 1
Demographic Information
Site
Subjects
Gender
Test age
Re‐test age
Education
Days between tests
Number
(M, F)
(mean ± SD)
(mean ± SD)
(mean ± SD)
Min
Median
Max
UCLA
21
10, 11
18.15 ± 3.05
19.22 ± 2.98
11.38 ± 3.01
217
380
602
Emory
20
13, 7
21.82 ± 5.03
22.88 ± 5.03
13.6 ± 3.69
145
354
795
Harvard
23
11, 12
18.9 ± 4.56
19.94 ± 4.45
11.26 ± 3.15
77
353
686
Hillside
23
15, 8
17.1 ± 2.73
18.38 ± 2.72
10.78 ± 2.61
270
381
812
UNC
23
14, 9
20.3 ± 2.52
21.3 ± 2.51
13.78 ± 2.28
272
364
462
UCSD
17
13, 4
20.38 ± 6.64
21.24 ± 6.62
12.59 ± 4.05
147
350
400
Calgary
25
9, 16
21.76 ± 5.89
22.79 ± 5.81
13.6 ± 4.53
255
363
729
Yale
15
6, 9
21.51 ± 6.13
22.56 ± 6.08
12.8 ± 4.31
254
370
762
TOTAL
167
91, 76
19.91 ± 4.89
20.97 ± 4.83
12.46 ± 3.61
77
365
812
Demographic InformationAll HCs had at least one global assessment of functioning (GAF) as a part of study procedures (Endicott, Spitzer, Fleiss, & Cohen, 1976; Jones, Thornicroft, Coffey, & Dunn, 1995). The current GAF score nearest to the baseline EEG date (median time between GAF and EEG was 22 days, IQR: 7–34.61 days) was saved for correlation analyses. As an additional GAF metric, the mean current GAF score across all assessments (max = 5, one every 6 months) during the 24 month study period was saved. There were 18 HCs (~11%) who only had one GAF assessment, making their baseline GAF and mean GAF scores equivalent.
MMN paradigm
All sites used similar hardware and presentation software (www.neurobs.com) to run the EEG experiment. Auditory stimuli were delivered via ER1‐A Etymotic insert earphones and subjects responded with a Cedrus RB‐830 button box. Auditory stimuli delivery consisted of 85% standard tones presented for 50 ms at 633 Hz, 5% duration (DUR) deviants presented for 100 ms at 633 Hz, 5% frequency (FRQ) deviants presented for 50 ms at 1000 Hz, and 5% double‐deviants (DBL) presented for 100 ms at 1000 Hz. A total of 1,794 tones were presented over 3 separate blocks, with each block lasting approximately 5 minutes. Tones were presented with 5 ms rise and fall times and a 500 ms stimulus onset asynchrony. In an effort to reduce the effect of attention on MMN, participants were instructed to ignore auditory stimuli and focus on a separate distractor task. A visual oddball paradigm was run simultaneously with MMN, where image presentation was jittered to avoid cooccurring visual oddball and MMN ERPs.
EEG data acquisition
EEG was digitized at 1024 Hz using 32‐ or 64‐channel electrode caps (Biosemi, Amsterdam, The Netherlands), and the common 32 channels were used in subsequent steps. Additional electrodes were placed on the above and below the right eye, on the outer canthus of each eye, and on the mastoids. An offline average mastoid reference was initially applied to continuous EEG data prior to all preprocessing.
Preprocessing
Mastoid‐referenced, continuous EEG recordings were high‐pass filtered at 1 Hz prior to segmentation into 1,000 ms epochs (−500 to 500 ms). Blinks and eye movement artifacts were recorded by electrodes placed around the eyes and were subtracted from single trials using regression (Gratton, Coles, & Donchin, 1983). Following baseline correction (−50 to 0 ms), outlier electrodes were interpolated within single trial epochs based on previously established criteria (Nolan, Whelan, & Reilly, 2010). A spherical spline interpolation (Delorme & Makeig, 2004) was applied to any channel that was determined to be a statistical outlier (|z| > 3) on one or more of four parameters, including variance to detect additive noise, median gradient to detect high‐frequency activity, amplitude range to detect pop‐offs, and deviation of the mean amplitude from the common average to detect electrical drift. Epochs were rejected, if they contained amplitudes greater than ±100 μV in any of these electrodes: AF3, AF4, F3, Fz, F4, FC1, FC2, FC5, FC6, C3, Cz, C4.
ERP averaging and MMN measurement
ERP averages for all stimulus types were determined using a sorted averaging method, which has been shown to reduce noise in the MMN waveform by averaging over the subset of trials that optimizes the estimated signal to noise ratio for each subject (Rahne, von Specht, & Muhler, 2008). In this study, single‐epoch root mean squared (RMS) amplitude values averaged across the 12 electrodes used for artifact rejection for each trial are calculated and sorted in ascending order for each stimulus type. Following averaging, ERPs for all stimulus types were low‐pass filtered at 30 Hz, and then standard tone ERPs were subtracted from deviant ERPs to obtain difference waves. MMN peak amplitude was classified as the most negative peak between 90 and 290 ms in the difference wave. MMN mean amplitude ±10 ms around the peak was also quantified as an alternative measurement to peak amplitude. Average amplitude in a fixed window defined based on grand average waveforms (90–170 ms for FRQ and DBL, 150–230 for DUR) was quantified as a third approach. Peak latencies were saved for a fourth set of analyses.
Common slope linear regression models
When experimental data come from one laboratory site, an ordinary least squares (OLS) regression model can be applied to MMN (or other response variable) data using age as a predictor to obtain a simple linear equation that can be used to predict a subject's MMN response at a given age. Such an OLS model predicting MMN scores by age may have the form:In Equation (1), y is the MMN score, a is the age, and e is the residual (i.e., difference between age‐predicted and actual MMN score) for the ith subject. The model error, or specifically, root‐mean‐square error (RMSE), is calculated as:The RMSE summarizes the model's error across all subjects, and it can be used to calculate an age‐corrected MMN z‐score:In the multisite setting, one must consider laboratory site as between subjects, categorical variable. This increases the OLS model design matrix from two (intercept + age) to nine columns in NAPLS. The additional seven columns are indicator variables that capture site membership (i.e., 1 if a subject is from that site, 0 otherwise), and only seven indicator variables are needed to encode eight sites. The corresponding design matrix for all 241 HCs included in the baseline analysis is plotted in Figure 1.
FIGURE 1
A graphical representation of the design matrix used to predict mismatch negativity (MMN) responses is plotted. The first eight columns include an intercept term and seven site indicator variables, where white represents 1 and black 0. These columns capture fixed effects of site, while the ninth column is the age covariate, with grayscale age value representing each of the 241 healthy control participants' ages at baseline MMN assessment. This model is the common slope model where each site may have a different y‐intercept, but a common age relationship estimated using data from all subjects and sites
A graphical representation of the design matrix used to predict mismatch negativity (MMN) responses is plotted. The first eight columns include an intercept term and seven site indicator variables, where white represents 1 and black 0. These columns capture fixed effects of site, while the ninth column is the age covariate, with grayscale age value representing each of the 241 healthy control participants' ages at baseline MMN assessment. This model is the common slope model where each site may have a different y‐intercept, but a common age relationship estimated using data from all subjects and sitesThis same design matrix is applied to all response variables (MMN amplitude, latency, etc.), and the resulting parameter estimates are used to obtain expected responses for a particular subject given subject age and site. The difference between a given subject's actual value and that predicted value, divided by the RMSE [Equation (2)] of the model yields an age‐ and site‐corrected z‐score, which represents that subject's deviation, in standardized units, from the expected value for a subject who is the same age, measured at the same site.In addition to the standard assumptions of a regression model, this design assumes (a) the age relationship does not differ between sites, and (b) there is not a higher order polynomial (e.g., quadratic) age relationship with the response variable. Both of these assumptions can be formally tested by either (a) adding site*age interaction effects to the model or (b) adding a mean‐centered, age‐squared term to the model and checking for a statistically significant improvement in model fit with the r
2 change F‐test. In the age‐squared case, this is equivalent to the test of the relationship between the response variable and the age‐squared term. In the more complicated site*age model, heterogeneous age relationships at the sites would lead to an improved fit. Such F‐tests were conducted for all variables (384 total), and both Akaike and Bayseian‐Swartz Information Criteria (AIC and BIC) were calculated as additional descriptive measures of model fitness (Sakamoto & Kitagawa, 1987). False discovery rate (FDR) correction was applied separately to the two sets of F‐tests, and Bonferroni correction was separately applied to families of tests limited to the electrodes (n = 32) for each measure and deviant type (p = 0.05/32 = 0.0015625). Finally, the number of uncorrected (p < .05) significant tests was listed for descriptive purposes.Follow‐up longitudinal MMN data were also z‐scored using the baseline HC model and t‐ or F‐tests were conducted to assess age and site effects, respectively, as additional measures of model fitness. This subset of 167 follow up data points could be considered a “hold‐out” data set, and any site or age effects indicate that the z‐scoring procedure suboptimally accounted for linear effects of age and fixed effects of site.
Variance components and G‐coefficients
The longitudinal HC data were re‐purposed as a single facet (test occasion) G‐study design to estimate variance components. Such a design allows estimation of three variance components for any response using the data from the participants at a particular site. The variance components for Person (), Occasion (), and Person x Occasion plus Error () are estimated separately for each NAPLS laboratory site, as Site may represent another source of variance [see Roach et al. (2019)]. Once variance components are estimated, the G‐coefficient, which provides a measure of generalizability or stability of the measured score in this longitudinal setup, can be calculated as in Equation (4):The NAPLS2 study design included EEG assessments at baseline, 12 month, and 24 month study time points. MMN scores from each session are treated separately, with particular emphasis on using baseline data to predict conversion to psychosis, meaning the best choice for
is 1. Therefore, the G‐coefficient is equal to the intraclass correlation (ICC) defined by Shrout and Fleiss (e.g., ICC(3,1) in (Shrout & Fleiss, 1979)) when
= 1. Variance components were estimated using a restricted maximum likelihood approach in Matlab (Witkovský, 2012). Components were estimated separately and saved for the three deviant types (DBL, FRQ, DUR), 32 electrodes, and four MMN measurements (peak amplitude, mean around peak, mean in fixed window, and peak latency) for both MMN raw scores and z‐scores.The goal of a G‐study is not to test a specific hypothesis. Thus, there are no p‐values associated with estimated variance components or G‐coefficients. However, existing guidelines for determining clinical significance of ICCs suggest that the reliability coefficient can be qualitatively categorized as follows: ICC < 0.4 is poor, 0.4 ≤ ICC < 0.6 is fair, 0.6 ≤ ICC < 0.75 is good, and 0.75 ≤ ICC < 1 is excellent (Cicchetti & Sparrow, 1981). Therefore, G‐coefficients were categorized using these 4 labels for descriptive purposes, as done previously (Roach et al., 2019).
Exploratory correlations between MMN variables and trait variables
To capture more trait‐like aspects of MMN, z‐scores from 6 fronto‐central electrodes (F3, Fz, F4, C3, Cz, C4) were averaged across the two test occasions separately for each deviant type and correlated with mean GAF or used to explore gender differences in MMN. As one method to demonstrate the enhanced reliability of averaged MMN z‐scores, baseline MMN z‐scores, and baseline GAF scores were also correlated. Similar to age regression models, site was a categorical covariate and heterogeneity of MMN‐GAF relationships between sites were ruled out by first including a site*GAF interaction term, and the r
2 change F‐test was used to determine improvement in model fit.Given the exploratory nature of these correlations, parameter estimates, uncorrected p‐values, as well as FDR‐corrected p‐values within this trait family of tests are reported.
RESULTS
MMN ERP waveforms from electrode Fz are plotted in Figure 2. There is consistency between waveforms at each site and on each test occasion despite the long interval between tests and differences in site demographics. Descriptive statistics for trial numbers contributing to individual subject and test occasion ERPs are included in Table 2.
FIGURE 2
Site‐ and session‐specific grand average mismatch negativity (MMN) deviant minus standard tone difference waveforms are plotted for the Double (Frequency plus Duration) Deviant (Top), Frequency Deviant (Middle), and Duration Deviant (Bottom) from electrode Fz. Grand Average MMN waveforms for each NAPLS laboratory site are plotted separately on the left‐hand side for the first (1) and second (2) test occasion. All 16 of these average waveforms are overlaid for each deviant type on the right‐hand side. Time, in milliseconds (ms) from tone onset is plotted on the x‐axis, and amplitude, in microVolts (μV), is plotted on the y‐axis
TABLE 2
Trial Numbers in ERP averages
Trial type
Test
Re‐test
Standard
1,356.04 ± 108.13
1,329.51 ± 157.72
Double deviant
80.37 ± 7.42
78.43 ± 10.64
Frequency deviant
79.99 ± 7.06
79.28 ± 9.88
Duration deviant
80.26 ± 6.98
78.93 ± 9.93
Note: Mean ± Standard Deviation.
Site‐ and session‐specific grand average mismatch negativity (MMN) deviant minus standard tone difference waveforms are plotted for the Double (Frequency plus Duration) Deviant (Top), Frequency Deviant (Middle), and Duration Deviant (Bottom) from electrode Fz. Grand Average MMN waveforms for each NAPLS laboratory site are plotted separately on the left‐hand side for the first (1) and second (2) test occasion. All 16 of these average waveforms are overlaid for each deviant type on the right‐hand side. Time, in milliseconds (ms) from tone onset is plotted on the x‐axis, and amplitude, in microVolts (μV), is plotted on the y‐axisTrial Numbers in ERP averagesNote: Mean ± Standard Deviation.There were two sets of F‐tests to assess the appropriateness of common slope, site, and age regression models. In one set, a mean‐centered age‐squared term was added to the model to test for quadratic age relationships with MMN scores. Only 7.8% (30/384) of these tests showed statistically significant quadratic age relationships at an uncorrected level (ps < 0.05), none survived electrode‐family Bonferroni‐correction (all ps > 0.0015625), and none were significant after FDR‐correction. Comparisons of AIC and BIC between age and age‐squared models indicated that the age‐squared model was better (i.e., smaller AIC or BIC values) for ~25% (95/384) of the models based on AIC but only 4% (16/384) based on BIC. This indicates that a quadratic age effect does not systematically improve MMN modeling and should be omitted.In the second set of F‐tests, age*site interaction effects were added to the model to determine if there were site‐specific differences in MMN‐age relationships. Only 4% (16/384) of age*site F‐tests showed evidence of uncorrected effects (ps < 0.05), one survived Bonferroni‐correction (p < 0.0015625) and none were significant after FDR correction. This more complicated model was better than the simplified model based on AIC in 5.5% (21/384) of the models and none of the models for BIC. This indicates that the age relationship did not systematically differ between the sites.In the common slope models, 33.3% (128/384) of the tests of age relationships were statistically significant at an uncorrected level (ps < 0.05), 13% (50/384) survived Bonferroni‐correction (ps < 0.0015625), and 18.5% (71/384) survived FDR correction. Across all models, site and age accounted for 6.32% of MMN variance (range: 0.2163–14.2434%), indicating that even for the strongest site and age effects, at least 85% of the variance in the MMN raw scores remained in the site‐ and age‐corrected MMN z‐scores. Using these regression models, site‐specific mean MMN amplitude for the window measure from electrode Fz and 95% confidence intervals (CIs) for an 18 year‐old subject were estimated and plotted in Figure 3.
FIGURE 3
Estimated 95% confidence intervals are plotted for the mean mismatch negativity (MMN) amplitude at 18 years of age. The double (frequency plus duration; DBL, left), frequency (FRQ, middle), and duration (DUR, right) deviants are plotted separately along the x‐axis from electrode Fz, and are separated and color‐coded by NAPLS laboratory site. Estimates were derived from a site and age regression model of MMN amplitude averaged across a fixed window of either 90–170 ms (for DBL and FRQ) or 150–230 ms (DUR)
Estimated 95% confidence intervals are plotted for the mean mismatch negativity (MMN) amplitude at 18 years of age. The double (frequency plus duration; DBL, left), frequency (FRQ, middle), and duration (DUR, right) deviants are plotted separately along the x‐axis from electrode Fz, and are separated and color‐coded by NAPLS laboratory site. Estimates were derived from a site and age regression model of MMN amplitude averaged across a fixed window of either 90–170 ms (for DBL and FRQ) or 150–230 ms (DUR)The estimated means and CIs demonstrate that there is overlap between the FRQ and DUR MMN amplitude across sites, with the FRQ MMN being smallest at Emory. The DBL MMN estimates are slightly larger (i.e., more negative MMN amplitudes) than the other deviants with CIs that are approximately twice as wide as those for the other deviants. The common age effect is plotted on top of site‐ and deviant‐specific scatter plots in Figure 4 for electrode Fz.
FIGURE 4
Scatterplots depict the relationships between mismatch negativity (MMN) amplitude averaged across a fixed window at electrode Fz and participant years of age at testing for double (frequency plus duration; DBL, circles), frequency (FRQ, triangles), and duration (DUR, squares) deviants. Data are plotted separately for each site, and thick lines depict the common age relationship across sites based on regression models. Thin lines depict site‐specific, nonlinear locally weighted predictions of MMN given age (Cleveland, Grosse, & Shyu, 1992), and there is no higher‐order polynomial or other nonlinear pattern of fit that is consistent across sites
Scatterplots depict the relationships between mismatch negativity (MMN) amplitude averaged across a fixed window at electrode Fz and participant years of age at testing for double (frequency plus duration; DBL, circles), frequency (FRQ, triangles), and duration (DUR, squares) deviants. Data are plotted separately for each site, and thick lines depict the common age relationship across sites based on regression models. Thin lines depict site‐specific, nonlinear locally weighted predictions of MMN given age (Cleveland, Grosse, & Shyu, 1992), and there is no higher‐order polynomial or other nonlinear pattern of fit that is consistent across sitesTests of age relationships in the z‐scored longitudinal follow‐up MMN data indicated that the age effect was removed in this hold‐out subsample, with only ~5% (20/384) statistically significant effects at an uncorrected level (p < 0.05), consistent with what is expected by chance. None survived FDR correction. Tests of site effects in these data indicated ~15% (58/384) were significant at an uncorrected level. Only 3 site tests of MMN latency measures survived FDR correction.G‐coefficients for each electrode, deviant type, measure, and NAPLS site are included in Table S1 for both raw MMN and z‐scores. As can be seen in Figure 5, the G‐coefficients for Fz are fair or better (G ≥ 0.4) in almost every MMN amplitude measure across NAPLS sites, but the latency G‐coefficients are highly variable and poor in many cases.
FIGURE 5
G‐coefficients for the single‐facet (test occasion) generalizability substudies calculated separately for each NAPLS geographic site for electrode Fz based on either raw (top) or standardized mismatch negativity (MMN) z‐scores (bottom). Measurement approaches are plotted along the x‐axis separately for double‐deviant (DBL, circles), frequency‐deviant (FRQ, triangles), and duration‐deviant (DUR, squares) mismatch negativity. These include peak amplitude (“Peak”: most negative peak between 90 and 290 ms in the MMN difference wave), mean amplitude (“Mean”: ±10 ms around the peak), average amplitude in a fixed window (“Window”: 90–170 ms for FRQ and DBL, 150–230 for DUR), and peak latency (“Latency”). Dashed lines indicate qualitative categorization of G‐coefficients based on preexisting standards (Cicchetti & Sparrow, 1981)
G‐coefficients for the single‐facet (test occasion) generalizability substudies calculated separately for each NAPLS geographic site for electrode Fz based on either raw (top) or standardized mismatch negativity (MMN) z‐scores (bottom). Measurement approaches are plotted along the x‐axis separately for double‐deviant (DBL, circles), frequency‐deviant (FRQ, triangles), and duration‐deviant (DUR, squares) mismatch negativity. These include peak amplitude (“Peak”: most negative peak between 90 and 290 ms in the MMN difference wave), mean amplitude (“Mean”: ±10 ms around the peak), average amplitude in a fixed window (“Window”: 90–170 ms for FRQ and DBL, 150–230 for DUR), and peak latency (“Latency”). Dashed lines indicate qualitative categorization of G‐coefficients based on preexisting standards (Cicchetti & Sparrow, 1981)Frequencies of poor, fair, good, and excellent reliability categorization of all G‐coefficients are presented in Table 3 separated by NAPLS site. The table demonstrates that the majority (~60%) of G‐coefficients were fair or better, including many (~27%) scores with excellent generalizability. G‐coefficients based on z‐scores were nearly equivalent to those based on raw scores (average difference in G‐coefficients = 0.0044), consistent with relatively small proportions of MMN raw score variance being accounted for by age and site. Site‐specific relationships between mean MMN amplitude in a fixed window on the first and second test occasions are plotted along with corresponding deviant‐specific scatter plots in Figure 6 for electrode Fz.
TABLE 3
Frequency of G‐coefficients by NAPLS Site
Site
Poor
Fair
Good
Excellent
Total
UCLA
136
99
117
32
384
Emory
168
192
23
1
384
Harvard
182
119
72
11
384
Hillside
197
135
46
6
384
UNC
169
114
61
40
384
UCSD
141
106
116
21
384
Calgary
130
103
104
47
384
Yale
108
146
77
53
384
TOTAL
1,231
1,014
616
211
3,072
FIGURE 6
Scatterplots depict the relationships between mismatch negativity (MMN) amplitude averaged across a fixed window at electrode Fz at first (Time 1, x‐axis) and second (Time 2, y‐axis) test occasions for double (frequency plus duration; DBL, circles), frequency (FRQ, triangles), and duration (DUR, squares) deviants. Data are plotted separately for each site, and thick lines depict the site‐specific linear relationship between occasions along with shading to show 95% confidence intervals
Frequency of G‐coefficients by NAPLS SiteScatterplots depict the relationships between mismatch negativity (MMN) amplitude averaged across a fixed window at electrode Fz at first (Time 1, x‐axis) and second (Time 2, y‐axis) test occasions for double (frequency plus duration; DBL, circles), frequency (FRQ, triangles), and duration (DUR, squares) deviants. Data are plotted separately for each site, and thick lines depict the site‐specific linear relationship between occasions along with shading to show 95% confidence intervalsParameter estimates along with test statistics for all trait‐like MMN models are presented in Table 4. There were no significant site*GAF interaction effects for either DBL (F[7,151] = 0.974, p = 0.45) or FRQ (F[7,151] = 0.867, p = 0.5341) MMN, but there was evidence of heterogeneous DUR MMN‐GAF relationships between sites (F[7,151] = 2.1644, p = 0.0401, r
2 = 0.1185). Scatter plots of the relationships between each deviant type and mean GAF scores are plotted separately for each site in Figure 7. The plots show mostly negative relationships (i.e., greater GAF is associated with more negative MMN) for FRQ and DBL MMN, but only negative relationships between DUR MMN and GAF at Emory (t[18] = −2.545, p = 0.0203, r
2 = 0.2647) and UCSD (t[15] = −2.431, p = 0.0281, r
2 = 0.2826). Reduced models for DBL and FRQ MMN revealed negative relationships with GAF (DBL: r
2 = 0.0564, FRQ: r
2 = 0.0598), controlling for site, but only the FRQ MMN effect survived FDR correction (Table 4). Had trait‐like aspects of MMN and GAF not been emphasized through averaging across assessments, neither the time 1 FRQ MMN ( = −0.011, t(158) = −1.5, p = 0.135, r
2 = 0.02) nor the time 1 DBL MMN ( = −0.001, t(158) = −0.177, p = 0.86, r
2 = 0.0072) z‐score relationships with nearest current GAF score would have reached statistical significance.
TABLE 4
Parameter estimates for trait‐like average MMN exploratory models
Deviant
Term
Estimate
S.E.
t‐statistic
p‐value
FDR p‐value
DUR
Male vs female
0.032
0.146
0.218
0.82745
1.00000
FRQ
Male vs female
−0.049
0.130
−0.379
0.70534
1.00000
DBL
Male vs female
0.097
0.132
0.730
0.46647
1.00000
DUR
Mean GAF score
−0.011
0.009
−1.196
0.23350
FRQ
Mean GAF score
−0.021
0.008
−2.679
0.00816
0.04078
DBL
Mean GAF score
−0.017
0.008
−2.140
0.03386
0.13542
FIGURE 7
Scatterplots depict the relationships between mean scores for Global Assessment of Functioning (GAF) averaged across all study time points (x‐axis) and standardized z‐scores from mismatch negativity (MMN) amplitude averaged across a fixed window averaged across two test occasions and electrodes F3, Fz, F4, C3, Cz, C4 for double (frequency plus duration; DBL, circles), frequency (FRQ, triangles), and duration (DUR, squares) deviants (y‐axis). Data are plotted separately for each site, and thick lines depict the site‐specific linear relationship between mean MMN along with shading to show 95% confidence intervals. Most sites and deviants show negative relationships, indicating that better mean GAF scores are associated with larger (i.e., more negative) MMN z‐scores
Parameter estimates for trait‐like average MMN exploratory modelsScatterplots depict the relationships between mean scores for Global Assessment of Functioning (GAF) averaged across all study time points (x‐axis) and standardized z‐scores from mismatch negativity (MMN) amplitude averaged across a fixed window averaged across two test occasions and electrodes F3, Fz, F4, C3, Cz, C4 for double (frequency plus duration; DBL, circles), frequency (FRQ, triangles), and duration (DUR, squares) deviants (y‐axis). Data are plotted separately for each site, and thick lines depict the site‐specific linear relationship between mean MMN along with shading to show 95% confidence intervals. Most sites and deviants show negative relationships, indicating that better mean GAF scores are associated with larger (i.e., more negative) MMN z‐scoresIn the gender models, there was neither evidence of a site*gender interaction effect for any deviant type (all ps > 0.487), nor evidence of a gender difference between males and females in the reduced models.
DISCUSSION
One goal of this study was to present a site and age modeling strategy to create regression models to produce standardized site‐ and age‐adjusted MMN z‐scores for all participants and all test occasions in NAPLS2, and in doing so, demonstrate the utility of such an approach for large, multisite studies. The main purpose of the generalizability analyses presented was to quantify variance components and associated G‐coefficients representing the single site, single session stability of MMN responses measured about 1 year apart. G‐coefficients indicated that for both raw MMN and age and site‐ adjusted z‐scores, the stability of amplitude measures was fair or better and consistent across the 8 laboratory sites, while the stability of latency measures was inconsistent across sites and poor in many cases. This suggests that amplitude measures are optimal for longitudinal studies of MMN.Several alternatives to the z‐score approach for removing site‐ and age‐related confounds in clinical studies are potentially problematic. One alternative is to ignore site‐ and age‐related variation. A second approach would be to eliminate subjects from certain sites in order to match groups on age at each site. In studies of rare outcome events or patients, which is one of the motivations of a multisite study like NAPLS, eliminating subjects is disadvantageous. A third approach is to conduct ANCOVA with site and age as a covariates. The problem with an age factor in ANCOVA is that MMN‐age relationships are derived from a pooled estimate of aging effects from all of the groups being compared, including the CHR. On theoretical grounds, it is reasonable to hypothesize that physiological measures like MMN in the psychosis prodrome may have abnormal age trajectories, reflecting abnormal brain maturation, and other pathogenic processes operating during the transition to psychosis or disease‐related progressive brain changes occurring after illness onset (Kiang, Braff, Sprock, & Light, 2009; Light et al., 2015; Todd et al., 2008). Accordingly, we believe ANCOVA models are inappropriate because of their potential to remove disease‐related aging effects along with normal aging effects within the study sample.Two previous studies reported excellent MMN reliability (Fz ICC > 0.8) using a long duration deviant similar to this study and a window measurement (135–205 ms) from nose‐referenced data (Light et al., 2012; Light & Braff, 2005). These reliability coefficients were based on either 10 patients with schizophrenia (Light & Braff, 2005), 168 patients with schizophrenia, or 58 healthy subjects (Light et al., 2012), tested twice, at least 1 year apart. While the corresponding window measure G‐coefficients, averaged across all 8 NAPLS sites, was smaller in the present study (raw and z‐score MMN at Fz G = 0.625), the subjects in this study were younger, healthy participants who may have experienced more true score change in a 1 year interval than the older schizophreniapatients and controls in other studies. A similar age group of 28 young, healthy participants (Biagianti et al., 2017) had good duration deviant peak amplitude reliability based on two MMN sessions, approximately 6 months apart (Fronto‐central 6 electrode average ICC = 0.72), which is closer to reliability averaged across all NAPLS sites in this study (raw and z‐score peak MMN at Fz G = 0.644) and consistent with the idea that more true score change occurs in younger subjects. It is also worth noting that when averaging all 8 NAPLS sites' separately calculated G‐coefficients in our traveling subjects study, where subjects were tested on two consecutive days at each site, the duration deviant MMN based on the window measure similarly has ~60% of the relative variance attributed to persons, and 40% attributed to error, on average [raw MMN at Fz G = 0.6, Roach et al. (2019)]. These estimates are consistent with other MMN reliability studies of healthy subjects that also reported good reliability using long duration deviants (Fz ICC = 0.66 in Hall et al., (2006)] or frequency deviants [Cz ICC = 0.6 in Lew et al. (2007)].The Spearman‐Brown prophecy formula indicates that reliability of a score increases as test length or the number of items averaged to summarize a subject's score increases (Brown, 1910; Spearman, 1910). In the case of this study, averaging across the two EEG test occasions reduces the contribution of the error variance component to the calculation of the G‐coefficient in Equation 4. This shifts the average G‐coefficient at Fz for all deviant types from good (G > = .6) to excellent (G > = .75). In practice, this averaging emphasized the trait‐like attributes of the MMN scores, allowing relationships between averaged GAF scores and FRQ MMN to emerge. This negative correlation between GAF and MMN has previously been observed in schizophreniapatients using DUR MMN (Fulham et al., 2014; Jahshan et al., 2012; Koshiyama et al., 2018; Light & Braff, 2005). Future studies exploring the relationship between MMN and functioning should consider averaging across multiple assessments to emphasize trait‐like aspects of MMN and functioning measures while also reducing error variance. There were no gender differences in averaged MMN scores, consistent with some (Qiao et al., 2015; Yang et al., 2016) but not all (Light et al., 2015) prior reports.There are several limitations in the present stability analyses that should be carefully considered. Because estimates of variance components can be fairly unstable when the number of measurements is small, having only a subset of all the HC subjects studied on only two test occasions at each site is not ideal. It is possible that HCs who returned for a second EEG assessment represent a biased subgroup of subjects who were above‐average in compliance, leading to inflated G‐coefficients. For example, the Yale site had the lowest number of subjects in their G‐study, the lowest retention rate (41.67%), and the greatest number of excellent G‐coefficients. However, the Calgary site had the most subjects, the best retention rate (92.6%), and the second greatest number of excellent G‐coefficients.Despite these limitations, MMN amplitude measures appear to have fair or better stability across all NAPLS sites, similar to the within‐site test–retest reliability previously reported in a small (N = 8) sample traveling subjects study (Roach et al., 2019). Furthermore, site‐ and age‐standardization of MMN measures via linear regression minimally changed the G‐coefficients while removing fixed effects of site and age in the full (N = 241) NAPLS2 HC sample. These MMN z‐scores can be used to test for pathological aging effects in the CHR sample and to test hypotheses in subsamples of subjects that may not be balanced in number and/or age across the 8 sites (e.g., comparing CHR‐C to CHR‐NC). This simple, linear transformation represents a useful approach to multisite EEG studies of rare patient populations or clinical trials. The consistency of MMN waveforms and G‐coefficients across site between two test occasions indicates that MMN amplitude measures are generalizable, and like in other consortium studies (e.g., Light et al., 2015), it is feasible to combine data from multiple, appropriately controlled and calibrated, research laboratory sites to study MMN.
CONFLICT OF INTEREST
The authors declare no potential conflict of interest.
FINANCIAL DISCLOSURES
Dr Light reported grants from Boehringer Ingelheim, other from Astellas, and other from Heptares outside the submitted work. Dr Bearden reported grants from the NIMH during the conduct of the study. Dr Cornblatt reported grants from NIMH during the conduct of the study. Dr Duncan has received research support for work unrelated to this project from Auspex Pharmaceuticals, Inc. and Teva Pharmaceuticals, Inc. Dr Perkins reported grants from the NIMH during the conduct of the study; personal fees from Sunovion and personal fees from Alkermes outside the submitted work. Dr Seidman reported grants from the NIMH during the conduct of the study. Dr Woods reported grants from the NIMH during the conduct of the study; grants and personal fees from Boehringer Ingelheim, personal fees from New England Research Institute, personal fees from Takeda, grants from Amarex, grants from Teva, grants from One Mind Institute, and grants from Substance Abuse and Mental Health Services Administration outside the submitted work; in addition, Dr Woods had a patent to Glycine agonists for prodromal schizophrenia issued and a patent to Method of predicting psychosis risk using blood biomarker analysis pending. Dr Cannon reported grants from NIMH during the conduct of the study. Dr Mathalon reported grants from NIMH during the conduct of the study; consulting fees from Boehringer Ingelheim, consulting fees from Aptinyx, consulting fees from Takeda, consulting fees from Upsher‐Smith, and consulting fees from Alkermes outside the submitted work. No other disclosures were reported.Data S1.
Supporting Information.Click here for additional data file.
Authors: Veronica B Perez; Judith M Ford; Brian J Roach; Scott W Woods; Thomas H McGlashan; Vinod H Srihari; Rachel L Loewy; Sophia Vinogradov; Daniel H Mathalon Journal: J Abnorm Psychol Date: 2011-11-07
Authors: Jean Addington; Kristin S Cadenhead; Barbara A Cornblatt; Daniel H Mathalon; Thomas H McGlashan; Diana O Perkins; Larry J Seidman; Ming T Tsuang; Elaine F Walker; Scott W Woods; Jack A Addington; Tyrone D Cannon Journal: Schizophr Res Date: 2012-10-06 Impact factor: 4.939
Authors: Michael Avissar; Shanghong Xie; Blair Vail; Javier Lopez-Calderon; Yuanjia Wang; Daniel C Javitt Journal: Schizophr Res Date: 2017-07-11 Impact factor: 4.939
Authors: Amanda McCleery; Daniel H Mathalon; Jonathan K Wynn; Brian J Roach; Gerhard S Hellemann; Stephen R Marder; Michael F Green Journal: Psychol Med Date: 2019-01-15 Impact factor: 7.723
Authors: Susanna L Fryer; Brian J Roach; Katherine Wiley; Rachel L Loewy; Judy M Ford; Daniel H Mathalon Journal: Neuropsychopharmacology Date: 2016-04-12 Impact factor: 7.853
Authors: Brian J Roach; Holly K Hamilton; Peter Bachman; Aysenil Belger; Ricardo E Carrión; Erica Duncan; Jason Johannesen; Joshua G Kenney; Gregory Light; Margaret Niznikiewicz; Jean Addington; Carrie E Bearden; Emily M Owens; Kristin S Cadenhead; Tyrone D Cannon; Barbara A Cornblatt; Thomas H McGlashan; Diana O Perkins; Larry Seidman; Ming Tsuang; Elaine F Walker; Scott W Woods; Daniel H Mathalon Journal: Int J Methods Psychiatr Res Date: 2020-03-30 Impact factor: 4.035