Literature DB >> 34957878

Validation of the automated neuropsychological assessment metrics for assessing cognitive impairment in systemic lupus erythematosus.

Kimberley Yuen¹, Dorcas Beaton², Kathleen Bingham³, Patricia Katz⁴, Jiandong Su⁵, Juan Pablo Diaz Martinez⁵, Maria Carmela Tartaglia⁶, Lesley Ruttan⁷, Joan E Wither⁸, Mahta Kakvan⁹, Nicole Anderson¹⁰, Dennisse Bonilla¹⁰, May Y Choi¹⁰, Marvin J Fritzler¹¹, Robin Green¹², Zahi Touma¹³.

Abstract

OBJECTIVE: We previously demonstrated the utility of the Automated Neuropsychological Assessment Metrics (ANAM) for screening cognitive impairment (CI) in patients with systemic lupus erythematosus (SLE) and developed composite indices for interpreting ANAM results. Our objectives here were to provide further support for the ANAM's concurrent criterion validity against the American College of Rheumatology neuropsychological battery (ACR-NB), identify the most discriminatory subtests and scores of the ANAM for predicting CI, and provide a new approach to interpret ANAM results using Classification and Regression Tree (CART) analysis.
METHODS: 300 adult SLE patients completed an adapted ACR-NB and ANAM on the same day. As per objectives, six models were built using combinations of ANAM subtests and scores and submitted to CART analysis. Area under the curve (AUC) was calculated to evaluate the ANAM's criterion validity compared to the adapted ACR-NB; the most discriminatory ANAM subtests and scores in each model were selected, and performance of models with the highest AUCs were compared to our previous composite indices; decision trees were generated for models with the highest AUCs.
RESULTS: Two models had excellent AUCs of 86 and 89%. Eight most discriminatory ANAM subtests and scores were identified. Both models demonstrated higher AUCs against our previous composite indices. An adapted decision tree was created to simplify the interpretation of ANAM results.
CONCLUSION: We provide further validity evidence for the ANAM as a valid CI screening tool in SLE. The decision tree improves interpretation of ANAM results, enhancing clinical utility.

Entities: Chemical

Keywords: ANAM; CART analysis; cognitive impairment; lupus; screening; systemic lupus erythematosus

Mesh：

Year: 2021 PMID： 34957878 PMCID： PMC8793300 DOI： 10.1177/09612033211062530

Source DB: PubMed Journal: Lupus ISSN： 0961-2033 Impact factor: 2.911

Key messages

(1) The ANAM can be used as a screening tool for CI in patients with SLE. (2) Eight most discriminatory ANAM subtests and scores were identified, reducing overall testing time. (3) A simple decision tree generated from CART analysis improves the interpretability of ANAM results.

Introduction

Cognitive impairment (CI) is common in patients with systemic lupus erythematosus (SLE) with a pooled prevalence of 38% (95% confidence interval: 33–43%) (1). However, diagnosis and screening for CI is often delayed. Currently, evidence is lacking on the validity and agreement of screening instruments used for assessing CI in patients with SLE (2). The validated American College of Rheumatology Neuropsychological Battery (ACR-NB) is regarded as the gold standard for assessing CI in SLE (3, 4). It is a 1 hour NB, covering cognitive domains shown to be affected in SLE, including attention and speed of processing, language processing, learning and memory (visuospatial and verbal), executive functioning, and manual motor speed (3). Although shorter than traditional NBs, the ACR-NB remains associated with high costs due to the need for trained personnel for administration and score interpretation. These create significant barriers for patients and clinicians since such costs are not typically covered by public healthcare systems and impose a substantial time burden for screening in ambulatory settings. Thus, instruments with less administrative burden are needed to facilitate early screening of CI in SLE. The Automated Neuropsychological Assessment Metrics (ANAM) offers such a possibility, but its validity for cognitive screening in SLE has yet to be fully established. The ANAM (version 4) General Neuropsychological Screening (GNS) battery is self-administered, takes 30–40 min to complete, and has been used for detecting CI in SLE (5–7). The ANAM generates large amounts of data which allows customization of a cognitive screening battery for specific populations of interest (8). Trained personnel can administer the ANAM under the supervision of a qualified health professional (e.g., clinical psychologist); however, interpretation requires a qualified professional trained in test principles (e.g., neuropsychologist). We previously demonstrated that the ANAM could accurately screen for CI in SLE compared to the ACR-NB, and derived composite indices for predicting CI (9). To extend our results, we used Classification and Regression Tree (CART) analysis in the current study to predict CI in patients with SLE based on ANAM subtests and scores. CART uses recursive partitioning to build a decision tree (10). CART retains the optimal number of predictors to maximize sensitivity and specificity of the outcome. This innovative and powerful statistical technique identifies the most discriminatory variables in a model and displays data in a decision tree (10–12). Implementing the ANAM as a screening test for CI in SLE with the CART decision tree facilitates earlier, large-scale screening. Our objectives were to (1) examine the ANAM’s criterion validity for detecting CI against an adapted ACR-NB, (2) identify the most discriminatory subtests and scores of the ANAM for predicting CI and compare the performance of our best models against our previous composite indices (9), and (3) provide a new approach for interpreting ANAM results using decision trees. We hypothesized that the ANAM would achieve a sensitivity ≥80% and specificity ≥70%.

Patients and methods

Patients

A cross-sectional analysis of data on 300 consenting adult patients with SLE who attended the University of Toronto Lupus Clinic between January 2016 and October 2019 was conducted. Inclusion criteria were (a) fulfillment of the revised ACR criteria for SLE classification or three criteria and a supportive biopsy (kidney or skin) (13); (b) ages 18–65; and (c) ability to give informed consent. Exclusion criteria were (a) mental or physical disability preventing participation in the study and (b) low fluency in English precluding completion of verbal items of the ACR-NB. All participants provided written, informed consent. This study was approved by the University Health Network Research Ethics Board.

Study Procedures

Patients completed both the adapted ACR-NB and ANAM on the same day, and were classified as either CI (n=157), non-CI (n=54), or indeterminate (n=89) based on the adapted ACR-NB. We used the following criteria: (a) CI: A z-score of ≤−1.5 in 2 or more domains; (b) non-CI: z-scores in all domains ≥−1.5; and (c) indeterminate: A z-score of −1.5 in only one domain (9). The indeterminate group was excluded from the analysis to reduce heterogeneity. The final sample included 211 patients. A domain was defined as impaired if a z-score of ≤−1.5 was reached in at least one test in the following domains: manual motor speed, simple attention and processing speed, visual-spatial construction and language processing; or z-score of ≤−1.5 in 2 or more tests in the following domains: learning and memory and executive functioning (9). We corrected for patients with known joint issues if performance on a motor task (e.g., Finger Tapping) resulted in a z-score <−1.5 (i.e., impairment).

Outcome measures

Adapted ACR-NB

The ACR-NB has been described in detail elsewhere (3, 4). The version used in this study was identical to the original ACR-NB, except for the following: the Hopkins Verbal Learning Test–Revised (HVLT-R) (14) was used instead of the California Verbal Learning Test (CVLT) (15) due to its shorter duration. Our adapted ACR-NB includes 11 cognitive tests representing six cognitive domains (manual motor speed; simple attention and processing speed; visual-spatial construction; language processing; learning and memory [visuospatial and verbal]; and executive functioning [untimed and timed]) (9).

ANAM

The ANAM (version 4) GNS battery consists of 15 subtests. Each ANAM test provides four scores: percentage correct (PCT), mean reaction time (MR), throughput (TP), and coefficient of variation of reaction time (CV). Percentage correct responses represents accuracy, MR is the mean reaction time (in seconds), TP measures cognitive efficiency and is the number of correct responses per minute (9, 16), and CV is an index of the patient’s consistency of response speed within a given timed subtest and is a derived score (standard deviation of MR divided by MR) (9). Higher PCT and TP scores, and lower MR and CV scores, indicate better cognitive performance (9). Four subtests (Simple Reaction Time, Tower Puzzle, and Tapping Left and Right hand) do not have a PCT score, as these subtests do not allow incorrect responses. Two subtests (Tower Puzzle and Go/No Go) do not report a TP score as they cannot be derived. Instead, MeanScore (derived from combination of accuracy, speed, and problem difficulty) was used in place of TP for Tower Puzzle and number of incorrect responses, or false positives (NumIncRsp) was used in place of TP for Go/No Go. Tests and cognitive domains of the ANAM and adapted ACR-NB can be found in Table 1. ANAM performance results of CI and non-CI patients can be seen in Supplementary Table S1.

Table 1.

Cognitive domains and tests of the adapted ACR-NB and ANAM.

NB cognitive domains	NB tests	ANAM cognitive domains	ANAM tests
Manual motor speed	Finger tapping test: Dominant hand and non-dominant hand	Fine motor processing	1. Tapping right hand;
Manual motor speed	Finger tapping test: Dominant hand and non-dominant hand	Fine motor processing	2. tapping left hand
Simple attention and processing speed	Trails A, Stroop color naming, Stroop word reading	Attention and processing speed	3. Running memory;
			4. procedural reaction time;
			5. two-choice reaction time;
			6. simple reaction time;
			7. simple reaction time repeated
Visual-spatial construction	RCFT copy	Visual-spatial perception	8. Spatial processing
Language processing	COWAT Animals	Language processing	9. Logical relations
Learning and memory	RCFT delayed recall	Learning	10. Code substitution learning
Visuospatial	RCFT delayed recognition	Memory	11. Code substitution delayed
Verbal	HVLT-R delayed recall, HVLT-R recognition, HVLT-R total recall		12. Match to sample
Executive functioning Untimed	Stroop (interference score), WAIS letter-number, Consonant trigrams (used lower value from 18 s or 36 s)	Executive functioning	13. Math processing;
			14. go no go hits;
			15. tower test
Executive timed	WAIS-III digit symbol trail B		15. tower test

Cognitive domains and tests of the adapted ACR-NB and ANAM.

Statistical Analyses

Demographic and clinical characteristics between patients who were classified as CI and non-CI were summarized. A sample size calculation following the rule for regression analyses of 20 cases per predictor, suggested the minimum sample size to be 200, which we surpass (n=211). Statistical significance was set at an alpha level of p < 0.05. CART analysis was employed in R (17). Missing data was handled by CART via imputation of missing data with surrogate variables. Holm–Bonferroni method was used to control for multiple comparisons in the same ANAM score family. Raw ANAM scores were used and adjusted for age in each model.

Examine the ANAM’s concurrent criterion validity (objective 1)

Models were defined a priori using the same ANAM scores and subtests as we had previously (9): Model 1—PCT scores, Model 2—CV scores, Model 3—MR scores, Model 4—TP scores, Model 5—PCT, CV, and MR scores, and Model 6—PCT, CV, MR, and TP scores. MeanScore was used in place of TP for Tower Puzzle and number of incorrect responses (NumIncRsp) was used in place of TP for Go/No Go. Age was also included in the models. Each model was submitted to CART analysis. Decision trees were partitioned and pruned using complexity parameter (cp), a computed value that determines the number of predictors in a tree (18). The cp value with the lowest cross-validation error to produce the optimal number of predictors and lowest misclassification rate was selected (11). To minimize overfitting, repeated k-fold cross-validation was performed on each model using the one minus standard error rule with R package “caret.”(19) The k was assigned to 10 in our analysis, meaning the dataset was randomly split into 10 equal parts; one part (10%) of the dataset reserved as the testing dataset and the remaining nine parts (90%) as the training dataset. Each model was fit on the training set and evaluated on the testing set. An evaluation score was retained, and the model was discarded. This was continued until all 10 parts were used as the testing dataset. This process was repeated three times (a standard method), and each model’s performance was a result of combined fitness of all 30 models. Each model’s ability to detect CI was analyzed using receiver operator characteristic (ROC) curves to determine area under the curve (AUC). AUC values were classified as outstanding (1.0–0.91), excellent (0.90–0.81), good (0.80–0.71), fair (0.70–0.61), or poor (<0.6) (20). R package “ROCR” was used for plotting ROC (21) and R package “pROC” was used for calculating 95% confidence intervals for each ROC (22). Contingency tables were used to calculate sensitivity, specificity, accuracy, positive predictive value (PPV), and negative predictive value (NPV).

Identify the most discriminatory ANAM subtests and scores and compare to composite indices (objective 2)

Final decision trees were generated following k-fold cross-validation for models that achieved an AUC ≥0.81, sensitivity ≥80% and specificity ≥70% using R package “rpart.plot” (23) and R package “Rattle” (24). Resultant decision trees identified the most discriminatory subtests and scores of the ANAM. Previous composite indices (ANAM-index5 and ANAM‐index6) (9) were applied onto the current dataset. ANAM subtests were compared between CART and the composite indices.

New approach to interpret ANAM results using CART analysis decision trees (objective 3)

Decision trees generated with R Packages “rpart.plot” (Supplementary Data S1) was used as a reference to redesign the decision trees to enhance clinical interpretability. To further investigate the performance of the decision tree algorithm from the best models, we applied their algorithms to the indeterminate group (n=89).

Results

Demographic and clinical characteristics of the SLE cohort are included in Table 2. The prevalence of CI of all 300 participants was 52%.

Table 2.

Demographic and clinical characteristics of cohort included in the analysis (CI and non-CI).

	Cognitive status based on adapted ACR-NB
Variable	Value	Total	CI	Non-CI	p-value
Variable	Value	N=211	N=157	N=54
Sex	Female	188 (89.1%)	138 (87.9%)	50 (92.6%)	0.34
Sex	Male	23 (10.9%)	19 (12.1%)	4 (7.4%)	—
Age (years)	18–29	47 (22.3%)	39 (24.8%)	8 (14.8%)	0.05
	30–39	56 (26.5%)	39 (24.8%)	17 (31.5%)	—
	40–49	47 (22.3%)	30 (19.1%)	17 (31.5%)	—
	50–59	44 (20.9%)	38 (24.2%)	6 (11.1%)	—
	60–69	17 (8.1%)	11 (7.0%)	6 (11.1%)	—
Age at SLE diagnosis (years)	Mean ± SD	26.92 ± 10.82	27.48 ± 11.73	25.26 ± 7.44	0.19
Age at SLE diagnosis (years)	Median (IQR)	25 (18–33)	26 (18–35)	25 (20–29)	0.46
Age at enrollment (years)	Mean ± SD	41.01 ± 12.22	40.81 ± 12.51	41.61 ± 11.41	0.67
Age at enrollment (years)	Median (IQR)	40 (30–52)	40 (30–52)	43 (31–49)	0.60
Disease duration at enrollment (years)	Mean ± SD	14.07 ± 10.19	13.33 ± 9.78	16.24 ± 11.11	0.07
Disease duration at enrollment (years)	Median (IQR)	12 (6–22)	12 (6–20)	15 (7–24)	0.11
Ethnicity	Black	46 (21.8%)	40 (25.5%)	6 (11.1%)	0.04
	Caucasian	104 (49.3%)	69 (43.9%)	35 (64.8%)	—
	Chinese	29 (13.7%)	23 (14.6%)	6 (11.1%)	—
	Other	32 (15.2%)	25 (15.9%)	7 (13.0%)	—

Ethnicities in the “Other” category include Indigenous, Filipino, and other minority groups. p-values resulted from t-tests for continuous variables, chi-square tests for binary variables, and Cochran-Armitage trend tests for categorical variables with more than two levels. CI, cognitive impairment.

Demographic and clinical characteristics of cohort included in the analysis (CI and non-CI). Ethnicities in the “Other” category include Indigenous, Filipino, and other minority groups. p-values resulted from t-tests for continuous variables, chi-square tests for binary variables, and Cochran-Armitage trend tests for categorical variables with more than two levels. CI, cognitive impairment. The AUC, sensitivity, specificity, PPV, NPV, and accuracy from all models are displayed in Table 3. ANAM accurately identified CI compared to the adapted ACR-NB. The AUC for all models except Model 2 (CV) was >71%, indicating good to excellent values. All models achieved a sensitivity of ≥90%. The best models were Model 5 (PCT, CV, and MR) and Model 6 (PCT, CV, MR, and TP); both models had an AUC >81%, sensitivity ≥90% and specificity ≥70% (Figure 1). Model 5 had an AUC of 86% (95% confidence interval: 0.80–0.92), sensitivity of 93%, and specificity of 70%. Model 6 (PCT, CV, MR, and TP) had an AUC of 89% (95% confidence interval: 0.84–0.94), sensitivity of 90% and specificity of 78%.

Table 3.

AUC, sensitivity, specificity, PPV, NPV, and accuracy results of all CART models.

	Model 1 (PCT)	Model 2 (CV)	Model 3 (MR)	Model 4 (TP)	Model 5 (PCT, CV, and MR)	Model 6 (PCT, CV, MR, and TP)
AUC	0.79 (95% confidence interval: 0.73–0.86)	0.65 (95% confidence interval: 0.58–0.72)	0.83 (95% confidence interval: 0.76–0.90)	0.73 (95% confidence interval: 0.65–0.81)	0.86 (95% confidence interval: 0.80–0.92)	0.89 (95% confidence interval: 0.84–0.94)
Sensitivity	96%	99%	97%	97%	93%	90%
Specificity	35%	31%	56%	39%	70%	78%
PPV	81%	81%	86%	82%	90%	92%
NPV	76%	89%	88%	84%	78%	72%
Accuracy	81%	82%	87%	73%	87%	79%

AUC: area under the curve; PPV: positive predictive value; NPV: negative predictive value.

Figure 1.

Receiver operator characteristic curves of Model 5 (PCT, CV, and MR) and Model 6 (PCT, CV, MR, and TP). AUC, area under the curve; PCT, percentage correct responses; CV, coefficient of variation; MR, mean reaction time; TP, throughput.

AUC, sensitivity, specificity, PPV, NPV, and accuracy results of all CART models. AUC: area under the curve; PPV: positive predictive value; NPV: negative predictive value. Receiver operator characteristic curves of Model 5 (PCT, CV, and MR) and Model 6 (PCT, CV, MR, and TP). AUC, area under the curve; PCT, percentage correct responses; CV, coefficient of variation; MR, mean reaction time; TP, throughput.

Identify the most discriminatory ANAM subtests and scores and comparison to composite indices (objective 2)

The most discriminatory subtests and scores from Model 5 (PCT, CV, and MR) were MR Procedural Reaction Time, CV Spatial Processing, MR Tapping Left Hand, CV Running Memory, CV Logical Relations, CV Simple Reaction Time Repeated, MR Code Substitution Learning and MR Spatial Processing. Age was also an important factor. The most discriminatory subtests and scores from Model 6 (PCT, CV, MR, and TP) were the same as above except for TP Code Substitution Learning and CV Two Choice Reaction Time instead of MR Code Substitution Learning and MR Spatial Processing. The AUC from ANAM‐index5 was 75% (95% confidence interval: 0.67–0.83), compared to an AUC of 86% (95% confidence interval: 0.80–0.92) for CART Model 5. Model 5 from CART included seven ANAM subtests, while ANAM‐index5 (9) included four subtests (Table 4). ANAM‐index6 (9) had an AUC of 75% (95% confidence interval: 0.66–0.83) compared to an AUC of 89% (95% confidence interval: 0.84–0.94) for CART Model 6. Eight ANAM subtests were included in both Model 6 from CART and ANAM‐index6 (Table 4).

Table 4.

Comparison of AUCs and ANAM subtests between CART analysis and composite indices for Models 5 and 6.

	Model 5 (PCT, CV, and MR)		Model 6 (PCT, CV, MR, and TP)
	CART analysis	ANAM‐index₅	CART analysis	ANAM‐index₆
AUC	86% (95% confidence interval: 0.80–0.92)	75% (95% confidence interval: 0.67–0.83)	89% (95% confidence interval: 0.84–0.94)	75% (95% confidence interval: 0.66–0.83)

ANAM tests	Procedural reaction time	Code substitution learning*	Procedural reaction time	Code substitution learning*
	Spatial processing*	Code substitution delayed memory	Spatial processing*	Code substitution delayed
	Tapping left hand*	Spatial processingTapping left hand	Tapping left hand*	Spatial processing*
	Running memory		Running memory	Tapping left hand*
	Logical relations		Simple reaction time repeated*	Simple reaction time repeated*
	Simple reaction time repeated		Logical relations	Go/no go
	Code substitution learning*		Code substitution learning*	Mean tower puzzle
			Two-choice reaction time*	Two-choice reaction time*
			Two-choice reaction time*	Two-choice reaction time*

* represents ANAM tests that were found in both the present analysis (CART) and our previous composite index. The formula for ANAM-index5 = 3.88–0.05*PCT/CSD-8.4*CV/SP+2.44*MR/CSL+9.87*MR/TL and ANAM‐index6 = 31.85–0.06*PCT/CSD-0.14*PCT/GNG-9.93*CV/SP-6.38*CV/TCRT+9.74*MR/TL-0.06*TP/CSL-0.02*TP/SRTR-0.0008*MS/TPZ (CSD = code substitution delay, SP = spatial processing, CSL = code substitution learning, TL = tapping left hand, GNG = go/no go, TCRT = two-choice reaction time, SRTR = simple reaction time repeated, TPZ = tower puzzle). AUC values and ANAM subtests from the best CART models and composite indices (9) were compared.

Comparison of AUCs and ANAM subtests between CART analysis and composite indices for Models 5 and 6. * represents ANAM tests that were found in both the present analysis (CART) and our previous composite index. The formula for ANAM-index5 = 3.88–0.05*PCT/CSD-8.4*CV/SP+2.44*MR/CSL+9.87*MR/TL and ANAM‐index6 = 31.85–0.06*PCT/CSD-0.14*PCT/GNG-9.93*CV/SP-6.38*CV/TCRT+9.74*MR/TL-0.06*TP/CSL-0.02*TP/SRTR-0.0008*MS/TPZ (CSD = code substitution delay, SP = spatial processing, CSL = code substitution learning, TL = tapping left hand, GNG = go/no go, TCRT = two-choice reaction time, SRTR = simple reaction time repeated, TPZ = tower puzzle). AUC values and ANAM subtests from the best CART models and composite indices (9) were compared.

New approach to interpret ANAM results using CART decision trees (objective 3)

Model 5 (PCT, CV, and MR) and Model 6 (PCT, CV, MR, and TP) were the two models that had AUC ≥0.81, sensitivity ≥80%, and specificity values ≥70%. Since Models 5 and 6 both consisted of the same seven subtests and Model 6 included an additional test, we decided to only redesign the decision tree of Model 6 for potential clinical use (Figure 2). Age was also shown to be an important factor for CI in SLE patients and was included in the decision tree. The algorithm for Model 5 predicted 79.8% of the indeterminate group (n=89) to have CI and the algorithm for Model 6 predicted 74.2% of the indeterminate group to have CI.

Figure 2.

Adapted CART analysis decision tree of Model 6 (PCT, CV, MR, and TP) displaying the most discriminatory subtests and scores from the ANAM for detecting CI in patients with SLE. This decision tree based on Model 6 (PCT, CV, MR, and TP) was adapted to reflect a simple flowchart for clinicians to use. This decision tree includes the most discriminatory ANAM subtests (8 subtests) and is organized hierarchically (most discriminatory subtests closer to the top). The 11 terminal nodes at the bottom of the decision tree report the classification of CI or no CI. CI, cognitive impairment; PCT, percentage correct responses; CV, coefficient of variation; MR, mean reaction time; TP, throughput.

Discussion

This is the first study using CART analysis to predict CI in SLE using the ANAM benchmarked against the gold standard ACR-NB. Our results extend the literature on the concurrent criterion validity of the ANAM as a screening tool for CI in SLE, and builds upon our previous study which used a composite index derived by logistic regression (9). Our results demonstrate that the ANAM can accurately differentiate between CI and non-CI SLE patients who have been classified using traditional neuropsychological testing. CART identified the most discriminatory subtests and scores of the ANAM for detecting CI in SLE patients, which notably reduces ANAM completion time from 40 to 15–20 min, and parallels results from our previous study (9). Furthermore, the decision tree provides high clinical utility, allowing clinicians to classify patients using a simple, visual algorithm. With no current standard screening tool and high costs associated with comprehensive neuropsychological testing, our findings strengthen the utility of the ANAM as a large-scale screening method. Similar to our previous study, our analyses showed that specific ANAM subtests were associated with CI in patients with SLE. These subtests assess attention and processing speed, visual-spatial perception, fine motor processing, language processing, and learning and memory (Table 1). The only domain not represented by these ANAM tests was executive function, in which 21% of our cohort was found to be impaired based on the adapted ACR-NB. These findings highlight a few considerations. First, impairment on executive function tests from the adapted ACR-NB may be secondary to impairments in related domains such as attention/processing speed. This is suggested as half of the reported discriminatory ANAM tests represent attention and processing speed, which is considered a lower-level function that affects, and can compromise, executive function (25, 26). Second, the ANAM has been found to be sensitive to attention, processing speed, and working memory (27-29), and this domain comprises the greatest number of tests. Finally, the structural validity of the full ANAM GNS v4 battery has not yet been studied, leaving an area for future research to explore. Overall, the ANAM is able to measure cognitive efficiency but may be limited in its ability to assess higher-level cognitive functions. However, as a screening tool, the ANAM has been successful in classifying CI and non-CI, and comprehensive neuropsychological tested should be used if further assessment of cognitive function is warranted. In addition to the most discriminatory ANAM subtests, performance score was also reported (e.g., PCT, CV, MR, and TP). In past studies investigating the performance of the ANAM in patients with or without SLE, many studies have only used TP as the outcome (30–32). For example, Roebuck-Spencer et al. (30) found a sensitivity of 76%, specificity of 83% and overall correct classification rate of 80%. However, when comparing ANAM subtests using TP to equivalent neuropsychological tests, only moderate associations were found (30–32). Our two best models using a combination of scores had AUCs of 86% and 89%, respectively, compared to the model using only TP (AUC of 73%). These findings further demonstrate the importance of including all scores, echoing our previous study results where the two models with a combination of scores had the highest AUCs (81% and 84%) (9). Brunner et al. also found a combination of scores to be better at assessing CI than TP alone in pediatric patients with SLE, with 100% sensitivity and 86% specificity for detecting moderate/severe CI (33). Furthermore, it is noteworthy that Model 3, which used MR scores only, performed closely to Models 5 and 6 in terms of AUC (83%), with a higher sensitivity (97%) but lower specificity (56%). This highlights that MR scores play an important role in the CI classification process. Model 3 decision tree can be found in Supplementary Figure S1. Age was also found to be an important predictor of CI, appearing in both Model 5 and 6, although it was lower in the decision tree relative to other predictors. We excluded sex and level of formal education from the final analysis because they were not found to be important predictors of CI, added statistical noise, and reduced the ANAM’s performance. This could be explained by the simplicity of most ANAM subtests, as they were designed to be completed by anyone regardless of education level. Furthermore, the ratio of male to female participants in our cohort was about 1:8; thus, the number of males in each node of CART analysis was small and unlikely to affect the results. Previous studies examining the effects of demographic factors on ANAM performance have consistently found sex and education to have little-to-no effect on most ANAM tests (34–36). CART models 5 and 6 had higher AUCs compared to the composite indices from our previous study derived using logistic regression. This may be because CART can handle highly skewed data and missing values and is robust to data irregularities (i.e., outliers and multicollinearity), unlike other multivariate modeling methods (11, 12, 18). However, in comparing results from the current and previous study, we must note that the composite indices developed previously were derived from an older sample (n=211) with a different proportion of patients with CI (45.5%) and without CI (24.6%). The current sample (n=300) had a higher prevalence of CI (52%) and lower prevalence of non-CI patients (18%). Therefore, we cannot make definitive conclusions as to the best method for interpreting ANAM scores (composite indices vs. CART decision tree) from this study, and it is better to view these approaches as complementary, providing converging evidence. Future directions include using both methods on the same dataset for direct comparison. The decision tree generated by CART encompasses the most discriminatory ANAM subtests and scores, and is easy to interpret (9). We propose using the decision tree from Model 6 (PCT, CV, MR, and TP) (Figure 2) as it had the highest AUC and provides a more comprehensive evaluation for CI compared to Model 5 (both are identical except for the additional test in Model 6). Upon classification of CI status and clinical judgment, the clinician can then determine whether further neuropsychological testing is warranted for diagnosis. Future directions include creating a more robust clinical tool, such as an application/calculator that automatically classifies patients based on inputs from ANAM results. There are several limitations of the study. One is possible selection bias, as the population was drawn from a tertiary care center with possible referral biases. The prevalence of CI in our cohort was relatively high (52%), but within the wide range of CI rates (15–79%) in SLE described in the literature (1, 9, 37, 38). Our final analysis only included patients that were classified as CI or non-CI. We excluded the indeterminate group to reduce heterogeneity in our sample for the purpose of generating initial validity evidence. However, we did apply the algorithms from our best CART models (5, 6) on the indeterminate group (n=89), where Model 5 predicted 79.8% to have CI and Model 6 predicted 74.2% to have CI. This preliminary result classifies most of the indeterminate group as cognitively impaired based on our screening algorithm. However, further research on larger samples is needed as to how to best handle indeterminate patients in screening tests. As well, the study may be vulnerable to order effects, specifically related to fatigue. The ANAM was completed after the adapted ACR-NB on the same day. While the tests from the ANAM and the adapted ACR-NB are overlapping in procedures, the measures differ with item content, making practice effects less likely; however, familiarity with general procedures may have reduced anxiety on the ANAM. Perhaps of greatest importance is the need to use caution when using the ANAM with patients with arthritis, joint stiffness, joint deformities and/or neuropathies—common sequelae of SLE—who may not perform optimally on motor and dexterity tasks due to peripheral rather than central (cognitive or psychomotor) causes. Future research should address the extent of losses in validity of the ANAM with these patients and perhaps propose alternative measures or correction factors. Finally, the current results are generalizable only to the English-speaking population as the methods have only been evaluated on participants fully fluent in English.

Conclusion

This study extends the validity evidence for the ANAM as a screening tool for CI in patients with SLE. The most discriminatory subtests and scores of the ANAM were identified using CART, reducing the duration of the battery. A decision tree was generated to increase clinical utility and aid interpretation of ANAM results. We recommend use of the ANAM and the current decision tree as a clinical screening tool for CI in adult patients with SLE who are fluent in English and without significant motor impairments. Click here for additional data file. Supplemental Material, sj-pdf-1-lup-10.1177_09612033211062530 for Validation of the automated neuropsychological assessment metrics for assessing cognitive impairment in systemic lupus erythematosus by Kimberley Yuen, Dorcas Beaton, Kathleen Bingham, Patricia Katz, Jiandong Su, Juan Pablo Diaz Martinez, Maria Carmela Tartaglia, Lesley Ruttan, Joan E. Wither, Mahta Kakvan, Nicole Anderson, Dennisse Bonilla, May Y. Choi, Marvin J. Fritzler, Robin Green and Zahi Touma in Lupus Click here for additional data file. Supplemental Material, sj-pdf-2-lup-10.1177_09612033211062530 for Validation of the automated neuropsychological assessment metrics for assessing cognitive impairment in systemic lupus erythematosus by Kimberley Yuen, Dorcas Beaton, Kathleen Bingham, Patricia Katz, Jiandong Su, Juan Pablo Diaz Martinez, Maria Carmela Tartaglia, Lesley Ruttan, Joan E. Wither, Mahta Kakvan, Nicole Anderson, Dennisse Bonilla, May Y. Choi, Marvin J. Fritzler, Robin Green and Zahi Touma in Lupus Click here for additional data file. Supplemental Material, sj-pdf-3-lup-10.1177_09612033211062530 for Validation of the automated neuropsychological assessment metrics for assessing cognitive impairment in systemic lupus erythematosus by Kimberley Yuen, Dorcas Beaton, Kathleen Bingham, Patricia Katz, Jiandong Su, Juan Pablo Diaz Martinez, Maria Carmela Tartaglia, Lesley Ruttan, Joan E. Wither, Mahta Kakvan, Nicole Anderson, Dennisse Bonilla, May Y. Choi, Marvin J. Fritzler, Robin Green and Zahi Touma in Lupus

26 in total

1. Automated Neuropsychological Assessment Metrics (v4) Traumatic Brain Injury Battery: military normative data.

Authors: Andrea S Vincent; Tresa Roebuck-Spencer; Kirby Gilliland; Robert Schlegel
Journal: Mil Med Date: 2012-03 Impact factor: 1.437

2. Performance of the Automated Neuropsychological Assessment Metrics (ANAM) in detecting cognitive impairment in heart failure patients.

Authors: Susan S Xie; Carly M Goldstein; Emily C Gathright; John Gunstad; Mary A Dolansky; Joseph Redle; Joel W Hughes
Journal: Heart Lung Date: 2015 Sep-Oct Impact factor: 2.210

3. ROCR: visualizing classifier performance in R.

Authors: Tobias Sing; Oliver Sander; Niko Beerenwinkel; Thomas Lengauer
Journal: Bioinformatics Date: 2005-08-11 Impact factor: 6.937

4. Neurocognitive performance and prior injury among U.S. Department of Defense military personnel.

Authors: Susan P Proctor; Kenneth Nieto; Kristin J Heaton; Caitlin C Dillon; Robert E Schlegel; Michael L Russell; Andrea S Vincent
Journal: Mil Med Date: 2015-06 Impact factor: 1.437

5. Validation of the Pediatric Automated Neuropsychological Assessment Metrics in childhood-onset systemic lupus erythematosus.

Authors: Hermine I Brunner; Marisa S Klein-Gitelman; Frank Zelko; Erin C Thomas; Jessica Hummel; Shannen M Nelson; Jennifer Huggins; Megan L Curran; Tresa Roebuck-Spencer; Dean W Beebe; Jun Ying
Journal: Arthritis Care Res (Hoboken) Date: 2013-03 Impact factor: 4.794

Review 6. The American College of Rheumatology nomenclature and case definitions for neuropsychiatric lupus syndromes.

Authors:
Journal: Arthritis Rheum Date: 1999-04

Review 7. Executive functions.

Authors: Adele Diamond
Journal: Annu Rev Psychol Date: 2012-09-27 Impact factor: 24.137

8. What is the prevalence of cognitive impairment in lupus and which instruments are used to measure it? A systematic review and meta-analysis.

Authors: Hanan Al Rayes; Chiara Tani; Andrew Kwan; Sherief Marzouk; Kenneth Colosimo; Jorge Medina-Rosas; Ahmed Mustafa; Jiandong Su; Panos Lambiris; Marta Mosca; Zahi Touma
Journal: Semin Arthritis Rheum Date: 2018-02-21 Impact factor: 5.532

9. The prevalence of neuropsychiatric syndromes in systemic lupus erythematosus.

Authors: H Ainiala; J Loukkola; J Peltola; M Korpela; A Hietaharju
Journal: Neurology Date: 2001-08-14 Impact factor: 9.910

10. pROC: an open-source package for R and S+ to analyze and compare ROC curves.

Authors: Xavier Robin; Natacha Turck; Alexandre Hainard; Natalia Tiberti; Frédérique Lisacek; Jean-Charles Sanchez; Markus Müller
Journal: BMC Bioinformatics Date: 2011-03-17 Impact factor: 3.307