Literature DB >> 28968607

Abbreviation of the Follow-Up NIH Stroke Scale Using Factor Analysis.

Syed Ali Raza, Michael R Frankel, Srikant Rangaraju.

Abstract

BACKGROUND: The NIH Stroke Scale (NIHSS) is a 15-item measure of stroke-related neurologic deficits that, when measured at 24 h, is highly predictive of long-term functional outcome. We hypothesized that a simplified 24-h scale that incorporates the most predictive components of the NIHSS can retain prognostic accuracy and have improved interrater reliability.
METHODS: In a post hoc analysis of the Interventional Management of Stroke-3 (IMS-3) trial, we performed principal component (PC) analysis to resolve the 24-h NIHSS into PCs. In the PCs that explained the largest proportions of variance, key variables were identified. Using these key variables, the prognostic accuracies (area under the curve [AUC]) for good outcome (3-month modified Rankin Scale [mRS] 0-2) and poor outcome (mRS 5-6) of various abbreviated NIHSS iterations were compared with the total 24-h NIHSS. The results were validated in the NINDS intravenous tissue plasminogen activator (NINDS-TPA) study cohort. Based on previously published data, interrater reliability of the abbreviated 24-h NIHSS (aNIHSS) was compared to the total 24-h NIHSS.
RESULTS: In 545 IMS-3 participants, 2 PCs explained 60.8% of variance in the 24-h NIHSS. The key variables in PC1 included neglect, arm and leg weakness; while PC2 included level-of-consciousness (LOC) questions, LOC commands, and aphasia. A 3-variable aNIHSS (aphasia, neglect, arm weakness) retained excellent prognostic accuracy for good outcome (AUC = 0.90) as compared to the total 24-h NIHSS (AUC = 0.91), and it was more predictive (p < 0.001) than the baseline NIHSS (AUC = 0.73). The prognostic accuracy of the aNIHSS for good outcome was validated in the NINDS-TPA trial cohort (aNIHSS: AUC = 0.89 vs. total 24-h NIHSS: 0.92). An aNIHSS >9 predicted very poor outcomes (mRS 0-2: 0%, mRS 4-6: 98.5%). The estimated interrater reliability of the aNIHSS was higher than that of the total 24-h NIHSS across 6 published datasets (mean weighted kappa 0.80 vs. 0.73, p < 0.001).
CONCLUSIONS: At 24 h following ischemic stroke, aphasia, neglect, and arm weakness are the most prognostically relevant neurologic findings. The aNIHSS appears to have excellent prognostic accuracy with higher reliability and may be clinically useful.

Entities: CellLine Chemical Disease Gene Species

Keywords: Clinical outcome; Clinical stroke rating instruments; Functional recovery; Ischemic stroke; Prognosis

Mesh：

Year: 2017 PMID： 28968607 PMCID： PMC5730111 DOI： 10.1159/000479933

Source DB: PubMed Journal: Cerebrovasc Dis Extra ISSN： 1664-5456

Introduction

The NIH Stroke Scale (NIHSS) is a 15-item scale that is a well-validated and prognostically important measure of stroke-related neurologic deficits in research and clinical care [1, 2]. The NIHSS at 24 h, as compared to the baseline NIHSS, is a much stronger predictor of long-term clinical outcomes in ischemic stroke [3, 4]. Although the NIHSS has moderate interrater reliability and has been validated across various rater types and rating circumstances [5, 6, 7, 8], many items within the NIHSS have low interrater reliability [5, 9, 10]. Therefore, the clinical and research applicability of the 24-h NIHSS as an early surrogate and predictor of long-term stroke outcomes may be limited. Factor analysis has been used to determine the underlying structure of the NIHSS but has not been used to simplify the 24-h NIHSS [2]. We hypothesized that a simplified version of the 24-h NIHSS can retain high prognostic accuracy for functional outcomes in ischemic stroke patients in addition to having improved interrater reliability. We employed principal component analysis (PCA) to identify key prognostically relevant 24-h NIHSS variables in large derivation and validation randomized trial databases to simplify the 24-h NIHSS without loss of predictive accuracy (abbreviated 24-h NIHSS [aNIHSS]). This aNIHSS retained excellent prognostic accuracies for good (modified Rankin Scale [mRS] 0–2) and poor (mRS 5–6) functional outcomes across both study cohorts. Using previously published reliability data for each NIHSS component, we found that the aNIHSS had better interrater reliability as compared to the total 24-h NIHSS.

Subjects and Methods

Study Populations

In post hoc analyses of the Interventional Management of Stroke-3 (IMS-3) (derivation cohort) and the NINDS intravenous tissue plasminogen activator (NINDS-TPA) (validation cohort) randomized trial datasets, participants with prospectively collected baseline NIHSS (at the time of randomization), 24-h NIHSS (that was not affected by sedation), and 3-month mRS scores were included (details regarding these trials have been published previously) [11, 12]. No data imputation was performed.

Statistical Approach

PCA is a factor analysis approach that summarizes a large number of variables in terms of fewer underlying principal components (PCs) [13]. A variable correlating well with the PC is said to be loading onto that PC, and the weights of the loading variables represent strength of correlation with the PC. PCA was performed with the 15-item 24-h NIHSS using Varimax rotation with the Kaiser Normalization method in SPSS (version 23.0). Scree plots showing the proportion of variance in the 24-h NIHSS explained by individual PCs were plotted, and within these PCs, key variables that were most highly loaded were identified. The top 3 highly loaded variables within each PC were considered for developing abbreviated iterations of the 24-h NIHSS. Receiver-operating characteristic curve analyses were performed to determine prognostic accuracies (area under the curve [AUC]) in predicting 3-month good outcome (mRS 0–2), poor outcome (mRS 5–6), and functional independence (Barthel Index ≥95). AUCs of these iterations were compared to the total 24-h NIHSS and the baseline NIHSS and also assessed separately in right and left hemispheric stroke patients. The prognostic accuracy of the abbreviated form was validated in the NINDS-TPA trial [12]. Pair-wise comparisons of AUC were done using the χ2 test. Correlation between observed probabilities for good outcome in IMS-3 and NINDS-TPA was assessed. To determine the interrater reliability of the aNIHSS, we used previously published interrater reliability data for individual components of the 15-item NIHSS derived from 6 studies [5]. The mean weighted kappa statistic was calculated for the total 24-h NIHSS and the aNIHSS using the weighted kappa for each NIHSS item [14]. Paired analysis (paired two-tailed t test) was performed to determine whether the aNIHSS improved interrater reliability as compared to the total 24-h NIHSS. For all statistical comparisons, p < 0.05 was considered statistically significant.

Results

Our analysis included 545 IMS-3 participants (56 excluded due to missing data and 42 excluded due to confounding of the 24-h NIHSS by sedation) and 623 NINDS-TPA participants (1 excluded due to missing data). The IMS-3 participants had a median NIHSS of 17 (IQR 13–20) and mean age of 66 (SD 12.4) years. The NINDS-TPA participants had a median NIHSS of 14 (IQR 9–20) and mean age of 67 (SD 11.6) years. In the IMS-3 cohort, a 2-PC solution explained 60.8% of the variance in the 24-h NIHSS (PC1 = 39.6%, PC2 = 21.2%, PC3 = 6.7, Fig. 1a). Within PC1 and PC2, language, level-of-consciousness (LOC) questions and LOC commands, and right arm/leg weakness were highly loaded onto PC1, while neglect, left arm and leg weakness were the top 3 variables most loaded onto PC2 (Table 1). Among the aNIHSS iterations (Table 2), a 3-variable aNIHSS comprising neglect, language, and arm weakness had excellent prognostic accuracy for good outcome (mRS 0–2) and poor outcome (mRS 5–6) and was comparable to the total 24-h NIHSS in the entire dataset as well as in right and left hemispheric stroke patients, while being superior to the baseline NIHSS (p < 0.001, Fig. 1b). While there were no significant differences between the prognostic accuracies of the 24-h NIHSS and the aNIHSS (mRS 0–2, p = 0.32; mRS 4–6, p = 0.27; mRS 5–6, p = 0.44), the aNIHSS was superior to the baseline NIHSS (Fig. 1b–d, mRS 0–2, p < 0.001; mRS 4–6, p < 0.001; mRS 5–6, p = 0.004, Barthel Index ≥95, 0.90 vs. 0.66, p < 0.001). In the NINDS-TPA trial, the aNIHSS (AUC = 0.89) and the 24-h NIHSS (AUC = 0.92) had excellent prognostic accuracies for good outcome, and the observed rates of good outcome across the aNIHSS scores had excellent agreement (R2 = 0.98, p < 0.001) in both cohorts (Fig. 2). In a subset of patients in the IMS-3 trial who had no internal carotid artery, M1, or M2 middle cerebral artery occlusion identified on initial CT angiography (n = 63), the aNIHSS retained excellent prognostic accuracy (AUC = 0.89) for good outcome as compared to all IMS-3 patients.

Fig. 1.

Identification of key prognostic components of the 24-h NIHSS. a Scree plot showing the eigenvalues of each principal component and the proportion of variance in the 24-h NIHSS explained by each principal component. b–d Receiver-operating characteristic curves comparing areas under the curve (AUC) of the aNIHSS, the total 24-h NIHSS, and the baseline NIHSS for mRS 0–2, mRS 4–6, and mRS 5–6.

Table 1

Results of principal components (PCs) analysis of the 24-h NIHSS in the IMS-3 trial

NIHSS variable	PC1	PC2
LOC-1A	0.54	0.53
LOC-1B	0.84	0.00
LOC-1C	0.81	0.16
Gaze	0.31	0.67
Visual	0.37	0.65
Facial	0.30	0.62
Ataxia	−0.14	−0.08
Sensory	0.25	0.69
Language	0.91	0.01
Dysarthria	0.59	0.36
Neglect	0.02	0.79
Motor Left Arm (LUE)	−0.19	0.89
Motor Right Arm (RUE)	0.87	0.03
Motor Left Leg (LLE)	−0.07	0.90
Motor Right Leg (RLE)	0.83	0.13

Component loading data in the table represent correlations between NIHSS variables and the PC. Variables from each PC with high loading coefficients are highlighted in bold.

LOC, level of consciousness.

Table 2

Prognostic accuracies of the abbreviated 24-h NIHSS iterations for 3-month good outcome (modified Rankin Scale [mRS] 0–2] in the IMS-3 trial

	Variables, n	Points	Good outcome: mRS 0–2			Poor outcome: mRS 5–6
			all	right	left	all	right	left
Total 24-h NIHSS	13^a	42	0.92 [0.89–0.94]	0.94 [0.92–0.97]	0.89 [0.86–0.93]	0.86 [0.82–0.90]	0.8 [0.74–0.88]	0.91 [0.87–0.96]

Language (3) + Neglect (2) + Motor Arm (8)	3	13	0.90 [0.87–0.92]	0.92 [0.88–0.95]	0.89 [0.85–0.93]	0.84 [0.79–0.88]	0.76 [0.69–0.84]	0.90 [0.86–0.95]

LOC-1B (2) + Language (3) + Neglect (2) + Motor Arm (8)	4	15	0.89 [0.86–0.92]	0.92 [0.89–0.95]	0.89 [0.85–0.93]	0.84 [0.79–0.88]	0.77 [0.69–0.84]	0.91 [0.86–0.95]

LOC-1B (2) + Language (3) + Neglect (2) + Motor Arm (8) + Total Motor Leg (8)	5	23	0.90 [0.88–0.93]	0.93 [0.89–0.96]	0.90 [0.86–0.93]	0.85 [0.81–0.89]	0.77 [0.70–0.85]	0.91 [0.87–0.96]

LOC-1B (2) + LOC-1C (2) + Language (3) + Neglect (2) + Motor Arm (8)	5	17	0.88 [0.85–0.91]	0.92 [0.89–0.95]	0.89 [0.85–0.93]	0.84 [0.79–0.88]	0.77 [0.70–0.85]	0.91 [0.87–0.96]

LOC-1B (2) + LOC-1C (2) + Language (3) + Neglect (2) + Motor Arm + Motor Leg (8)	6	25	0.90 [0.87–0.93]	0.93 [0.89–0.96]	0.89 [0.85–0.93]	0.85 [0.81–0.89]	0.78 [0.71–0.86]	0.92 [0.87–0.96]

Values are area under the curve [95% CI], unless indicated otherwise. Total possible score for each variable is shown in parentheses. Prognostic powers of each iteration for good outcome in right and left hemispheric stroke patients are shown.

Motor scores for arm weakness and leg weakness were collapsed as follows: total motor arm 0–8, total motor leg 0–8.

LOC, level of consciousness.

Fig. 2.

Prognostic accuracy of the aNIHSS in the IMS-3 and NINDS-TPA trials. Degree of agreement between observed probability for mRS 0–2 in the IMS-3 and NINDS-TPA cohorts across the aNIHSS scores is shown.

An aNIHSS >9 had a positive predictive value of 90.6% (IMS-3) and 91.2% (NINDS-TPA) and a negative predictive value of 83.4% (IMS-3) and 78.4% (NINDS-TPA) for poor outcome. An aNIHSS <4 had a positive predictive value of 79.2% (IMS-3) and 77.8% (NINDS-TPA) and a negative predictive value of 87.5% (IMS-3) and 85.5% (NINDS-TPA) for good outcome. Observed probabilities for functional outcomes (mRS 0–2, 4–6, and 5–6) in a combined analysis of both cohorts are shown in Figure 3. Based on 6 previously published interrater reliability studies of the NIHSS, we calculated the mean weighted kappa for the aNIHSS as compared to that for the total 24-h NIHSS [5, 8, 9, 15, 16]. Interrater reliability (mean weighted kappa) of the aNIHSS was also higher than that of the total 24-h NIHSS (0.80 vs. 0.73, p < 0.001, Fig. 4, Table 3).

Fig. 3.

Probability of functional independence declines with increasing aNIHSS. Observed probabilities (with 95% confidence intervals) of 3-month mRS 0–2, 4–6, and 5–6 outcomes across the aNIHSS score categories (combined analysis of the IMS-3 and NINDS-TPA) are shown. The aNIHSS 12 and 13 groups were collapsed due to smaller numbers.

Fig. 4.

Interrater reliability of the aNIHSS and the 24-h NIHSS. Comparison of mean weighted kappa of the aNIHSS and the total 24-h NIHSS based on weighted kappa statistics for each NIHSS item derived from 6 previously published studies. Paired t test (two-tailed) p value is shown.

Table 3

Comparison of interrater reliability of NIHSS items

	Variables	Video tape assessments		mNIHSS – Prospective	STRokE DPC – Aim 1	TACTIC untrained	Medical record abstraction
		NIHSS Tape 1	NIHSS Tape 2
la	LOC	0.62	0.42	0.46	1	0.87	0.74
1b	LOC questions	0.68	0.90	0.94	0.93	0.96	0.71
1c	LOC commands	0.00	0.93	0.94	1	1	0.78
2	Gaze	0.02	0.51	0.66	1	0.60	0.74
3	Visual fields	0.94	0.81	0.88	0.93	0.78	0.76
4	Facial palsy	0.38	0.20	0.74	0.22	0.62	0.27
5a	Left arm motor	0.79	0.92	0.97	0.88	0.94	0.91
5b	Right arm motor	0.79	0.94	0.96	0.82	0.97	0.90
6a	Left leg motor	0.80	0.95	0.95	0.74	0.95	0.90
6b	Right leg motor	0.71	0.66	0.98	0.80	0.89	0.89
7	Limb ataxia	0.23	0.56	0.69	0.34	0.65	0.70
8	Sensory	0.94	0.81	0.89	0.80	1	0.63
9	Language	0.39	0.57	0.84	0.73	0.89	0.92
10	Dysarthria	0.72	0.42	0.29	0.61	0.60	0.36
11	Neglect	0.54	0.53	0.89	0.80	0.72	0.59

Total NIHSS	Mean kappa	0.57	0.68	0.81	0.77	0.83	0.72

aNIHSS	Mean kappa	0.63	0.74	0.92	0.80	0.89	0.83

Table adapted from Meyer and Lyden [5]. It comprises 6 prior studies assessing reliability of both NIHSS and a 6-item modified NIHSS [5]. The names of the different trials, different elements or items, kappa statistics, threshold κ-statistic for excellent agreement, and overall agreements are shown. We extrapolated these results to estimate reliability for aNIHSS (aNIHSS items highlighted in italics].

LOC, level of consciousness.

Discussion

The 24-h NIHSS is a strong predictor of long-term outcomes following stroke, and its routine assessment as a surrogate of long-term outcome may be clinically meaningful [3, 4]. The 15-item NIHSS captures several aspects of the neurologic exam, many of which are interrelated and redundant, relatively less prognostically important, and found to have low interrater reliability [5]. NIHSS items with consistently poor interrater reliabilities include facial weakness, ataxia, LOC, dysarthria, and gaze based on data from over 15,000 raters who undertook online NIHSS certification and in different linguistic versions of NIHSS [1, 5, 8, 9, 10, 15, 16]. Eliminating less reliable NIHSS components can reduce variability and error in addition to improving ease and efficiency of NIHSS assessment. Although abbreviations of the NIHSS measured at initial presentation (the baseline NIHSS) have been developed to predict the presence of large-vessel occlusion and long-term outcomes [17, 18, 19], the key prognostic components of the 24-h NIHSS have not been elucidated. This is particularly important because the 24-h NIHSS is a much more robust predictor of long-term functional outcomes in ischemic stroke, and it is possible that the relative prognostic importance of NIHSS components at 24 h after stroke is very different than at initial presentation [3, 4, 20]. In this analysis, we show that the most prognostically important neurologic findings at 24 h following ischemic stroke include LOC (ability to answer 2 questions), aphasia, neglect, and arm weakness. We propose an abbreviated 4-item 24-h NIHSS (the aNIHSS) that retains excellent prognostic value compared to the total 24-h NIHSS, is superior to the baseline NIHSS, and is not affected by stroke laterality or the presence or absence of a large-vessel occlusion. The aNIHSS also had excellent prognostic accuracy for predicting good (mRS 0–2) as well as poor functional outcome (mRS 5–6). An aNIHSS >10 portends very poor outcomes with a 0% probability of a good outcome and a >90% probability of severe disability (mRS 4–6), often not associated with good quality of life. This finding mirrors previous results that a high 24-h NIHSS portends very poor long-term prognosis in ischemic stroke [20]. An aNIHSS <5 is associated with a high probability (>70%) of a good outcome. The segregation of aphasia with right-side weakness in one PC and of neglect with left-side weakness in the second PC agree with a previously published construct validity of the NIHSS in separating dominant from non-dominant hemispheric symptoms [2, 21]. The high loading weights of cortical symptoms (aphasia and neglect) and of upper extremity weakness at 24 h also corroborates the existing literature regarding their prognostic importance in functional recovery of stroke patients [22, 23, 24]. The aNIHSS also seems to have better interrater reliability as compared to the total 24-h NIHSS, which we attribute to the exclusion of NIHSS items with low interrater reliability. This may translate to improvements in efficiency of clinical care and in provider communication, a possibility that needs prospective evaluation. Our observations in a fairly recent study cohort with moderate-to-severe stroke severity (IMS-3) were validated in a cohort from the NINDA-TPA trial (1995) with lower stroke severity, supporting the applicability of our results to the vast majority of ischemic stroke patients. Prospective and blinded ascertainments of NIHSS and mRS within randomized trials are additional strengths of our study. Limitations of this study include biases of secondary post hoc analyses of clinical trial data, and the paucity of posterior circulation strokes in these datasets that limit the applicability of our results to anterior circulation stroke, especially those with large-vessel occlusions. Lastly, our analysis of interrater reliability was not prospective and was not limited to NIHSS measured only at 24 h following stroke. Therefore, future prospective validations of our findings are warranted. Lastly, it must be emphasized that neurologic scales such as the NIHSS or aNIHSS cannot replace the detailed neurologic examination, but instead provide objective and reliable measures of stroke-related neurologic deficits and can serve as adjuncts during prognostication and clinical care of stroke patients. In summary, we show that aphasia, neglect, and upper extremity weakness are the most prognostically relevant neurologic findings at 24 h following ischemic stroke. The aNIHSS is an abbreviated version of the complete NIHSS which, when measured at 24 h, has excellent prognostic accuracy for functional outcomes in addition to higher reliability. The clinical applicability of the aNIHSS needs to be prospectively evaluated.

Disclosure Statement

The authors declare no conflicts of interest.

24 in total

1. Shortening the NIH Stroke scale for use in the prehospital setting.

Authors: David L Tirschwell; W T Longstreth; Kyra J Becker; Richard E Gammans; LuAnn A Sabounjian; Scott Hamilton; Lewis B Morgenstern
Journal: Stroke Date: 2002-12 Impact factor: 7.914

2. Relationship Between Lesion Topology and Clinical Outcome in Anterior Circulation Large Vessel Occlusions.

Authors: Srikant Rangaraju; Christopher Streib; Amin Aghaebrahim; Ashutosh Jadhav; Michael Frankel; Tudor G Jovin
Journal: Stroke Date: 2015-06-09 Impact factor: 7.914

3. Underlying structure of the National Institutes of Health Stroke Scale: results of a factor analysis. NINDS tPA Stroke Trial Investigators.

Authors: P Lyden; M Lu; C Jackson; J Marler; R Kothari; T Brott; J Zivin
Journal: Stroke Date: 1999-11 Impact factor: 7.914

4. NIH Stroke Scale reliability in ratings from a large sample of clinicians.

Authors: S Andrew Josephson; Nancy K Hills; S Claiborne Johnston
Journal: Cerebrovasc Dis Date: 2006-08-04 Impact factor: 2.762

5. Reliability and validity of estimating the NIH stroke scale score from medical records.

Authors: S E Kasner; J A Chalela; J M Luciano; B L Cucchiara; E C Raps; M L McGarvey; M B Conroy; A R Localio
Journal: Stroke Date: 1999-08 Impact factor: 7.914

6. Predicting prognosis after stroke: a placebo group analysis from the National Institute of Neurological Disorders and Stroke rt-PA Stroke Trial.

Authors: M R Frankel; L B Morgenstern; T Kwiatkowski; M Lu; B C Tilley; J P Broderick; R Libman; S R Levine; T Brott
Journal: Neurology Date: 2000-10-10 Impact factor: 9.910

7. Design and validation of a prehospital stroke scale to predict large arterial occlusion: the rapid arterial occlusion evaluation scale.

Authors: Natalia Pérez de la Ossa; David Carrera; Montse Gorchs; Marisol Querol; Mònica Millán; Meritxell Gomis; Laura Dorado; Elena López-Cancio; María Hernández-Pérez; Vicente Chicharro; Xavier Escalada; Xavier Jiménez; Antoni Dávalos
Journal: Stroke Date: 2013-11-26 Impact factor: 7.914

8. Tissue plasminogen activator for acute ischemic stroke.

Authors:
Journal: N Engl J Med Date: 1995-12-14 Impact factor: 91.245

9. Improved reliability of the NIH Stroke Scale using video training. NINDS TPA Stroke Study Group.

Authors: P Lyden; T Brott; B Tilley; K M Welch; E J Mascha; S Levine; E C Haley; J Grotta; J Marler
Journal: Stroke Date: 1994-11 Impact factor: 7.914

10. Endovascular therapy after intravenous t-PA versus t-PA alone for stroke.

Authors: Joseph P Broderick; Yuko Y Palesch; Andrew M Demchuk; Sharon D Yeatts; Pooja Khatri; Michael D Hill; Edward C Jauch; Tudor G Jovin; Bernard Yan; Frank L Silver; Rüdiger von Kummer; Carlos A Molina; Bart M Demaerschalk; Ronald Budzik; Wayne M Clark; Osama O Zaidat; Tim W Malisch; Mayank Goyal; Wouter J Schonewille; Mikael Mazighi; Stefan T Engelter; Craig Anderson; Judith Spilker; Janice Carrozzella; Karla J Ryckborst; L Scott Janis; Renée H Martin; Lydia D Foster; Thomas A Tomsick
Journal: N Engl J Med Date: 2013-02-07 Impact factor: 91.245