Background Because of the failure of numerous clinical trials, various recommendations have been made to improve the usefulness of preclinical studies. Specifically, the STAIR (Stroke Therapy Academic Industry Roundtable) recommendations highlighted functional outcome as a critical measure. Recent reviews of experimental subarachnoid hemorrhage ( SAH ) studies have brought to light the numerous neurobehavioral scoring systems that are used in preclinical SAH studies. To gain insight into the utility of these scoring systems, as well as to identify a scoring system that best captures the deficits caused by SAH in mice, we designed the current study. Methods and Results Adult male C57 BL /6J mice were used. One cohort of mice was randomly allocated to either sham or SAH and had functional testing performed on days 1 to 3 post- SAH using the modified Bederson Score, Katz Score, Garcia Neuroscore, and Parra Neuroscore, as well as 21 individual subtests. A new composite neuroscore was developed using the 8 most diagnostically accurate subtests. To validate the use of the developed composite neuroscore, another cohort of mice was randomly assigned to either the sham or SAH group and neurobehavior was evaluated on days 1 to 3, 5, and 7 after injury. Receiver operating characteristic curves were used to analyze the diagnostic accuracy of each scoring system, as well as the subtests. Of the 4 published scoring systems, the Parra Neuroscore was diagnostically accurate for SAH injury in mice versus the modified Bederson and Katz Scores, but not the Garcia Neuroscore. However, the newly developed composite neuroscore was found to be statistically more diagnostically accurate than even the Parra Neuroscore. Conclusions The findings of this study promote use of the newly developed composite neuroscore for experimental SAH studies in mice.
Background Because of the failure of numerous clinical trials, various recommendations have been made to improve the usefulness of preclinical studies. Specifically, the STAIR (Stroke Therapy Academic Industry Roundtable) recommendations highlighted functional outcome as a critical measure. Recent reviews of experimental subarachnoid hemorrhage ( SAH ) studies have brought to light the numerous neurobehavioral scoring systems that are used in preclinical SAH studies. To gain insight into the utility of these scoring systems, as well as to identify a scoring system that best captures the deficits caused by SAH in mice, we designed the current study. Methods and Results Adult male C57 BL /6J mice were used. One cohort of mice was randomly allocated to either sham or SAH and had functional testing performed on days 1 to 3 post- SAH using the modified Bederson Score, Katz Score, GarciaNeuroscore, and Parra Neuroscore, as well as 21 individual subtests. A new composite neuroscore was developed using the 8 most diagnostically accurate subtests. To validate the use of the developed composite neuroscore, another cohort of mice was randomly assigned to either the sham or SAH group and neurobehavior was evaluated on days 1 to 3, 5, and 7 after injury. Receiver operating characteristic curves were used to analyze the diagnostic accuracy of each scoring system, as well as the subtests. Of the 4 published scoring systems, the Parra Neuroscore was diagnostically accurate for SAH injury in mice versus the modified Bederson and Katz Scores, but not the GarciaNeuroscore. However, the newly developed composite neuroscore was found to be statistically more diagnostically accurate than even the Parra Neuroscore. Conclusions The findings of this study promote use of the newly developed composite neuroscore for experimental SAH studies in mice.
We examine the utility and perform a comparison analysis of 4 widely used scoring systems for identifying functional deficits in mice after subarachnoid hemorrhage.We found that each of the existing scoring systems has subtests that are not diagnostically accurate at identifying functional deficits after subarachnoid hemorrhage.We therefore developed a new composite score using 8 subtests, which are the most diagnostically accurate for subarachnoid hemorrhage deficits.
What Are the Clinical Implications?
Because a major concern with patients is functional outcome, it is imperative to have diagnostically accurate behavioral tests when performing preclinical studies.We developed a new composite neuroscore that is significantly more diagnostically accurate for detecting deficits after subarachnoid hemorrhage compared with 4 widely used scoring systems.
Introduction
Subarachnoid hemorrhage (SAH) is a devastating stroke subtype that is associated with severe morbidity and mortality rates of ≈25%.1 With the high number of clinical trials failing to improve patient outcome despite reducing the pathological consequences of SAH, improved experimental studies (the basis of most trials) are called for. Several roundtables have been held to develop guidelines directed at improving preclinical and translational studies. One particular recommendation from these meetings is the proper selection and use of neurobehavioral testing.2, 3, 4, 5 Although neurobehavior testing is more standardized for experimental ischemic stroke studies (ie, it is known which tests are sensitive and specific for ischemic injury), preclinical SAH studies typically do not use functional outcome tests that can be compared between studies. The lack of consistent and robust neurobehavioral tests for experimental SAH studies has been highlighted in 2 recent reviews.6, 7 However, despite the reviews and call for determining the most useful and appropriate functional tests for detecting injury after SAH, the work has not been performed.To this end, the current study was designed to specifically investigate the utility of several functional scoring systems for evaluating deficits in mice subjected to SAH via endovascular perforation. Herein, the behavioral performance of mice after SAH is assessed using 4 scoring systems that have been applied to SAH studies without proper validation. Because the Parra Neuroscore was developed for SAH, we hypothesized that the Parra Neuroscore would be the most appropriate scoring system for detecting functional deficits in mice after SAH. Our second hypothesis was that, using the most diagnostically accurate individual subtests for sensory‐motor function, a new composite neuroscore could be developed that would have greater diagnostic accuracy for injury after SAH in mice than any of the other scoring systems.
Materials and Methods
The experiments were approved by the Animal Welfare Committee at the University of Texas Health Science Center at Houston, were conducted in compliance with the NIH Guidelines for the Use of Animals in Neuroscience Research, and are reported in compliance with the Animal Research: Reporting in Vivo Experiments guidelines. Data are available on reasonable request.
Study Design
Fifty‐five adult male C57BL/6J mice (28–34 g, Jackson Labs) were used in all experiments. Animals were housed in a humidity‐ and temperature‐controlled room with a 12‐hour light‐dark cycle. Animals were given ad libitum access to food and water. Animals were randomized (electronically generated) into either the sham or the SAH group, according to sample size calculations. The same surgeon (D.W.M.) performed SAH and sham surgeries, and all animals were treated with the same amount of buprenorphine and saline (on the day of surgery). All investigators responsible for functional assessment, measurement of outcomes, and data analysis were blinded to the experimental groups.
The 3‐day study
For the 3‐day study, animals were randomly assigned into either the sham (n=16) or the SAH (n=21) group before performing any surgical procedures. Sample size estimation was conducted using data from previous publications. For 2 groups, sample size calculations were as follows. An α=0.05 and power=0.80 were assumed for all cases. The sample sizes were estimated to be as follows: n=15 to 20 for the modified Bederson Score8, 9; n=9 to 20 for day 1 (minimum difference in means=25 and SD=1810; also see Bermueller et al8 and Thal et al8, 9) and n=20 to 22 for day 2 (minimum difference in means=13 and SD=1510; also see Thal et al9) for the Katz Score; n=6 to 24 for the GarciaNeuroscore (minimum difference in means=6 and SD=3.1611; also see data of others12, 13, 14, 15, 16); and n=3 for the Parra Neuroscore (minimum difference in means=16 and SD=5 [approximated]17).Overall, sample sizes were estimated to be n=3 to 24, with most studies indicating a sample size of 9 to 20 is needed. Therefore, in this study, we chose to use a sample size of n=16 per group. Because the mortality of mice after endovascular perforation is ≈20% to 25%, we randomly allocated 21 mice into the SAH group (assuming 5 fatalities) and 16 mice into the sham group.
The 7‐day study
After the completion of the 3‐day study, analysis of the neurobehavioral performances, and development of a new composite neuroscore for SAH, we performed a 7‐day study using a new cohort of mice to validate the use of the developed composite neuroscore. For the 7‐day study, animals were randomly assigned into either the sham (n=8) or the SAH (n=10) group before performing any surgical procedures. Sample size was calculated using the mean difference (10.11) and SD of the population (0.4632) from the data of the developed neuroscore at 24 hours after SAH; α was set to 0.05 and power was set to 0.80 using the sample calculation for a longitudinal study,18 with the added 15% for nonparametric tests.19 This calculation indicated that 8 mice per group were needed to test for statistical significance. Assuming 1 to 2 moralities suggested that the SAH group needed 9 to 10 mice.
SAH Model
SAH was induced in mice, as previously described.20 Briefly, mice were anesthetized with isoflurane (induction, 5% isoflurane; maintained, 1.5%–2.5% isoflurane) delivered in oxygen (1 L/min). Buprenorphine was injected SC (0.05–0.1 mg/kg). The surgical site was shaved, and bupivacaine (2 mg/kg) was injected SC near the midline in the surgical site. The animal was placed supine, and the surgical site was sterilized with alternating wipes of betadine and 70% ethanol (3 times). A vertical incision was made along the midline of the neck, and the external carotid and common carotid arteries were isolated using blunt dissection. The external carotid artery was ligated, leaving a stump. Then, the internal carotid artery was isolated. Vessel clips were placed on the common carotid and internal carotid arteries to momentarily stop blood flow. A cut was made in the external carotid artery stump, and a 5 to 0 monofilament nylon suture was inserted through the opening of the external carotid artery stump. The vessel clip was removed from the internal carotid artery, and the suture was advanced through the internal carotid artery until resistance was felt (≈8 mm). The suture was advanced 1 mm further to perforate the vessel, inducing SAH. The common carotid artery clip was removed, and the suture was immediately withdrawn. Then, the external carotid artery stump was ligated closed and the neck incision was sutured. Antibiotic was applied to the skin, and the animal was allowed to recover. Afterwards, mice were placed back into their home cages and housed in groups of 1 to 5 mice per cage. Analgesic and saline were given twice a day for 3 days, as necessary, for mice subjected to SAH. Animals allocated into the sham group underwent all the same surgical procedures and were given buprenorphine, bupivacaine, and saline on the day of surgery. No sham animals received buprenorphine or saline on the days after surgery.
Neurobehavioral Performance
All animals were used to test for sensorimotor function by 3 independent, blinded scorers (K.M., T.P.K., and D.W.M.). All tests were performed in the same order for every animal. All 3 assessors performed the behavior together (with no interassessor interactions) on each mouse to minimize fatigue and potential learning or boredom. All tests were done at the same time (except the beam walking test), in the same order, at the same time of day for each mouse. After all the neuroscore subtests, including Katz and modified Bederson, the mice were given a 3‐ to 5‐minute break before performing the beam walking tasks. Beam walking was performed 1 time, and all beam walking scoring (ie, for Katz and the neuroscore) was recorded at the same time. All mice had neurobehavioral performance assessed by the below tests 1 day before SAH surgery to confirm that all animals were statistically indistinguishable (data not shown).For the 3‐day study, mice underwent daily testing. For the 7‐day study, mice had their neurobehavior tested on days 1 to 3, 5, and 7 after SAH. All animals were euthanized after completing the final day of neurobehavior. The following tests were performed.
Coordination and Balance Tests
Spontaneous activity: the mouse is observed for 3 minutes on its ability to reach and explore (rear) all 4 walls of its environment (Table 1) 21. Forepaw outstretching: the mouse is held by its tail and allowed to walk using its forepaws only (ie, keeping the hind limbs suspended in the air). Climbing: the mouse is placed on an inclined plane with equally spaced rungs to observe its ability to climb for 1 minute. Balance: while the mouse is freely moving, the animal is observed, and any balance deficits (eg, swaying and stumbling) are recorded. Lateral turning: the mouse is suspended in air (head down) and should be able to turn toward both sides to reach for its tail. Walking: the mouse is allowed to freely walk around and is observed for the ability to turn to the left and right. Beam walking: the mouse is placed on each platform (at both ends of a 1.5‐cm round rod) for 30 seconds. Then, the mouse is placed perpendicular to the center of the rod and given 1 minute for traversing the beam.
Table 1
Coordination and Balance Tests
Subtest
Score
0
1
2
3
Spontaneous activity (3 min)
No movement
Minimal movement
Touches 1–2 walls
Touches 3–4 walls
Forepaw outstretching
No movement
Moves in circles
Moves to one side
Straight/curved path
Climbing (1 min)
Weak grip and fall down
Climbs but does not reach top and weak grip
Climbs to top and weak grip or does not climb to top and good grip
Climbs to top and strong grip
Balance
Tumbles
Stands, sways
Sways as walking
Changes position easily
Lateral turning
No turning
Unequal turning
Bilateral turning, equal but <45°
Bilateral turning, equal and >45°
Walking
No turning
Unequal turning
Bilateral turning <45°
Bilateral turning >45°
Beam walking (1 min)
Hug or fall <10 s
Stand in one spot or move but fall <25 s
Stays on beam and traverses the beam
Walks onto the platform
Scoring criteria are slightly modified from McBride et al.21 Score is from 0 (maximum deficits) to 3 (no deficits).
Coordination and Balance TestsScoring criteria are slightly modified from McBride et al.21 Score is from 0 (maximum deficits) to 3 (no deficits).
Posture and Strength Tests
Ptosis: both eyes of the mouse are observed for drooping (Table 2) 21. Dyspnea: the mouse is assessed for difficulty breathing. Facial weakness: the mouse is observed for any facial drooping, asymmetry in facial expressions/jaw, and weakened response to cheek touch.
Table 2
Posture and Strength Tests
Subtest
Score
0
1
2
Ptosis
Severe
Slight
None
Dyspnea
Gasping
Slight
None
Facial weakness
Severe
Slight
None
Scoring criteria are slightly modified from McBride et al.21 Score is from 0 (maximum deficits) to 2 (no deficits).
Posture and Strength TestsScoring criteria are slightly modified from McBride et al.21 Score is from 0 (maximum deficits) to 2 (no deficits).
Reflex Tests
Side stroking: a cotton swab is used to stroke each side of the mouse's body to observe a response (head turning, whisker movement, or flight) (Table 3) 21. Vibrissae touch: a cotton swab is moved from the rear to stimulate the vibrissae and observe a response. Visual: a cotton swab is advanced toward each eye and is observed for any elicited response. Olfactory: a cotton swab is dipped in honey and advanced toward the mouse's nose to look for any sniffing or exploration. Tactile: the wood end of a cotton swab is used to poke the tops of each hind paw. Postural: the mouse is placed in a cage, and the cage is moved rapidly downwards. Sound: observer clapped. Righting: the mouse is placed on a grid and allowed to grip. The grid is moved so that the mouse is facing downwards, and the time it takes for the mouse to right itself (ie, turn upwards) is recorded (repeated 4 times).
Table 3
Reflex Tests
Subtest
Score
0
1
2
3
Side stroking
No response
Unilateral response
Bilateral weak response or strong ipsilateral response
Strong bilateral response
Vibrissae touch (stroke whiskers)
No response
Unilateral response
Weak bilateral response
Strong bilateral response
Visual (tip toward each eye)
No response
Unilateral response
Weak bilateral response
Strong/rapid bilateral response
Olfactory
No sniffing
Brief sniff
Sniff >2 s
···
Tactile (poke top of paws)
No response
Delayed withdrawal
Immediate withdrawal
···
Postural reflex (sudden drop)
Absent
···
···
Present
Sound reflex
Absent
Delayed response
Rapid response
···
Righting reflex (4 trials, 15 s)
Absent
Rights <15 s but >10 s
Rights <10 s but >5 s
Rights <5 s
Scoring criteria are slightly modified from McBride et al.21 Score is from 0 (maximum deficits) to 3 (no deficits).
Reflex TestsScoring criteria are slightly modified from McBride et al.21 Score is from 0 (maximum deficits) to 3 (no deficits).
Limb Use Tests
Limb extension: the mouse is held in the air by its tail and observed for limb extension (Table 4) 21. Forelimb use: while the mouse is freely moving, the use of the forelimbs is observed for any deficit (inability to use or stiffness). Hind limb use: while the mouse is freely moving, the use of the hind limbs is observed for any deficit.
Table 4
Limb Use Tests
Subtest
Score
0
1
2
3
Limb extension
No movement
Minimal movement
Abnormal forelimb walk
Contralateral forelimbs and hind limbs completely extended
Forelimb use
Severe bilateral deficits
Severe unilateral deficits
Slight deficits (unilateral or bilateral)
None
Hind limb use
Severe bilateral deficits
Severe unilateral deficits
Slight deficits (unilateral or bilateral)
None
Scoring criteria are slightly modified from McBride et al.21 Score is from 0 (maximum deficits) to 3 (no deficits).
Limb Use TestsScoring criteria are slightly modified from McBride et al.21 Score is from 0 (maximum deficits) to 3 (no deficits).In addition to the above individual tests, which comprise various composite neuroscores, we also assessed the animals’ performances using the modified Bederson Score, Katz Score, GarciaNeuroscore, and Parra Neurosore. The modified Bederson Score tests forelimb extension and mobility (Table 5).8, 9 The Katz Score combines 13 subtests assessing general deficits, reflexes, and sensorimotor and coordination deficits (Table 6).9, 22 The total score for the Katz Score is 0 (no deficits) to 100 (maximum deficits). The GarciaNeuroscore is composed of 6 subtests (described above): spontaneous activity, limb extension, forepaw outstretching, climbing, side stroking, and vibrissae touch (Table 7).23 The total score for the GarciaNeuroscore ranges from 0 (maximum deficits) to 18 (no deficits). The Parra Neuroscore is made up of 9 subtests: spontaneous activity, limb extension, climbing, beam walking, visual, olfactory, tactile, side stroking, and vibrissae touch (Table 8).17 The total score for the Parra Neuroscore ranges from 5 (maximum deficits) to 27 (no deficits).
Table 5
Modified Bederson Score
Score
Description
5
Mouse held by tail had normal forelimb extension
4
Mouse with consistent flexion of forelimb on either side and adduction and internal rotation of shoulder
3
Mice were allowed to grip paper with forepaws, and gently pushed forward with pressure against the forepaw shoulders; reduced resistance on paretic side was graded 3
2
Forelimb walking: animals circled
1
Spontaneous circling when allowed to walk normally on floor
0
No spontaneous motion
Score is from 0 (maximum deficits) to 5 (no deficits).8, 9
Bilateral weak response or strong ipsilateral response
Strong bilateral response
Vibrissae touch (stroke whiskers)
No response
Unilateral response
Weak bilateral response
Strong bilateral response
Limb extension
Both contralateral; limb completely flexed
One contralateral; limb extended and other flexed
Mid flexion of either contralateral limb
Contralateral forelimbs and hind limbs completely extended
Forepaw outstretching
No movement
Moves in circles
Moves to one side
Straight/curved path
Climbing (1 min)
Weak grip and fall down
Climbs but does not reach top and weak grip
Climbs to top and weak grip or does not climb to top and good grip
Climbs to top and strong grip
The scoring criteria are slightly modified from the original scoring criteria proposed by Garcia et al.23 Score range is from 0 (maximum deficits) to 18 (no deficits).
Table 8
Parra Neuroscore
Subtest
Score
0
1
2
3
Spontaneous activity (3 min)
No movement
Minimal movement
Touches 1–2 walls
Touches 3–4 walls
Side stroking
···
No response
Unilateral response
Bilateral response
Vibrissae touch (stroke whiskers)
···
No response
Unilateral response
Bilateral response
Visual (tip toward each eye)
···
No response
Unilateral response
Bilateral response
Olfactory
···
No sniffing
Brief sniff
Sniff >2 s
Tactile (poke top of paws)
···
No withdrawal
Delayed withdrawal
Immediate withdrawal
Limb extension
No movement
Minimal movement
Abnormal forelimb walk
Contralateral forelimbs and hind limbs completely extended
Climbing (1 min)
Falls down
Holds <4 s
Holds on but no movement
Climbs
Beam walking
Falls <2 s
Falls >2 s
Holds but no movement
Walks
The scoring criteria are slightly modified from the original scoring criteria proposed by Han et al.17 Score range is from 5 (maximum deficits) to 27 (no deficits). The side stroking and beam walking subtests in the Parra Neuroscore were originally called proprioception and balance, respectively.
All tests in this study were 2 sided. All data are presented as individual data points with the median shown. Normality and homoscedasticity were tested for, but not met, for all data within. Single time point data (ie, behavioral data on day 1 after SAH) were analyzed with the Mann‐Whitney U test. The longitudinal data (ie, 3‐ and 7‐day studies) were analyzed with a Scheirer‐Ray‐Hare test (2‐way ANOVA on ranks) and a mixed‐effect model. For the factor(s) that were found to be significant by these tests, multiple comparisons were made: the injury factor was analyzed using the Mann‐Whitney U test, and the time factor was analyzed using the Friedman test (1‐way ANOVA on rank with repeated measures) (followed by a Bonferroni post hoc correction for multiple comparisons). GraphPad Prism 6 (La Jolla, CA), SigmaPlot 11.0 (SysStat, Germany), MedCalc Statistical Software v18.5 (Ostend, Belgium), and the Real Statistics Resource Pack software (Release 4.3) were all used for analyzing and graphing data.
Receiver operating characteristic curve analysis
Receiver operating characteristic (ROC) curves were created (GraphPad Prism 6) for each individual subtest to identify the diagnostic accuracy each subtest provides for correctly labeling a mouse into its true group (ie, how sensitive and specific is the subtest for labeling a sham mouse as a sham [rather than incorrectly identifying it as an SAHmouse] and vice versa).24In addition, to assess the diagnostic accuracy of the various neurobehavioral scoring systems tested herein, ROC curves were generated for the modified Bederson and Katz Scores and the Garcia and Parra Neuroscores. To test for statistical significance between the ROC curves of the scoring systems, the method of Hanley and McNeil was used to calculate the z score25:where A1 and A2 are the areas under the curves for scoring systems 1 and 2, respectively; SE1 and SE2 are the standard errors of the ROC area for scoring systems 1 and 2, respectively; and r is the estimated correlation between the 2 scoring systems. To determine r, first 2 intermediate correlation coefficients need to be calculated: the correlation between scoring system 1 and scoring system 2 for the sham group and the correlation between scoring system 1 and scoring system 2 for the SAH group. GraphPad Prism was used to compute these 2 Spearman correlation coefficients. Then, using the table provided by Hanley and McNeil, the correlation between the 2 scoring systems (r) was estimated.25 Finally, after the z score between the 2 scoring systems was calculated, the P value was determined from a z‐score table.
Sample size estimation to determine the utility of each neuroscoring system
Sample size analysis for each of the neurobehavioral performance scoring systems used within was performed using the means and SDs from the 24‐hour functional performance data. The mean difference between the means for the sham and SAH groups was determined. The SD of the population was calculated as follows:where nsham and nSAH are the number of sham and SAH animals, respectively; and SDsham and SDSAH are the SDs of the sham and SAH groups, respectively. For each scoring system (ie, modified Bederson, Katz, Garcia, and Parra), the mean difference and SD of the population were used in SigmaPlot's t‐test sample size calculator using a desired power of 0.8 and an α of 0.05 to estimate the sample size required to test for statistical significance between sham and SAH groups. This process was mirrored using SigmaPlot's ANOVA sample size calculator for group sizes of 3 to 5. Finally, because the sample sizes estimated by SigmaPlot reflect the assumptions of a parametric test, to estimate the sample sizes required to test for statistical significance using a nonparametric test (ie, ANOVA on ranks or Mann‐Whitney), each sample size calculated by SigmaPlot was increased by 15%.19
Interoperator variability
Interoperator variation was calculated for each of the scoring systems: modified Bederson Score, Katz Score, GarciaNeuroscore, and Parra Neuroscore. Interoperator variations were computed using a weighted κ statistic (MedCalc).
Developing a new composite neuroscore for SAH
Two methods were used to develop a new composite neuroscore. First, after computing the ROC curves for each subtest, we used threshold for the area under the curve equal to 0.80 for selecting the “best” subtests for a new neuroscore. We also used a variable selection procedure through Lasso regression to identify the most important subtests (based on a frequency >150, Figure S1). The “best/most important” subtests identified by these methods were combined into a new neuroscore (Table 9). This composite neuroscore was then subjected to the rigorous statistical analysis that each of the other scoring systems underwent using the data from the 3‐day study (ie, ROC curve analysis, sample size determination, and interoperator variability). This neuroscore was validated by performing a 7‐day study.
Table 9
New Composite Neuroscore for Evaluating Functional Deficits After SAH in Mice
Subtest
Score
0
1
2
3
Spontaneous activity (3 min)
No movement
Minimal movement
Touches 1–2 walls
Touches 3–4 walls while standing on hind limbs
Climbing (1 min)
Weak grip and falls down
Climbs but does not reach top and weak grip
Climbs to top and weak grip or does not climb to top and good grip
Climbs to top and strong grip
Balance
Tumbles
Stands, sways
Sways as walking
Changes position easily
Side stroking
No response
Unilateral response
Bilateral weak response or strong ipsilateral response
Strong bilateral response
Vibrissae touch (stroke whiskers)
No response
Unilateral response
Weak bilateral response
Strong bilateral response
Visual (tip toward each eye)
No response
Unilateral response
Weak bilateral response
Strong/rapid bilateral response
Forelimb use
Severe bilateral deficits
Severe unilateral deficits
Slight deficits (unilateral or bilateral)
None
Hind limb use
Severe bilateral deficits
Severe unilateral deficits
Slight deficits (unilateral or bilateral)
None
Score range is from 0 (maximum deficits) to 24 (no deficits). SAH indicates subarachnoid hemorrhage.
New Composite Neuroscore for Evaluating Functional Deficits After SAH in MiceScore range is from 0 (maximum deficits) to 24 (no deficits). SAH indicates subarachnoid hemorrhage.
Results
Mortality rates were 0% (0/16) for sham and 26% (8/31) for SAH. For the 3‐day study, 1 mouse died within the first 24 hours (before neurobehavior could be completed), 3 mice died before day 2 neurobehavior, and 2 mice died before day 3 neurobehavior. Animals not surviving until euthanasia were excluded from the longitudinal analysis (16 sham and 15 SAH animals were included in analysis). All animals surviving >24 hours were included in the analysis of neurobehavioral performance on day 1 (16 sham and 20 SAH animals were included in analysis). For the 7‐day study, one mouse in the SAH group died within 1 hour after SAH and another mouse died on day 2, so these data are not included in the longitudinal analysis (8 sham and 8 SAH animals were included in analysis). Data S1 contains all statistical reports (ie, exact P values, provided in Tables S1 through S8) as well as additional experimental results.
Modified Bederson Score
For the 3‐day longitudinal study assessed using the modified Bederson Score, mice experiencing SAH perform significantly worse compared with sham mice on days 1 and 2 after injury (Figure 1A, Table 10). On day 3 after SAH, no significant difference between the sham and SAH animals was observed.
Figure 1
Neurobehavioral deficits on days 1 to 3 after subarachnoid hemorrhage (SAH). A, Modified Bederson Score. B, Katz Score. C, Garcia Neuroscore. D, Parra Neuroscore. Sham, n=16; SAH, n=15 to 20. Analyzed with Scheirer‐Ray‐Hare tests with Bonferroni post hoc tests. *P<0.05 between sham and SAH at the indicated time point.
Table 10
Means and SDs for the 3‐Day Study (Figure 1)
Scoring System
Mean
SD
Modified Bederson Score
Day 1
Sham
4.923
0.2774
SAH
3.350
1.568
Day 2
Sham
4.923
0.2774
SAH
3.667
2.093
Day 3
Sham
5.000
0.000
SAH
4.000
1.604
Katz Score
Day 1
Sham
5.231
8.604
SAH
28.27
23.51
Day 2
Sham
1.000
1.915
SAH
24.67
24.39
Day 3
Sham
1.000
1.915
SAH
16.92
1.115
Garcia Neuroscore
Day 1
Sham
16.92
1.124
SAH
11.15
5.008
Day 2
Sham
16.69
1.377
SAH
11.73
5.849
Day 3
Sham
16.85
1.625
SAH
12.40
5.422
Parra Neuroscore
Day 1
Sham
25.56
1.315
SAH
19.60
5.795
Day 2
Sham
25.44
1.365
SAH
19.13
6.589
Day 3
Sham
26.19
0.9106
SAH
19.93
6.829
SAH indicates subarachnoid hemorrhage.
Neurobehavioral deficits on days 1 to 3 after subarachnoid hemorrhage (SAH). A, Modified Bederson Score. B, Katz Score. C, GarciaNeuroscore. D, Parra Neuroscore. Sham, n=16; SAH, n=15 to 20. Analyzed with Scheirer‐Ray‐Hare tests with Bonferroni post hoc tests. *P<0.05 between sham and SAH at the indicated time point.Means and SDs for the 3‐Day Study (Figure 1)SAH indicates subarachnoid hemorrhage.
Katz Score
Using the Katz Score for the 3‐day longitudinal study, SAH animals performed significantly worse that sham animals on all 3 days after SAH (Figure 1B, Table 10).
Garcia Neuroscore
The behavioral performance of mice on the GarciaNeuroscore for the 3‐day longitudinal study was statistically different between the sham and SAH groups on all 3 days (Figure 1C, Table 10).
Parra Neuroscore
The Parra Neuroscore for the 3‐day longitudinal study was able to statistically distinguish between sham and SAHmice on all 3 days after SAH (Figure 1D, Table 10).
Sensitivity Analysis of Behavioral Scoring Systems
For the mice subjected to neurobehavioral testing on day 1 after SAH, the data for each scoring system were examined to determine the diagnostic accuracy via ROC curve analysis (Table 11). Of note is the area under the curve; a greater area under the curve indicates that the test is better at placing a subject into the correct classification. The modified Bederson and Katz Scores had areas under the curves of 0.7313 and 0.8781, respectively. The areas under the curves for the Garcia and Parra Neuroscores were 0.8875 and 0.9266, respectively.
Table 11
ROC Curve Analysis for the Neurobehavioral Scoring Systems
Scoring System
AUC (SE)
P Value
Modified Bederson Score
0.7313 (0.08392)
0.01852
Katz Score
0.8781 (0.05698)
0.0001
Garcia Neuroscore
0.8875 (0.05306)
<0.0001
Parra Neuroscore
0.9266 (0.04232)
<0.0001
Our composite neuroscore
0.9953 (0.00702)
<0.0001
Day 1 data from the 3‐day study were used for analysis. AUC indicates area under the curve; ROC, receiver operating characteristic.
ROC Curve Analysis for the Neurobehavioral Scoring SystemsDay 1 data from the 3‐day study were used for analysis. AUC indicates area under the curve; ROC, receiver operating characteristic.
Analysis of the Sensitivity of the Individual Subtests
For all animals surviving for behavioral testing on day 1, the individual subtests were analyzed for statistical significance between the sham and SAHmice (Figure 2). For the balance and coordination tests, all subtests identified statistically significant deficits between the sham and SAHmice (P<0.01). For the posture and strength tests, all 3 subtests were able to detect significant deficits. The ptosis and facial weakness subtests had P=0.0047 and P=0.0036, respectively, whereas the dyspnea subtest had a P=0.0417. For the reflex tests, the side stroking, vibrissae touch, visual, olfactory, and righting subtests were able to distinguish between sham and SAH animals (P<0.01), whereas no significant deficits were observed in the tactile (P=0.0529), postural (P=0.1930), and sound (P=0.1965) subtests. All 3 subtests for the limb use tests identified significant differences between sham and SAHmice (P<0.002).
Figure 2
Functional deficits assessed by the individual subtests on day 1 after subarachnoid hemorrhage (SAH). A, Coordination and balance tests. B, Posture and strength tests. C, Reflex tests. D, Forelimb tests. Sham, n=16; SAH, n=20. Analyzed with Mann‐Whitney tests. *P<0.05 vs sham for the indicated subtest.
Functional deficits assessed by the individual subtests on day 1 after subarachnoid hemorrhage (SAH). A, Coordination and balance tests. B, Posture and strength tests. C, Reflex tests. D, Forelimb tests. Sham, n=16; SAH, n=20. Analyzed with Mann‐Whitney tests. *P<0.05 vs sham for the indicated subtest.Using ROC curves, the ability for each subtest to correctly identify each animal as either sham or injured (ie, SAH) was computed (Table 12). The subtests with an area under the curve of <0.70 were forepaw outstretching, dyspnea, tactile reflex, postural reflex, and sound reflex. Those with an area under the curve between 0.70 and 0.75 were lateral turning and walking. The subtests with areas under the curve between 0.75 and 0.80 were beam walking, ptosis, facial weakness, olfactory reflex, righting reflex, and limb extension. Finally, the subtests with an area under the curve >0.80 were spontaneous activity, climbing, balance, side stroking, vibrissae touch, visual reflex, forelimb use, and hind limb use.
Table 12
ROC Curve Analysis for the Individual Subtests
Subtest
AUC (SE)
P Value
Spontaneous activity
0.8277 (0.06900)
0.0006
Forepaw outstretching
0.6905 (0.08569)
0.0460
Climbing
0.8333 (0.06814)
0.0005
Balance
0.8782 (0.05749)
<0.0001
Lateral turning
0.7206 (0.08475)
0.0223
Walking
0.7451 (0.08021)
0.0102
Beam walking
0.7985 (0.07483)
0.0020
Ptosis
0.7647 (0.07947)
0.0056
Dyspnea
0.6401 (0.09032)
0.1422
Facial weakness
0.7563 (0.07961)
0.0073
Side stroking
0.8333 (0.06814)
0.0005
Vibrissae touch
0.8249 (0.07056)
0.0007
Visual reflex
0.8347 (0.07037)
0.0005
Olfactory reflex
0.7661 (0.07766)
0.0053
Tactile reflex
0.6190 (0.09082)
0.2122
Postural reflex
0.5714 (0.09321)
0.4541
Sound reflex
0.5882 (0.09273)
0.3551
Righting reflex
0.7899 (0.07373)
0.0024
Limb extension
0.7605 (0.07868)
0.0063
Forelimb use
0.8543 (0.06273)
0.0002
Hind limb use
0.8585 (0.06184)
0.0002
The day 1 data from the 3‐day study were used for analysis. Boldfacing indicates P<0.05 between the sham and SAH groups on day 1 post‐SAH. AUC indicates area under the curve; ROC, receiver operating characteristic.
ROC Curve Analysis for the Individual SubtestsThe day 1 data from the 3‐day study were used for analysis. Boldfacing indicates P<0.05 between the sham and SAH groups on day 1 post‐SAH. AUC indicates area under the curve; ROC, receiver operating characteristic.
Development of a New Composite Neuroscore
After ROC curve analysis of the individual subtests, we found that the Parra Neuroscore uses one subtest that is insensitive toward SAH injury (the tactile reflex test). This led us toward developing a new composite neuroscore specifically designed to detect functional deficits observed after SAH in mice. Using the ROC curve analysis of the individual subtests, we set a threshold (for the area under the curve) equal to 0.80. All the subtests that had an area under the curve >0.80 were combined to make the new composite neuroscore (Table 9).Using the new composite neuroscore developed within this study, we first analyzed the behavioral data from the 3‐day study (Figure 3, Table 13). On days 1 to 3 after SAH, significant differences were observed between the sham and SAHmice. ROC curve analysis of the new composite neuroscore measured the area under the curve to be 0.9953 (Table 11).
Figure 3
Assessing functional deficits on days 1 to 3 after subarachnoid hemorrhage (SAH) using the developed composite neuroscore. Sham, n=16; SAH, n=15 to 20. Analyzed with Scheirer‐Ray‐Hare test with Bonferroni post hoc test. *P<0.05 between sham and SAH at the indicated time point.
Table 13
Means and SDs for the 3‐Day Study Using the Developed Composite Neuroscore (Figure 3)
Study Day (in 3‐Day Study)
Mean
SD
Day 1
Sham
22.81
1.223
SAH
12.70
6.105
Day 2
Sham
22.63
2.125
SAH
13.29
8.099
Day 3
Sham
23.31
1.302
SAH
14.73
8.581
SAH indicates subarachnoid hemorrhage.
Assessing functional deficits on days 1 to 3 after subarachnoid hemorrhage (SAH) using the developed composite neuroscore. Sham, n=16; SAH, n=15 to 20. Analyzed with Scheirer‐Ray‐Hare test with Bonferroni post hoc test. *P<0.05 between sham and SAH at the indicated time point.Means and SDs for the 3‐Day Study Using the Developed Composite Neuroscore (Figure 3)SAH indicates subarachnoid hemorrhage.Finally, as an internal validation of our composite neuroscore, we performed a 7‐day study on a separate cohort of mice (sham, n=8; SAH, n=10) (Table 14). Neither the modified Bederson nor Katz Score detected any significance difference in the neurobehavioral performance of sham versus SAHmice (Figure 4A and 4B). The GarciaNeuroscore detected significant differences between the 2 groups on days 2 and 3 after SAH, but not days 1, 5, and 7 (Figure 4C). The Parra Neuroscore found that SAHmice had significantly more deficits than sham animals on days 1 and 2 after SAH, but not any other day (Figure 4D). On days 1 to 3 and 5 after SAH, significant differences were observed between the sham and SAHmice for the developed composite neuroscore; deficits were not significantly different on day 7 (Figure 4E). The trends of individual mice are plotted on Figures S2 through S6.
Table 14
Means and SDs for the 7‐Day Study (Figure 4)
Study Day
Garcia Neuroscore
Parra Neuroscore
Our Composite Neuroscore
Mean
SD
Mean
SD
Mean
SD
Day 1
Sham
17.50
0.7559
26.38
0.9161
23.88
0.3536
SAH
13.33
4.062
21.78
3.734
16.33
4.500
Day 2
Sham
17.63
0.5175
26.38
0.9161
23.50
0.7559
SAH
13.63
3.462
22.25
4.234
17.13
6.578
Day 3
Sham
17.75
0.4629
26.63
0.7440
23.63
0.7440
SAH
13.88
3.137
22.63
3.503
18.63
4.749
Day 5
Sham
17.75
0.4629
26.63
0.7440
23.88
0.3536
SAH
14.88
2.997
24.00
2.726
19.63
4.274
Day 7
Sham
17.88
0.3536
26.38
1.408
23.88
0.3536
SAH
15.50
3.024
24.25
3.327
20.75
3.991
SAH indicates subarachnoid hemorrhage.
Figure 4
Neurobehavioral deficits on days 1 to 7 after subarachnoid hemorrhage (SAH) in mice. A, Modified Bederson Score. B, Katz Score. C, Garcia Neuroscore. D, Parra Neuroscore. E, Developed composite neuroscore. Sham, n=8; SAH, n=8 to 9. Analyzed with Scheirer‐Ray‐Hare test with Bonferroni post hoc test. *P<0.05 between sham and SAH at the indicated time point.
Means and SDs for the 7‐Day Study (Figure 4)SAH indicates subarachnoid hemorrhage.Neurobehavioral deficits on days 1 to 7 after subarachnoid hemorrhage (SAH) in mice. A, Modified Bederson Score. B, Katz Score. C, GarciaNeuroscore. D, Parra Neuroscore. E, Developed composite neuroscore. Sham, n=8; SAH, n=8 to 9. Analyzed with Scheirer‐Ray‐Hare test with Bonferroni post hoc test. *P<0.05 between sham and SAH at the indicated time point.
ROC Curve Analysis of the Scoring Systems
To determine which neuroscore was the most useful for SAHmice studies, we analyzed the differences between the ROC curves for the GarciaNeuroscore, Parra Neuroscore, and our neuroscore (Table 15, Table S6). The Parra Neuroscore is not significantly more diagnostically accurate for SAH deficits than the GarciaNeuroscore (P=0.1062). However, the developed composite neuroscore is significantly more diagnostically accurate for SAH injury than both the GarciaNeuroscore (P=0.0121) and the Parra Neuroscore (P=0.0241).
Table 15
Comparison of the ROC Curves for the Garcia Neuroscore, Parra Neuroscore, and the Developed Composite Neuroscore
Scoring System
Absolute Difference in AUC
SE1; SE2
r
z Score
P Value
Parra Neuroscore vs Garcia Neuroscore
0.0391
0.05306;
0.04232
0.807
1.248
0.1062
Our composite neuroscore vs Garcia Neuroscore
0.1078
0.05306;
0.00702
0.78
2.256
0.0121a
Our composite neuroscore vs Parra Neuroscore
0.0687
0.04232;
0.00702
0.84
1.877
0.0241a
SE1 and SE2 are the SEs of the AUCs (from ROC curve analysis) for the first and second scoring system being analyzed, respectively. The r value is the estimated correlation between the 2 scoring systems (obtained from the table provided by Hanley and McNeil25). AUC indicates area under the curve; ROC, receiver operating characteristic.
Boldfacing indicates that there is a significant difference between the 2 scoring systems.
Comparison of the ROC Curves for the GarciaNeuroscore, Parra Neuroscore, and the Developed Composite NeuroscoreSE1 and SE2 are the SEs of the AUCs (from ROC curve analysis) for the first and second scoring system being analyzed, respectively. The r value is the estimated correlation between the 2 scoring systems (obtained from the table provided by Hanley and McNeil25). AUC indicates area under the curve; ROC, receiver operating characteristic.Boldfacing indicates that there is a significant difference between the 2 scoring systems.
Sample Size Estimations for Each of the Scoring Systems
Although a functional test may be diagnostically accurate for the injury studied, sample sizes are critical to every single study; and poor estimation of sample sizes required to test a hypothesis in a particular study reduces the value of that study (ie, the study may be underpowered). Thus, we performed sample size estimations for each of the scoring systems used in this study on the basis of the functional performance of mice after SAH. Sample sizes were estimated for a variety of conditions: 2‐group comparison (Mann‐Whitney test) and 3‐, 4‐, and 5‐group comparisons (Kruskal‐Wallis test); most studies make statistical comparisons for these different conditions. The sample sizes required by the scoring systems are as follows: modified Bederson Score>Katz Score>GarciaNeuroscore>Parra Neuroscore>our composite neuroscore (Table 16). The sample sizes needed for the Parra Neuroscore are moderate, at n=8 to 12 per group, but the sample sizes needed to test for significance using our developed composite neuroscore are n=6 to 8 per group.
Table 16
Sample Size Estimations for a Single Time Point Study
Scoring System
No. of Groups
2 (Mann‐Whitney)
3 (Kruskal‐Wallis)
4 (Kruskal‐Wallis)
5 (Kruskal‐Wallis)
Modified Bederson Score
17
21
24
25
Katz Score
13
15
17
19
Garcia Neuroscore
9
12
13
14
Parra Neuroscore
8
9
10
12
Our composite neuroscore
6
7
7
8
Sample sizes (ie, number of mice required in each group) for each neurobehavioral scoring system were estimated by SigmaPlot using the mean differences, SDs of the populations, a desired power of 0.8, and an α of 0.05. Because the sample sizes estimated by SigmaPlot reflect the assumptions of a 1‐way ANOVA, each sample size was increased by 15% to estimate the sample sizes required to test for statistical significance using a nonparametric test (ie, 1‐way ANOVA on ranks or Mann‐Whitney).19
Sample Size Estimations for a Single Time Point StudySample sizes (ie, number of mice required in each group) for each neurobehavioral scoring system were estimated by SigmaPlot using the mean differences, SDs of the populations, a desired power of 0.8, and an α of 0.05. Because the sample sizes estimated by SigmaPlot reflect the assumptions of a 1‐way ANOVA, each sample size was increased by 15% to estimate the sample sizes required to test for statistical significance using a nonparametric test (ie, 1‐way ANOVA on ranks or Mann‐Whitney).19Finally, we estimated the sample size required to test for statistical significance in a longitudinal study (2 days) for 2 groups (sham and SAH) with the method of Liu and Liang,18 using the mean differences, SDs of the populations, a desired power of 0.8, and an α of 0.05. Adjusting for the use of a nonparametric 2‐way ANOVA test,19 the sample sizes required are as follows: modified Bederson Score, n=32; Katz Score, n=23; GarciaNeuroscore, n=16; Parra Neuroscore, n=13; and our composite neuroscore, n=8.
Discussion
Herein, we developed a new composite neuroscore for measuring the functional performance of mice after SAH. Although various neurobehavioral testing schemes exist, none have been specifically designed nor validated for diagnostic accuracy to detect deficits after SAH. To our knowledge, this is the first SAH study to perform an analysis of the diagnostic accuracy for the modified Bederson Score, Katz Score, GarciaNeuroscore, and Parra Neuroscore. Furthermore, this is also the first study to have investigated the utility of the various subtests for SAH injury in mice. The shortcomings of the preexisting scoring systems for SAHmice led to the development of a new composite score (Table 9). This composite score was specifically designed using individual subtests that are diagnostically accurate for SAH. Finally, this is also the first study to perform a sample size estimation comparison between the various scoring systems.Within this study, we found that the modified Bederson Score is not an adequate scoring system for SAH in mice. Although both the Katz Score and the GarciaNeuroscore are somewhat diagnostically accurate for SAH injury (ie, have areas under the curves of 0.80–0.90), the Parra Neuroscore is a slightly better choice (because of the greater area under the curve). However, the developed composite neuroscore has an even greater ROC area under the curve and is even more diagnostically accurate for SAH injury than the Parra Neuroscore. The interoperator variation for the developed composite score is similar to the interoperator variations for the Garcia and Parra Neuroscores (all interoperator variations were >0.92 for the neurobehavioral data on day 1, Table 17).
Table 17
Interoperator Variation
Scoring System
Tester 1 vs Tester 2
Tester 1 vs Tester 3
Tester 2 vs Tester 3
Modified Bederson Score
0.9364 (0.05116)
0.9134 (0.05257)
0.9357 (0.06001)
Katz Score
0.8818 (0.05856)
0.8175 (0.05329)
0.8823 (0.05856)
Garcia Neuroscore
0.9597 (0.01389)
0.9226 (0.03770)
0.9291 (0.02940)
Parra Neuroscore
0.9748 (0.00927)
0.9451 (0.02807)
0.9532 (0.02153)
Our composite neuroscore
0.9604 (0.01473)
0.9258 (0.03026)
0.9408 (0.02853)
Data are given as κ (SE). Data from day 1 of the 3‐day study are used.
Interoperator VariationData are given as κ (SE). Data from day 1 of the 3‐day study are used.Although the modified Bederson Score has never been used to assess injury after SAH in mice, it has been used in several rat models of SAH. Despite being used in ratSAH studies, the modified Bederson Score does not seem to be a suitable scoring system for identifying behavioral impairment after SAH in rats. The study by Bederson et al did not detect any functional deficits in rats on days 1 and 2 after SAH.26 Similarly, the study by Bermueller et al observed no difference in the function of untreated and treated SAHrats for days 1 to 7 after SAH, despite positive findings in intracranial pressure reduction and neuronal survival for the treatment tested.8 Finally, Thal et al did not observe any differences in the modified Bederson Score of untreated and treated rats on days 1 to 7 after SAH.9 Furthermore, Thal et al9 compared several other behavioral tests that were able to identify significant differences between untreated and treated animals after SAH, suggesting that the modified Bederson Score may not be applicable to SAH injury. The only positive study for use of the modified Bederson Score for SAH in rats is that of Hockel et al, which reported significant deficits between untreated and treated SAHrats (n=10 per group) 1 and 2 days after injury, but no difference on days 3 to 7.27 The modified Bederson Score was developed by modified Bederson et al for detecting injury after ischemic stroke, and thus the modified Bederson Score may be more suitable for revealing deficits from either unilateral injury or large ischemic damagerather than SAH injury.28In our 3‐day study, the modified Bederson Score was able to detect significant differences between sham and SAHmice on all 3 days, but no differences were found on days 1 to 3 in the 7‐day study. Although this seems contradictory, it is attributable to the sample sizes for the 2 studies; the 3‐day study was powered (ie, n=16–21 per group) to detect differences between sham and SAHmice for all the scoring systems, whereas the 7‐day study was powered to detect differences for the GarciaNeuroscore, Parra Neuroscore, and the developed neuroscore (ie, n=8–10 per group). Our power analysis of the modified Bederson Score data indicated that >20 mice in each group were needed to reach statistical significance, which is much more than the 10 mice allocated into the groups for the 7‐day study (Table 16).The Katz Score has been used in 5 ratSAH studies with mixed findings. In a study by Zausinger et al, the Katz Score was able to detect significant differences between treated and untreated rats only on the first day after SAH, although the study was conducted for 7 days.29 In a follow‐up study, Bermueller et al8 used the Katz Score to test for significant functional differences in treated and untreated SAHrats (n=15/group) on days 1, 3, and 7 after SAH. Although the treatment reduced intracranial pressure and promoted neuron survival, no difference was found between the behavior of treated and untreated SAHrats using the Katz Score.8 Although the first 2 studies only investigated functional deficits between treated and untreated rats, the study by Scholler et al tested for differences between sham and SAHrats at 6, 24, 48, and 72 hours after ictus.10 This study observed significant differences between sham and SAHrats at 6 and 24 hours after SAH, but not on days 2 or 3. Finally, Thal et al observed functional deficits using the Katz Score on day 1 after SAH (SAH untreated versus SAH treated rats), but not on days 2 to 7.9, 30The studies by Zausinger et al29 and Thal et al9 (2008) tested for statistical significance between 3 groups, with 20 rats per group allocated, which is above our estimated sample size for 3 groups of n=15 per group. The study by Bermueller et al8 allocated 15 rats per group and 4 groups, which is slightly lower than our estimated sample size (n=17/group). In the 2‐group studies by Scholler et al10 and Thal et al30 (2009), 9 and 7 rats, respectively, were assigned to each group. Although these sample sizes are less than our estimated sample size for 2 groups, the findings may be attributed to no rat in the sham group having any deficits (ie, mean=0, SD=0). The findings of the current study suggest that if the correct number of animals is allocated into each group, the Katz Score is a viable option for examining functional deficits after SAH in rodents.Similar to the modified Bederson Score, the Katz Score was adequate to identify significant differences between sham and SAH animals on all 3 days in the 3‐day studies but failed to find differences for days 1 to 3 in the 7‐day study. Again, this discrepancy is attributable to the way the 2 studies were powered (see Table 16 for power analysis).By far, the GarciaNeuroscore has been much more used to assess functional deficits in rodents after SAH. The GarciaNeuroscore (and its modifications, which include addition of 1 more subtest) is reported to distinguish between sham and SAHmice in several studies.31, 32, 33 Sozen et al observed differences between sham and SAHmice at 1 and 3 days after ictus using n=17 to 20 mice per group.33 Similarly, Fujimoto et al reported functional deficits between uninjured and injured mice on days 1 and 2 after SAH (n=16–24 mice per group).31 The first 2 studies used sample sizes much higher than required to test for statistical significance. Liu et al reported that mice after SAH (via endovascular perforation) performed significantly worse on the GarciaNeuroscore compared with sham mice.32 This study allocated 10 mice into the sham group and 15 animals into the SAH group (of which 11 survived) to test for statistical significance in the GarciaNeuroscore. The authors ran their Kruskal‐Wallis test on 4 experimental groups. Their sample size used (n=8–12 per group) is close to the sample size estimated for a 4‐group ANOVA on ranks (ie, Kruskal‐Wallis test) based on the data obtained in the current study (required sample size to test for significance was n=13 [Table 16]).Much of the use of the GarciaNeuroscore has been performed in rats after SAH. Overall, SAH in rats after endovascular perforation leads to statistically significant differences in the functional performance of untreated SAHrats and sham rats on days 1 and 3 after injury.11, 12, 13, 15, 34, 35 Sugawara et al observed that moderate and severe SAH in rats leads to significant functional deficits 1 day after ictus (endovascular perforation model).16 In addition, the authors reported on the interindividual variation of the GarciaNeuroscore for SAHrats and observed good correlation between the 2 scorers for identifying the presence or lack of deficits.In general, the original 6 sensorimotor subtests proposed by Garcia et al23 have been used for assessing functional deficits after SAH in rats. Despite the lack of sensitivity for the forepaw outstretching subtest, most investigators have observed statistically significant deficits after SAH compared with sham rats.11, 12, 13, 15, 34, 35 Other researchers have used a modified GarciaNeuroscore combining 7 subtests,36, 37 which sometimes includes the beam walking subtest.38, 39, 40The GarciaNeuroscore (and its modified version) has been successful for identifying functional deficits after SAH in rats, and the current study (as well as others31, 32, 33) indicates that this neuroscoring system may also be useful for assessing deficits after SAH in mice. However, the findings from our study and those of Liu et al indicate that a larger sample size is required to test for statistical significance in the mouseSAH model31, 32, 33 compared with the rat model.36, 37, 38, 40 One reason for this difference between endovascular perforation in rats versus mice is that rats are prone to larger hemorrhages, as well as slower hemorrhage clearance. Specifically, large subarachnoid hemorrhages can be observed and graded in rats after SAH,16 but in mice this is more difficult because of smaller hemorrhages.17, 41, 42, 43 In addition, the hemorrhage in rats can be observed and graded 3 days after SAH14, 44 and may last as long as 7 days after injury,37 whereas in the mouse, the hemorrhage is typically cleared by 2 to 4 days after SAH.42, 43Furthermore, inclusion of the beam balance score in the GarciaNeuroscore may provide added diagnostic accuracy to the modified GarciaNeuroscore. In the current study, we did not include the beam balance test into the GarciaNeuroscore. However, the sensitivity of the beam balance test for assessing functional deficits after SAH in mice (area under the curve of 0.799) suggests that modifying the GarciaNeuroscore to include the beam walking test may increase its diagnostic accuracy. The study by Liu et al used the endovascular perforation mouse model and observed significant functional deficits in the beam balance score, as well as the GarciaNeuroscore, further suggesting that there might be added sensitivity if the beam balance score is incorporated into the GarciaNeuroscore.32 However, because the GarciaNeuroscore is significantly less diagnostically accurate for SAH injury than the developed neuroscore (because it includes the forepaw outstretching subtest, which is insensitive), it is unlikely that including the beam walking test in the GarciaNeuroscore will make it better than the composite neuroscore developed in this study.To date, the Parra Neuroscore has been used in 5 studies.17, 45, 46, 47, 48 Although it is a composite score, it combines several tests that are distinct from those included in the GarciaNeuroscore. The forepaw stretching test (part of the GarciaNeuroscore), which we found was not sensitive for SAH injured, is not included in the Parra Neuroscore. Rather, the Parra Neuroscore makes use of the visual reflex test, which is highly diagnostically accurate to SAH injury, and the beam walking and olfactory tests, which are moderately diagnostically accurate to SAH deficits. However, also included, which likely diminishes the diagnostic accuracy of the Parra Neuroscore, is the insensitive tactile reflex subtest. In the original study, the Parra Neuroscore was capable of detecting behavioral differences between sham and SAHmice 3 days after SAH.17 The animal number distributed to each group was n=7 to 8, which agrees with our sample size estimation of n=8 per group for 2‐group analysis. In 3 more recent articles, the studies by Vellimana et al, Han et al, and Wu et al found that the Parra Neuroscore was able to distinguish between sham and injured animals (as well as to evaluate treatment effects) using n=11 to 16,47 n=13 to 20,17 and n=15 to 2848 mice, respectively. These 3 studies had sample sizes well above that needed to test for statistical significance. One negative study using the Parra Neuroscore was that of Tait et al.46 The authors were unable to detect any significant differences between sham and SAHmice at either 6 or 24 hours after injury.46 However, this is likely attributable to being slightly underpowered; only 3 mice were included in the sham group and 10 to 12 mice were included in the injured groups.
Other Neuroscoring Systems
There also exist several other composite neuroscores that combine subtesting of sensory‐motor function. Specifically used in mice after SAH are the neuroscores used by the Laskowitz’ group,49, 50, 51, 52 Plesnila's group,41, 53 McGirt et al,54, 55 and Neulen et al.56 Two other scoring systems used for murineSAH are the 3‐point scale used by Tamargo's group57, 58 and the SHIRPA (SmithKline Beecham Pharmaceuticals; Harwell, MRCMouse Genome Centre and Mammalian Genetics Unit; Imperial College School of Medicine at St Mary's; Royal London Hospital, St Bartholomew's and the Royal London School of Medicine; Phenotype Assessment) score.59, 60The Laskowitz Neuroscore combines the spontaneous activity, limb extension, climbing, balance and coordination, side stroking, vibrissae touch, visual, and tactile subtests.49, 50, 51, 52 Of these subtests in the neuroscore by Laskowitz' group, the findings of the current study indicate that all expect the tactile test is sensitive and specific for SAH. Thus, the diagnostic accuracy of their neuroscore may be reduced compared with the composite neuroscore developed within because of the inclusion of the tactile subtest.Another neuroscoring system developed is that of Plesnila's group, which has been used for the mouse model of SAH via endovascular perforation.41, 53 The developed neuroscore combines tests of reflexes (grasping, righting, and falling), coordination (head orientation and circling), and general behaviors (spontaneous activity, fur appearance, nibbling, and flight) into a score that ranges from 0 (best performance) to 33 (worst performance). This group has demonstrated that their neuroscore is sensitive enough to detect significant deficits in SAHmice compared with sham mice for up to 7 days with a sample size of 6 to 9mice per group (analyzed using Kruskal‐Wallis ANOVA on ranks41 or the Friedman test53). The positive findings by Plesnila's group using this scoring method,41, 53 as well as unambiguous scoring criterion, warrant this neuroscore be investigated in future studies.McGirt et al54,55 used 2 different composite neuroscoring systems to test for functional deficits after SAH in mice. Specifically, the authors used a 9‐ to 39‐point scoring system55 and a 5‐ to 27‐point scoring method.54 Both scoring schemes were acceptable in identifying behavioral deficits after SAH in mice. Although the specific scoring criteria for each subtest are not clear, the authors used sample sizes of n=14 to 17 per group to test for statistical significance.54, 55 The 5‐ to 27‐point scoring method uses the same subtests as the Parra Neuroscore. However, because the exact scoring criterion for the 2 methods is missing, it is difficult for either of these scoring methods to be used by others.The 3‐point scale used by Tamargo's group has been used in mice subjected to SAH via autologous blood injected into the cisternal magna.57, 58 This scoring method observes 3 behaviors (namely, posture, grooming, and ambulation). Each behavior is scored either 0 (deficits) or 1 (no deficits), with a maximum score of 3 (uninjured) and a minimum score of 0 (severely injured). This test was performed by 2 independent observers, and the scores were averaged. This test has yet to be applied to the other SAHmouse models, as well as be evaluated for testing for deficits at time points >1 day.One more interesting scoring system is the SHIRPA score.59, 60 To date, the SHIRPA score has been used in a mouseSAH study (specifically, the autologous blood injection model in the prechiasmatic space). Boettinger et al observed that mice with moderate to severe SAH (induced using 100–120 μL of blood) presented with significant functional deficits for body position (4 subtests) and motor behavior (10 subtests) and minor deficits in spontaneous activity (single test) and gait (single test) on days 1 and 2 after SAH.59 Herein, this scoring system was not specifically used, and because the exact details about each individual score for the subtests are not clear, it is difficult to assess the utility of the SHIRPA score for determining behavioral deficits after SAH in mice.Finally, although not developed for SAH, the neuroscore by Feldman et al measures mobility, reflexes, behavior, and function61 and has been highlighted as a potentially useful scoring system for SAH in rodents because of its inclusion of motor and sensory tests, as well as beam walking and mobility.6 A modification of this scoring system has been successfully applied to rats subjected to SAH; Yatsushige et al observed modest differences in the functional deficits of untreated versus treated rats after subarachnoid hemorrhage, but no comparison was made between the functional behavior of SAHrats and sham rats.62 Additional studies need to be undertaken to determine the utility of the Feldman Neuroscore for detecting functional deficits after SAH.
Toward a Standardized Neuroscore for SAH Mice Studies
To our knowledge, this is the first study to investigate the utility and diagnostic accuracy of various neurobehavioral scoring systems for assessing deficits after SAH in rodents. The current study used mice, but the findings likely apply to rat studies (although specific sample size estimations may be different). Of the previously published scoring systems, the combined findings of literature and this study cannot recommend that the modified Bederson Score or the Katz Score be used for rodent SAH studies. This is because large sample sizes are required to test for statistical significance. As for the GarciaNeuroscore, although the sample size needed is modest, the ROC curve analysis suggests that the Parra Neuroscore is a better choice. But, because the Parra Neuroscore incorporates the insensitive tactile reflex subtest, we argue that it is still not the optimal neuroscoring system, and therefore we developed a new composite neuroscore using only highly diagnostically accurate subtests. The developed composite neuroscore is significantly more diagnostically accurate than both the GarciaNeuroscore (P=0.0121) and the Parra Neuroscore (P=0.0241). The findings of the current study suggest that the most appropriate scoring system for testing for functional deficits after SAH in mice is the new composite neuroscore developed within. The new neuroscore was demonstrated to have interoperator variation similar to the Garcia and Parra Neuroscores, and because several of the subtests overlap with the Garcia and Parra Neuroscores, the new neuroscore can be quickly adopted by other preclinical SAH groups without any new costs or training.
Limitations and Future Studies
The current study has several limitations that need to be addressed in future studies to fully understand the application of composite neuroscoring systems to experimental SAH studies. First, this study did not investigate all the various composite scoring tests. In future studies, we can investigate other scoring methods and compare them with the developed neuroscore. Second, the recommendations from the roundtable committees argue for testing various species, different strains, both sexes, as well as age, and even comorbidities. We chose young C57 male mice specifically because they are the most widely used. However, these results may not apply to other strains of mice or even female C57BL/6 mice. We are continuing this study in C57BL/6 mice using aged males (10 and 18 months old), females (4, 10, and 18 months old), and also a blood injection model of SAH. Our plan is to publish our findings in future articles for these groups. With respect to the other SAH models, the other 2 primary models of SAH in mice are the autologous blood injection models into the prechiasmatic space and the cisterna magna. These models should be investigated for diagnostic accuracy for the various scoring methods in future studies. In addition, the diagnostic accuracy for these composite neuroscores may be different between mice and rats. Thus, although the developed neuroscore is the most appropriate for testing mice subjected to SAH via endovascular perforation, this may not hold true for other SAH models or for endovascular perforation in rats. Third, numerous other functional tests exist for evaluating functional behavior in mice: rotarod, Morris water maze, T‐maze, corner turn test, and forelimb placement test, just to name a few. The reviews of Jeon et al6 and Turan et al7 are excellent reads for the current findings of these and other neurobehavior tests.6, 7 Many of these other tests require unique equipment/setups to test for specific functions, whereas composite neuroscores test several general behaviors and are typically more economical and feasible to perform.
Conclusion
Herein, a composite neuroscore was developed for evaluating behavioral deficits in mice subjected to SAH. The new composite neuroscore was compared with several other functional scoring systems, and our neuroscore had greater diagnostic accuracy for SAH injury in mice. The results of this study suggest that the new composite neuroscore is more appropriate to use in mice studies of SAH than the modified Bederson or Katz Scores, or the Garcia or Parra Neuroscores.
Sources of Funding
Funding support was provided by a seed grant provided by The Vivian L. Smith Department of Neurosurgery at the University of Texas Health Science Center at Houston (McBride).
Disclosures
None.Data S1. contains additional data which was not directly needed in the main paper but supports our findings. It also contains extra statistical analysis and specific information on the inter‐operator values.Figure S1. Variable selection through lasso regression to identify the most important sub‐tests for the composite neuroscore. All sub‐tests were subjected to this process, but only the top 10 are shown for the frequency.Figure S2. Performance on the Bederson Score for mice over 7‐days post‐SAH. Sham n=8, SAHn=8 to 9. A, Individual data points are plotted with the median shown as the line plot. B, Same data as (A), butplotted with lines for each individual animal to show a tendency towards recovery. Analyzed withScheirer‐Ray‐Hare test with Bonferroni post‐hoc test. No significance was observed between the Shamand SAHmice at any time point.Figure S3. Functional Performance on the Katz Score for Mice over 7‐Days Post‐SAH. Sham n=8, SAHn=8 to 9. A, Individual data points are plotted with the median shown as the line plot. B, Same data as (A), but plotted with lines for each individual animal to show a tendency towards recovery. Analyzed with Scheirer‐Ray‐Hare test with Bonferroni post‐hoc test. No significance was observed between the Sham and SAHmice at any time point.Figure S4. Neurobehavioral performance on the GarciaNeuroscore for mice over 7‐days post‐SAH. Individual mouse performance (reproduced from Figure 4) is connected by a line. Sham n=8, SAHn=8 to 9. Analyzed with Scheirer‐Ray‐Hare test with Bonferroni post‐hoc test. *P<0.05 between Sham andSAH at the indicated time‐point.Figure S5. Functional performance assessed in mice over 7‐days post‐SAH using the Parra Neuroscore. Individual mouse performance (reproduced from Figure 4) is connected by a line. Sham n=8, SAHn=8 to 9. Analyzed with Scheirer‐Ray‐Hare test with Bonferroni post‐hoc test. *P<0.05 between Sham and SAH at the indicated time‐point.Figure S6. Neurobehavioral performance assessed in mice over 7‐days post‐SAH using the developed Composite Neuroscore. Individual mouse performance (reproduced from Figure 4) is connected by a line. Sham n=8, SAHn=8 to 9. Analyzed with Scheirer‐Ray‐Hare test with Bonferroni post‐hoc test. *P<0.05 between Sham and SAH at the indicated time‐point.Table S1. Statistical Report for Analysis of the Longitudinal Data in Figure 1 Using Scheirer‐Ray‐Hare Tests and a Mixed Effect ModelTable S2. Multiple Comparisons Statistical Report for Analysis of the Longitudinal DATA in Figure 1Table S3. Test Statistics and P‐Value for the Analysis of the Data in Figure 2Table S4. Statistical Report for Analysis of the Longitudinal Data in Figure 3 Using Scheirer‐Ray‐Hare test and a Mixed Effect Model (3‐Day Study)Table S5. Multiple Comparisons Statistical Report for Analysis of the Longitudinal Data in Figure 3Table S6. Comparison of the ROC Curves for the GarciaNeuroscore, Parra Neuroscore, and the Developed Composite NeuroscoreTable S7. Statistical Report for Analysis of the Longitudinal Data in Figure 4 Using Scheirer‐Ray‐Hare Tests and a Mixed Effect ModelTable S8. Multiple Comparisons Statistical Report for Analysis of the Longitudinal Data in Figure 4Table S9. Inter‐Operator VariationTable S10. Inter‐Operator VariationTable S11. Raw Data for the Sham Mice From the 3‐Day StudyTable S12. Raw Data for the SAHMice From the 3‐Day StudyTable S13. Raw Data for the Sham Mice From the 7‐Day Study for Bederson Score and Katz ScoreTable S14. Raw Data for the Sham Mice From the 7‐Day Study for the GarciaNeuroscore, Parra Neuroscore, and Our Composite NeuroscoreTable S15. Raw Data for the SAHMice From the 7‐Day Study for Bederson Score and Katz ScoreTable S16. Raw Data for the SAHMice From the 7‐Day Study for the GarciaNeuroscore, Parra Neuroscore, and Our Composite NeuroscoreClick here for additional data file.
Authors: Ananth K Vellimana; Eric Milner; Tej D Azad; Michael D Harries; Meng-Liang Zhou; Jeffrey M Gidday; Byung Hee Han; Gregory J Zipfel Journal: Stroke Date: 2011-02-11 Impact factor: 7.914
Authors: Chih-Lung Lin; Tarkan Calisaneller; Naoya Ukita; Aaron S Dumont; Neal F Kassell; Kevin S Lee Journal: J Neurosci Methods Date: 2003-02-15 Impact factor: 2.390
Authors: S Boettinger; F Kolk; G Broessner; R Helbok; B Pfausler; E Schmutzhard; R Beer; P Lackner Journal: Behav Brain Res Date: 2017-02-04 Impact factor: 3.332
Authors: Matthew J McGirt; Augusto Parra; Huaxin Sheng; Yoshinori Higuchi; Tim D Oury; Daniel T Laskowitz; Robert D Pearlstein; David S Warner Journal: Stroke Date: 2002-09 Impact factor: 7.914
Authors: Jiang Wu; Yang Zhang; Peng Yang; Budbazar Enkhjargal; Anatol Manaenko; Jiping Tang; William J Pearce; Richard Hartman; Andre Obenaus; Gang Chen; John H Zhang Journal: Stroke Date: 2016-03-22 Impact factor: 7.914
Authors: Ari Dienel; Remya Ammassam Veettil; Sung-Ha Hong; Kanako Matsumura; Peeyush Kumar T; Yuanqing Yan; Spiros L Blackburn; Leomar Y Ballester; Sean P Marrelli; Louise D McCullough; Devin W McBride Journal: Stroke Date: 2020-06-16 Impact factor: 7.914
Authors: Ari Dienel; Remya A Veettil; Kanako Matsumura; Jude P J Savarraj; H Alex Choi; Peeyush Kumar T; Jaroslaw Aronowski; Pramod Dash; Spiros L Blackburn; Devin W McBride Journal: Neurotherapeutics Date: 2021-05-10 Impact factor: 6.088
Authors: Ari Dienel; Remya A Veettil; Kanako Matsumura; H Alex Choi; Peeyush Kumar T; Andrey S Tsvetkov; Jaroslaw Aronowski; Pramod Dash; Spiros L Blackburn; Devin W McBride Journal: Exp Neurol Date: 2021-06-25 Impact factor: 5.620
Authors: Jasper Hans van Lieshout; Serge Marbacher; Sajjad Muhammad; Hieronymus D Boogaarts; Ronald H M A Bartels; Maxine Dibué; Hans-Jakob Steiger; Daniel Hänggi; Marcel A Kamp Journal: Transl Stroke Res Date: 2020-03-09 Impact factor: 6.800