Literature DB >> 21305027

Effect of population heterogenization on the reproducibility of mouse behavior: a multi-laboratory study.

S Helene Richter1, Joseph P Garner, Benjamin Zipser, Lars Lewejohann, Norbert Sachser, Chadi Touma, Britta Schindler, Sabine Chourbaji, Christiane Brandwein, Peter Gass, Niek van Stipdonk, Johanneke van der Harst, Berry Spruijt, Vootele Võikar, David P Wolfer, Hanno Würbel.   

Abstract

In animal experiments, animals, husbandry and test procedures are traditionally standardized to maximize test sensitivity and minimize animal use, assuming that this will also guarantee reproducibility. However, by reducing within-experiment variation, standardization may limit inference to the specific experimental conditions. Indeed, we have recently shown in mice that standardization may generate spurious results in behavioral tests, accounting for poor reproducibility, and that this can be avoided by population heterogenization through systematic variation of experimental conditions. Here, we examined whether a simple form of heterogenization effectively improves reproducibility of test results in a multi-laboratory situation. Each of six laboratories independently ordered 64 female mice of two inbred strains (C57BL/6NCrl, DBA/2NCrl) and examined them for strain differences in five commonly used behavioral tests under two different experimental designs. In the standardized design, experimental conditions were standardized as much as possible in each laboratory, while they were systematically varied with respect to the animals' test age and cage enrichment in the heterogenized design. Although heterogenization tended to improve reproducibility by increasing within-experiment variation relative to between-experiment variation, the effect was too weak to account for the large variation between laboratories. However, our findings confirm the potential of systematic heterogenization for improving reproducibility of animal experiments and highlight the need for effective and practicable heterogenization strategies.

Entities:  

Mesh:

Year:  2011        PMID: 21305027      PMCID: PMC3031565          DOI: 10.1371/journal.pone.0016461

Source DB:  PubMed          Journal:  PLoS One        ISSN: 1932-6203            Impact factor:   3.240


Introduction

Experimental results that cannot be reproduced are scientifically worthless and a nuisance if published in the literature where they may create uncertainty and hinder scientific progress. Poor reproducibility and lack of external validity are an issue throughout laboratory research from mass spectrometry proteomic profiling [1] and microarray analysis [2]–[5] to the social and behavioral sciences [6], [7]. In animal experiments, however, where the lives of animals are highly valuable, poor reproducibility is also an ethical issue. Thus, animal care and use regulations require scientists not to unnecessarily duplicate previous experiments [8]–[10]. This explicitly assumes that animal results are reproducible by different laboratories, and that duplication therefore represents unnecessary animal use. However, a review of the scientific literature casts serious doubt on this assumption, indicating that poor reproducibility may be rather widespread [11]–[22]. In animal experiments, animals, housing and experimental conditions are traditionally standardized to render the animals' responses to experimental treatments more homogeneous, thereby reducing within-experiment variation and increasing test sensitivity [23], [24]. Because higher test sensitivity allows a reduction of sample size, standardization is also promoted for ethical reasons as a mean to reduce animal use [25], [26]. Moreover, standardization across experiments is assumed to reduce between-experiment variation, thereby improving reproducibility among laboratories [24], [27]. However, by reducing within-experiment variation, standardization may limit inference to the specific experimental conditions [28], [29]. Given that most biological traits exhibit environmental plasticity [30], different experimental conditions may produce different experimental outcomes. Because laboratories inherently vary in many experimental features (e.g. experimenter, room architecture), conditions are generally more homogenous within than between laboratories. Therefore, standardization inevitably induces disparity between results from different laboratories. In contrast, controlled variation of experimental conditions may render the animals within experiments more heterogeneous, thereby improving the external validity and hence the reproducibility of experimental results [28], [29], [31]. Indeed, we have recently shown in mice that standardization may increase the incidence of spurious results in behavioral tests, accounting for poor reproducibility between replicate experiments, while systematic variation of experimental conditions (heterogenization) attenuated spurious results, thereby improving reproducibility [32]. However, our findings were challenged because they were based on retrospective analysis, and because heterogenization may be logistically unfeasible [33]. We therefore tested standardization against a simple form of heterogenization for reproducibility across four independent replicate experiments. Systematic variation of only two factors was sufficient to mimic the range of differences between the replicate experiments, resulting in almost perfect reproducibility [34]. In a real multi-laboratory situation, however, between-experiment variation might be considerably greater. Recent multi-laboratory studies revealed large effects of the laboratory as well as strong interactions between genotype and the laboratory environment [13], [17], [35], [22]. To investigate whether simple forms of heterogenization within laboratories render populations of mice sufficiently heterogeneous to guarantee robust results across laboratories, we designed a multi-laboratory study involving six laboratories, and compared the effect of standardization against heterogenization on the reproducibility of behavioral differences between two common inbred strains of mice. Although heterogenization significantly increased within-experiment variation relative to between-experiment variation, the effect was too weak to account for the large variation between laboratories and improve reproducibility substantially. Thus, further research is needed to establish effective and practicable heterogenization strategies.

Methods

Experimental design

Each of six laboratories used 64 female mice of two inbred strains (C57BL6NCrl, DBA/2NCrl, n = 32 each) and examined them for strain differences in five commonly used behavioral tests (barrier test, vertical pole test, elevated zero maze, open field test, novel object test). To test heterogenization against standardization, each laboratory successively conducted the same experiment twice, using two different experimental designs with half of the mice allocated to each design. In the standardized design, experimental conditions were standardized, while they were systematically varied in the heterogenized design. For heterogenization, we selected two experimental factors (test age, cage enrichment) that typically vary between experiments in different laboratories, and chose three factor levels A, B and C for each factor (age: A = 12 weeks old, B = 8 weeks old, C = 16 weeks old; cage enrichment: A = nesting material, B = shelter, C = climbing structures). Within each laboratory, the two factors were standardized to factor level A in the standardized design and systematically varied across B and C using a 2×2 factorial design in the heterogenized design (Fig. 1). Because both age and enrichment have been demonstrated to affect and interact with a wide variety of potential outcome measures [36]–[42], heterogenization across these two factors was expected to create a range of different phenotypes within experiments, thereby increasing the external validity and thus the reproducibility of the results across the six laboratories.
Figure 1

Experimental design.

Each of six laboratories used 64 female mice of two inbred strains (C57BL6NCrl, DBA/2NCrl) ordered in two consecutive batches (n = 16 per batch and strain), with each batch being allocated to one experimental design. Upon arrival of a batch, the 16 mice per strain were randomly assigned to four cages in groups of four. To test heterogenization against standardization, we selected two experimental factors (test age, cage enrichment) and chose three factor levels A, B and C for each factor. Within each laboratory, the two factors were either standardized to factor level A (standardized design, uniform grey) or systematically varied across B and C using a 2×2 factorial design (heterogenized design, varying grey). According to the 2×2 factorial design of the heterogenized condition, study populations were divided into four blocks that were also characterized by similar microenvironmental differences due to cage position within the rack.

Experimental design.

Each of six laboratories used 64 female mice of two inbred strains (C57BL6NCrl, DBA/2NCrl) ordered in two consecutive batches (n = 16 per batch and strain), with each batch being allocated to one experimental design. Upon arrival of a batch, the 16 mice per strain were randomly assigned to four cages in groups of four. To test heterogenization against standardization, we selected two experimental factors (test age, cage enrichment) and chose three factor levels A, B and C for each factor. Within each laboratory, the two factors were either standardized to factor level A (standardized design, uniform grey) or systematically varied across B and C using a 2×2 factorial design (heterogenized design, varying grey). According to the 2×2 factorial design of the heterogenized condition, study populations were divided into four blocks that were also characterized by similar microenvironmental differences due to cage position within the rack. Besides the two experimental factors that were standardized or heterogenized depending on the experimental design, the following factors were controlled and standardized in both experimental designs and all six laboratories: order of tests, test protocols, animal supplier, and housing protocols (number of animals/cage, housing period prior to testing, position of cages within the rack, interval of cage changes). All other variables varied between laboratories depending on laboratory standards. These included: details of the housing conditions (e.g. local tap water, food type, local bedding material, cage size), physical arrangement of housing and testing rooms (e.g. local room architecture, humidity, lighting, temperature), test apparatuses, tracking software (e.g. ANYmaze or EthoVision), experimenter, handling method (e.g. with/without gloves), identification method (e.g. ear punctures, fur markings) arrival and test dates, and test time (see Tables 1 and 2).
Table 1

Laboratory-specific housing conditions and animal care routines (STAN = standardized design, HET = heterogenized design).

GiessenMuensterZürichMannheimMunichUtrecht
Arrival and test dates
Arrival (STAN) Tue, 04/11/08Tue, 09/06/09Wed, 02/09/09Wed, 06/05/09Thu, 06/08/09Wed, 03/06/09
Arrival (HET) Tue, 11/11/08Tue, 26/05/09Wed, 26/08/09Wed, 20/05/09Thu, 13/08/09Wed, 27/05/09
Tests (STAN) Mon, 24/11/08Mon, 29/06/09Mon, 28/09/09Mon, 01/06/09Mon, 31/08/09Mon, 29/06/09
Tests (HET) Mon, 01/12/08Mon, 15/06/09Mon, 21/09/09Mon, 15/06/09Mon, 07/09/09Mon, 22/06/09
Housing conditions
Food type Altromin 1324Altromin 1324Kliba Nafag 3430Ssniff R/M-HAltromin 1324CRM (E) Expanded
Bedding GRADE 6, HellmannAllspan, HövelerLignocel S3-4Rehofix MK-2000LTE E-001, ABEDDWoodchips, ABEDD
Cage size Type IIIType IIIType IIIType IIIType IIIType II elongated
Physical arrangement of the housing room (HR)
Humidity 35±5%60±5%50±5%50±5%60±5%67±10%
Temperature 21±1°C20±1°C21±1°C20±0.2°C21±1°C21±1°C
Lighting 8–20 white light,8–20 white light,20–8 white light,19–7 white light,8–20 white light,19–7 white light,
20–8 lights off20–8 lights off8–20 lights off (rev.)7–19 lights off20–8 lights off7–19 red light (rev.)
Animal care
Who? experimenterexperimenterexperimenters (2)experimenters (2)experimenters (2)animal keeper
Cage cleaning 1/week, Friday1/week, Tuesday1/week, Wednesday1/week, Wednesday1/week, Friday1/week, Monday
Handling gloves, tailgloves, tailwithout gloves, tailgloves, tailgloves, tailwithout gloves, tail
Disturbance noneother mice in HRother mice in HRother mice in HRother mice in HRradio background
(for first 10 days only)
Identification fur markingsfur markingstails markings,tail markings,ear puncturesear punctures, tail
black markerblack markermarkings, black marker
Table 2

Laboratory-specific testing procedures and apparatuses.

GiessenMuensterZürichMannheimMunichUtrecht
Behavioral testing
Test room separate roomseparate roomseparate roomseparate roomsame as housing roomseparate room
Distance about 15mabout 30mabout 10mabout 1msame roomabout 10m
Lighting all tests:white light,all tests:red light,white lightred light
white light, 60 lxOF: 120lx, EZM: 220lxwhite light, 20lxEZM: white light, 25lxall tests: 60lx
Test time start: 9 a.m.,start: 10 a.m.,start 9–10 a.m.,start: 10 a.m.,start: 9 a.m.,start: 9 a.m.,
inactive phaseinactive phaseactive phaseactive phaseinactive phaseactive phase
Software EthoVision 3.1ANYmazeEthoVision 3.0EthoVision XTANYmazeEthoVision XT
Cleaning 30% Isopropanol30% EtOHwater70% EtOH80% EtOHwater, cleaner
Experimenter PhD studentPhD studentpostdoc,postdoc (EZM, OFT/NOT),postdoc,biotechnician
biotechnicianbiotechnician (BT, VPT),student assistant
2 trainees (assistance)
Apparatuses
BT Macrolon Type III,Macrolon Type III,Macrolon Type IIIMacrolon Type III,Macrolon Type III,Macrolon Type III,
barrier: 3cm high,barrier: 3cm high,barrier: 3cm high,barrier: 1cm high,barrier: 3cm highbarrier: 3cm high,
0.6cm wide,0.5cm wide,0.5cm wide0.6cm wide,0.5cm wide0.6cm wide,
dark grey plastictransparent plasticdark grey plastictransparent plasticdark grey plasticdark grey plastic
VPT wooden pole, Ø2cm,wooden pole, Ø2cm,wooden pole, Ø2cm,wooden pole, Ø2cm,wooden pole, Ø2cm,wooden pole, Ø2cm,
length: 45cmlength: 45cmlength: 45cmlength: 45cmlength: 45cmlength: 45cm
EZM light grey plastic,grey plastic, coveredlight grey plasticgrey plastic, coveredlight grey plastic,light grey plastic,
elevated 40cm,with a white plastic runway,elevated 40cm,with black cardboard paperelevated 40cm,elevated 40cm,
Ø46cm, 5.5cm widthelevated 40cm,Ø46cm, 5.5cm widthelevated 50cm,Ø46cm, 5.5cm widthØ46cm, 5.5cm width
Ø46cm, 5.5cm widthØ46cm, 6cm width
OFT+NOT 4 adjacent dark grey4 adjacent grey arenas4 adjacent white4 adjacent white4 adjacent dark greyrat phenotyper,
plastic arenaswith white ground platesplastic arenasplastic arenasplastic arenastransparent plastic
(50cm×50cm)(40cm×40cm)(50cm×50cm)(50cm×50cm)(50cm×50cm)(45cm×45cm)

Laboratories

The study was conducted in the following six laboratories: (1) Animal Welfare and Ethology, University of Giessen (H. Würbel), (2) Behavioural Biology, University of Muenster (N. Sachser), (3) Psychoneuroendocrinology, Max Planck Institute of Psychiatry, Munich (C. Touma), (4) Animal Models in Psychiatry, Central Institute of Mental Health, Mannheim (P. Gass), (5) Delta Phenomics B.V. in Utrecht (B. Spruijt) and (6) Institute of Anatomy, University of Zürich (D. Wolfer). Each lab provided space in a conventional colony room for animal housing and a test room for behavioral testing. Animal care was provided by each lab's animal care staff together with the designated experimenter of each laboratory who also implemented cage enrichments and conducted behavioral testing throughout the two test weeks. Experimenters were a PhD student in Giessen, a PhD student in Muenster, a postdoctoral research fellow and a student assistant in the Munich lab, a technician and a postdoctoral research fellow in Mannheim, a biotechnician in the Utrecht lab, and a postdoctoral research fellow and a biotechnician in Zürich. All experimenters were adept in working with mice and conducting behavioral tests. Experimenters were not blinded to strain, age, housing conditions and experimental design. However, because our outcome measure was reproducibility across laboratories and each laboratory had its own experimenter, the experimenters' knowledge about the animals and their expectations about the outcome of the study could not bias our outcome measure.

Between-laboratory standardized conditions and procedures

Animals and housing conditions

The 384 female mice (C57BL/6NCrl, DBA/2NCrl, n = 192 each) were obtained from Charles River Laboratories (Sulzfeld, Germany) aged nine weeks for the standardized condition, and aged five and thirteen weeks for the heterogenized condition (Fig. 2). Each lab independently ordered 32 females per strain that were supplied consecutively in two batches (n = 16/strain), one for the standardized design and one for the heterogenized design. The order of supply was balanced across laboratories with three laboratories (Giessen, Mannheim, Munich) starting with the standardized design and three laboratories (Muenster, Zürich, Utrecht) starting with the heterogenized design. Upon arrival, the mice were randomly assigned to same-strain groups of four and housed in conventional polycarbonate cages with sawdust, standard mouse diet and tap water ad libitum. Depending on the experimental design, cages contained additional equipment: Cages of the standardized design (A) additionally contained two soft tissue papers (Tork, SCA Hygiene Products GmbH, Wiesbaden, Germany), while half of the cages of the heterogenized design (B) contained a mouse house (MouseHouse, Tecniplast, Italy) and one tissue paper and the other half (C) a climbing structure (18 cm long, 10 cm high) [43], a wooden ladder (3 rungs, each 5 cm long, 14 cm high; Trixie Heimtierbedarf, Tarp, Germany) and one tissue paper. Cages were cleaned once per week, except for the test week to minimize disruption due to cage cleaning before testing. Mice were housed under these conditions for three weeks before the onset of the test phase (Fig. 2). Temperature and relative humidity were stable within laboratories, but differed between them (see Table 1). Similarly, all mice were held under a constant 12 h light-dark cycle, but the time schedules differed between laboratories (see Table 1).
Figure 2

Experimental procedure followed by each laboratory.

The 64 mice per laboratory, aged nine weeks for the standardized design, and five and thirteen weeks for the heterogenized design, were supplied in two independent batches (n = 16/strain). Upon arrival, the mice were group-housed in conventional polycarbonate cages for three weeks. Cages of the standardized design (red) contained two pieces of tissue paper (nesting material), while half of the cages of the heterogenized design (blue) contained a mouse house and the other half a climbing structure and a wooden ladder. Subsequent to the three-week housing phase, mice were subjected to a battery of five behavioral tests. The whole experimental procedure lasted five weeks, including a three-week housing phase, a one-week test phase and one week shift between the behavioral tests of the standardized and the heterogenized design. The order was balanced across the six laboratories with three laboratories starting with the standardized and three laboratories with the heterogenized design.

Experimental procedure followed by each laboratory.

The 64 mice per laboratory, aged nine weeks for the standardized design, and five and thirteen weeks for the heterogenized design, were supplied in two independent batches (n = 16/strain). Upon arrival, the mice were group-housed in conventional polycarbonate cages for three weeks. Cages of the standardized design (red) contained two pieces of tissue paper (nesting material), while half of the cages of the heterogenized design (blue) contained a mouse house and the other half a climbing structure and a wooden ladder. Subsequent to the three-week housing phase, mice were subjected to a battery of five behavioral tests. The whole experimental procedure lasted five weeks, including a three-week housing phase, a one-week test phase and one week shift between the behavioral tests of the standardized and the heterogenized design. The order was balanced across the six laboratories with three laboratories starting with the standardized and three laboratories with the heterogenized design. Depending on the position in the rack, cages may differ in local environmental conditions (e.g. temperature, humidity, lighting, and disturbance) due to variation in proximity to ventilation, lights and human traffic. To avoid position bias, we controlled for cage position in the experimental design [44]. Thus, the eight cages of one design were stacked in two horizontal lines of four cages in one rack, with cages of DBA/2NCrl and C57BL/6NCrl mice balanced for horizontal and vertical position in the rack, and each vertical pair of cages of C57BL/6NCrl and DBA/2NCrl mice was treated as a block, assuming greater microenvironmental similarity within blocks than between blocks. All procedures complied with the regulations covering animal experimentation within the EU (European Communities Council Directive 86/609/EEC) and in the countries in which the experiments were conducted (Germany: Deutsches Tierschutzgesetz; The Netherlands: Dutch Animal Welfare Act; Switzerland: Schweizerisches Tierschutzgesetz). They were conducted in accordance with the institutions' animal care and use guidelines and, where necessary, approved by the national and local authorities. The German labs (Giessen, Munich, Münster and Mannheim) did not need formal approval of the study by governmental authorities, because the study did not involve any harmful procedures. In the Utrecht lab, the study was approved by the Dutch Ethical Commission (Lely-DEC) under license number DPh-09-04, and the lab's permission to conduct animal experiments was granted by their general license number 24900 provided by the Dutch Government. In the Zürich lab, the study was approved by the Swiss Federal Veterinary Office under license number 204/2008. Moreover, all efforts were made to minimize the number of animals used and the severity of procedures applied in this study.

Behavioral testing

Mice were subjected to five behavioral tests that are commonly performed in drug-screening or behavioral phenotyping studies. They were conducted in the same order in all six laboratories: day 1: barrier test (BT), vertical pole test (VPT), day 2: elevated zero maze (EZM), day 3: open field test (OFT) and day 4: novel object test (NOT). To monitor health status, mice were weighed prior to and after testing (Fig. 2). Testing order of cages was balanced across strain and rack position, and the mice of one cage were tested either simultaneously (OFT, NOT) or successively (BT, VPT, EZM). Apparatuses were cleaned with water or alcohol solution between trials.

Barrier test

To test the exploratory drive, mice were individually placed into an unfamiliar, empty type III Macrolon cage, divided in two halves by a plastic hurdle (see Table 2). At the beginning of each trial, a mouse was placed into one of the compartments according to a pseudo-random schedule. The test was finished when the mouse either crossed the barrier (all four paws on the other side of the barrier) or a maximum time of 300 s elapsed without the mouse climbing over the barrier. The latency to cross the barrier was used as measure of exploratory behavior.

Vertical pole test

The vertical pole test is a measure of motor coordination and balance that requires minimal equipment [45]. A wooden pole, approximately 2 cm in diameter and 40 cm long, was wrapped with cloth tape for improved traction. The mouse was placed on the centre of the pole that was held in a horizontal position by hand. The pole was then gradually lifted to a vertical position. The test was finished when the mouse either fell off the pole or held fast to it for 180 s. The latency to fall off the pole was used as dependent variable.

Elevated zero maze

On an EZM, the exploratory drive of mice is competing with their natural avoidance of heights and open spaces [45]. The EZM is a modification of the elevated plus maze that was first introduced and pharmacologically validated in rats [46], [47]. The advantage of the EZM is that it lacks the ambiguous central square of the traditional plus maze. The apparatus consisted of a circular platform, elevated 40–50 cm above the floor, divided into two open and two closed sectors enclosed by walls of about 20 cm height. At the beginning of each trial, the mouse was placed in one of the two closed sectors and behavior was recorded for 300 s. By using specialized tracking software, the total path moved on the maze as well as the path moved within, the time spent in, and the number of entries into the open and closed sectors, were automatically recorded. Moreover, head dips, stretched postures and rearing behavior were manually calculated according to the following definitions: Head dip (HD): The animal dips its head over the side of the maze while its body remains on the maze. Protected HD: Head dips are considered protected when the animal dips its head over the side of the maze while its body remains in a closed segment. Stretched posture: Elongation of the body while maintaining the hind paws fixed, followed by retraction. Rearing: Standing upright on the hind limbs with or without touching a wall surface.

Open field test

The open field test is the most widely used behavioral test since it was developed by Hall [48], [49]. It has been validated pharmacologically as a test of anxiety [50], but is also used to measure exploratory and locomotor drive in laboratory rodents [45]. The apparatus consisted of an open box, 40 cm×40 cm minimum size, virtually divided into various zones (corners, 5 cm wall zone, centre). Mice were placed into the centre of the empty open field arena and videotracked for 10 min. The time spent in, the distance travelled within, and the number of entries into each zone, were calculated. In addition, the total distance moved during the 10 min session was analyzed and the number of fecal boli dropped was counted at the end of each trial.

Novel object test

In combination with an open field test, the novel object test serves to discriminate between approach and avoidance tendencies towards novel stimuli [51]. Twenty-four hours after the open field test, the animals were re-exposed for 10 min to the same arena with a novel object (black pine cone, autoclaved, about 7 cm high and 6 cm in diameter mounted on a metal plate; Miroflor, Greiz, Germany) placed upright in the centre of the arena. In addition to the zones defined for the open field test, two zones surrounding the object (15 cm, 25 cm diameter) were defined as exploration zones. The zone defined by the object itself was excluded from the exploration zones to avoid confounding “sitting on the object” with “object exploration”. Again, time spent in, distance travelled within, and the number of entries into the various zones were calculated. Object exploration time and frequency were assessed using the time spent in, and the frequency of entering the exploration zones. Moreover, the total distance moved during the 10 min session was analyzed and the number of fecal boli dropped was counted at the end of each trial.

Statistical analysis

The aim of the present study was to compare a standardized design with a heterogenized design to examine whether they differ with respect to the reproducibility of strain differences across laboratories, and sample size was determined by the minimum number of animals need for this purpose. For each factor combination of the heterogenized design we used only one cage per strain (the absolute minimum), with 4 mice in each cage (in total n = 16 mice per strain, experimental design, and laboratory). We considered 4 mice per cage the absolute minimum to allow us to compare within-cage variance with between-cage variance as an important control measure to assess whether heterogenization had worked. Moreover, we considered 6 labs sufficient to obtain a reasonable estimate of the effect of heterogenization on reproducibility of behavioral strain differences. Except for Utrecht, where one animal had died during the test phase, data recordings were complete. However, for 21 out of the 384 animals we had to exclude single values from the final analysis. Reasons for exclusion were (i) mice jumping out of the apparatus (especially in the BT), (ii) mice performing stereotypic circling in the open-field arena, and (iii) problems with video tracking. These missing values were replaced by series means. All data were analyzed using General Linear Models (GLM). To meet the assumptions of parametric analysis, residuals were graphically examined for homoscedasticity and outliers, and, when necessary, the raw data were transformed using square-root, logarithmic or angular transformations (for a list of transformations see Table 3). For the analysis we selected 29 behavioral measures from the five behavioral tests, including common measures of activity, anxiety and exploratory drive, 20 of which were automatically recorded using specialized software (Table 3).
Table 3

Complete list of behavioral measures used for the analysis and the transformations applied to meet the assumptions of parametric analysis (NT = no transformation, log = log10(y+1)-transformed, sqrt = square-root-transformed, angular = arcsin(square-root(y))-transformed).

Behavioral testBehavioral measureTransformation
BTlatency to climb over the barrier [s]log
VPTlatency to fall off the pole [s]log
EZMtotal path moved [cm]sqrt
path moved in closed sectors [cm]NT
time spent in closed segments [s]angular
number of open segment entriessqrt
total head dipssqrt
protected head dipsNT
bolus countsqrt
total stretched posturessqrt
rearing frequencysqrt
OFTbolus countNT
total path moved [cm]sqrt
path moved within centre [cm]sqrt
path moved within corners [cm]NT
corner time [s]angular
centre time [s]angular
entries centreNT
entries cornersNT
NOTbolus countNT
total path moved [cm]sqrt
path moved within wall zone [cm]sqrt
path moved within exploration zone 1 [cm]sqrt
path moved within exploration zone 2 [cm]sqrt
exploration frequency 1 (zone Ø 15 cm)sqrt
exploration frequency 2 (zone Ø 25 cm)sqrt
exploration time 1 (zone Ø 15 cm) [s]sqrt
exploration time 2 (zone Ø 25 cm) [s]sqrt
wall time [s]angular
In a first step, we determined mean strain differences ( = mean C57BL/6NCrl mice - mean DBA/2NCrl mice) for all 29 behavioral measures to compare variation among the six laboratories for the standardized and the heterogenized design. Next, we analyzed the results of each laboratory separately (laboratory-specific analysis) as if each experiment had been conducted independently and assessed the main effect of ‘strain’ on each of the 29 behavioral measures using a GLM split by experimental design. Based on the 2×2 factorial design of the heterogenized condition, and to account for microenvironmental differences due to cage position in the rack, each experiment was divided into the four blocks of cage pairs, and ‘block’ included as a blocking factor in the GLM: y = strain + block. Including ‘block’ as a blocking factor in the GLM allowed us to control for between-block variation, thereby reducing variance in the data and increasing test sensitivity [52], [53]. To explore the difference between the two experimental designs in the variation among laboratories (lab) further, we analyzed the two experimental designs separately using the GLM: y = strain +lab + strain× lab. We then compared the resulting F-ratios of the ‘strain-by-lab’ interaction term between the two experimental designs using a second GLM blocked by behavioral measure: y = experimental design+behavioral measure. The rationale for using F-ratios for this comparison was twofold, namely (i) that F-ratios are scale invariant, so the different scales of the different test measures became unimportant, and (ii) that F-ratios in a GLM have a discrete null hypothesis (F = 1) which we could test against. The latter is because F-ratios reflect ‘variance components’, so we can think of the true variance of any factor as being = variance due to that factor+residual variance. Thus, in a GLM, if the variance due to the factor under test is 0, then the F-ratio will ideally be 1 (because F-ratio = (variance due to the factor + residual variance)/residual variance). Therefore, if the average F-ratio of the ‘strain-by-laboratory’ interaction term were equal to 1, this would mean that strain differences did not vary between laboratories, which would essentially be the same as perfect reproducibility. To test this statistically, we used a post-hoc t-test of the null hypothesis that F equals 1. In the GLM used to determine variation among the six laboratories (y = strain + lab + strain× lab), the residual variance accounts for all the within-laboratory variance (except variance due to ‘strain’). However, the residual variance of this model reflects various aspects of within-laboratory variance, including variation due to the heterogenization factors, cage position in the rack, and individual differences. To determine how exactly heterogenization influenced between-experiment variation, we therefore calculated an additional GLM that included ‘block’ and the interaction between ‘strain’ and ‘block’ as factors: y = strain +lab + block(lab) + strain× lab + strain× block(lab). Including ‘block’ and the ‘strain-by-block’ interaction (with ‘block’ nested within ‘lab’) in the GLM, allowed us to calculate an additional F-ratio by dividing the mean squares (MS) of the ‘strain-by-lab’ interaction by the MS of the ‘strain-by-block’ interaction (F = MS(strain×lab)/MS(strain×block(lab))). This F-ratio reflects the partitioning of the strain-by-block variance among all 24 blocks in the six laboratories into variance among blocks of different laboratories (i.e. between-laboratory variation), and variance among blocks within the same laboratory (i.e. within-laboratory variation). It therefore represents an ideal measure to determine how heterogenization affected within-experiment variation relative to between-experiment variation [34]. Our prediction was that this ratio will be smaller for the heterogenized design, and ideally = 1. If it were equal to 1 or lower, this would mean that heterogenization generated as much or even more variance between the four blocks within a laboratory as exists between laboratories. All statistical tests were conducted using the software package SPSS/PASW (version 17.0 for Windows).

Results

Effects of strain and laboratory

Regardless of the experimental design, significant and, in some cases, large main effects of ‘laboratory’ and ‘strain’ were found for nearly all variables (Table 4). As expected, the absolute values measured were quite variable among laboratories (see Fig. 3 and Fig. 4). Such additive effects of the laboratory, however, occurred in all five behavioral tests, including measures of activity (e.g. total path moved in the open field), exploration (e.g. novel object exploration time and frequency), and anxiety (e.g. time spent in and frequency of entering the centre in the open field; Table 4). For example, mice tested in Muenster were, on average, less active than those tested in other labs (measured by ‘total path moved in the open field test’ or ‘total path moved in the novel object test’). In particular, the number of stretched postures on the elevated zero maze varied most remarkably among the six laboratories (see Fig. 3).
Table 4

F-ratios of all behavioral measures for the main effects of ‘strain’ and ‘laboratory’ based on the GLM (split by experimental design): y = strain+laboratory+strain×laboratory; p≤0.001***, p≤0.01**, p≤0.05*, p>0.05 NS.

StrainLaboratoryStrain×Laboratory
StandardizedHeterogenizedStandardizedHeterogenizedStandardizedHeterogenized
latency to climb over barrier [s], BT 22,575***14,080***3,988**3,022*2,526*2,803*
latency to fall off [s], VPT 113,205***38,735***2,704**1,918NS1,630NS,509NS
total path moved [cm], EZM 6,811**10,683***30,542***15,480***11,197***14,467***
path moved in closed sectors [cm], EZM 0,584NS0,931NS33,348***17,354***6,364***13,981***
time in closed segments [s], EZM 84,642***35,154***12,580***7,451***5,236***2,027NS
open segment entries, EZM 56,711***31,446***11,333***0,861NS10,182***2,867*
total head dips, EZM 106,193***61,963***2,982*6,613***3,353**3,137**
protected head dips, EZM 42,265***38,389***1,801NS8,978***2,166NS3,101*
bolus count, EZM 105,702***76,452***1,600NS9,820***2,839*3,874**
total stretched postures, EZM 0,131NS0,050NS122,767***151,486***5,744***3,376**
number of “rearing”, EZM 8,715**4,870*18,002***3,692**1,469NS4,434***
bolus count, OFT 25,920***36,667***0,920NS6,600***2,033NS4,136**
path centre [cm], OFT 48,366***10,774***37,967***46,303***6,160***7,372***
path corner zone [cm], OFT 108,431***74,080***119,952***134,583***3,028*1,977NS
total path moved [cm], OFT 2,255NS1,177NS39,026***41,679***5,441***4,942***
corner time [s], OFT 88,109***13,775***34,907***43,196***3,863**7,921***
centre time [s], OFT 92,615***49,561***36,893***43,855***3,450**3,161**
entries centre, OFT 10,492***8,338**36,579***47,698***10,515***10,290***
entries corner zone, OFT 24,171***19,427***30,691***26,147***11,699***4,669***
bolus count, NOT 48,835***41,776***6,836***7,038***1,016NS2,666*
total path moved [cm], NOT 15,087***14,049***44,393***20,109***4,068**3,243**
path wall zone [cm], NOT 17,781***30,000***42,941***29,831***3,026*1,544NS
path exploration zone 1 [cm], NOT 78,443***52,096***9,794***13,029***8,301***3,960**
path exploration zone 2 [cm], NOT 34,177***8,186**11,888***8,660***17,232***4,899***
exploration frequency 1, NOT 100,276***49,733***2,755*2,868*14,347***6,363***
exploration frequency 2, NOT 50,252***19,235***6,223***2,956*14,259***5,762***
exploration time 1 [s], NOT 88,788***75,483***10,916***11,121***3,415**1,630NS
exploration time 2 [s], NOT 29,488***3,355NS5,747***8,094***16,765***6,832***
wall time [s], NOT 0,678NS0,064NS12,360***22,463***2,322*2,477*
Figure 3

Number of stretched postures on the elevated zero maze shown by C57BL/6NCrl and DBA/2NCrl mice.

Data are presented as means (+ s.e.m., square-root-transformed, n = 16/strain and laboratory). The example illustrates large effects of the laboratory in the standardized (A) and heterogenized (B) design. Moreover, the direction of strain difference differed between Giessen and Munich in the standardized design.

Figure 4

Object exploration time in the novel object test shown by C57BL/6NCrl and DBA/2NCrl mice.

Data are presented as means (+ s.e.m., square-root-transformed, n = 16/strain and laboratory). The example illustrates large effects of strain and laboratory in the standardized (A) and heterogenized (B) design. Moreover, the direction of strain difference differed between Giessen and Zürich in the heterogenized design.

Number of stretched postures on the elevated zero maze shown by C57BL/6NCrl and DBA/2NCrl mice.

Data are presented as means (+ s.e.m., square-root-transformed, n = 16/strain and laboratory). The example illustrates large effects of the laboratory in the standardized (A) and heterogenized (B) design. Moreover, the direction of strain difference differed between Giessen and Munich in the standardized design.

Object exploration time in the novel object test shown by C57BL/6NCrl and DBA/2NCrl mice.

Data are presented as means (+ s.e.m., square-root-transformed, n = 16/strain and laboratory). The example illustrates large effects of strain and laboratory in the standardized (A) and heterogenized (B) design. Moreover, the direction of strain difference differed between Giessen and Zürich in the heterogenized design. Furthermore, comprehensive analysis of all data revealed strong differences between C57BL/6NCrl and DBA/2NCrl mice in all five behavioral tests (Table 4). Depending on the specific laboratory, however, the direction of strain differences varied for some behavioral measures. For example, on the elevated zero maze DBA/2NCrl mice showed more stretched postures than C57BL/6NCrl mice in Giessen (standardized design: F1,27 = 16.050, p<0.001; heterogenized design: F1,27 = 16.077, p<0.001), but fewer in Munich (standardized design: F1,27 = 12.949, p<0.001), while they did not differ in Muenster, Zürich, and Utrecht (Fig. 3). In the novel object test, DBA/2NCrl mice explored the novel object much longer than C57BL/6NCrl mice in Giessen (standardized design: F1,27 = 101.067, p<0.001; heterogenized design: F1,27 = 24.892, p<0.001), while they explored it shorter in Zürich (heterogenized design: F1,27 = 5.760, p<0.05) (Fig. 4). For most behavioral measures, however, the laboratory environment was critical in determining the size rather than the direction of strain effects.

Standardization versus heterogenization

Between-experiment variation

To explore the effect of the experimental design on the reproducibility of behavioral strain differences, the effect of ‘strain’ on each of the 29 behavioral measures was assessed separately for each laboratory. Although the average effect of ‘strain’ varied considerably among the six laboratories in the heterogenized design, the standardized design produced even more variable outcomes (Fig. 5). Moreover, the average F-ratios of the ‘strain’ effect were considerably larger in the standardized design (Fig. 5).
Figure 5

Variation of strain main effects across the six laboratories in both designs.

For each laboratory and experimental design, the main effect of ‘strain’ was separately calculated and displayed in terms of the mean F-ratio (+ s.e.m., square-root-transformed) across all 29 behavioral measures. Although the strain effect varied considerably among laboratories in the heterogenized design, the standardized design produced even more variable outcomes. Moreover, average F-ratios for ‘strain’ were considerably higher in the standardized design, indicating that treatment effects may be systematically overestimated by standardization.

Variation of strain main effects across the six laboratories in both designs.

For each laboratory and experimental design, the main effect of ‘strain’ was separately calculated and displayed in terms of the mean F-ratio (+ s.e.m., square-root-transformed) across all 29 behavioral measures. Although the strain effect varied considerably among laboratories in the heterogenized design, the standardized design produced even more variable outcomes. Moreover, average F-ratios for ‘strain’ were considerably higher in the standardized design, indicating that treatment effects may be systematically overestimated by standardization. To confirm these findings statistically, we used the GLM y = strain + lab + strain × lab (see Statistical Analysis) to determine the F-ratios of the ‘strain-by-lab’ interaction term for each of the 29 behavioral measures that were then compared between the two experimental designs. Indeed, these F-ratios were significantly smaller in the heterogenized design (F1,28 = 4.222, p = 0.049), indicating improved reproducibility of strain differences among laboratories in the heterogenized design (Fig. 6). However, including ‘block’ in the GLM (y = strain + lab + block(lab) + strain× lab + strain × block(lab)) weakened this effect to a non-significant trend (F1,28 = 3,405, p = 0.076), indicating that part of the effect was due to cage position, independent of the heterogenization factors. Moreover, in both designs the average F-ratio was significantly different from 1 (t-test of the null hypothesis that F = 1: standardized design: T28 = 7.660, p<0.001; heterogenized design: T28 = 8.214, p<0.001), demonstrating that strain effects varied substantially among laboratories in both designs (Fig. 6). Further graphical examination of the mean strain differences across the six laboratories confirmed this, although strain differences were somewhat more consistent in the heterogenized design (Fig. 7).
Figure 6

Variation between laboratories in the standardized and in the heterogenized design.

The variation in strain differences is displayed as mean F-ratios (+ s.e.m.) of the ‘strain-by-laboratory’ interaction term calculated for 29 behavioral measures. F-ratios were determined separately for the two experimental designs, square-root-transformed to meet the assumptions of parametric analysis, and then compared using a GLM blocked by ‘behavioral measure’. F-ratios of the ‘strain-by-laboratory’ interaction terms were significantly lower in the heterogenized design (F1,28 = 4.222, p = 0.049), indicating lower between-experiment variation.

Figure 7

Variation of mean strain differences in the standardized and heterogenized design across the six laboratories.

Four examples of selected behavioral measures from four of the five behavioral tests are displayed: (A) Latency to fall off the pole in the vertical pole test, (B) number of open segment entries on the elevated zero maze, (C) number of corner entries in the open field test and (D) path travelled within the exploration zone in the novel object test. Strain differences varied considerably between laboratories in both designs, but were somewhat more consistent in the heterogenized design. Each laboratory tested 16 mice per strain for each experimental design.

Variation between laboratories in the standardized and in the heterogenized design.

The variation in strain differences is displayed as mean F-ratios (+ s.e.m.) of the ‘strain-by-laboratory’ interaction term calculated for 29 behavioral measures. F-ratios were determined separately for the two experimental designs, square-root-transformed to meet the assumptions of parametric analysis, and then compared using a GLM blocked by ‘behavioral measure’. F-ratios of the ‘strain-by-laboratory’ interaction terms were significantly lower in the heterogenized design (F1,28 = 4.222, p = 0.049), indicating lower between-experiment variation.

Variation of mean strain differences in the standardized and heterogenized design across the six laboratories.

Four examples of selected behavioral measures from four of the five behavioral tests are displayed: (A) Latency to fall off the pole in the vertical pole test, (B) number of open segment entries on the elevated zero maze, (C) number of corner entries in the open field test and (D) path travelled within the exploration zone in the novel object test. Strain differences varied considerably between laboratories in both designs, but were somewhat more consistent in the heterogenized design. Each laboratory tested 16 mice per strain for each experimental design.

Within-experiment variation

To assess whether improved reproducibility in the heterogenized design was caused by heterogenization shifting variation from between-experiment variation to within-experiment variation, within-experiment variances were averaged across the six laboratories and compared between the two designs for each of the 29 behavioral measures. The average within-experiment variance was larger in 23 out of 29 measures in DBA/2NCrl mice and in 18 out of 29 measures in C57BL/6NCrl mice, suggesting that heterogenization systematically shifted variance from between-experiment to within-experiment variation. To confirm this statistically, we used the GLM y = strain + lab + block(lab) + strain× lab + strain × block(lab) and calculated the F-ratio of the ‘strain-by-lab’ interaction term divided by the ‘strain-by-block’ interaction term (see Statistical Analysis). These F-ratios were significantly smaller in the heterogenized design (F1,28 = 4.678, p = 0.039, Fig. 8), demonstrating that heterogenization did indeed increase within-experiment variation (variance among blocks of the same laboratory) relative to between-experiment variation (variance among blocks of different laboratories).
Figure 8

Between-experiment variation versus within-experiment variation.

To assess the relative weight of between-laboratory variation versus within-laboratory variation, an F-ratio was calculated that reflects the partitioning of the ‘strain-by-block’ variance between all 24 blocks of one experimental design into variance due to variation between laboratories and variance due to variation within laboratories. For this, the mean squares of the ‘strain-by-laboratory’ interaction term were divided by the mean squares of the ‘strain-by-block’ interaction term. Data are displayed as mean F-ratios (+ s.e.m.; square-root-transformed) across all 29 behavioral measures for both conditions. F-ratios were significantly smaller in the heterogenized design (F1,28 = 4.678, p = 0.039), demonstrating that heterogenization increased within-experiment variation relative to between-experiment variation.

Between-experiment variation versus within-experiment variation.

To assess the relative weight of between-laboratory variation versus within-laboratory variation, an F-ratio was calculated that reflects the partitioning of the ‘strain-by-block’ variance between all 24 blocks of one experimental design into variance due to variation between laboratories and variance due to variation within laboratories. For this, the mean squares of the ‘strain-by-laboratory’ interaction term were divided by the mean squares of the ‘strain-by-block’ interaction term. Data are displayed as mean F-ratios (+ s.e.m.; square-root-transformed) across all 29 behavioral measures for both conditions. F-ratios were significantly smaller in the heterogenized design (F1,28 = 4.678, p = 0.039), demonstrating that heterogenization increased within-experiment variation relative to between-experiment variation.

Discussion

Strain effects

C57BL/6 and DBA/2 mice, two of the most widely used inbred strains of laboratory mice, are known to differ markedly in many behavioral tasks [54]–[59]. Therefore, it was not surprising to find significant and often large strain differences in almost all behavioral measures assessed in the present study. In line with previous studies, C57BL/6NCrl mice generally showed less anxiety-related behavior than DBA/2NCrl mice [55], [56]. General levels of locomotor activity as measured by the total path moved in the tests, however, did not differ much between the two strains, although C57BL/6 mice are considered to be more active than DBA/2 mice [55], [56].

The impact of the laboratory environment

We also found considerable differences in the absolute values measured in different laboratories, confirming previous findings [13], [22], [35]. In particular, the frequency of stretched postures on the elevated zero maze differed markedly among laboratories. Such large differences in the absolute values may be typical for manually recorded measures and reflect experimenter-dependent variability, highlighting the importance of inter-observer reliability training [60], [61] and the value of automated data recording [62], [63]. However, such additive differences among laboratories do not normally threaten the validity of strain differences. For example, Munich and Muenster used ANYmaze (Stoelting Co.) for video-tracking of open field and elevated zero maze performance, while the other four laboratories used two different versions of EthoVision (Noldus Information Tecnology). Differences in software functioning may indeed explain some variation in the absolute values measured, but should not affect the size and direction of strain differences. Despite the marked phenotypic differences between these two strains, however, we also found variation in the direction of strain differences among laboratories in some measures, indicating that the same test conducted in different laboratories may lead to fundamentally different conclusions. Such dramatic strain-by-laboratory interactions may arise when different strains respond differently to the specific environmental or testing conditions of the different laboratories. When using strains that are phenotypically less distinct as is often the case when transgenic strains are compared with wild-type strains [64], this may actually be the norm rather than an exception given that many phenotypic states are highly dependent on environmental conditions [11], [19], [30], [57], [65]. In the present study, some aspects of the housing and testing conditions were equated between laboratories (e.g. supplier, testing order, position of cages within the rack), while others remained laboratory-specific (e.g. local room architecture, tracking software, experimenter, time of testing, handling and identification method). However, because both additive and non-additive laboratory effects may arise from any or all of these laboratory-specific aspects, any further explanation of these effects in terms of single factors is impossible.

Reproducibility of the results

Poor reproducibility is typically caused by interactions of genotype with the specific laboratory conditions. To avoid this, scientists are generally advised to strengthen efforts of standardization both within and between laboratories [23], [27], [66], [67]. However, attempts to avoid poor reproducibility by more rigorous standardization are misleading. If fully effective, standardization within laboratories would decrease variation within study populations to zero [28], and therefore, each experiment would turn into a single-case study with zero information gain, producing statistically significant, but irrelevant results that lack generality under even slightly different conditions [28], [29]. Indeed, the average F-ratios of the ‘strain’ effect were considerably larger in the standardized design, indicating that standardization may systematically overestimate strain main effects. The obvious reason for this is that interactions between strain and the laboratory-specific conditions are mistaken for strain main effects [32], [34]. Instead of rigorous standardization, we proposed systematic variation of experimental conditions to render populations of experimental animals more heterogeneous, thereby improving the external validity of results across the unavoidable variation among laboratories [32], [34]. The findings reported here are somewhat ambiguous with respect to the efficacy of heterogenization in improving reproducibility. Thus, although heterogenization did have an effect in the predicted direction, this effect was rather weak, and both heterogenization and standardization resulted in relatively poor reproducibility. The reason for this might be that either heterogenization did not work with our selection of behavioral measures or that the type of heterogenization employed here was not effective enough. Both the reproducibility of behavioral measures and the effect of heterogenization on their reproducibility may vary depending on the exact selection of measures. However, heterogenization should have the weakest effect on those measures that are least sensitive to environmental conditions. Such measures should also be highly reproducible under both standardized and heterogenized conditions. The present analysis was based on a selection of 29 behavioral measures from five tests that are widely used in behavioral phenotyping or drug screening studies. The fact that nearly all of these measures varied considerably among laboratories suggests that they were highly sensitive to environmental conditions. Therefore, our selection of measures is unlikely to account for the relatively weak improvement of reproducibility by heterogenization. Instead, our findings suggest that the study populations generated within laboratories by the form of heterogenization employed here did not adequately represent the range of variation between the six laboratories. The reason for this may be that age and cage enrichment were poor heterogenization factors, or that the specific levels of these factors were not different enough to induce sufficient variation in behavioral phenotypes.

Efficacy of heterogenization

Our choice of heterogenization factors was based on practical considerations and on studies demonstrating that both age and enrichment affect, and interact with, a variety of potential outcome measures [36]–[42], [68], [69]. It is possible that more distinct levels of these factors would have produced stronger effects. Moreover, the pretest housing period was limited to three weeks for logistic reasons. Perhaps a longer exposure of the mice to the laboratory-specific conditions would have strengthened the effects of the heterogenization factors. On the other hand, more extreme variation of age and enrichment, or a longer housing period would have rendered heterogenization less practicable. This raises the question whether other factors might be more effective in heterogenization. Many of the measures obtained from behavioral tests are highly sensitive to test conditions. Paylor [33] suggested running experiments in several batches tested on different days. While this may be an effective strategy since test conditions are likely to vary from day to day, it is not a well controlled strategy, and efficacy may vary greatly both within and between laboratories. Alternatively, specific factors of the test conditions may be used for systematic heterogenization similar to age and enrichment in the present study. For example, test time, background noise, and illumination level have all been shown to affect test responses [70]–[73]. It is possible that heterogenization through factors of the test conditions would be more effective because their effects on the animals' test responses are more immediate. Taken together, the fact that the results varied greatly between laboratories in both designs confirms the need for effective heterogenization strategies to guarantee reproducible test results. Therefore, further research is needed to identify and validate factors that exert sufficiently strong effects on behavioral phenotypes. Because poor reproducibility occurs throughout animal experimentation, this research should aim at heterogenization strategies that are either applicable to a wide range of different studies or are specifically tailored to specific types of studies.

Conclusions

Despite strong effects of the laboratory on nearly all behavioral measures in both designs, the findings of this study confirm our earlier findings [32], [34], and indicate that systematic heterogenization may also improve reproducibility in a real multi-laboratory situation. By systematically increasing within-experiment variation relative to between-experiment variation, heterogenization tended to improve reproducibility compared to standardization. However, the ratio of between-experiment to within-experiment variation was far greater than 1 in both designs, indicating that between-laboratory variation was substantially greater than within-laboratory variation. This underscores the need for more powerful heterogenization strategies to guarantee reproducibility of results across the large variation among different laboratories.
  55 in total

Review 1.  Animal definition: a necessity for the validity of animal experiments?

Authors:  K J Obrink; C Rehbinder
Journal:  Lab Anim       Date:  2000-04       Impact factor: 2.471

2.  SEE locomotor behavior test discriminates C57BL/6J and DBA/2J mouse inbred strains across laboratories and protocol conditions.

Authors:  Neri Kafkafi; Dina Lipkind; Yoav Benjamini; Cheryl L Mayo; Gregory I Elmer; Ilan Golani
Journal:  Behav Neurosci       Date:  2003-06       Impact factor: 1.912

3.  Environmental bias? Effects of housing conditions, laboratory environment and experimenter on behavioral tests.

Authors:  L Lewejohann; C Reinhard; A Schrewe; J Brandewiede; A Haemisch; N Görtz; M Schachner; N Sachser
Journal:  Genes Brain Behav       Date:  2006-02       Impact factor: 3.449

4.  Reliability, robustness, and reproducibility in mouse behavioral phenotyping: a cross-laboratory study.

Authors:  Silvia Mandillo; Valter Tucci; Sabine M Hölter; Hamid Meziane; Mumna Al Banchaabouchi; Magdalena Kallnik; Heena V Lad; Patrick M Nolan; Abdel-Mouttalib Ouagazzal; Emma L Coghill; Karin Gale; Elisabetta Golini; Sylvie Jacquot; Wojtek Krezel; Andy Parker; Fabrice Riet; Ilka Schneider; Daniela Marazziti; Johan Auwerx; Steve D M Brown; Pierre Chambon; Nadia Rosenthal; Glauco Tocchini-Valentini; Wolfgang Wurst
Journal:  Physiol Genomics       Date:  2008-05-27       Impact factor: 3.107

5.  Long-term individual housing in C57BL/6J and DBA/2 mice: assessment of behavioral consequences.

Authors:  V Võikar; A Polus; E Vasar; H Rauvala
Journal:  Genes Brain Behav       Date:  2005-06       Impact factor: 3.449

6.  Influence of age on behavioural response in the light/dark paradigm.

Authors:  M Hascoët; M C Colombel; M Bourin
Journal:  Physiol Behav       Date:  1999-06

7.  Behavioral and physiological mouse assays for anxiety: a survey in nine mouse strains.

Authors:  J Adriaan Bouwknecht; Richard Paylor
Journal:  Behav Brain Res       Date:  2002-11-15       Impact factor: 3.332

Review 8.  Behavioral phenotypes of inbred mouse strains: implications and recommendations for molecular studies.

Authors:  J N Crawley; J K Belknap; A Collins; J C Crabbe; W Frankel; N Henderson; R J Hitzemann; S C Maxson; L L Miner; A J Silva; J M Wehner; A Wynshaw-Boris; R Paylor
Journal:  Psychopharmacology (Berl)       Date:  1997-07       Impact factor: 4.530

9.  Environmental enrichment in mice decreases anxiety, attenuates stress responses and enhances natural killer cell activity.

Authors:  N Benaroya-Milshtein; N Hollander; A Apter; T Kukulansky; N Raz; A Wilf; I Yaniv; C G Pick
Journal:  Eur J Neurosci       Date:  2004-09       Impact factor: 3.386

10.  Bioassay for carcinogenicity of rotenone in female Wistar rats.

Authors:  D L Greenman; W T Allaben; G T Burger; R L Kodell
Journal:  Fundam Appl Toxicol       Date:  1993-04
View more
  54 in total

1.  A glass full of optimism: enrichment effects on cognitive bias in a rat model of depression.

Authors:  Sophie Helene Richter; Anita Schick; Carolin Hoyer; Katja Lankisch; Peter Gass; Barbara Vollmayr
Journal:  Cogn Affect Behav Neurosci       Date:  2012-09       Impact factor: 3.282

2.  Laboratory environmental factors and pain behavior: the relevance of unknown unknowns to reproducibility and translation.

Authors:  Jeffrey S Mogil
Journal:  Lab Anim (NY)       Date:  2017-03-22       Impact factor: 12.625

3.  More than 3Rs: the importance of scientific validity for harm-benefit analysis of animal research.

Authors:  Hanno Würbel
Journal:  Lab Anim (NY)       Date:  2017-03-22       Impact factor: 12.625

4.  Effects of Soy in Laboratory Rodent Diets on the Basal, Affective, and Cognitive Behavior of C57BL/6 Mice.

Authors:  Anne S Mallien; Sebastian T Soukup; Natascha Pfeiffer; Christiane Brandwein; Sabine E Kulling; Sabine Chourbaji; Peter Gass
Journal:  J Am Assoc Lab Anim Sci       Date:  2019-08-29       Impact factor: 1.232

Review 5.  Behavioural profiles are shaped by social experience: when, how and why.

Authors:  Norbert Sachser; Sylvia Kaiser; Michael B Hennessy
Journal:  Philos Trans R Soc Lond B Biol Sci       Date:  2013-04-08       Impact factor: 6.237

6.  Reply to: "Reanalysis of Richter et al. (2010) on reproducibility".

Authors:  Hanno Würbel; S Helene Richter; Joseph P Garner
Journal:  Nat Methods       Date:  2013-05       Impact factor: 28.547

7.  Does systematic variation improve the reproducibility of animal experiments?

Authors:  Rudy M Jonker; Anja Guenther; Leif Engqvist; Tim Schmoll
Journal:  Nat Methods       Date:  2013-05       Impact factor: 28.547

8.  Reproducibility of animal research in light of biological variation.

Authors:  Bernhard Voelkl; Naomi S Altman; Anders Forsman; Wolfgang Forstmeier; Jessica Gurevitch; Ivana Jaric; Natasha A Karp; Martien J Kas; Holger Schielzeth; Tom Van de Casteele; Hanno Würbel
Journal:  Nat Rev Neurosci       Date:  2020-06-02       Impact factor: 34.870

9.  Systematic heterogenization for better reproducibility in animal experimentation.

Authors:  S Helene Richter
Journal:  Lab Anim (NY)       Date:  2017-08-31       Impact factor: 12.625

10.  Reversed light-dark cycle and cage enrichment effects on ethanol-induced deficits in motor coordination assessed in inbred mouse strains with a compact battery of refined tests.

Authors:  Elizabeth Munn; Mark Bunning; Sofia Prada; Martin Bohlen; John C Crabbe; Douglas Wahlsten
Journal:  Behav Brain Res       Date:  2011-06-01       Impact factor: 3.332

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.