The statistical power of studies for the assessment of side effects of toxicants on honeybees conducted according to current guidelines is often limited. A new test design and modified field methods have therefore been developed to decrease uncertainty and variability and to be able to detect small effects. The new test design comprises a monitoring phase (before the tunnel phase) for the selection of honeybee colonies and modified methods, which include assessments of colony strength, an evaluation of the cell content of all cells of hives using photos and digital analysis, and the use of video recordings for the assessment of foraging activity and forager mortality. With the proposed new study design and the modified field methods variability between hives was considerably reduced, which resulted in a marked reduction of the minimum detectable difference (MDD). This makes it possible to address the Specific Protection Goals defined by the European Food Safety Authority and to gain unprecedented insight into the development of hives and driving factors.
The statistical power of studies for the assessment of side effects of toxicants on honeybees conducted according to current guidelines is often limited. A new test design and modified field methods have therefore been developed to decrease uncertainty and variability and to be able to detect small effects. The new test design comprises a monitoring phase (before the tunnel phase) for the selection of honeybee colonies and modified methods, which include assessments of colony strength, an evaluation of the cell content of all cells of hives using photos and digital analysis, and the use of video recordings for the assessment of foraging activity and forager mortality. With the proposed new study design and the modified field methods variability between hives was considerably reduced, which resulted in a marked reduction of the minimum detectable difference (MDD). This makes it possible to address the Specific Protection Goals defined by the European Food Safety Authority and to gain unprecedented insight into the development of hives and driving factors.
Currently, honeybee field and semi-field trials conducted to assess side effects by toxicants, such as pesticides, are based on visual assessments of colony strength and evaluations of brood success for a relatively small number of brood cells. The test guidance documents for the conduct of such studies [1, 2] propose that colony strength is estimated visually by estimation of the comb area covered by bees and, regarding brood development, the evaluation of at least 100 cells containing eggs is required. EFSA [3] proposes to consider at least 200 eggs. Based on a typical brood nest size of about 3000 to 6000 cells [4], this is still a rather small proportion. The assessment of colony strength based on visual estimation (e.g. using the Liebefelder methods [5]) seems to provide a rather useful average accuracy. However, it has not been evaluated which statistical test-power can actually be reached by visual estimations. Notably, test-power depends on the variability of a measurement rather than on the average. Other endpoints measured in honeybee trials are flight activity and mortality (e.g. from counts in dead bee traps). Also the measurement of these parameters is typically very approximate only and they often represent a small snapshot of what happens over a day. For example, flight activity is measured at three locations in 1 m squares for a few seconds [1]. Also the area covered by dead bee traps or linen sheets for counting dead bees is limited. Considering that honeybees may fly throughout the day and that in a semi-field study a tunnel should measure at least 40 m2, these measurements provide only a very small snapshot of the actual flight activity and the estimation of the proportion of foragers is not possible (since it is unknown how many bees foraged and how many bees died outside of traps or linen sheets). The standard methods used in OECD (2007) [1] and OEPP/EPPO (2010) [2] clearly have their justification and are very useful to routinely evaluate potential effects of chemical substances. But to reach a better understanding of the robustness of an evaluation and of why colonies develop the way they do, it would be helpful to obtain more data and more accurate data. For example, it would be helpful to understand which stores and brood are exactly available in a hive and how these affect the colony’s development or which proportion of foragers relative to all foragers dies after the application of a test substance. Also when a higher statistical power is desired new field methodology is needed. In fact, one key issue of recent evaluations of pesticides by authorities was the limited test-power that makes it impossible to detect small effects [6]. Recently, EFSA (2013) [3] proposed that field or semi-field studies should be able to detect 7% effect size regarding colony strength (a value of 7% has previously been proposed to represent a negligible effect size regarding forager mortality [7] and has then been adopted as a value representing a negligible effect level for colony strength; [3]).The reasons why a high test-power is hard to reach in practice include the uncertainty of current methods (e.g. visual estimation) and analysis of a limited fraction of a hive (e.g. number of cells). But also the high variability introduced by current test designs (e.g. selection of hives) is a reason for an insufficient test-power. To overcome the high variability among honey bee colonies, Delaplane et al. [8] described methods to obtain equalized honeybee colonies. This included the so-called "classical objective mode", which is a synthesis of methods presented by Harbo [9, 10, 11, 12] and Delaplane and Harbo [13]. In this method empty hives are pre-stocked with brood, empty combs, syrup feeders and a caged queen. Then worker bees are added. This mode was adapted for investigations about Varroa destructor e.g. in [14, 15, 16, 17], and about colony growth [18]. The second mode was called the "Shook swarm objective mode" by Delaplane et al. [8]. In this method workers of the same origin are put into new and empty hives without brood. Later the origin queens are added.Also in the present study the aim was to reduce variability. However, this was not done by manipulation of colonies, but by (i) a selection of colonies after a monitoring phase (before the tunnel phase) approximately lasting four weeks, (ii) the assessment of entire hives (including all cells of a hive, instead of tracking the development in a selected number of brood cells) and (iii) by applying different methods for measuring colony strength and mortality, in order to see which is the most sensitive method. The key principle is the reduction of variability by selection of hives from a larger subset of hives (see i). In the following we refer to this study design as ‘low uncertainty and variability test’ (LUV test). This methodology was tested in a semi-field study conducted under Good Laboratory Practice (GLP). Results from our semi-field study demonstrate that variability of both field and semi-field trials can be decreased significantly with the proposed new test design and methodology.
Material and methods
Test design
The study design was a modification of OECD (2007) [1]. The study was conducted under Good Laboratory Practice (GLP) in a large oilseed rape field near Heidelberg, Germany. Since the aim of the semi-field study was not to assess the toxicity of a test substance but to test new field methodology, only a control group (applying tapwater) and a reference group were used (applying 400 g/ha dimethoate). The study was divided into three phases: 1. An about four week monitoring phase before the tunnel phase (during this time colonies were kept at the bee keeping facility), during which colonies were assessed regarding colony size, brood development and food stores and mortality in dead bee traps. 2. A ten-day tunnel phase, during which colonies were placed in tunnels (100 m2) on an oilseed rape field in full bloom. Application was conducted two days after bees were in tunnels. 3. An about four week post-tunnel phase, during which monitoring continued (during this time colonies were again kept at the bee keeping facility). Drone brood was removed when capped. Throughout the whole study the following endpoints were measured (see below for details on the methods): colony strength, content of all cells of all hives (incl. brood, nectar, pollen), dead bees and larvae in traps. Foraging activity by visual assessment was only conducted in the tunnel phase and forager activity and mortality measurements by videography were only conducted in the tunnel and post-tunnel phase.
Step 1: Four week monitoring phase
The study was started with 16 colonies with sister queens (Carniolan honeybees) obtained from a commercial beekeeper. During the monitoring phase (before the tunnel phase) five colonies were excluded because two colonies did not contain related sister queens, one hive had no queen at all, one colony had slightly elevated Varroa destructor infestation and bees of one hive were very aggressive. The infestation of the hives with Varroa destructor was assessed via counting natural mite drops. All hives were compared regarding their Varroa counts. Then the hive with the highest Varroa infection was eliminated as this infection rate was above the acceptable Varroa infection level. Hence, at the end of the monitoring phase eleven colonies were considered for the selection for the tunnel phase.
Selection of colonies at the end of the monitoring phase
From these eleven colonies eight colonies were selected and randomly assigned to the control and the reference group (four colonies per group). The selection had the aim of achieving colonies of similar strength during the tunnel phase. This same rationale is used in many laboratory toxicity trials, where very ‘similar’ animals, e.g. of similar age and strain are selected in order to decrease variability (and increase test-power). This was achieved based on the development of colony size throughout the pre-tunnel monitoring phase (as measured by weight), the number of capped brood cells per hive and similar mortality (measured in dead bee traps).
Step 2: A ten-day tunnel phase
After selection of colonies, colonies were placed in tunnels. Colonies remained in tunnels for ten days. Toxicant application (reference substance) was conducted two days after the start of the tunnel phase. This step (tunnel phase) was identical to the procedure described in OECD text 75.
Step 3: Post-tunnel phase
After the tunnel phase colonies were relocated to their original location and monitored for about four additional weeks. This phase allowed to assess potential recovery of colonies.
Estimation of colony size
The limited accuracy of visual estimation of colony strength following the Liebefelder method results in some uncertainty which decreases test-power [19, 20]. Since this variability alone can be a reason for not reaching a high test-power, colony size was measured based on two alternative methods in addition to visual estimation: 1. Weighing of hives with and without bees. 2. Photography of bees on frames. Colony size was estimated with all three methods during all phases of the study.Weighing: Hives were closed in the evening after flight activity. In the next morning closed hives were weighed and afterwards entrances were opened. Subsequently, all parts of the hives were weighed without bees by gently brushing bees off. A weight of 100 mg per adult bee was considered to calculate the number of bees. The number of all bees represents all adult bees including all foragers.Adult bee photography: All frames of all hives were photographed to count the number of adult bees. Photography was conducted simultaneously for all hives to exclude any bias due to changes of weather, which might affect foraging activity. The number of bees was counted automatically in photos using the software HoneybeeComplete 6.0 (WSC Scientific GmbH). This number represents all adult bees without active foragers. The accuracy of automated counting had previously been validated (accuracy was ~80–90%, automatically counted vs. real no. of bees). Correction factors reflecting this accuracy were applied.Finally, for comparison also visual estimations of colony size were conducted following the Liebefelder method [5].
Estimation of brood success and food stores
Since the evaluation of a limited number of brood cells as proposed in OCED (2007) [1] or EFSA (2013) [3] results in an uncertainty that increases the measured variability between brood termination rates between hives [21], all frames of the hives were photographed with a 36 MP camera and the content of all cells (usually more than 3000 brood cells, more than 100 000 cells per hive) was evaluated. With these photos the development of brood and the amount of nectar and pollen stores was assessed. Cells and cell content were recognized with the software HoneybeeComplete 6.0 and the content of cells was manually verified and corrected when necessary. Also the brood development was evaluated with this software. Brood photography was conducted following the time intervals proposed in OECD (2007) [1]. In addition, brood photography was conducted in weekly intervals during monitoring and after the end of photography according to OECD (2007) [1].
Flight activity and forager mortality
Measurements of flight activity were planned in weekly intervals during and after the tunnel phase using video recordings of the entrances of all hives over the entire activity phase (from dawn to dusk; see S1 File). Actual dates varied by 1–2 days depending on weather conditions. Recordings were processed with the software VideoCounter 1.1 (WSC Scientific GmbH) to count the number of bees exiting and entering the hives. The software was previously validated regarding the accuracy (which was 108.7% for bees entering the hive and 82.7% for bees leaving the hive) of counting and correction factors were applied to obtain corrected counts. These counts reflect foraging activity and from these counts forager mortality can be estimated by subtracting the daily number of entering bees from the number of leaving bees. In addition, flight activity was also assessed in three 1 m squares per colony as proposed in OECD (2007) [1], but longer observation periods of 30 seconds were used.
Dead bee counting according to OECD 75
Dead bees were counted using dead bee traps (type underbasket) during all phases of the study. During the tunnel phase dead bees were additionally counted on 80 cm wide sheets placed in the centre, front and end of tunnels.
Weather data
Weather data, including temperature and precipitation, were obtained from the weather station nearest to the study field (3.1 km distance to the study field; non-GLP).
Statistical analysis
Limit of detection
To evaluate which effect size could be detected the minimum detectable difference (MDD) was calculated according to Brock et al. [22]. MDD is based on the assumption of normal data distribution, which may not be expected with regard to colony size. However, MDD was still used as it is a measure that is already established in the risk assessment of pesticides (e.g. [23]) and that is frequently requested by authorities. MDD calculations were conducted for the number of adults obtained from weight measurements, from photography and by visual estimations (Liebefelder method).
Hypothesis testing
Differences of colony strength between the control and the reference were assessed statistically using a t-test (as there were only two test groups; significance level α = 0.05). Before conducting this test, data were checked for normality using Shapiro-Wilks Test. Homogeneity of variances was tested using Bartlett’s test. A significance level of α = 0.05 was considered. The statistical analysis was conducted in R [24].
Analysis of factors determining brood termination
Brood termination can be described as a function of constitutive in-hive variables such as the number of pollen cells. To evaluate the relevance of these variables, generalized linear models (GLMs) were generated in R [24, 25] and compared on the basis of AICc (AIC correction for small sample sizes). Since the dependent variable BTRegg is a proportion, a beta distribution was assumed using logit link function. BTRegg of control group (four hives) measured at nine points in time was considered. Besides the single variables also their quadratic terms and two-way interactions were considered as linear components in the models (e.g. y = a1x1 + a2x2 + a3x3 + a4x4 + a5 with x3 = z12 and x4 = z2z3). Furthermore, highly correlated variables (r > 0.7) were excluded in advance to avoid multiple incorporation of the same effect and to enable a proper regression analysis.
Results
Pre-monitoring
The pre-monitoring of the study started with 16 colonies obtained from a commercial beekeeper and labelled as colonies with sister queens from 2016. Over a period of 32 days they were assessed approximately weekly (depending on weather conditions) to finally select eight very similar colonies (e.g. regarding colony size, brood, Varroa destructor counts). During this pre-monitoring period two colonies were excluded as the queens were evidently not sister queens (wrong colour marks). Furthermore, one was excluded due to slightly elevated Varroa destructor infestation, one contained very aggressive bees and in one colony no queen was found. After exclusion of these colonies eleven colonies remained for selection of colonies for the tunnel phase. Of these eight were selected for the tunnel phase, which were expected to be most equal during the tunnel phase with respect to colony strength based on the number of adult bees and capped brood cells and mortality. By the selection of four hives per test group after the pre-monitoring period MDD and mean CV (coefficient of variation) values of the control and reference were reduced from 29.4% (MDD) and 19.4% (CV) to 10.8% and 7.2% for estimations by weight (Fig 1, Table 1), from 19.5% (MDD) and 12.4% (CV) to 13.7% and 8.9% for estimations by adult photography and from 20.3% (MDD) and 12.8% (CV) to 13.7% and 9.0% for visual estimations. This demonstrates that colony size estimations based on colony weight or photography would be able to detect small effects on population sizes of colonies of almost 10%, even though only four replicates (tunnels) were used per test group. For detailed information of means and standard deviations underlying the power analysis see S1-S3 Tables in S1 File. We also compared MDD of colony strength by weight on the day of selection of colonies under the assumption of randomly sampling four colonies for each test group by Monte Carlo analysis (with replacement). When randomly choosing hives MDD was on average 21.6%. I.e. the selection procedure achieved a reduction of MDD by a factor of about two (10.8% by the selection procedure vs. 21.6% by randomly choosing hives).
Fig 1
Minimum detectable difference (MDD) [%] and coefficient of variation (CV) [%] for the number of adults obtained from weight measurements.
Table 1
Minimum detectable difference and p-values of t-tests of the effect of toxicant exposure on colony strength estimated by weight, adult bee photography and visual estimation between the control and the reference.
Pre-tunnel phase
Tunnel phase
Post-tunnel phase
Date
10.04
13.04
24.04
28.04
02.05
07.05
13.05
19.05
25.05.
01.06.
Minimum detectable difference (MDD, %)
Weight
29.4
23.2
10.8
12.9
14.6
10.9
17.9
28.3
21.0
22.3
Photo-graphy
19.5
25.2
13.7
14.7
14.3
18.1
20.7
18.6
18.4
20.0
Visual estima-tion
20.3
25.2
13.7
18.7
18.3
24.6
19.5
17.7
27.1
19.7
p-values from t-tests
Weight
n/a
n/a
n/a
1.000
0.032*
0.019*
0.171
0.458
0.029*
0.724
Photo-graphy
n/a
n/a
n/a
0.997
0.016*
0.100
0.397
0.376
0.130
0.128
Visual estima-tion
n/a
n/a
n/a
0.998
0.179
0.217
0.472
0.214
0.196
0.407
* Significant difference between control and reference hives at p<0.05.
Data did not deviate significantly from normality and variances were homogeneous. n/a = not applicable.
* Significant difference between control and reference hives at p<0.05.Data did not deviate significantly from normality and variances were homogeneous. n/a = not applicable.
Tunnel phase
After the selection of the colonies, four hives were randomly assigned to each the control group and the group in which a toxicant was applied (reference group), respectively and they were placed into the tunnels. After application (at the beginning of the tunnel phase), colony size in the reference dropped by about 21% (Fig 2). This decrease of the population size of the reference hives was clearly detectable by weight measurements and adult photography (Table 1). When evaluating colony size from visual estimations the effect was statistically not significant. Notably, MDD for colony size obtained from weighting and photography remained low throughout the tunnel phase.
Fig 2
Colony strength [% initial] obtained from the estimations of the numbers of adult bees measured by weight.
Error bars reflect the standard deviation between colonies (data had been normalised to the mean of the control group).
Colony strength [% initial] obtained from the estimations of the numbers of adult bees measured by weight.
Error bars reflect the standard deviation between colonies (data had been normalised to the mean of the control group).Application of the reference substance also resulted in increased forager mortality (Fig 3) estimated from automated video counts and in a high number of dead bees in dead bee traps and sheets (S2 Fig, S4-S6 Tables in S1 File). Flight activity obtained from automated counts of bees leaving the hive was reduced by 34% after the application during the tunnel phase, whereas the flight activity obtained by observations of 1 m squares in the crop was reduced by 91% after the application during the tunnel phase. This difference could be due to bees deciding not to forage in the sprayed crop after an initial assessment of the environment.
Fig 3
Daily forager mortality in the tunnel before and after the application.
Error bars indicate one standard deviation.
Daily forager mortality in the tunnel before and after the application.
Error bars indicate one standard deviation.
Post-tunnel phase
During the post-tunnel phase both control and reference colonies showed a similar weather related development. One month after application effects on colony size had disappeared (Fig 2, Table 1).
Brood development, food stores and influence of weather
Photos of all frames of all hives were taken during all phases of the study. Therefore, brood development could be evaluated continuously over the study period. The brood development of both control and reference hives (amount of eggs, old larvae, young larvae und pupae) showed a similar trend during the whole study. Weather had a pronounced effect of the brood development in both control and reference hives, which was more pronounced than the treatment with the reference substance. Low temperatures (<12°C) coincided with marked reduction of nectar and pollen stores (Fig 4, shown for control hives).
Fig 4
Relation of the amount of pollen cells of all control hives to temperature during the study.
Brood termination mainly occurred during the egg stage, i.e. very early during the development (up to 83% of the total termination occurred at the egg stage). This early termination of eggs (BTRegg) was highest when the ratio of pollen cells to open brood cells dropped below 1.4 (Fig 5 left).
Fig 5
Left: Dependence of BTR
on the ratio of pollen cells on same or opposite frame to open brood cells. Right: Model fit of a best fit GLM for BTRegg considering impact of capped and open brood cells and pollen cells (model no. 3 in Table 2).
Left: Dependence of BTR
on the ratio of pollen cells on same or opposite frame to open brood cells. Right: Model fit of a best fit GLM for BTRegg considering impact of capped and open brood cells and pollen cells (model no. 3 in Table 2).
Table 2
Generalized linear models (GLMs) describing the impact of factors on brood termination rate BTRegg.
GLM for identification of BTRegg drivers
Model
N param.
Explanatory variables
AICc
ΔAICc
1
2
CB + OB
-36.30
+5.28
2
3
CB + OB + OB2
-38.84
+2.74
3
4
CB + OB + PC + CB*PC
-41.58
0
4
5
Same as model 3
-41.58
0
A list of evaluated parameters is available in the supplementary material. The explanatory variables are CB = Capped Brood, OB = Open Brood, PC = Pollen Cells. The ΔAICc references to the lowest AICc among all models. Since the best model with five parameters equals the best model with four parameters no more models with five parameters are listed. Exceeding the number of four model parameters will lead to an overparameterization.
A list of evaluated parameters is available in the supplementary material. The explanatory variables are CB = Capped Brood, OB = Open Brood, PC = Pollen Cells. The ΔAICc references to the lowest AICc among all models. Since the best model with five parameters equals the best model with four parameters no more models with five parameters are listed. Exceeding the number of four model parameters will lead to an overparameterization.To identify parameters which mostly affect BTRegg more systematically, generalized linear models (GLMs) were used. A complete list of parameters is provided in the supplementary material. Prior to the model analysis the data was checked for pseudo-replicates due to repeated measurements in the same hives. However, the termination rate refers to the eggs rather than to the hives and the entity of eggs completely changed from one measurement to another. Furthermore, there was no significant correlation between the BTRegg and the hive index or the time of measurement (S3 Fig in S1 File). Hence, the BTRegg measurements can be considered as independent.The visual analysis of the hive frames indicated that brood termination depended on the spatial distance between pollen cells and brood cells (Fig 6). Statistical analysis confirmed the visual analysis and identified the number of pollen cells on the same or opposite frame and the distance from the hive center as the most important drivers of the BTRegg inside the hive (S3 Fig in S1 File).
Fig 6
3D representation of the bottom body of a colony hive (hive 17–5).
During a period of cold weather nectar and pollen stores were gradually depleted. It can also be seen that brood termination (shown in yellow) was sometimes high, when pollen stores on the same or opposite side of a frame were low (see e.g. 13th and 24th April). Even though pollen was available elsewhere in the hive, it seems that this did not prevent termination.
3D representation of the bottom body of a colony hive (hive 17–5).
During a period of cold weather nectar and pollen stores were gradually depleted. It can also be seen that brood termination (shown in yellow) was sometimes high, when pollen stores on the same or opposite side of a frame were low (see e.g. 13th and 24th April). Even though pollen was available elsewhere in the hive, it seems that this did not prevent termination.When evaluating the hive content without taking account of this spatial information (i.e. ignoring whether e.g. pollen is available near brood cells or far away), the best models describing BTRegg included the parameters capped brood, open brood and pollen cells. When analyzing the hive content taking spatial information into account, a high BTRegg coincided with a low number of pollen cells on the same or opposite frame, a high number of capped brood on the same or opposite frame (possibly indicating a high recent consumption of pollen of previously open brood that is now capped) and a low ratio of pollen cells per open brood cells and a low number of open brood (S3 Fig in S1 File). Hence, the spatial distribution of pollen seems to play an important role for brood success.
Discussion and conclusions
Currently used honeybee field and semi-field trials conducted to assess side effects by toxicants have been criticised of having a relatively low statistical power, resulting from the inherent variability of honeybee colonies but also from field methodology [3, 6]. In the past, this has been addressed by equalizing honeybee colonies before the start of a test. For example, Delaplane et al. [8] presented two different variations of a method to reduce variation among honey bee colonies regarding adult bees, brood, mites and food at the beginning of experiments. Both modes described by Delaplane et al. [8] involve significant manipulation of colonies (and possibly a mix of bees from different queens may be obtained when brood combs are combined). In the study design considered in our study, instead of equalizing hives we started with a larger number of unmanipulated hives and tracked their development over about a month in order to see how they perform over time. Then colonies were selected for the test, which were not only of similar size, but which developed similarly throughout the pre-monitoring phase (taking account of mortality and capped brood) and which were hence expected to perform similarly during the tunnel phase. We used sister queens, but since no new colonies were set up using combs from different hives, in our trial all bees can be considered to be offspring of their queen, which reduces genetic variation (for a comparison of the present study design with a conventional OECD 75 test, see also S10 Table in S1 File).Apart from reducing variability by the selection of colonies after pre-monitoring, another aim of this study was to test different methods with regard to their variability and uncertainty, in order to identify those methods, which have the potential to decrease variability in field and semi-field trials. This had been preceded by an assessment the sources of uncertainty and variability [19, 26]. This included the measurement of colony size by weight, adult bee photography and the assessment of brood development and food stores with in-hive photography. The determination of colony size by weight has previously been described by Delaplane et al. [8]. But in contrast to Delaplane et al. [8] weighing of the colonies was not only done at the end of the experiment but during the whole course of our study. Evaluation of bee brood via photography has been described by various authors (e.g. [26, 27, 28, 29, 30]). However, to our knowledge the present study is the first one using digital analysis of photos of all combs of all hives during the whole study period (including not only brood cells, but also any other type of cell), instead of selecting a subset of 100 or 200 brood cells only [1, 3]. The use of photography to count adults has previously also been used by Cutler et al. [31]. A new method applied in this study was the assessment of flight activity and forager mortality with the help of automated video counts instead of using 1 m observation squares, dead bee traps and linen sheets, which cover forager activity and forager mortality only partially.The presented study design together with the tested measurement methods resulted in a considerable reduction of MDDs of colony strength (except for visual estimation, for which MDD were generally highest). In particular, the assessment of colony strength obtained from weight measurements made it possible to detect effects of about 10%, despite of the low number of only four replicates per test group (MDD values before the application and during the tunnel phase after the application). Therefore, by using this method it was possible to detect rather small differences between hives. If the number of control and reference hives were doubled for estimations by weight (i.e. eight control hives and eight reference hives), the MDD value would have been 6.4% (just before the tunnel phase).To illustrate the sample sizes required to reach specific MDD values during the exposure phase, we also calculated power curves (Fig 7) using Monte Carlo randomization (for details, see S1 File). This has been done for the first day in the tunnel (28.4.), i.e. at a time when a pesticide would be applied.
Fig 7
Power curves for the first measurement day in the tunnel (28.4.) showing MDD% values and increasing number of hives per test group assuming either random selection of hives (conventional trial) or the selection of hives as conducted in the present LUV trial (for details see S1 File).
Notably, while the variability was considerably reduced by the selection of hives after the pre-monitoring phase, there was a trend that variability gradually increased again towards the end of the study. This may be due to the variability in the use of foraging sites [32, 33] and (possibly as a consequence) foraging plants [34], which naturally introduce the variability between colonies over time. Since the use of MDD is a relatively new concept in honeybee risk assessment, there are only few studies available in which MDD were provided. In field study by Rolke et al. [35] eight equalized colonies with sister queens were used per study site and colony strength was estimated visually. MDD ranged between 15.2–21.4%. This is considerably higher than the MDD we observed after selection of colonies, despite of the smaller sample size used (four colonies instead of eight in Rolke et al [35]).Using a larger number of hives and adding a monitoring phase to finally select a subset of colonies for the actual test does, however, increased the duration and workload for such a study. In particular the use of photography to count adult bees required a much larger number of field staff, since adult bee photography was done simultaneously for all hives to reduce bias by changing weather conditions. However, results have shown that colony size estimation by weight maybe a more cost effective alternative method with a similar level of measurement uncertainty.Assessing forager activity and forager mortality with methods proposed in OECD (2007) [1] and using automated counting of foragers in videos a clear effect of the reference substance on the number of dead bees counted and on forager activity was observed. However, it is important to understand that the different methods for assessing mortality provide different information: The assessment of mortality according to OECD (2007) [1] and OEPP/EPPO (2010) [2] includes counting dead bees in dead bee traps in front of the hive and on plastic sheets placed in the tunnel. Dead bees found in dead bee traps are mostly bees that died within the hive and have been transported outside by conspecifics while dead bees collected on plastic sheets also include a considerable fraction foragers that died outside (in particular in a tunnel study). With regard to the Specific Protection Goals (SPGs) for forager mortality (e.g. a twofold increased forager mortality over a period of three days is considered negligible; [3] it should be noted that dead bee traps do not provide a measure of forager mortality. Dead bees from plastic sheets probably partly reflect forager mortality. With video recordings, however, foraging activity can be estimated based on the number of bees entering and exiting the hives. Hence, this method more directly generates the data required to address the SPGs regarding forager mortality defined by EFSA [3]. Similarly, forager activity assessments as proposed in OECD (2007) [1] do not cover the whole forager activity of a colony but only a fraction limited by time and space. The new method of using automated video counts of bees entering and leaving the hives can be used to assess the whole forager activity of a colony for the whole foraging time from dawn to dusk.To our knowledge this study is the first one assessing entire colonies, i.e. all cells of a hive were evaluated regarding the development of brood and of food stores throughout the study. This offered a unique insight into the development of hives and driving factors. Each hive included more than 3000 brood cells while in current brood trials only 100 to 200 cells containing eggs or larvae are assessed [1, 2]. To assess the impact of selection of a subset of brood cells on the results of a brood trial, either 100 or 300 cells with eggs were randomly chosen from one reference hive (hive 17–5) and BTR was calculated. This was repeated many times (see S1 File). BTR varied by about ±20% compared to the true BTR over all cells when choosing 100 cells with eggs for evaluation and by ±10% when choosing 300 cells with eggs. Hence, the selection of a small number of brood cells results in a considerable uncertainty of measured BTR.Apart from removing uncertainty about the BTR in a given hive, the complete evaluation of all cells of the hives also made it possible to obtain insight into the factors that affect brood success and colony development. Low temperature of less than 12°C, which prevent foraging [36] coincided with a marked reduction of pollen stores and an increase in brood termination. Due to the importance of pollen as larvae food, the reduction of pollen stores results in an increase of open brood removal [37, 38]. 3D images of hives indicated also that brood termination depended on the location of pollen within the hive. In particular, pollen stores near brood cells determined brood success. This may be relevant for bee keeper practice and help to avoid colony losses. The maintenance of the frame location in the hive or an intentional relocation of frames with pollen stores could be used as a measure to increase brood success. The detailed information on brood success and the understanding of the factors increasing BTR may help to understand why BTR is sometimes very high in semi-field studies. For the future, this knowledge can help to decrease BTR in control hives, making this tests more reliable. In the past a number of meta-analyses have been conducted to understand why BTR in sometimes high in semi-studies [39, 40, 41] and less often in field studies [42] (but see also Candolfi et al. [43]). In these analyses more than 80 semi-field trials were analysed considering the factors season, weather, colony strength, tunnel size, larval and pupal mortality. Overall, there was a high variation of BTR and it was not very clear which factors correlate with BTR. While Pistorius et al. [39] found some influence of season (lower BTR in spring than in summer) and crop area (tunnel size) the results of the following analysis (which were partly based on the same trials) did not identify factors that very clearly affect BTR. In the present study, very detailed data on BTR were available: All brood cells of the hives were assessed (more than 3000 cells per hive), reducing variability of BTR which is due to sampling uncertainty and brood photography was conducted continuously over a period of more than two months (one month pre-monitoring, tunnel phase and one month post-tunnel phase). As a result BTRegg could be calculated for nine time points and for each frame side of each hive. The results indicate that BTRegg is determined by the ratio of pollen vs. open brood cells. Furthermore, also a clear spatial relation was found, i.e. mainly pollen stores on the same or opposite side of a frame were relevant for brood success, i.e. the pollen cells which can be easily accessed by nurse bees when feeding larvae. The influence of weather was clearly visible in the course of the study. Low temperature resulted in a rapid depletion of both pollen and nectar stores.Concluding, with the presented LUV test, it was possible to considerably increase test-power (as reflected by MDD, which was reduced from a values >20% to 10.8% after selection of colonies). This may make it now possible to empirically determine Specific Protection Goals (e.g. the seven percent effect size regarding colony strength) recently proposed by EFSA [3, 7] using expert judgement, but also to test the toxicity of chemicals with a much higher certainty. Furthermore, new insights could be gained regarding the impact of weather and other biotic or abiotic factors can be studied in greater detail. However, as the results summarized above are based only on one study, further testing may be required to verify the findings of this study.(PDF)Click here for additional data file.10 Oct 2019PONE-D-19-24362Reduction of variability for the assessment of side effects of toxicants on honeybees and understanding drivers for colony developmentPLOS ONEDear Dr Wang,Thank you for submitting your manuscript to PLOS ONE. After careful consideration, we feel that it has merit but does not fully meet PLOS ONE’s publication criteria as it currently stands. Therefore, we invite you to submit a revised version of the manuscript that addresses the points raised during the review process.Please address all the comments raised by both reviewers and, in particular, the comments on discussing causality and statistical power.We would appreciate receiving your revised manuscript by Nov 24 2019 11:59PM. When you are ready to submit your revision, log on to https://www.editorialmanager.com/pone/ and select the 'Submissions Needing Revision' folder to locate your manuscript file.If you would like to make changes to your financial disclosure, please include your updated statement in your cover letter.To enhance the reproducibility of your results, we recommend that if applicable you deposit your laboratory protocols in protocols.io, where a protocol can be assigned its own identifier (DOI) such that it can be cited independently in the future. For instructions see: http://journals.plos.org/plosone/s/submission-guidelines#loc-laboratory-protocolsPlease include the following items when submitting your revised manuscript:A rebuttal letter that responds to each point raised by the academic editor and reviewer(s). This letter should be uploaded as separate file and labeled 'Response to Reviewers'.A marked-up copy of your manuscript that highlights changes made to the original version. This file should be uploaded as separate file and labeled 'Revised Manuscript with Track Changes'.An unmarked version of your revised paper without tracked changes. This file should be uploaded as separate file and labeled 'Manuscript'.Please note while forming your response, if your article is accepted, you may have the opportunity to make the peer review history publicly available. The record will include editor decision letters (with reviews) and your responses to reviewer comments. If eligible, we will contact you to opt in or out.We look forward to receiving your revised manuscript.Kind regards,James C. Nieh, Ph.D.Academic EditorPLOS ONEJournal Requirements:1. When submitting your revision, we need you to address these additional requirements.Please ensure that your manuscript meets PLOS ONE's style requirements, including those for file naming. The PLOS ONE style templates can be found athttp://www.journals.plos.org/plosone/s/file?id=wjVg/PLOSOne_formatting_sample_main_body.pdf and http://www.journals.plos.org/plosone/s/file?id=ba62/PLOSOne_formatting_sample_title_authors_affiliations.pdf2. We note that one or more of the authors are employed by a commercial company: WSC Scientific GmbH.a) Please provide an amended Funding Statement declaring this commercial affiliation, as well as a statement regarding the Role of Funders in your study. If the funding organization did not play a role in the study design, data collection and analysis, decision to publish, or preparation of the manuscript and only provided financial support in the form of authors' salaries and/or research materials, please review your statements relating to the author contributions, and ensure you have specifically and accurately indicated the role(s) that these authors had in your study. You can update author roles in the Author Contributions section of the online submission form.Please also include the following statement within your amended Funding Statement.“The funder provided support in the form of salaries for authors [insert relevant initials], but did not have any additional role in the study design, data collection and analysis, decision to publish, or preparation of the manuscript. The specific roles of these authors are articulated in the ‘author contributions’ section.”If your commercial affiliation did play a role in your study, please state and explain this role within your updated Funding Statement.b) Please also provide an updated Competing Interests Statement declaring this commercial affiliation along with any other relevant declarations relating to employment, consultancy, patents, products in development, or marketed products, etc.Within your Competing Interests Statement, please confirm that this commercial affiliation does not alter your adherence to all PLOS ONE policies on sharing data and materials by including the following statement: "This does not alter our adherence to PLOS ONE policies on sharing data and materials.” (as detailed online in our guide for authors http://journals.plos.org/plosone/s/competing-interests) . If this adherence statement is not accurate and there are restrictions on sharing of data and/or materials, please state these. Please note that we cannot proceed with consideration of your article until this information has been declared.Please include both an updated Funding Statement and Competing Interests Statement in your cover letter. We will change the online submission form on your behalf.3. We note that you have included the phrase “data not shown” in your manuscript. Unfortunately, this does not meet our data sharing requirements. PLOS does not permit references to inaccessible data. We require that authors provide all relevant data within the paper, Supporting Information files, or in an acceptable, public repository. Please add a citation to support this phrase or upload the data that corresponds with these findings to a stable repository (such as Figshare or Dryad) and provide and URLs, DOIs, or accession numbers that may be used to access these data. Or, if the data are not a core part of the research being presented in your study, we ask that you remove the phrase that refers to these data.Comments to the Author1. Is the manuscript technically sound, and do the data support the conclusions?The manuscript must describe a technically sound piece of scientific research with data that supports the conclusions. Experiments must have been conducted rigorously, with appropriate controls, replication, and sample sizes. The conclusions must be drawn appropriately based on the data presented.Reviewer #1: NoReviewer #2: Yes**********2. Has the statistical analysis been performed appropriately and rigorously?Reviewer #1: NoReviewer #2: Yes**********3. Have the authors made all data underlying the findings in their manuscript fully available?The PLOS Data policy requires authors to make all data underlying the findings described in their manuscript fully available without restriction, with rare exception (please refer to the Data Availability Statement in the manuscript PDF file). The data should be provided as part of the manuscript or its supporting information, or deposited to a public repository. For example, in addition to summary statistics, the data points behind means, medians and variance measures should be available. If there are restrictions on publicly sharing data—e.g. participant privacy or use of data from a third party—those must be specified.Reviewer #1: YesReviewer #2: Yes**********4. Is the manuscript presented in an intelligible fashion and written in standard English?PLOS ONE does not copyedit accepted manuscripts, so the language in submitted articles must be clear, correct, and unambiguous. Any typographical or grammatical errors should be corrected at revision, so please note any specific errors here.Reviewer #1: YesReviewer #2: Yes**********5. Review Comments to the AuthorPlease use the space provided to explain your answers to the questions above. You may also include additional comments for the author, including concerns about dual publication, research ethics, or publication ethics. (Please upload your review as an attachment if it exceeds 20,000 characters)Reviewer #1: This study reports the results of a semi-field trial where caged colonies of honeybees (n = 4) are exposed to either exposure to an insecticide or control conditions. The authors use the data from the experiment to evaluate the power of the experimental design to detect treatment effects. The authors claim that the novel protocols used in their experiment enable fairly small treatment effects to be detected, which might be able to meet the levels of resolution stipulated in recent European regulations.The experiments and data analysis appear to be technically sound and the paper is fairly well-written.The main implication – that adjusted protocols will enable semi-field trials to detect pesticidal effects at the levels required by EU regulations – will be very important and interesting to a wide audience, including regulators, industry and environmentalists.There are, however, some fairly major shortcomings that should be addressed if the wok is to reach its full impact, which I describe as follows.1. Setting the baselineIt would be very useful to set the context – what is the MDD of previous semi-field trials? In the discussion, some evaluation of the level of improvement should be given.2. Power analysisOne of the conventional components of a power analysis is an effort-power curve, which is the relationship between the MDD and sample size across a continuous range. Here it would be useful to see the curves from n = 2 to n = 20. In discussion, it would be useful to compare the power curves of the old protocols with the authors’ new protocols. Curves should be presented for all response variables.3. Interpretation of causalityIt is not correct to attribute the reduction in MDD over time to the selection of the hives because it might have decreased in any case. Justifying this assertion would require comparing the MDD when hives were picked from a pool of all hives versus the MDD when hives were picked from a reduced pool. Either do this or do not make the assertion.4. DiscussionThe current discussion is off-target. The main headline should be the improvement in MDD relative to past practice, which variables are most reliable, etc. The large opening section evaluating the various Delaplane methods does not warrant the space.5. Data provisionThere should be a table of means and SDs in the MS itself so that others can verify the power analyses.Minor edits by line number81-84: link to general theory by using some standard terms such as sampling, response variable.90: decreased, surely?100: clarify locations of phases 1-3109: dissected = eliminated386: ‘very small’ is subjective – currently the reference point is ‘almost negligible’ at 7%Page 29: Figs 1 and 2 are indentical in my copyReviewer #2: The manuscript by Wang et al proposed and tested a new test design for risk assessment (RA) of pesticides in the field and semi-field, addressing major concerns.The manuscript seems scientifically sound, mostly interesting for a relatively narrow community as the topic is very specific but with important broader outcome and global relevance as it proposes changes to the internationally used RA system. The MS could be more concise and clear, in terms of methodology and results/discussion. I think the MS is valuable and worth publishing in PlosONE after referee comments will be addressed. I highlight some major concerns that I believe, when addressed, could make the manuscript clearer, more impactful and robust.*please address the CONS (negative aspects) related to your proposal, not only the pros. This is essential to estimate its feasibility and show a honest approach. i.e. increased costs in terms of expenses, time, organization, staff, etc.*the proposed and current protocol should be reported in a table for clarity and ease of comparison. this tab can for ex. list the endpoints measured and how they are measured (time, etc) in your and the current system. (eg line 45-46, 33, 52,*improve clarity of the methods/proposal: use multiple subchapters for each step i.e. in line 107 and after.*"approximately" is often used (L214, 162, and many more). A more clear and exact description of your protocol is needed as this is required by guidances and gives solidity to your proposal. Add frequency of assessment times for all steps (ie L 174-176)*SPG: how about the subelthal effects? Nonetheless you test forager activity etc, all other sublethal effects pesticides can cause are not addressed very much in your paper. This is another major concerns related to bee health, RA, SPGs. I'd clarify this and eventually state that your work goal does not include this specifically.*clarify if you did field, semif, or both (ie. L90). certain basic details of your work should be easier to figure out. For ex did you test both scenarios with pre selection of colonies and not (L 212+)? If not you cannot compare the change in vairability .*colony exclusion procedure (L 219) needs to be described and showed explicity in terms of methods and results. an explicit method for excludsion decision needs to be used and described (ie. decision threshold for each endpoint ie. varroa, etc?). This is a crucial point that RA needs to clarify explicitly and objectively. please refer to previous guidelines if available on the topic.*the authors should address, briefly and concisely and explicitly in the discussion, how it was demonstrated, providing the key compartive quantitative values, that the LUV test improved the current standards (i.e. increase power); see lines 462-463. Please also explicitly report how you re results demonstrate that your assessment is more robust.*major problems of field studies are not addressed: ie. the absence of real control colonies (i.e. pesticide-free) in the field (i.e. Campbell et al 2016 and Henry et al 2015 showed that control colonies were contaminated by the target pesticide too, and Tosi et al 2018 showed that the majority of the colonies in the environment are exposed to individual and multiple pesticides, even banned ones). I would at least mention this aspect and other key concerns related to field studies for RA.LINE BY LINE COMMENTS:abstract: spell out MDDline 80. this was addressed before too. move above?88 specify field or semifield?412-415 video counts: you reported it causes higher variability. Please address this in this section and evetual other CONS.416-418. I imagine there would be others, I'd double check.discussion: text is very long, i think it should be shorter and more concise.463-464: authors should be more careful when stating this, as you report results from 1 study, testing limited colonies and over a limited time frame (1 year). standard procedures for proposing new methods is ring testing them, ie. perfomed in multiple countries over multiple years. Thus, this statement seems not supported by your results.figures: fig. 3: cannot see error bar in black bars.referencesCampbell, J. W., Cabrera, A. R., Stanley-Stahr, C. & Ellis, J. D. An evaluation of the honey bee (Hymenoptera: Apidae) safety profile of a new systemic insecticide, flupyradifurone, under field conditions in Florida. J. Econ. Entomol. 96, 875–878 (2016).Henry, M. et al. Reconciling laboratory and field assessments of neonicotinoid toxicity to honeybees. Proc. R. Soc. B Biol. Sci. 282, 20152110 (2015).Tosi, S., Costa, C., Vesco, U., Quaglia, G. & Guido, G. A 3-year survey of Italian honey bee-collected pollen reveals widespread contamination by agricultural pesticides. Sci. Total Environ. 615, 208–218 (2018).**********6. PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files.If you choose “no”, your identity will remain anonymous but your review may still be made public.Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy.Reviewer #1: NoReviewer #2: No[NOTE: If reviewer comments were submitted as an attachment file, they will be attached to this email and accessible via the submission site. Please log into your account, locate the manuscript record, and check for the action link "View Attachments". If this link does not appear, there are no attachment files to be viewed.]While revising your submission, please upload your figure files to the Preflight Analysis and Conversion Engine (PACE) digital diagnostic tool, https://pacev2.apexcovantage.com/. PACE helps ensure that figures meet PLOS requirements. To use PACE, you must first register as a user. Registration is free. Then, login and navigate to the UPLOAD tab, where you will find detailed instructions on how to use the tool. If you encounter any issues or have any questions when using PACE, please email us at figures@plos.org. Please note that Supporting Information files do not need this step.12 Nov 2019Responses are included in file "Response to Reviewers.docx". Here is the same as text:Journal Requirements:1. When submitting your revision, we need you to address these additional requirements.Reply: We have taken these formatting requirements into account.2. We note that one or more of the authors are employed by a commercial company: WSC Scientific GmbH.a) Please provide an amended Funding Statement declaring this commercial affiliation, as well as a statement regarding the Role of Funders in your study. If the funding organization did not play a role in the study design, data collection and analysis, decision to publish, or preparation of the manuscript and only provided financial support in the form of authors' salaries and/or research materials, please review your statements relating to the author contributions, and ensure you have specifically and accurately indicated the role(s) that these authors had in your study. You can update author roles in the Author Contributions section of the online submission form.Please also include the following statement within your amended Funding Statement.“The funder provided support in the form of salaries for authors [insert relevant initials], but did not have any additional role in the study design, data collection and analysis, decision to publish, or preparation of the manuscript. The specific roles of these authors are articulated in the ‘author contributions’ section.”If your commercial affiliation did play a role in your study, please state and explain this role within your updated Funding Statement.Reply: We amended the financial disclosure statement as follows:This research was funded entirely by the commercial company WSC Scientific GmbH. The funder provided support in the form of salaries for authors [MW, TB, CD] and materials, but did not have any additional role in the study design, data collection and analysis, decision to publish, or preparation of the manuscript. The specific roles of these authors are articulated in the ‘author contributions’ section.b) Please also provide an updated Competing Interests Statement declaring this commercial affiliation along with any other relevant declarations relating to employment, consultancy, patents, products in development, or marketed products, etc.Within your Competing Interests Statement, please confirm that this commercial affiliation does not alter your adherence to all PLOS ONE policies on sharing data and materials by including the following statement: "This does not alter our adherence to PLOS ONE policies on sharing data and materials.” (as detailed online in our guide for authors http://journals.plos.org/plosone/s/competing-interests) . If this adherence statement is not accurate and there are restrictions on sharing of data and/or materials, please state these. Please note that we cannot proceed with consideration of your article until this information has been declared.Please include both an updated Funding Statement and Competing Interests Statement in your cover letter. We will change the online submission form on your behalf.Reply: We added the following Competing Interest Statement (we don’t expect a commercial benefit from this study, as industry is not interested in refined methodology (which may improve the detection of adverse effects of pesticides), however, academia or governmental authorities may be interested in an improvement of the method, which would, however, not result in any commercial benefit of the funding company):[MW, TB, CD] are employed at WSC Scientific GmbH, which, amongst others, develops scientific software including software for the evaluation of honeybee trials. The purpose of this study was, however, not to promote such software, which contributes only marginally to the turnover of the company, but to develop better study design for the risk assessment. This does not alter our adherence to PLOS ONE policies on sharing data and materials.3. We note that you have included the phrase “data not shown” in your manuscript. Unfortunately, this does not meet our data sharing requirements. PLOS does not permit references to inaccessible data. We require that authors provide all relevant data within the paper, Supporting Information files, or in an acceptable, public repository. Please add a citation to support this phrase or upload the data that corresponds with these findings to a stable repository (such as Figshare or Dryad) and provide and URLs, DOIs, or accession numbers that may be used to access these data. Or, if the data are not a core part of the research being presented in your study, we ask that you remove the phrase that refers to these data.Reply: We have added these data in the Supplemental Material now and replaced “data not shown” with ”S1 Fig. 2, Tables 4-6”.Comments to the Author5. Review Comments to the AuthorReviewer #1: This study reports the results of a semi-field trial where caged colonies of honeybees (n = 4) are exposed to either exposure to an insecticide or control conditions. The authors use the data from the experiment to evaluate the power of the experimental design to detect treatment effects. The authors claim that the novel protocols used in their experiment enable fairly small treatment effects to be detected, which might be able to meet the levels of resolution stipulated in recent European regulations.The experiments and data analysis appear to be technically sound and the paper is fairly well-written.The main implication – that adjusted protocols will enable semi-field trials to detect pesticidal effects at the levels required by EU regulations – will be very important and interesting to a wide audience, including regulators, industry and environmentalists.There are, however, some fairly major shortcomings that should be addressed if the wok is to reach its full impact, which I describe as follows.1. Setting the baselineIt would be very useful to set the context – what is the MDD of previous semi-field trials? In the discussion, some evaluation of the level of improvement should be given.Reply: We added a section in which we discuss a paper by Rolke et al. (2016), in which MDD was calculated (the use of MDD is a relatively new concept in pesticide risk assessment). Using twice as many colonies MDD ranged between 15.2–21.4% compared to about 10% in our study with only four colonies.2. Power analysisOne of the conventional components of a power analysis is an effort-power curve, which is the relationship between the MDD and sample size across a continuous range. Here it would be useful to see the curves from n = 2 to n = 20. In discussion, it would be useful to compare the power curves of the old protocols with the authors’ new protocols. Curves should be presented for all response variables.Reply: Such a power curve would indeed be helpful. However, since honeybee trials according to OECD 75 are done with three to six colonies per test group at most, such data is not available (such studies are very cost intensive even with these sample sizes). We are not aware of any (published or unpublished) study with more than six colonies per test group.3. Interpretation of causalityIt is not correct to attribute the reduction in MDD over time to the selection of the hives because it might have decreased in any case. Justifying this assertion would require comparing the MDD when hives were picked from a pool of all hives versus the MDD when hives were picked from a reduced pool. Either do this or do not make the assertion.Reply: We thank the reviewer for highlighting this. We added the information which MDD would have been reached without the selection procedure at the end of section “Pre-monitoring” in the results:“We also compared MDD of colony strength by weight on the day of selection of colonies under the assumption of randomly sampling four colonies for each test group by Monte Carlo analysis. When randomly choosing hives MDD was on average 21.6%. I.e. the selection procedure achieved a reduction of MDD by a factor of about two (10.8% by the selection procedure vs. 21.6% by randomly choosing hives).”4. DiscussionThe current discussion is off-target. The main headline should be the improvement in MDD relative to past practice, which variables are most reliable, etc. The large opening section evaluating the various Delaplane methods does not warrant the space.Reply: We shortened the section where the work by Delaplane was discussed considerably.5. Data provisionThere should be a table of means and SDs in the MS itself so that others can verify the power analyses.Reply: We added this information to the Supplementary Information S1 (Tables 1-3).Minor edits by line number81-84: link to general theory by using some standard terms such as sampling, response variable.Reply: We don’t understand this comment.90: decreased, surely?Reply: Thank you, yes it needs to be “decreased”.100: clarify locations of phases 1-3Reply: Thank you for highlighting this, it is indeed clearer if we mention the locations more clearly for phases 1 and 3 (for the tunnel phase this was already done). We added the location in parenthesis for phases 1 and 3 now.109: dissected = eliminatedReply: We replaced ‘dissected’ with ‘eliminated’386: ‘very small’ is subjective – currently the reference point is ‘almost negligible’ at 7%Reply: In the previous sentence we stated that effects of 10% were detectable with N=4, hence we feel it is fine to name these ‘very small’ (as the absolute value was provided). However, to make clear that we do not reach 7% yet with N=4 we changed the sentence to ‘rather small’.Page 29: Figs 1 and 2 are indentical in my copyReply: We corrected this now.Reviewer #2: The manuscript by Wang et al proposed and tested a new test design for risk assessment (RA) of pesticides in the field and semi-field, addressing major concerns.The manuscript seems scientifically sound, mostly interesting for a relatively narrow community as the topic is very specific but with important broader outcome and global relevance as it proposes changes to the internationally used RA system. The MS could be more concise and clear, in terms of methodology and results/discussion. I think the MS is valuable and worth publishing in PlosONE after referee comments will be addressed. I highlight some major concerns that I believe, when addressed, could make the manuscript clearer, more impactful and robust.*please address the CONS (negative aspects) related to your proposal, not only the pros. This is essential to estimate its feasibility and show a honest approach. i.e. increased costs in terms of expenses, time, organization, staff, etc.Reply: We added a paragraph highlighting the CONS of the applied methodology in the discussion:“Using a larger number of hives and adding a monitoring phase to finally select a subset of colonies for the actual test does, however, increased the duration and workload for such a study by about a month. In particular the use of photography to count adult bees required a much larger number of field staff, since adult bee photography was done simultaneously for all hives to reduce bias by changing weather conditions. However, results have shown that colony size estimation by weight maybe a more cost effective alternative method with a similar level of measurement bias compared to photography.”*the proposed and current protocol should be reported in a table for clarity and ease of comparison. this tab can for ex. list the endpoints measured and how they are measured (time, etc) in your and the current system. (eg line 45-46, 33, 52,Reply: Such a table is already included in the supplemental information (Section “4. Comparison of study design in comparison to a conventional OECD 75 trial”), there the differences are compared for all endpoints measured. If the editor prefers we could move this table from the supplementary information to the manuscript?*improve clarity of the methods/proposal: use multiple subchapters for each step i.e. in line 107 and after.Reply: We thank the reviewer for this excellent proposal. We have added a section before this line to first clarify which endpoints have been measured in which phase:“Throughout the whole study the following endpoints were measured (see below for details on the methods): colony strength, content of all cells of all hives (incl. brood, nectar, pollen), dead bees and larvae in traps. Foraging activity by visual assessment was only conducted in the tunnel phase and forager activity and mortality measurements by videography were only conducted in the tunnel and post-tunnel phase.”Furthermore we added a header “Step 1: Four week monitoring phase” and added a section for the following phase: “Step 2: A ten-day tunnel phase” and “Step 3: Post-tunnel phase”. We believe this does indeed help very much to clarify how the study was conducted.*"approximately" is often used (L214, 162, and many more). A more clear and exact description of your protocol is needed as this is required by guidances and gives solidity to your proposal. Add frequency of assessment times for all steps (ie L 174-176)Reply: We rephrased this section as follows (bold letters indicate changed text):Line 162: “Measurements of flight activity were planned in weekly intervals during and after the tunnel phase using video recordings of the entrances of all hives over the entire activity phase (from dawn to dusk; see supplementary material). Actual dates varied by one to two days depending on weather conditions.”Line 214: “Over a period of 32 days they were assessed approximately weekly (depending on weather conditions) to finally select eight very similar colonies […]”*SPG: how about the subelthal effects? Nonetheless you test forager activity etc, all other sublethal effects pesticides can cause are not addressed very much in your paper. This is another major concerns related to bee health, RA, SPGs. I'd clarify this and eventually state that your work goal does not include this specifically.Reply: We believe it is sufficiently clear that we do not address sublethal effects when discussing the specific protection goal (SPG) defined by EFSA, because whenever ‘SPG’ is mentioned the context is provided:Line 404 in original manuscript: “With regard to the Specific Protection Goals (SPGs) for forager mortality […]”. Here it is clear that this only focuses on forager mortality.Line 409: “Hence, this method more directly generates the data required to address the SPGs regarding forager mortality defined by EFSA (2013)” Same as above.If the editor prefers we would of course be happy to add an additional sentence mentioning that sublethal effects are not considered.*clarify if you did field, semif, or both (ie. L90). certain basic details of your work should be easier to figure out. For ex did you test both scenarios with pre selection of colonies and not (L 212+)? If not you cannot compare the change in vairability.Reply: Since the selection process would be the same for both field and semi-field trials, this sentence applies to both types of studies. We did, however, conduct only a semi-field study after the selection. But note that variability was reduced already before colonies were in the tunnels. In the sentence before it was already mentioned that we only did a semi-field study. However, to avoid any doubt we modified the sentence as follows:“Results from our semi-field study demonstrate that variability of both field and semi-field trials can be decreased significantly with the proposed new test design and methodology.”*colony exclusion procedure (L 219) needs to be described and showed explicity in terms of methods and results. an explicit method for excludsion decision needs to be used and described (ie. decision threshold for each endpoint ie. varroa, etc?). This is a crucial point that RA needs to clarify explicitly and objectively. please refer to previous guidelines if available on the topic.Reply: We have added a more detailed description of the selection process in the methods section under the subheading “Selection of colonies at the end of the monitoring phase”. This section now reads as follows:“From these eleven colonies eight colonies were selected and randomly assigned to the control and the reference group (four colonies per group). The selection had the aim of achieving colonies of similar strength during the tunnel phase. This same rationale is used in many laboratory toxicity trials, where very ‘similar’ animals, e.g. of similar age and strain are selected in order to decrease variability (and increase test-power). This was achieved based on the development of colony size throughout the pre-tunnel monitoring phase (as measured by weight), the number of capped brood cells per hive and similar mortality (measured in dead bee traps). Specifically, the selection was done in two steps: 1. Exclusion of colonies due to other reasons than colony size (e.g. high varroa count, untypically high mortality in dead bee traps). 2. Comparison of colony size during the monitoring phase, i.e. the most similar eight colonies in terms of colony strength were selected and assigned randomly to the two test groups.”*the authors should address, briefly and concisely and explicitly in the discussion, how it was demonstrated, providing the key compartive quantitative values, that the LUV test improved the current standards (i.e. increase power); see lines 462-463. Please also explicitly report how you re results demonstrate that your assessment is more robust.Reply: We modified the sentence:“Concluding, with the presented LUV test, it was possible to considerably increase test-power.”To:“Concluding, with the presented LUV test, it was possible to considerably increase test-power (as reflected by MDD, which was reduced from a values >20% to 10.8% after selection of colonies).”*major problems of field studies are not addressed: ie. the absence of real control colonies (i.e. pesticide-free) in the field (i.e. Campbell et al 2016 and Henry et al 2015 showed that control colonies were contaminated by the target pesticide too, and Tosi et al 2018 showed that the majority of the colonies in the environment are exposed to individual and multiple pesticides, even banned ones). I would at least mention this aspect and other key concerns related to field studies for RA.Reply: While the issues mentioned by the reviewer are certainly an important point to consider in the risk assessment scheme in general, we feel that it is a bit out of the scope of this article. We could add a sentence such as:“As for all field or semi-field trials, there are still more issues to resolve for future testing, such as the use of truly unexposed control colonies (Henry et al., 2015; Campbell et al., 2016; Tosi et al., 2018). However, since colonies usually always have access to agricultural land or gardens when they are established before a field or semi-field test, it may be difficult to achieve this.”However, we feel it would be a bit lost in the discussion. If the editor prefers, we’d of course be happy to add at the end of the discussion.LINE BY LINE COMMENTS:abstract: spell out MDDReply: We did as suggested.line 80. this was addressed before too. move above?Reply: The methods above, described e.g. by Delaplane and Harbo (1987), differ to the ones we used. Delaplane and Harbo (1987) suggested to ‘create’ new colonies in order to achieve similar colonies, while we do not alter the colonies, but select the most similar ones. Therefore, we feel it is better to leave these parts of the text separately.88 specify field or semifield?Reply: We added “semi-field” to be more precise.412-415 video counts: you reported it causes higher variability. Please address this in this section and evetual other CONS.Reply: This is probably a misunderstanding. We wrote that “the forager activity assessments as proposed in OECD (2007) [1] do not cover the whole forager activity of a colony but only a fraction limited by time and space.“ I.e. we consider the assessment using videos to be much more precise than the very short, visual assessments of only a few 1m² squares proposed by OECD.416-418. I imagine there would be others, I'd double check.Reply: We did not find any papers on studies in which entire colonies (all cells) were evaluated. There were only papers on methods to count capped cells (e.g. Colin et al., 2018).Colin T, Bruce J, Meikle WG, Barron AB 2018. The development of honey bee colonies assessed using a new semi-automated brood counting method: CombCount. PLOS ONE: https://dx.plos.org/10.1371/journal.pone.0205816discussion: text is very long, i think it should be shorter and more concise.Reply: We tried to shorten the discussion, specifically paragraph 1.463-464: authors should be more careful when stating this, as you report results from 1 study, testing limited colonies and over a limited time frame (1 year). standard procedures for proposing new methods is ring testing them, ie. perfomed in multiple countries over multiple years. Thus, this statement seems not supported by your results.Reply: We agree. We rephrased the paragraph and also added a sentence highlighting that so far this has been demonstrated only in one study (words in bold letters have been added):“Concluding, with the presented LUV test, it was possible to considerably increase test-power (as reflected by MDD, which was reduced from a values >20% to 10.8% after selection of colonies). This does not only may make it now possible to empirically determine Specific Protection Goals (e.g. the seven percent effect size regarding colony strength) recently proposed by EFSA [3, 7] using expert judgement, but also to test the toxicity of chemicals with a much higher certainty. Furthermore, new insights cancould be gained regarding the impact of weather and other biotic or abiotic factors can be studied in greater detail. However, as the results summarized above are based only on one study, further testing may be required to verify the findings of this study.”figures: fig. 3: cannot see error bar in black bars.Reply: We modified the figures with error bars so that also negative bars are now visible.ReferencesCampbell, J. W., Cabrera, A. R., Stanley-Stahr, C. & Ellis, J. D. An evaluation of the honey bee (Hymenoptera: Apidae) safety profile of a new systemic insecticide, flupyradifurone, under field conditions in Florida. J. Econ. Entomol. 96, 875–878 (2016).Henry, M. et al. Reconciling laboratory and field assessments of neonicotinoid toxicity to honeybees. Proc. R. Soc. B Biol. Sci. 282, 20152110 (2015).Tosi, S., Costa, C., Vesco, U., Quaglia, G. & Guido, G. A 3-year survey of Italian honey bee-collected pollen reveals widespread contamination by agricultural pesticides. Sci. Total Environ. 615, 208–218 (2018).Submitted filename: Response to Reviewers.docxClick here for additional data file.6 Dec 2019PONE-D-19-24362R1Reduction of variability for the assessment of side effects of toxicants on honeybees and understanding drivers for colony developmentPLOS ONEDear Dr Wang,Thank you for submitting your manuscript to PLOS ONE. After careful consideration, we feel that it has merit but does not fully meet PLOS ONE’s publication criteria as it currently stands. Therefore, we invite you to submit a revised version of the manuscript that addresses the points raised during the review process.Please provide in your revision the power curve requested by the reviewer.We would appreciate receiving your revised manuscript by Jan 20 2020 11:59PM. When you are ready to submit your revision, log on to https://www.editorialmanager.com/pone/ and select the 'Submissions Needing Revision' folder to locate your manuscript file.If you would like to make changes to your financial disclosure, please include your updated statement in your cover letter.To enhance the reproducibility of your results, we recommend that if applicable you deposit your laboratory protocols in protocols.io, where a protocol can be assigned its own identifier (DOI) such that it can be cited independently in the future. For instructions see: http://journals.plos.org/plosone/s/submission-guidelines#loc-laboratory-protocolsPlease include the following items when submitting your revised manuscript:A rebuttal letter that responds to each point raised by the academic editor and reviewer(s). This letter should be uploaded as separate file and labeled 'Response to Reviewers'.A marked-up copy of your manuscript that highlights changes made to the original version. This file should be uploaded as separate file and labeled 'Revised Manuscript with Track Changes'.An unmarked version of your revised paper without tracked changes. This file should be uploaded as separate file and labeled 'Manuscript'.Please note while forming your response, if your article is accepted, you may have the opportunity to make the peer review history publicly available. The record will include editor decision letters (with reviews) and your responses to reviewer comments. If eligible, we will contact you to opt in or out.We look forward to receiving your revised manuscript.Kind regards,James C. Nieh, Ph.D.Academic EditorPLOS ONE[Note: HTML markup is below. Please do not edit.]Reviewers' comments:Reviewer's Responses to QuestionsComments to the Author1. If the authors have adequately addressed your comments raised in a previous round of review and you feel that this manuscript is now acceptable for publication, you may indicate that here to bypass the “Comments to the Author” section, enter your conflict of interest statement in the “Confidential to Editor” section, and submit your "Accept" recommendation.Reviewer #1: (No Response)**********2. Is the manuscript technically sound, and do the data support the conclusions?The manuscript must describe a technically sound piece of scientific research with data that supports the conclusions. Experiments must have been conducted rigorously, with appropriate controls, replication, and sample sizes. The conclusions must be drawn appropriately based on the data presented.Reviewer #1: Yes**********3. Has the statistical analysis been performed appropriately and rigorously?Reviewer #1: Yes**********4. Have the authors made all data underlying the findings in their manuscript fully available?The PLOS Data policy requires authors to make all data underlying the findings described in their manuscript fully available without restriction, with rare exception (please refer to the Data Availability Statement in the manuscript PDF file). The data should be provided as part of the manuscript or its supporting information, or deposited to a public repository. For example, in addition to summary statistics, the data points behind means, medians and variance measures should be available. If there are restrictions on publicly sharing data—e.g. participant privacy or use of data from a third party—those must be specified.Reviewer #1: Yes**********5. Is the manuscript presented in an intelligible fashion and written in standard English?PLOS ONE does not copyedit accepted manuscripts, so the language in submitted articles must be clear, correct, and unambiguous. Any typographical or grammatical errors should be corrected at revision, so please note any specific errors here.Reviewer #1: Yes**********6. Review Comments to the AuthorPlease use the space provided to explain your answers to the questions above. You may also include additional comments for the author, including concerns about dual publication, research ethics, or publication ethics. (Please upload your review as an attachment if it exceeds 20,000 characters)Reviewer #1: Your revisions are largely satisfactory and I think it will be a valuable contribution to this under-investigated and important topic. I have one request that was not addressed and I point out some minor edits.1. Power curve - as previously requestedSince you are able to use Monte Carlo randomization, it should be straightforward to estimate power curves by sampling from parametric distributions with the same means and variances as the observed data. Or use the method that you used to estimate the effect of doubling the sample size (lines 410-413). Obviously, it would be nice to generate these empirically, but you can go a long way using your present observation.The really valuable contribution here is to estimate the number of hives needed to detect 7% effects currently specified by the Specific Protection Goals and to indicate the likely number of hives needed to detect any given effect size.Minor edits by line number:32: at driving?57: forager68: certainly not 'impossible' - 'logistically demanding', perhaps77: comma after colonies143: 'toxicant application' - even 'reference substance' is too specialist jargon for this journal209: based on258: define the pool that the selection was made from (the 11 colonies with or without replacement - with, surely?)266: t-tests of the effect of toxicant exposure on ..275 - see 143286 - as 143**********7. PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files.If you choose “no”, your identity will remain anonymous but your review may still be made public.Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy.Reviewer #1: No[NOTE: If reviewer comments were submitted as an attachment file, they will be attached to this email and accessible via the submission site. Please log into your account, locate the manuscript record, and check for the action link "View Attachments". If this link does not appear, there are no attachment files to be viewed.]While revising your submission, please upload your figure files to the Preflight Analysis and Conversion Engine (PACE) digital diagnostic tool, https://pacev2.apexcovantage.com/. PACE helps ensure that figures meet PLOS requirements. To use PACE, you must first register as a user. Registration is free. Then, login and navigate to the UPLOAD tab, where you will find detailed instructions on how to use the tool. If you encounter any issues or have any questions when using PACE, please email us at figures@plos.org. Please note that Supporting Information files do not need this step.8 Jan 2020Response to reviewersReviewer #1: Your revisions are largely satisfactory and I think it will be a valuable contribution to this under-investigated and important topic. I have one request that was not addressed and I point out some minor edits.1. Power curve - as previously requestedSince you are able to use Monte Carlo randomization, it should be straightforward to estimate power curves by sampling from parametric distributions with the same means and variances as the observed data. Or use the method that you used to estimate the effect of doubling the sample size (lines 410-413). Obviously, it would be nice to generate these empirically, but you can go a long way using your present observation.Reply: Thank you for this suggestion, we thought you had meant to do a power analysis with actual data. Using Monte Carlo instead is a good idea. We added power curves as suggested as Figure 7 to the main text in the location proposed by the reviewer. Details to the methods used are shown in the supplementary materials (section 6).The really valuable contribution here is to estimate the number of hives needed to detect 7% effects currently specified by the Specific Protection Goals and to indicate the likely number of hives needed to detect any given effect size.Reply: We agree, this is now shown in Figure 7.Minor edits by line number:32: at driving?Reply: Thank you, this is a typo it must be “and driving”57: foragerReply: We changed this to “foragers”68: certainly not 'impossible' - 'logistically demanding', perhapsReply: As we refer to evaluations by EFSA of data that were generated in the past, we think that it is correct to state that it is impossible to detect small effects using these data. The reviewer probably means that in principle one could detect small effects with more effort. We would agree, but this is not meant here.77: comma after coloniesReply: Thank you, we added a comma.143: 'toxicant application' - even 'reference substance' is too specialist jargon for this journalReply: We rephrased the sentence as follows: “Toxicant application (reference substance) was conducted two days after the start of the tunnel phase.”209: based onReply: Thank you, we added “on”258: define the pool that the selection was made from (the 11 colonies with or without replacement - with, surely?)Reply: We added “with replacement” in backets.266: t-tests of the effect of toxicant exposure on ..Reply: We added “the effect of toxicant exposure on” as suggested.275 - see 143Reply: We have rephrased the sentence as follows (but we feel it is less clear now, the term ‘reference group’ is a term that all ecotoxicologists will be familiar with): “After the selection of the colonies, four hives were randomly assigned to each the control group and the group in which a toxicant was applied (reference group), respectively and they were placed into the tunnels.”286 - as 143Reply: We have replaced “control” with “control group”.Submitted filename: Response to Reviewers.docxClick here for additional data file.4 Feb 2020Reduction of variability for the assessment of side effects of toxicants on honeybees and understanding drivers for colony developmentPONE-D-19-24362R2Dear Dr. Wang,We are pleased to inform you that your manuscript has been judged scientifically suitable for publication and will be formally accepted for publication once it complies with all outstanding technical requirements.Within one week, you will receive an e-mail containing information on the amendments required prior to publication. When all required modifications have been addressed, you will receive a formal acceptance letter and your manuscript will proceed to our production department and be scheduled for publication.Shortly after the formal acceptance letter is sent, an invoice for payment will follow. To ensure an efficient production and billing process, please log into Editorial Manager at https://www.editorialmanager.com/pone/, click the "Update My Information" link at the top of the page, and update your user information. If you have any billing related questions, please contact our Author Billing department directly at authorbilling@plos.org.If your institution or institutions have a press office, please notify them about your upcoming paper to enable them to help maximize its impact. If they will be preparing press materials for this manuscript, you must inform our press team as soon as possible and no later than 48 hours after receiving the formal acceptance. Your manuscript will remain under strict press embargo until 2 pm Eastern Time on the date of publication. For more information, please contact onepress@plos.org.With kind regards,Nicolas DesneuxAcademic EditorPLOS ONEAdditional Editor Comments (optional):Reviewers' comments:Reviewer's Responses to QuestionsComments to the Author1. If the authors have adequately addressed your comments raised in a previous round of review and you feel that this manuscript is now acceptable for publication, you may indicate that here to bypass the “Comments to the Author” section, enter your conflict of interest statement in the “Confidential to Editor” section, and submit your "Accept" recommendation.Reviewer #1: All comments have been addressed**********2. Is the manuscript technically sound, and do the data support the conclusions?The manuscript must describe a technically sound piece of scientific research with data that supports the conclusions. Experiments must have been conducted rigorously, with appropriate controls, replication, and sample sizes. The conclusions must be drawn appropriately based on the data presented.Reviewer #1: Yes**********3. Has the statistical analysis been performed appropriately and rigorously?Reviewer #1: Yes**********4. Have the authors made all data underlying the findings in their manuscript fully available?The PLOS Data policy requires authors to make all data underlying the findings described in their manuscript fully available without restriction, with rare exception (please refer to the Data Availability Statement in the manuscript PDF file). The data should be provided as part of the manuscript or its supporting information, or deposited to a public repository. For example, in addition to summary statistics, the data points behind means, medians and variance measures should be available. If there are restrictions on publicly sharing data—e.g. participant privacy or use of data from a third party—those must be specified.Reviewer #1: (No Response)**********5. Is the manuscript presented in an intelligible fashion and written in standard English?PLOS ONE does not copyedit accepted manuscripts, so the language in submitted articles must be clear, correct, and unambiguous. Any typographical or grammatical errors should be corrected at revision, so please note any specific errors here.Reviewer #1: Yes**********6. Review Comments to the AuthorPlease use the space provided to explain your answers to the questions above. You may also include additional comments for the author, including concerns about dual publication, research ethics, or publication ethics. (Please upload your review as an attachment if it exceeds 20,000 characters)Reviewer #1: This is nicely completed. All of my concerns/corrections have been addressed and the power curve for MDD will be useful for practitioners and regulators.**********7. PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files.If you choose “no”, your identity will remain anonymous but your review may still be made public.Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy.Reviewer #1: No5 Feb 2020PONE-D-19-24362R2Reduction of variability for the assessment of side effects of toxicants on honeybees and understanding drivers for colony developmentDear Dr. Wang:I am pleased to inform you that your manuscript has been deemed suitable for publication in PLOS ONE. Congratulations! Your manuscript is now with our production department.If your institution or institutions have a press office, please notify them about your upcoming paper at this point, to enable them to help maximize its impact. If they will be preparing press materials for this manuscript, please inform our press team within the next 48 hours. Your manuscript will remain under strict press embargo until 2 pm Eastern Time on the date of publication. For more information please contact onepress@plos.org.For any other questions or concerns, please email plosone@plos.org.Thank you for submitting your work to PLOS ONE.With kind regards,PLOS ONE Editorial Office Staffon behalf ofDr. Nicolas DesneuxAcademic EditorPLOS ONE
Authors: T C M Brock; M Hammers-Wirtz; U Hommen; T G Preuss; H-T Ratte; I Roessink; T Strauss; P J Van den Brink Journal: Environ Sci Pollut Res Int Date: 2014-08-15 Impact factor: 4.223
Authors: G Christopher Cutler; Cynthia D Scott-Dupree; Maryam Sultan; Andrew D McFarlane; Larry Brewer Journal: PeerJ Date: 2014-10-30 Impact factor: 2.984