| Literature DB >> 27480578 |
Teemu D Laajala1,2,3,4, Mikael Jumppanen3,5, Riikka Huhtaniemi3,4,6,7, Vidal Fey3,6,8, Amanpreet Kaur5,9,10, Matias Knuuttila3,4,6, Eija Aho7, Riikka Oksala4,7, Jukka Westermarck5,9, Sari Mäkelä3,11, Matti Poutanen3,6,12, Tero Aittokallio1,2,3.
Abstract
Recent reports have called into question the reproducibility, validity and translatability of the preclinical animal studies due to limitations in their experimental design and statistical analysis. To this end, we implemented a matching-based modelling approach for optimal intervention group allocation, randomization and power calculations, which takes full account of the complex animal characteristics at baseline prior to interventions. In prostate cancer xenograft studies, the method effectively normalized the confounding baseline variability, and resulted in animal allocations which were supported by RNA-seq profiling of the individual tumours. The matching information increased the statistical power to detect true treatment effects at smaller sample sizes in two castration-resistant prostate cancer models, thereby leading to saving of both animal lives and research costs. The novel modelling approach and its open-source and web-based software implementations enable the researchers to conduct adequately-powered and fully-blinded preclinical intervention studies, with the aim to accelerate the discovery of new therapeutic interventions.Entities:
Mesh:
Year: 2016 PMID: 27480578 PMCID: PMC4969752 DOI: 10.1038/srep30723
Source DB: PubMed Journal: Sci Rep ISSN: 2045-2322 Impact factor: 4.379
Figure 1Benefits of the modelling framework over the course of the study period.
The animal baseline matching improves the statistical analysis and design of preclinical animal studies in terms of power calculations, balanced allocations, and intervention blinding (pre-intervention period), as well as through the use of matching information in the statistical testing of the intervention effects (post-intervention period).
Figure 2Optimal matching of animals in the case of orthotopic VCaP mouse xenografts.
The original task was to randomly assign 75 animals into five balanced intervention groups (one control and four treatment groups, each consisting of 15 animals), but here we focus on two of the treatments only (ARN-509 and MDV3100), using a sub-sample of the complete data matrix (see Supporting Fig. S3). (a) Bivariate observations sampled from the VCaP study, illustrating the two selected baseline variables (body weight and PSA). (b) 15 × 15 dimensional distance matrix calculated based on the baseline variables was used as an input to the matching procedure, which solves the optimal animal matching matrix . (c) The optimal submatches from the branch and bound algorithm, which guarantees a globally optimal solution (see Supporting Fig. S7). (d) The optimally matched animals were randomized into the intervention groups via blinded treatment label assignments (coloured points). The baseline matching information was used in the statistical testing of the treatment effects, mainly through paired comparisons between the treated and control animals (solid lines). Alternatively, the model also allows for direct comparisons between the two treatments (dotted lines).
Figure 3Statistical testing of the treatment effects using pairwise matched inference.
(a) The matched inference makes use of the baseline matching information when testing the intervention effects by pairing the observed responses according to the optimal submatches at equal time points. (b) An example of the submatch-based pairing in the MDV3100 vs vehicle comparison, where the example trajectory was previously shown as a single estimate value in the original study23. Complex response differences are better captured when additional baseline information is incorporated into the statistical inference. The paired differences from the longitudinal observations (left panel) construct a single treatment curve for the pairwise matched mixed-effects modelling (right panel). (c) Comparison of the matched and unmatched statistical inference approaches in the MDV3100 vs vehicle comparison. Even if both inference approaches yield rather similar conclusion about the possible intervention effects, the matched approach improves the sensitivity of the detection (right panel). Different aspects of the mixed-effects modelling are visualized based on the observed data (top panel): the full model fit combining both the random and fixed effects (middle panel), and the population inference depicting only the fixed effects along with their interpretation (bottom panel). In the matched inference, the population of paired differences in the intervention effects () is tested against a null hypothesis of no paired differences (y = 0 line). The statistical inference results of the intervention effects are summarized in Table 1, and the full model fits for the four treatment cases are shown in Supplementary Figs S5 and S6.
Mixed-effects model fits for the fixed effects (population inference) and random effects (individual effects and the random error term).
| Fixed effects ( | Random effects (SD) | ||||||
|---|---|---|---|---|---|---|---|
| Model | β | β | β | γ | γ | ε | |
| ARN-509 vs Control | 14.311 (<0.001)*** | 10.062 (<0.001)*** | 8.234 | 5.163 | 5.749 | ||
| 0 (−) | 0 (−) | 7.053 | 8.894 | 8.399 | |||
| MDV3100 vs Control | 13.536 (<0.001)*** | 10.188 (<0.001)*** | 7.635 | 6.259 | 6.395 | ||
| 0 (−) | 0 (−) | 7.013 | 7.401 | 11.247 | |||
| ORX vs Intact | 14.548 (<0.001)*** | 1.336 (<0.001)*** | 14.578 | 0.997 | 8.518 | ||
| 0 (−) | 0 (−) | 4.251 | 2.157 | 9.522 | |||
| ORX+Tx vs ORX | 9.998 (<0.001)*** | 0.122 (0.0675)N.S. | 10.476 | 0.167 | 9.977 | ||
| 0 (−) | 0 (−) | 2.381 | 0.155 | 4.618 | |||
Model estimates and their significance levels using the conventional unmatched and matching-based pairwise models are presented for each intervention comparison separately.
The model term that explicitly tests for an intervention effect is highlighted in bold. N.S., not significant; *p < 0.05; **p < 0.01; ***p < 0.001.
Figure 4Model-based power calculations for sufficient sample size estimation.
Statistical power (the likelihood that a true treatment effect is detected) as a function of the sample size (animals per treatment arm). Power calculations were computed by bootstrap re-sampling, either without the matching information (unmatched) or using the information from the optimal pairs of matched samples (matched). The estimated sample sizes (N) are defined based on the conventional threshold of 0.8 power. (a) ARN-509 and MDV3100 intervention effects in the VCaP mouse xenografts. (b) ORX and ORX+Tx intervention effects in the orchiectomized (ORX) VCaP mouse xenografts.
Experimental design issues in exploratory and confirmatory preclinical studies.
| Design issue | Exploratory study | Confirmatory study | Aims and benefits |
|---|---|---|---|
| Study objective (focus on sensitivity/precision or specificity/generalizability) | Preclinical screening and pathophysiological hypothesis testing ( | Estimating effect size and ensuring clinical translation ( | Sensitivity allows effective search for intervention candidates, while specificity emphasizes translational aspects. Notably, mere statistical significance in preclinical testing does not yet guarantee clinical relevance |
| Example animal models | Traditional cost-efficient models, e.g. subcutaneous xenografts | Translational models, e.g. orthotopic xenografts, PDX, GEMM | Seeking a balance between cost-efficiency and translatability |
| Number of intervention groups (Parameter | High number of candidate intervention groups (Prefer | Carefully selected interventions to be validated (Prefer | High |
| Number of animals in each intervention arm (Parameter | Focus on testing multiple candidate intervention groups at sufficient sample size (medium | High confidence required for true positive effects as well as for effect size estimate (high | Well-characterized animals and sufficient |
| Number of covariates | Many possible confounding covariates, with suspected effect on the primary response (flexible | Ideally only few selected confounding covariates, which affect the representative intervention outcome (low | Matched animals in separate treatment arms allows more accurate inference both in terms of sensitivity and specificity |
| Estimation of sample size for the study and effect sizes for the interventions | Often difficult due to lack of pilot studies for the candidate interventions | Key ingredient in ensuring sufficient statistical power | Sufficient statistical power to identify true intervention effects and reject false effects. Accurate effect size estimation assists in evaluating clinical significance |
| Maximization of the consistency in handling of the individual animals and/or tumours | Relevant in all study aims | Relevant in all study aims | Prevent undesired stratification and false detections due to potential batch-effects |
| Taking into account potential dependence structures (e.g. tumours within the same animal) | Highly dependent on the number of | Highly relevant, e.g. cage-effects are attributed to high attrition rates of preclinical findings | Prevents over-estimation of the required sample size due to so-called pseudo-replication |
Exploratory and confirmatory study aims adopted from Kimmelman et al.29