| Literature DB >> 25951517 |
Nicholas Gd Masca1, Elizabeth Ma Hensor2, Victoria R Cornelius3, Francesca M Buffa4, Helen M Marriott5, James M Eales6, Michael P Messenger7, Amy E Anderson8, Chris Boot9, Catey Bunce10, Robert D Goldin11, Jessica Harris12, Rod F Hinchliffe13, Hiba Junaid14, Shaun Kingston15, Carmen Martin-Ruiz16, Christopher P Nelson17, Janet Peacock18, Paul T Seed19, Bethany Shinkins20, Karl J Staples21, Jamie Toombs22, Adam Ka Wright23, M Dawn Teare24.
Abstract
Lack of reproducibility is an ongoing problem in some areas of the biomedical sciences. Poor experimental design and a failure to engage with experienced statisticians at key stages in the design and analysis of experiments are two factors that contribute to this problem. The RIPOSTE (Reducing IrreProducibility in labOratory STudiEs) framework has been developed to support early and regular discussions between scientists and statisticians in order to improve the design, conduct and analysis of laboratory studies and, therefore, to reduce irreproducibility. This framework is intended for use during the early stages of a research project, when specific questions or hypotheses are proposed. The essential points within the framework are explained and illustrated using three examples (a medical equipment test, a macrophage study and a gene expression study). Sound study design minimises the possibility of bias being introduced into experiments and leads to higher quality research with more reproducible results.Entities:
Keywords: experimental design; human biology; interdisciplinary research; medicine; none; reproducibility; science forum; statistical design
Mesh:
Year: 2015 PMID: 25951517 PMCID: PMC4461852 DOI: 10.7554/eLife.05519
Source DB: PubMed Journal: Elife ISSN: 2050-084X Impact factor: 8.140
RIPOSTE discussion framework
DOI: http://dx.doi.org/10.7554/eLife.05519.002
| Item | Prompt/Consideration | Details (relevance of question will depend on study type) |
|---|---|---|
| Aims and objectives | Define the key aims of the study | What does the study ultimately aim to show? |
| What are the primary and any secondary objectives? | ||
| Outcomes, interventions and predictors of interest | Identify the variables and quantities/qualities of interest that will be measured (these may be different for each hypothesis) | What is the primary outcome/response variable? |
| Are there any secondary outcomes you also wish to measure and/or assess? | ||
| What are the key interventions/groups/predictors you will be testing? | ||
| Research questions/Hypotheses | List the research question(s) that will be addressed and/or any hypotheses that you would like to test | The research question(s) should be defined in such a way that they |
| - relate directly to the study objectives | ||
| - relate to a specific outcome (or set of outcomes) and specific comparisons/predictors | ||
| Each hypothesis should | ||
| - be clearly testable | ||
| - indicate what signifies a positive result for example, what is the minimum effect you would deem important? | ||
| Logistical considerations | Ethical approval | Will ethical approval be required for the study? |
| - Will statistical support be required for the ethics application? | ||
| Statistical support | What level of ongoing statistical support is available for this study? | |
| Data collection and management | How will the data be recorded and stored—will this require construction of a database? | |
| What steps will be taken to validate the data entered against what was collected? | ||
| Who will be responsible for data entry and validation? | ||
| Will any additional information (‘meta data’) be recorded to indicate data quality? | ||
| Materials and techniques | Laboratory equipment and methods | What specialist equipment and/or techniques will be used? |
| Are there any aspects of these that may impact or limit the design of the study? | ||
| Configuration and standardisation of materials and methods | Is there an accepted or validated way to measure the outcomes for this specific study or preliminary work be required to determine this? | |
| What are the possible sources of variation or systematic bias between samples/batches/observers/laboratories/centres? | ||
| Are any aspects susceptible to systematic variation and/or bias? What steps will be taken to minimise measurement bias and variation with consideration to: | ||
| - Technical factors—such as sample collection, processing, storage and analysis? | ||
| - Biological factors—which may include the effects of comorbidities, diet, medications, stress, biological rhythms etc, on the measurement variable? | ||
| Possible steps to consider in addressing these sources of variation might be the use of existing standards for sample processing or analysis (e.g., BRISQ, ISO, ASTM or CLSI), equipment calibration and maintenance, user training, randomisation of interventions. | ||
| Software | What software (if any) will be used during data processing/collection/storage? | |
| What software will be used during data analysis—will specialist software be required? | ||
| Does the software conform to any quality assurance standards, if applicable? | ||
| Is the software up-to-date? | ||
| What constraints/limits are there to the available resources? | What constraints are there? For example, due to cost and/or time | |
| - Are there any limits in terms of the available equipment (e.g., number of plates/chips) or materials (e.g., binding agents/gels)? | ||
| - What would be the maximum number of samples that could be used/processed given the available resources and time? | ||
| Design | Units of measurement | What are the sampling units in the study (e.g., blood samples from individuals)? |
| Will the units be organised according to any structure (e.g., onto plates, chips, and/or into batches) or clustered/correlated in any way (e.g., samples from different centres), or within families, matched or paired samples/measurements? | ||
| Will any repeated or replicate samples be taken? For example, any measurements over time; any biological replicates; any technical replicates. | ||
| Are there any inclusion/exclusion criteria? | ||
| Randomisation | Will any interventions or conditions be allocated at random to the units? | |
| - If so, how? (e.g., method of random allocation and process of generating random numbers) | ||
| - If not, why not? | ||
| Are there any other possible confounders (e.g., batches or plates) to which the units may need to be randomly allocated? | ||
| Blinding (masking) | Will blinding be used? If not, why not? | |
| Who will be blinded and how? | ||
| How will allocation be concealed and how will masking be maintained? | ||
| Under what circumstances will the data be unblinded? | ||
| Groups, treatments, and other predictors of interest | What are the primary groups or treatments of interest? | |
| What is your control or comparison group? | ||
| Are there multiple independent variables to assess simultaneously (for example, treatment and time)? If so, will a factorial design be used (involving testing all levels of each variable with all levels of each other)? | ||
| Are there any interactions of interest (which may, for example, lead to factorial designs)? | ||
| Use of analytical controls | What analytical controls will be used? For example, qualitative (positive/negative) and/or quantitative quality controls (QCs); comparative/normalisation controls | |
| How will the controls be used/for what purpose? | ||
| Other potential biases, confounders and sources of variability | Will you take any steps to minimise any background noise/variation? | |
| Will you measure and take into account any potential confounding variables? For example, the age and sex of any participants; batch/plate/chip effects; etc. | ||
| Sample size considerations | Sample size will depend on the primary objective of the study, whether the aim is to test hypotheses, estimate a quantity with specified precision or assess feasibility | |
| Hypothesis testing: | ||
| - Is there a single pre-specified primary hypothesis? Is a correction for multiple testing required? | ||
| - What signifies a positive result (e.g., the minimum effect size, margin of agreement)? | ||
| - What existing data are available to base the sample size calculation on? (e.g., SD of outcome) | ||
| - What power and overall level of significance will be used? Will one or two tailed tests be used? | ||
| Feasibility, pilot and proof of concept: | ||
| - Understanding sources of variation (e.g., standard deviation of the outcome) | ||
| ▪ The sample size needs to be large enough to give an accurate estimates of any components of variation | ||
| - Estimating with precision (e.g., proportion of samples that pass QC) | ||
| ▪ What is the acceptable precision (e.g., width of confidence interval) required? | ||
| - Preliminary proof of effect (e.g., superiority of a new cell extraction technique) | ||
| ▪ What probability needs to be set to observe the correct ordering of your outcomes? | ||
| ▪ What level of significance would provide enough evidence to progress to fully powered study? | ||
| Data assessment and preparation | QC criteria | What pre-specified criteria will be used to assess data from quantitative analytical QCs? |
| What pre-specified criteria will be used to assure the reproducibility of results? | ||
| - Will any thresholds be set to screen or benchmark data quality (e.g., setting a maximum coefficient of variation that would be deemed acceptable)? | ||
| Data verification | Have you allowed time for data validation and correction to be completed prior to analysis? | |
| Data normalisation/correction | Will the data be normalised or transformed in any way? If so, how? | |
| Outliers | What methods and criteria will be used to identify any outlying data? | |
| Statistical methods | Describe the different analyses to be performed | Which models or tests will be used (e.g., t-tests; ANOVA; mixed effects models etc)? |
| - Do these methods appropriately handle any repeated or correlated measurements? | ||
| What assumptions do the statistical methods rely upon? How will these be assessed? Do the data require any transformation? | ||
| Which comparisons will be made? For example, will all pairs of treatments be compared, or will each treatment just be compared to a control? | ||
| What covariates will be adjusted for? | ||
| If applicable, what model terms will be fitted, for example, which main effects and interactions, which fixed and/or random effects? | ||
| Will sensitivity analyses be performed to assess the validity of the findings? | ||
| Missing data | What might be the reasons for missing data? | |
| How will missing data be handled, for example, will missing data points be excluded or imputed? | ||
| Multiple testing | Will a correction for multiple testing be required? If so, how many tests will be accounted for? | |
| Which adjustment for multiplicity will be used, for example, Tukey, Bonferroni, false-discovery rate | ||
| Interim analysis | Will interim analyses be performed (before the full number of samples dictated by the sample size calculation has been collected)? If so, for what purpose (e.g., to update the required sample size)? | |
| Have any necessary adjustments to the sample size been made to account for the interim analysis? | ||
| Replication and/or validation | Is there an intention to replicate the results (e.g., in an independent set of samples)? | |
| In there an intention to validate the results (e.g., using a different technique or method of analysis)? | ||
| Guidelines/standards | Identify relevant reporting standards | What are the most appropriate reporting guidelines or standards that apply to the study design (e.g., BRISQ, MIFlowCyt and see |
This framework is intended to support discussion within the research team as a whole, including the statistician.
Box 1 Figure 1.Equipment set-up for elastomer pump experiment.
DOI: http://dx.doi.org/10.7554/eLife.05519.005
Two potential study designs in which either (a) four pumps and four catheters of the same type are tested simultaneously or (b) pump and catheter types are balanced during each 48 hr period of data collection, assuming only four pump-catheter units can be used concurrently and each is tested for 48 hr, three times in succession
DOI: http://dx.doi.org/10.7554/eLife.05519.004
| Arrangement | Duration | Bench 1 | Bench 2 | Bench 3 | Bench 4 |
|---|---|---|---|---|---|
| Suboptimal design with potential for confounding | |||||
| 1 | 48 hrs × 3 | ||||
| 2 | 48 hrs × 3 | ||||
| 3 | 48 hrs × 3 | p | p | p | p |
| 4 | 48 hrs × 3 | p c | p c | p c | p c |
| Optimal, balanced design | |||||
| 1 | 48 hrs × 3 | p | p c | ||
| 2 | 48 hrs × 3 | p | |||
| 3 | 48 hrs × 3 | p c | p | ||
| 4 | 48 hrs × 3 | p | p c | ||
There are four benches of equipment being tested, each with one of each type of pump and one of each type of catheter (P = existing pump; p = new pump, C = existing catheter; c = new catheter).
Box 2 Figure 2.Infection and treatment of donor macrophages.
DOI: http://dx.doi.org/10.7554/eLife.05519.008
Box 2 Figure 1.Production of differentiated macrophages from donor samples.
DOI: http://dx.doi.org/10.7554/eLife.05519.007
Commonly encountered examples of analytical controls
DOI: http://dx.doi.org/10.7554/eLife.05519.011
| Control type | Purpose |
|---|---|
| QCs | Qualitative QCs typically indicate whether specific aspects of the experimental and/or analytical procedure work in the intended ways, and are often included in the same analytical run used to collect study data. For example, a negative control may be a sample or unit that is known to be negative for the outcome and, hence, should assign a negative measurement in the assay. In contrast, a positive control would be expected to assign a positive result. |
| Quantitative QCs are used to monitor the performance of a quantitative measurement system and ensure that it is performing within acceptable limits. Typically quantitative QC samples are run at two or more concentrations across the range of the assay and interpreted using graphical and statistical techniques, such as Levy-Jennings plots and Westgard rules. QC materials are generally not used for calibration in the same process in which they are used as controls. | |
| In instances where any QC checks fail, certain aspects of the experimental procedure may have to be altered in order to remedy the problem or one or more units associated with the violation may have to be reprocessed until satisfactory checks are achieved. | |
| Comparative/normalisation controls | These can be alternative physical or biochemical parameters measured alongside the analyte of interest usually within the same sample, for the purposes of normalisation and/or correction. For example, in RT-PCR housekeeping genes are usually amplified as well as targets of interest, with the final output expressed as a ratio between the target and the housekeeping gene. |