| Literature DB >> 29871915 |
Abstract
The "reproducibility crisis" in science affects microbiology as much as any other area of inquiry, and microbiologists have long struggled to make their research reproducible. We need to respect that ensuring that our methods and results are sufficiently transparent is difficult. This difficulty is compounded in interdisciplinary fields such as microbiome research. There are many reasons why a researcher is unable to reproduce a previous result, and even if a result is reproducible, it may not be correct. Furthermore, failures to reproduce previous results have much to teach us about the scientific process and microbial life itself. This Perspective delineates a framework for identifying and overcoming threats to reproducibility, replicability, robustness, and generalizability of microbiome research. Instead of seeing signs of a crisis in others' work, we need to appreciate the technical and social difficulties that limit reproducibility in the work of others as well as our own.Entities:
Keywords: American Academy of Microbiology; microbiome; reproducibility; research ethics; scientific method
Mesh:
Year: 2018 PMID: 29871915 PMCID: PMC5989067 DOI: 10.1128/mBio.00525-18
Source DB: PubMed Journal: mBio Impact factor: 7.867
Simple grid-based system for defining concepts that can be used to describe the validity of a result
| Methods | Same experimental system | Different experimental system |
|---|---|---|
| Same methods | Reproducibility | Replicability |
| Different methods | Robustness | Generalizability |
This is a generalization of the approach used by Whitaker (9), who used it to describe computational analyses.
An aspirational rubric for evaluating the practices that host-associated microbiome researchers might use to increase the reproducibility and replicability of their work
| Practice | Good | Better | Best |
|---|---|---|---|
| Handling of confounding variables | Prior to generating data, did we identify a list of possible confounding variables—biological and technical—that may obscure the interpretation of our results? | Do we indicate the level of randomization and experimental blocking that we performed to minimize the effect of the confounding variables? | Does the interpretation of our results limit itself to only those variables that are not obviously confounded? |
| Sex/gender as confounding variables | Do we indicate the sex/gender of research animals/participants? | Do we provide a justification for the lack of even representation? | Is there equitable representation of sexes/genders? Do we account for them as a variable? |
| Experimental design considerations | Do we have an active collaboration with a statistician who helps with experimental design and analysis? | Do we indicate the number of hypothesis tests that we performed and have we corrected any | For our primary research questions, have we run a power analysis to determine the necessary sample size? |
| Data analysis plan | Before starting an analysis, have we articulated a set of primary and secondary research questions? | Has someone else reviewed our data analysis plan prior to analyzing the data? | Have we registered our data analysis plan with a third party before starting the project? |
| Provenance of reagents | Is there a table of reagents such as cell lines, strains, and primer sequences that were used? | Where possible, have we obtained reagents from certified entities like the American Type Culture Collection (ATCC)? | Is there a statement indicating how we know the provenance and purity of each cell line and strain? |
| Controlling for initial microbiota | Are mice obtained from a breeding facility that allows us to track their pedigree? | Where possible, are mice from different treatment groups cohoused to control for differences in initial microbiota? | Are comparisons between mice with different genotypes made using mice that are the result of matings between animals that are heterozygous for that genotype? |
| Clarity of software descriptions | Are all methods, databases, and software tools cited? Do we follow the relevant licensing requirements of each tool? | Do we indicate dates and version numbers of websites that were used to obtain data, code, and other third-party resources? | Are detailed methods registered on a website like protocols.io or GitHub? |
| DNA contamination | Did we quantify the background DNA concentration in our reagents? Did we sequence an extraction control? | Are we taking steps to minimize reagent contamination? | What methods do we take to confirm a result that a sequencing result may be clouded by contaminating DNA? |
| Availability of data products | Are all of the raw data publicly available? | Are intermediate and final data files publicly available? | Are tools like Amazon Machine Images (AMIs) used to make a snapshot of our working directory? |
| Availability of metadata | Are all of the metadata necessary to repeat any analyses that we performed publicly available? | Have we adhered to standards in releasing the minimum amount of metadata about our samples? | Did we go beyond the minimum to incorporate other pieces of metadata that will inform future studies? |
| Data analysis organization | Are all data, code, results, and documentation housed within a monophyletic folder structure on our computer? | Is this project contained within a single directory on our computer, and does it separate our raw and processed data, code, documentation, and results? | Is this folder structure under version control? Is the project’s repository publicly available? Are there assurances that this repository will remain accessible? |
| Availability of data analysis tools | Are free and open tools used in preference to proprietary commercial tools? | Is the computer code required to run analyses available through a service like GitHub? | Are Amazon Machine Images or Docker containers used to allow recreation of our work environment? |
| Documentation of data analysis workflow | Is our code well documented? Do we use a self-commenting coding practice? | Does each of our scripts have a header indicating the inputs, outputs, and dependencies? Is it documented how files relate to each other? | Are automated workflow tools like GNU Make and Common Workflow Language used to convert raw data into final tables, figures, and summary statistics? |
| Use of random number generator | Do we know whether any of the steps in our data analysis workflow depend on the use of a random number generator? | For analyses that utilize a random number generator, have we noted the underlying random seed? | Have we repeated our analysis with multiple seeds to show that the results are insensitive to the choice of the seed? |
| Defensive data analysis | Is our data analysis pipeline flexible enough to add new data? | Does our code include tests to confirm that it does what we think it does? | Did we make use of automated tests and continuous integration tools to ensure internal reproducibility? |
| Ensuring short- and long-term reproducibility | Did we release the underlying code and new data at the time of submitting a paper with their DOIs and accession numbers? | Did we include a reproducibility statement or declaration at the end of the manuscript? Are ORCID identifiers provided for all authors? | What mechanisms are in place to ensure that our analysis remains accessible and reproducible in 5 years? |
| Open science to foster reproducibility | Have we released any embargoes on our code repository and raw data prior to submitting the manuscript? | Did we post a preprint version of our manuscript prior to submission? | Have we published under a Creative Commons license? Is a permissive reuse license posted with our code? |
| Transparency of data analysis | Is it clear where one would go to find the data and processing steps behind any of our figures? | Are electronic notebooks publicly accessible, and do they accompany the manuscript? | Were literate programming tools used to generate summary statistics, tables, and figures? |
Although many of the questions can be thought of as having a yes-or-no answer, a better approach would be to see the questions as being open ended with the real question being “What can we do to improve the status of our project on this point?” With this in mind, a researcher is unlikely to have a project that satisfies the “Best” column for each line of the table. Researchers are encouraged to adapt and modify the categories to suit their own needs.