Literature DB >> 34541703

Improving rigor and reproducibility in nonhuman primate research.

Eliza Bliss-Moreau^1,2, Rama R Amara³, Elizabeth A Buffalo^4,5, Ricki J Colman^6,7, Monica E Embers⁸, John H Morrison^1,9, Ellen E Quillen¹⁰, Jonah B Sacha^11,12, Charles T Roberts¹¹.

Abstract

Nonhuman primates (NHPs) are a critical component of translational/preclinical biomedical research due to the strong similarities between NHP and human physiology and disease pathology. In some cases, NHPs represent the most appropriate, or even the only, animal model for complex metabolic, neurological, and infectious diseases. The increased demand for and limited availability of these valuable research subjects requires that rigor and reproducibility be a prime consideration to ensure the maximal utility of this scarce resource. Here, we discuss a number of approaches that collectively can contribute to enhanced rigor and reproducibility in NHP research.

Entities: Chemical

Keywords: biomedical research; data sharing; nonhuman primates; preregistration; quality assurance

Mesh：

Year: 2021 PMID： 34541703 PMCID： PMC8629848 DOI： 10.1002/ajp.23331

Source DB: PubMed Journal: Am J Primatol ISSN： 0275-2565 Impact factor: 3.014

INTRODUCTION

Science is in the midst of a major paradigm shift. Multiple scientific disciplines are increasingly facing and addressing, growing concerns about the meaning and impact of their findings. The extent to which scientific research is rigorous—robust and unbiased—and reproducible—able to be repeated biologically in the lab, analytically with the original data, systemically under different conditions, or conceptually at the level of the biological phenomenon—has an impact on almost every facet of modern life. Physical, biological, and social scientists have all begun conversations about how to improve research by creating systems and incentives to promote robustness and transparency in response to the unique challenges faced by each domain. So great is the concern about replication issues in biomedical science that funding agencies around the world have stepped in to create change via a focused effort that outline guidelines for rigor and reproducibility and/or requires their investigators to develop and adhere to procedures to ensure rigor and reproducibility in their research (e.g., the US National Institutes of Health [NIH], https://grants.nih.gov/policy/reproducibility/index.htm; for UK funders see https://acmedsci.ac.uk/policy/policy-projects/reproducibility-and-reliability-of-biomedical-research). The major stakeholders and their respective roles in this effort are illustrated in Figure 1. This issue has recently been addressed by the National Academy of Sciences in a comprehensive report (National Academies of Sciences & Medicine, 2019), which defines reproducibility as a computational matter and replicability as an experimental consistency matter.

Figure 1

Stakeholders and their roles in supporting rigor and reproducibility in nonhuman primate research

Stakeholders and their roles in supporting rigor and reproducibility in nonhuman primate research Researchers using nonhuman primates (NHPs) face particular challenges in addressing rigor and reproducibility that are of less concern in rodent or in vitro studies. For example, significant concerns have been raised about rigor in psychology research with humans, leading to a massive shift in the norms for sample sizes (Sassenberg & Ditrich, 2019). It is now accepted that sample sizes must be much larger than they once were, and decisions about sample sizes must be made a priori based on the stated study outcomes. Increasing sample size may not be possible for scientists working with NHPs for ethical, logistical, or fiscal reasons. Research carried out with animals, including NHPs, is subject to the “3Rs”—Replacing animals in research when possible, Reducing sample sizes, and Refining techniques (Tannenbaum & Bennett, 2015). These ethical constraints must be balanced with rigor and reproducibility considerations, as underpowered studies waste resources and animals. It should be noted that decisions about sample sizes involve choosing an appropriate sample size that produces reliable data, rather than reducing numbers per se. Achieving the balance between ethical considerations and, in particular, reducing the number of animals used in research, and carrying out rigorous research that is appropriately powered, has appropriate controls, and is reproducible, is particularly challenging in work that involves NHPs—one of the most appropriate animal models for many human disease‐related processes (Capitanio & Emborg, 2008; Estes et al., 2018; Phillips et al., 2014). Our goal in this review is to consider the challenges and opportunities to increase rigor and reproducibility in NHP research. To that end, we first briefly discuss NHPs as a model for human health and disease and then discuss themes in rigor and reproducibility that are consistent across the spectrum of scientific disciplines that work with them. We next highlight specific lessons learned from a number of disciplines that use the NHP model at the National Primate Research Centers (NPRCs), with the goal of expanding these best practices to the wider biomedical research community. Rigor and reproducibility in NHP research have been addressed in a recent NIH workshop (https://osp.od.nih.gov/wp-content/uploads/NHP_Workshop_Report_FINAL_20200218.pdf), and some of these considerations with respect to NHP vaccine studies, in particular, have been recently discussed (Prescott et al., 2021).

NHPS AS BIOMEDICAL RESEARCH MODELS

NHPs represent a small proportion of laboratory animals used in studies of human health and disease. According to the most recent (2018) USDA annual report of animal usage, less than 9% of the 792,000 USDA‐covered research animals (i.e., subject to the Animal Welfare Act) used in research in the US were NHPs. It is important to note that laboratory mice (Mus sp.) and rats (Rattus sp.), fish, and many other laboratory species are not USDA‐covered research animals; thus, the proportion of NHPs in the laboratory relative to all laboratory animals is much lower than 9%. In spite of comprising such a small proportion of animal research subjects, NHPs garner significant attention from both regulatory bodies and the public. Their contribution to biomedical science is particularly important based on the extensive similarity in NHP and human physiology and behavior. A full discussion of the specific features of NHPs that make them good models for human health is beyond the scope of this review; a number of recent reviews discuss the similarities between NHPs and humans, highlighting the importance of NHP work for a wide variety of health and disease topics, including cardiac health, genetics (including CRISPR and other genetic engineering approaches), infectious disease (e.g., treatments and vaccine development), immunology, diseases of aging (e.g., cognitive decline, obesity, metabolic disease), respiratory diseases (e.g., asthma), and the neurobiology of psychological disorders (Buffalo et al., 2019; Capitanio & Emborg, 2008; Estes et al., 2018; Miller et al., 2017; Phillips et al., 2014). In spite of the significant strengths of the NHP model, NHP research faces a number of notable constraints. Most NHPs are long‐lived, have long periods of early development during which they are dependent on parental care, and exhibit robust cognitive, affective, and social repertories (Phillips et al., 2014). These features make them good models for humans, but also engender the need for additional protection not typically afforded to many other widely employed animal models (e.g., rodents, zebrafish, Drosophila). Sample sizes—typically small—are limited by both ethical and practical constraints, in that their care requires significant and unique infrastructure and resources, including enriched laboratory environments and highly trained personnel. NHPs used in health‐related research develop more quickly than humans, with specific developmental speeds varying by genus and species. For example, macaques develop approximately 3–4 times faster than humans (Machado, 2013; Suomi, 1999), allowing for lifespan studies within an investigator's career. This allows powerful longitudinal studies that can span long periods of time (up to decades) and thus requires significant foresight in the study and long‐term care planning. It also means that a single animal may participate in a number of different studies across its lifespan, each of which has the potential to impact the others, thus creating variability in outcomes. Simultaneously, inherent variability in biology and behavior is the hallmark of the NHP model, insofar as NHP models are outbred, rather than inbred like many rodent models. This results in intrinsic “heterogenization,” which is increasingly considered a strength in experimental models (Richter, 2017; Voelkl et al., 2020). This strength of the outbred NHP model in terms of translatability to human health and disease requires proper statistical modeling that embraces variability by modeling it appropriately and with potentially larger sample sizes than typically used. Additionally, as science progresses, so too does our understanding of what features of individuals and their environments may create or sustain individual variation. For example, the norm in research with macaques for decades was nursery rearing (rearing by a human experimenter or with same‐aged peers) and housing individuals singly in rooms that included other animals (DiVincenti & Wyatt, 2011; Schapiro et al., 2000). A large body of work now demonstrates that these conditions result in dysregulated biological and behavioral development, which compromise the wellbeing of subjects and also may make them less ideal models for human health and disease (Capitanio et al., 2006; Gottlieb et al., 2013). Although there remain certain instances in which nursery rearing may be desirable (i.e., studies of developmental disorders, infant infectious disease, and generation of specific‐pathogen‐free animals), this should only be done with compelling justification. At the very least, reporting, and, when possible, controlling for this variation in housing, is important for understanding health and wellbeing outcomes. For example, AJP now requires reporting of the exact social conditions of housing even though such details are recommended and not required by the Animal Research: Reporting of In Vivo Experiments (ARRIVE) guidelines. Similarly, the impact of sex and age on immune system function and response to infection is becoming clearer (Giefing‐Kröll et al., 2015; Haberthur et al., 2010; Ingersoll, 2017). However, these and other relevant variables are not always considered when designing studies or documented upon publication. Thus, guidelines for rigor and reproducibility in NHP research must account both for concerns NHPs share with other model organisms—like sex as a biological variable—and other unique features of the NHP model ‐ like inclusion in multiple studies over the lifespan and a relatively outbred genome. Furthermore, given that NHPs are the subjects in a wide variety of studies in disciplines as varied as infectious disease, metabolic disease, neuroscience, and behavior, formulating guidelines that cover multiple disciplines is of the utmost importance. One potentially rich testing ground for establishing multidisciplinary guidelines for rigor and reproducibility for NHP research is the NPRC consortium, which is composed of the seven NIH‐supported NPRCs, and which is, by design, multidisciplinary. Although NHPs are research subjects in a wide variety of laboratories and centers around the globe, the NPRC consortium represents a fairly unique system and environment for carrying out NHP work (nprc.org; NPRCresearch.org). The NPRCs are directly funded by the NIH, originally via congressional legislation in the 1960s that established the “regional primate research centers” (subsequently renamed from Regional to National), and are coordinated as a network via administrative and scientific processes, allowing for complementary resource development, policy, and integrative science. There are currently seven NPRCs (Yerkes in Georgia; Tulane in Louisiana; Southwest in Texas; and California, Oregon, Washington, and Wisconsin in their respective states). Specific NHP resources and scientific focus vary across centers by design. All NPRCs are united by a mission to improve human health while providing environments that promote NHP well‐being and facilitate large‐scale, collaborative, interdisciplinary research. NPRC principal investigators are often affiliated with NPRC‐associated academic institutions and broadly support both basic and translational research via “affiliate scientist” programs that allow external investigators to carry out NHP work at the centers. In some cases, significant breeding populations at the centers provide NHP resources to laboratories outside the NPRC system as well. In addition to providing unique resources and rich intellectual environments for those carrying out this work, NPRCs are able to advance science and ultimately empirically guide policy around the care and use of NHPs in research, because their focus is almost exclusively NHPs.

TRANSPARENCY AS A PRIMARY WAY TO IMPROVE RIGOR AND REPRODUCIBILITY

For the purposes of this review, when we discuss research characterized by rigor, we are referring to research that is: (a) well‐planned with regard to the core questions such that bias is minimized; (b) well‐executed with regard to the experimental design; and (c) appropriately analyzed and interpreted without bias. In this way, rigor ensures the robustness of research. When we discuss reproducible research, we are referring to the ability of scientific work to be repeated, yielding the same or biologically consistent outcomes. Efforts to improve rigor and reproducibility are often, if not always, grounded in the assumption that rigorous research is more likely to be reproducible (Munafò et al., 2017). It is important to note that the extent to which research is rigorous and reproducible says little about the extent to which it is translatable (from animal model to human) (Munafo & Davey Smith, 2018). For example, mouse models of sepsis largely converge on a small set of mechanisms that generate this significant health issue, yet research carried out in murine models does not consistently translate to humans (Seok et al., 2013; Stortz et al., 2017). This distinction between research that is evaluated to be rigorous and that can be reproduced and research that translates to humans is similar to the distinction between precision (hitting the same target over and over again) and accuracy (hitting the right target). Improving research generally, and NHP research, in particular, requires addressing both precision (rigor and reproducibility) and accuracy (translatability). The latter is outside the realm of discussion here, with the exception of noting the power of the NHP model as discussed previously. Although the specific approach to improving rigor and reproducibility may ultimately be tailored to individual scientific disciplines (because norms vary with regard to what it means to be robust), a number of themes are relevant to all scientific disciplines and are also relevant to work conducted with NHPs. Ultimately, guidelines for improving rigor and reproducibility, such as those set forth by the NIH and scientific societies, focus on the goal of increasing the robustness of the science and improving the ability to evaluate its strength. Mechanisms that increase transparency and provide pathways to enhance the rigor and reproducibility of research processes include methods, design, and analytical approaches that can be evaluated separately from research outcomes (Collins, 2014; Landis et al., 2012).

PROVIDING ENOUGH DETAIL FOR PROPER EVALUATION

A consistent theme that arises in addressing reproducibility issues has been an inability to replicate published findings due to insufficient detail provided in the original published reports. The need for sufficient detail is driven not only by the goal of reproducibility, but it is also necessary for the scientific community to determine whether the research was conducted with rigor and whether or not the findings are valid. Determining the robustness of research is particularly important in cases where experiments generate null results. The null results of robust experiments may be interpreted as the absence of an effect, whereas the null results of experiments that are not robust may emerge, not because the effect does not exist, but rather because the experiment failed to elucidate the effect. Although this may be the normative level of reporting taught in our laboratories, the reality is that many experimental and analytic details do not make it into print. This may reflect journal reporting standards, biases in peer review, or failure to recognize that certain types of details (e.g., animal experimental rearing and experimental histories) may be critically important, both for shaping the experimental outcomes of a given study and generalizing its findings. At first blush, the solution to solving the transparency issue may be to simply report more detail in methods and data analyses sections and to share data. This raises important secondary questions: What additional details need to be reported, the mechanisms by which it is reported, and the processes by which data should be shared? At least three established efforts that improve transparency can be applied to NHP research: establishing normative protocols for carrying out work, including tracking and reporting relevant details of study design; preregistration; and open science practices that include establishing resources for and carrying out data sharing. We address these three areas below.

Establishing normative protocols for experimentation and reporting

One of the historical approaches to addressing issues related to rigor has come from vested parties (e.g., societies and scientific journals) that have developed guidelines for assay or protocol standardization and reporting, sometimes under the definition of resource authentication. Although not specific to NHP research, these guidelines are present in disciplines that use NHPs in research and that are represented at the NPRCs. For example, guidelines on verifying and authenticating antibodies were published both in Endocrinology (Gore, 2013) and the Journal of Comparative Neurology (Saper, 2005). Endocrinology guidelines stipulate that scientists must provide verification that a given antibody binds to a specific target antigen in both experimental and control cases and provides a number of examples of ways this can be done and what must be reported in publications. The Journal of Comparative Neurology guidelines stipulate four pieces of information that must be considered and addressed: identifying information for the antibody; information about the preparation of the antibody; information about how the specificity of the antibody was determined; and controls that are present for immunostaining. In a similar vein, the Journal of Clinical Endocrinology and Metabolism at the same time instituted a requirement for determination of steroid hormone levels using liquid chromatography‐tandem mass spectrometry (Handelsman & Wartofsky, 2013). Scientific societies and journals can also influence what details get reported in publications by establishing reporting standards for scientific procedures that are common to their communities and then requiring that reporting as a condition of publication or presentations. These reporting standards increasingly take the shape of presubmission checklists that provide specific details about what must be reported and require authors to indicate that they have reported those details or provide specific information about why they are not reporting them. Individual journals such as Circulation Research now require the use of checklists to address methodological transparency, to ensure that adequate information about subjects (including animals) is shared, and mandate data sharing (Bolli, 2017). Journals from Cell Press, the family of Nature journals, and BioMed have developed and adopted reporting checklists specific to their own journals, focused largely on reporting methodological details (Marcus, 2016). Cross‐journal efforts also demonstrate the potential for checklists that have a greater normative reach. For example, with the goal of establishing minimum reporting standards in life sciences, the Materials Design Analysis and Reporting (MDAR) Checklist was tested by 13 journals of various scope (http://blogs.nature.com/ofschemesandmemes/2019/10/21/journals-test-the-materials-design-analysis-reporting-mdar-checklist). The MDAR (https://osf.io/bj3mu/) asks authors to report information on antibodies (source, reagents), cell materials, experimental animals (including species, sex, strain, origin), plants and microbes, human participants, step by step study and laboratory protocols, study design (sample size, randomization, blinding, inclusion/exclusion criteria), in‐laboratory replication, ethics, attrition, statistics, data availability, and code availability. Piloted with 289 manuscripts across the journals, 80% of authors and editors reported that the checklist was useful, and revisions of the MDAR are being undertaken based on the pilot study. These observations are echoed in empirical studies that demonstrate that publications whose editorial process included checklists, compared to manuscripts at the same journals which did not require checklists, report more methodological details, including those typically deemed necessary to evaluate the robustness of research (Han et al., 2017; NDQIP Collaborative Group, 2019). Particularly germane to NHP research are the ARRIVE guidelines and checklist developed by the United Kingdom's National Center for the Replacement, Refinement & Reduction of Animals in Research (NC3Rs), and translated from English into nine languages (https://www.nc3rs.org.uk/arrive-guidelines) (Kilkenny et al., 2010). Originally published in 2010, the ARRIVE guidelines were developed in response to a survey that identified serious omissions in reporting of studies that were publicly funded in the United Kingdom and United States; they were developed specifically to fill a gap in other checklists that do not require adequate information specifically related to carrying out live animal research (Kilkenny et al., 2009). Like the checklists described above, the ARRIVE checklist is designed to be used when manuscripts are being prepared for submission although the guidelines cover each section of a standard manuscript. It is important to note that a randomized control trial of the ARRIVE guidelines suggested minimal, if any, improvement in reporting (Hair et al., 2019), demonstrating that changing the norms of reporting in animal science may be particularly difficult (Enserink, 2017). Guided by two randomized control trials (one in collaboration with PLoS One) and information provided by users and journal editors, the ARRIVE guidelines were recently updated (https://www.nc3rs.org.uk/revision-arrive-guidelines) (Percie du Sert et al., 2020). The update includes elaboration of what each criterion means and explains the rationale for including it, revision of some of the items, and organization of the original set of items into two sets that vary in priority. Criteria are divided into “essential” and “recommended,” and the recent NIH report on enhancing rigor in animal research recommends compliance with the ARRIVE 2.0 guidelines (https://acd.od.nih.gov/documents/presentations/06112021_RR-AR%20Report.pdf). One additional change that might make the ARRIVE guidelines better suited to NHP work would be explicit consideration of the long lifespan of NHPs. A given animal may be a participant in multiple studies over its lifetime, with the data distributed across multiple laboratories and publications; reporting this may be important for understanding variability in experimental outcomes. The reporting guidelines and checklists discussed above represent efforts to clarify experimental details at the time of publication, an important step towards transparency. Although the guidelines improve our ability to evaluate published science, they do not necessarily improve the integrity of the science that is carried out if they are not used at the time of experimental design and implementation. There has been some movement on the idea that checklists should be used when designing studies, and at least one example exists in the domain of animal research. The Planning Research and Experimental Procedures on Animals: Recommendations for Excellence (PREPARE) guidelines, which have their origins at the Norwegian School of Veterinary Science, propose 15 categories that should be considered when designing a study. These categories of information cover the entire research process, from the initial literature search at the conception of the study to necropsy (Smith et al., 2018). Adopting such guidelines consistently at the experiment preparation phase represents one potentially valuable step forward in ensuring that work that is carried out will be robust.

Preregistration

One process that has been offered as a solution to improve transparency and to ensure rigor and reproducibility is preregistration (Nosek & Lakens, 2014; Nosek et al., 2018). The central premise of preregistration is that aspects of experimental design and analysis are documented before data collection and/or data analysis occurs. In some cases, these documents are peer‐reviewed before the work being carried out (e.g., Part 1 of Registered Reports) to inform how the work is done (Nosek & Lakens, 2014). This allows for the methods and analysis protocol to be evaluated separately from the scientific outcomes. If the protocol is deemed robust, then the results can be published in participating journals regardless of whether there are statically significant effects or null results. For some journals, these documents are reviewed as part of a standard peer‐review process. Preregistration typically occurs using standardized templates that request particular information about experimental design, samples, sample sizes, and their calculation, and intended analyses that are housed on servers that date and time stamp their submission, even if they are not made immediately public. The major champion of preregistration has been the Center for Open Science (https://osf.io/), which offers templates for preregistration (although none specific to animal research at the time of this writing). The forms of preregistration vary in terms of the amount of information that is disclosed and when it is disclosed in the publication process, so the process can be adapted to meet the needs of a wide variety of stakeholders. Preregistration requires scientific planning as well as articulation of that plan in a way that can be evaluated and published. Preregistration templates encourage documentation of which analyses are done in the context of discovery (exploratory) vs those that are designed to test a specific hypothesis (confirmatory). This serves to document what scientists intend to do to prevent HARKING (hypothesizing after results are known; Kerr, 1998) and p‐hacking (carrying out statistical analyses until a significant result is found (Simmons et al., 2011)—two processes that have contributed to the reproducibility crisis. Preregistration does not, however, necessarily solve the problem of carrying out poorly theorized, modeled, or informed work (Smaldino, 2019).

Data sharing

Data sharing can enhance rigor and reproducibility in a number of ways, including allowing broad scientific communities access to data to carry out independent analyses to confirm published effects and/or build upon established datasets. Models for data sharing exist in multiple scientific domains (e.g., neuroscience: https://www.nwb.org/, and immunology https://www.immuneprofiling.org/hipc/page/show), but are arguably most well‐established in genomics. In fact, replication failures were one of the reasons that data‐sharing norms were proactively changed in genetics as a mandate from the NIH. The “replication crisis” has been raging in genetic association studies for almost two decades (Hirschhorn & Altshuler, 2002). In 2007, NCI and NHGRI hosted a working group on replication in association studies and developed a list of study details, data issues, methodological disclosures, and deposition requirements that should be considered by authors, reviewers, and journals when evaluating published association studies. Additionally, they set standards for the replication of associations (NCI‐NHGRI Working Group, 2007), but replication remains difficult in NHP work due to the challenges of accessing sufficient animals. In 2008, the NIH began mandating the deposition of genetic data in publicly accessible databases (NOT‐OD‐07‐088; https://grants.nih.gov/grants/guide/notice-files/not-od-14-124.html), a requirement enforced by most major journals. The final version of the Genomic Data Sharing Policy was published in 2014 (79 FR 51345; https://www.federalregister.gov/documents/2014/08/28/2014-20385/final-nih-genomic-data-sharing-policy). One of the most popular databases for deposition of basic genotype data, NCBI's dbSNP, closed to nonhuman data in 2017 and, as a result, the European Variation Archive hosted by the European Bioinformatics Institute is now the primary repository of genomic variation for many nonhuman species. However, for rhesus macaques, the macaque genotype and phenotype resource (mGAP; https://mgap.ohsu.edu/) provides searchable access to richly annotated variant data identified using genome‐wide sequencing approaches (Bimber et al., 2019). With variant data spanning rhesus macaque populations at each of the NPRCs, the mGAP data resource enables the informed selection of animals based on genetic information, improving the reliability and reproducibility of findings across research disciplines. Gene expression data are frequently archived in NCBI's Gene Expression Omnibus database, which includes more than 18,000 NHP samples. Protein data from any species can be deposited to the UniProt database funded by NIH and the European Molecular Biology Laboratory. The deposition of raw data in these repositories is insufficient to allow for the true replication of findings. This is because, unlike DNA sequence data, which is largely the same whether collected from blood or brain in infants or elderly animals, epigenetic (including methylation, acetylation, micrornas, long noncoding RNAs, and other data types), gene expression, proteomic, and metabolomic data have the additional challenge of being highly sensitive to both tissue type and environmental conditions, including time of day, fasting conditions, etc. NHPs, because of their large body size and phylogenetic similarity to humans, are uniquely suited for the collection of tissue biopsies longitudinally or at necropsy. An excellent example of an NHP tissue bank is the Monkey Alcohol Tissue Research Resource (https://gleek.ecs.baylor.edu/; (Daunais et al., 2014), which provides investigators with tissue samples and related phenotypic measures from NHPs subjected to a standard alcohol consumption protocol. However, in other cases, existing metadata from biorepositories may be insufficient to document differences in collection and storage methods or animal level conditions that are not directly related to experimental parameters but may have a substantial influence on omic data generation. Beyond individual investigator data sharing, there are ongoing community efforts to improve the utility of NHP omic data, including through the improvement of NHP genome annotations. Accurate and complete reference genomes are essential for any genetic or sequence‐derived research. Although the human genome is extremely well‐annotated and projects like the 1000 Genomes Project have captured more than 88 million variants (Auton et al., 2015), reference genomes for rhesus and cynomolgus macaques, baboons, vervet monkeys, and marmosets remain relatively poorly annotated, with little understanding of the genetic diversity within these species, as the number of individuals sequenced is in the tens rather than the thousands (Harding, 2017). Significant progress has recently been made with the rhesus macaque genome, with a new build based on >800 animals (Warren et al., 2020). Without an understanding of the variation present in the various species, however, it is difficult to evaluate the effect of individual genetic variants on phenotypes of interest. This recent rhesus genome analysis begins to address the issue of genome diversity (Warren et al., 2020). Furthermore, poorly annotated genomes hinder efforts to include animals of diverse genetic backgrounds in vaccines and other studies to maximize translational potential. Efforts are underway to sequence the genomes of additional animals, with RNA and protein analyses to follow. Due to the small number of NHPs in most studies, the vast majority are underpowered to accurately determine the magnitude of genetic effects on traits of interest. When leveraged correctly, the pedigree structure can help improve statistical power, but both small sample size and small effect size reduce power, such that not only is it more difficult to detect associations, but those associations that are nominally significant have a higher likelihood of being spurious. The issues of false positives and failure to replicate are by no means specific to genetic studies (see Button et al., 2013, for an excellent review in neuroscience), but they have frequently been highlighted in this domain. Sharing genetics data across investigators and centers is a critical first step to ensuring that enough cases are available for analysis and comparisons. Although genomics has established a standard for the deposition of data into major databases requiring harmonization and long‐term storage capacities, depositing data into major databases is only one model for data sharing. For example, so‐called clearing houses for sharing data of particular types are becoming more widespread. Building on established models for sharing human neuroimaging data, the PRIMatE Data Exchange (PRIME‐DE; http://www.fcon%5f1000.projects.nitrc.org/indi/indiPRIME.html; Milham et al., 2018) provides access to independently collected neuroimaging datasets and information about data quality via the International Neuroimaging Data‐sharing Initiative. Furthermore, individual investigators, including those who work with NHPs, are increasingly sharing data associated with specific papers or analyses via databases like Dryad (https://datadryad.org/stash) or project archives on the Open Science Framework (osf.io) or GitHub (github.com). These investigator choices align with the growing consensus among publishers that all data should be shared, with a preference for citable datasets assigned a Digital Object Identifier (Cousijn et al., 2018). LabKey, a laboratory information management system tool, has been used as the foundation for the development of specific colony health databases used by some NPRCs, and allowed scientists at the Wisconsin NPRC to easily share data as it was being generated during the early stages of their Zika virus research, and this approach is being replicated for severe acute respiratory syndrome coronavirus 2 (SARS‐CoV‐2) research (https://openresearch.labkey.com/project/home/begin.view). While the norms are slowly changing, ultimately, policies that originate with funding agencies or journals that require data sharing will have a higher likelihood of ensuring that data are made public.

ADDRESSING CHALLENGES IN RIGOR AND REPRODUCIBILITY SPECIFIC TO NHP RESEARCH

One of the major challenges to rigor and reproducibility efforts in NHP research is that the constraints of the 3Rs have been historically misinterpreted in ways that result in studies being carried out with small sample sizes and rarely if ever, replicated. As a result, the robustness of studies may be particularly impacted by features relating to statistical power, associated with both the number of animals tested as well as the number of trials each animal completes (i.e., both within and between individual power), the design of experimental conditions (e.g., including appropriate control conditions), including representative samples (e.g., with regard to age or sex), and the appropriate statistical methods to evaluate those sources of variance. Additionally, small‐sample NHP studies benefit from data analysis strategies that evaluate or control for sources of variance because NHPs are genetically variable and articulate the difference between exploratory and confirmatory analyses as defined above, although such methods and clarity around analyses are not necessarily normative. NHP research has, like other domains of science, historically overemphasized using a criterion of p < 0.05 as the determining factor for what findings are meaningful or important, with less emphasis on evaluation of raw data, effect sizes, and probability estimates. Another factor that likely contributes to small sample sizes is the level of expertise or training in statistical methods of investigators. Although this is alleviated to some extent by the increasingly common use of biostatisticians as contributors and consultants on research projects, a greater emphasis on training of students and new investigators can also contribute to rigor and reproducibility in NHP research.

REPLICATION EFFORTS CAN BE COLLABORATIVE

The importance of replication of individual studies has been emphasized in many domains of science, particularly behavioral science, and large scientific projects are underway to replicate core findings of a given field. Such efforts are typically considered unfeasible with NHPs because of the practical and ethical constraints associated with replicating NHP studies. Ongoing replication efforts can take the form of both individual laboratories attempting to replicate other laboratories' studies or large group efforts in which a given study (or studies) is replicated across “Many Labs” (Ebersole et al., 2016; Klein et al., 2014, 2018). A recent example is the Center for Open Science's project to replicate murine cancer research findings (https://cos.io/rpcb/and_https://elifesciences.org/collections/9b1e83d1/reproducibility-project-cancer-biology) and the International Brain Laboratory's virtual laboratory (https://www.internationalbrainlab.com/#home). One major issue that these efforts bring to light is the generalizability of particular findings; i.e., if laboratory A reveals a significant difference between two conditions and laboratory B does not find that difference, what are the features of laboratory A's sample that do not generalize to laboratory B's sample? Factors such as age, race, ethnicity, language, and health status may all play important roles in shaping effects, even if they are not reported as mediators or moderators. Carrying out work across many laboratories in which such factors vary is one approach to increasing research robustness. The “Many Labs” concept has been explicitly imported into behavioral NHP research (Many Primates et al., 2019), but with a specific emphasis on combining data across a number of laboratories and species to answer a core question, either to increase sample size or to test hypotheses about the generalizability of effects across phylogeny. Recent efforts at NPRCs demonstrate the effectiveness of combing data to unearth effects that would not have been discernable in small studies. For example, following the 2015–2016 Zika virus epidemic in South and Central America, scientists at the NPRCs mobilized to study the virus and its impact on developing fetuses. Research teams across the NPRCs noted fetal mortality following Zika virus infection, but it seemed to vary based on when the fetuses were infected and the small sample sizes at each center precluded drawing conclusions. When those data were pooled (to a total of N = 50) across Centers, however, the pattern became clear and statistically significant; i.e., that fetuses infected during the first trimester had a significantly greater chance of dying than those infected later in gestation (Dudley et al., 2018). Currently, a similar effort is underway for studies of SARS‐CoV‐2 infection, with regular data sharing and evaluation of findings. Efforts like these that combine samples across laboratories and centers are an important step forward because they allow sample sizes to be increased and inherently improve the generalizability of the science because it is being carried out at multiple sites with innate variation across sites with regards to animal care, standard operating procedures (SOPs), etc. These efforts capitalize on an existing NHP model, where small pilot studies are used to develop and ensure the potential effects of proposed interventions or the suitability of specific experimental protocols before larger studies are undertaken. There may be an important role in the context of discovery for pilot studies with small sample sizes (Bacchetti et al., 2011), although such studies may lead to an overestimation of necessary sample sizes which itself has potential ethical implications (Gaskill & Garner, 2020). Despite these multi‐site efforts, attempts to replicate most NHP studies are rare because of practical and ethical constraints in terms of access to animals as already mentioned. Even when sample sizes are large, publishing replication studies can be a major challenge as a result of publication norms across NHP science domains (i.e., nonsignificant effects are often not published), lack of familiarity with processes for publishing replications, and strict editorial guidelines at some of the major journals that publish NHP work. Given that it will likely take time and effort to change these norms, ensuring the robustness of individual NHP studies is crucial, and creating mechanisms by which NHP scientists can carry out pilot work and then build upon others' science via data sharing is critical.

CAPITALIZING UPON ESTABLISHED EFFORTS AND RESOURCES

Significant resources have been invested in generating, implementing, and evaluating the efficacy of checklists, preregistration, and data sharing. NHP scientists need not reinvent the wheel, but certainly need to embrace using the wheels that exist, either through mandated changes at the time of publication (via journals) or at the time of project planning as required or promoted by either granting agencies, institutions, or the NPRCs. An overview of the contributions of various parties to improve rigor and reproducibility, both for NHP research and for biomedical research in general, is shown in Figure 1. Existing authentication or standardization guidelines and guidelines for validating resources should be identified from the journals and societies that have generated them, centralized, and then integrated into the core workflows at the NPRCs and in other NHP laboratories. In this view, the NPRC system represents a fertile ground for developing and testing such workflows that can then be generalized to NHP research more broadly. General checklists like the MDAR and animal research‐specific checklists like PREPARE and ARRIVE are applicable to NHP research and could easily be implemented if journals or institutions began to require them for every NHP study. Existing models of preregistration—particularly those that encourage transparency in methods, clear designation of whether the research is being conducted in the context of discovering (exploratory analyses) or hypothesis testing (confirmatory analyses), and clarity around constraints on the generalizability of the studies—can be imported into NHP research simply by generating a series of NHP specific preregistration templates and then encouraging their use. This would require partnerships with journals to develop “registered reports” formats (where preregistrations are evaluated for robustness separately from the outcomes of the studies), to acknowledge when studies have been preregistered (e.g., with the ‘badge' system; https://cos.io/our-services/open-science-badges/; as is done at AJP), or to include null results when preregistered experimental design and analysis has been deemed to be robust. Scientists should be encouraged to share their data by depositing it into established databases and depositories, and NPRCs should catalog and index shared data that was generated with their NHP resources. Indeed, the NIH has recently issued a final policy for data management and sharing that specifies the requirements for data generated from NIH funding, and which will go into effect in 2023 (https://grants.nih.gov/grants/guide/notice-files/NOT-OD-21-013.html).

GENERATING NEW EFFORTS AND RESOURCES

Ultimately, the existing efforts to improve rigor and reproducibility may not account for all of the constraints and benefits of working with NHPs, and because of their collective science and inherent interdisciplinary nature, the NPRCs are in a unique position to lead the way to create new efforts to improve NHP science around the globe. Here we propose two interrelated efforts that are already partially in place, or for which there is established infrastructure at some of the centers, but for which center‐wide coordination should be undertaken. All of these cases leverage the resources inherent in the NPRC system but could be easily modified and implemented at other institutions; similarly, putting these efforts into place at NPRCs could also influence how NHP research is carried out at other institutions (e.g., having animal‐level metadata that travels with individual animals when they leave the NRPC where they were reared and are transferred to a laboratory at a non‐NPRC site).

Use of pilot grant programs to provide an opportunity for discovery and collaboration

The NPRCs have established pilot grant programs that provide funds for investigators to carry out studies that have the promise of securing federal funding in the future (https://nprcresearch.org/primate/pilot-programs.php). These funds are often used to bring researchers outside of the NPRCs into NHP research. Such awards can also be used to allow established investigators to carry out novel experiments, develop novel resources for improving science (protocols for standardizing assays, generation of standard datasets that can be shared), or carry out cross‐center projects where small samples at individual centers are pooled to address larger questions and evaluate the generalizability of effects across contexts. Such work across centers would be facilitated by the generation of animal level meta‐data (e.g., experimental history) and center‐based or center‐general mechanisms for data sharing. Inherent in this mechanism is the assumption that the use of a minimal number of animals will garner sufficient data to determine if future investment is warranted. Indeed, numerous successes in animal model development (Burwitz, et al., 2017a, 2017b; Lopez et al., 2014), vaccine testing (Coban et al., 2004; Datta et al., 2017; Petersen et al., 2014), and drug treatment efficacy have arisen from NHP pilot studies. Examples of pilot studies that successfully utilized limited numbers of macaques include a study of five macaques that demonstrated the persistence of the Lyme disease pathogen following a recommended treatment protocol (Embers et al., 2012a) and subsequently devised a strategy for diagnostic test development (Embers et al., 2012b), and a study of four macaques that demonstrated expression of the human NTCP receptor on hepatocytes is sufficient to support full HBV infection of macaques (Burwitz et al., 2017a). Although large sample sizes can be essential for detailed quantitative measures of effect and inter‐animal variation, small pilot studies illustrate that experiments that result in a binary black‐or‐white answer, measure qualitative differences or trends, or uncover new phenomena, can all be achieved with small sample size. Furthermore, some scientific questions can be answered with small sample studies that do not necessitate carrying out subsequent larger studies. Adaptive trial design (Bauer et al., 2016), in which the data from the initial stages of a study can inform potential adjustments to the overall study design, are increasingly employed in clinical research but are also an option for NHP studies. Such flexible designs can increase the efficiency of cohorts with a limited number of subjects, which is relevant to NHP studies using a limited resource and can also be enhanced by the incorporation of Bayesian analysis techniques (Chevret, 2012) in addition to standard statistical approaches. Although these approaches are yet to be fully implemented in human studies (Pallmann et al., 2018), their introduction into NHP studies deserves serious consideration. In light of the role of NHPs as the penultimate preclinical model in the drug development pathway, the application of aspects of first‐in‐human study designs (Shen et al., 2019) is also pertinent as, like most NHP studies, these designs usually employ a relatively small number of subjects. In fact, an earlier analysis of the effect of sample size on the outcome of a series of first‐in‐human dose‐escalation studies (Buöen et al., 2003), demonstrated that a sample size of 6–10 subjects was the necessary range to obtain useful data, with less than 6 being insufficient, but increasing the sample size to greater than 10 providing little additional power.

Investment in established data sharing and establish mechanisms for sharing colony‐wide data

Although broader aspects of data sharing are discussed above, there are certain aspects of data sharing that are specific to NHP colonies. Given that NHPs are a precious resource, it is critical that NHP scientists be willing (or mandated) and easily able to share experimental data and associated metadata without adding significant time or administrative burden. This should be supported across centers so that NPRC investigators are encouraged and have a mechanism to share and pool data. Such efforts can capitalize on established resources where they exist (e.g., genetics, neuroimaging, and neurophysiology databases) but may also require the development of new resources. One of the strengths of the NPRC system is the huge volume of data generated on animals in their colonies, some of whom may only be enrolled in investigators' studies for brief periods of time. This wealth of data about their rearing, health history, and tissue collection and preservation upon their deaths, is a valuable national resource, and making it available to scientists around the globe could help speed scientific discovery and ultimately improve human and NHP health. Furthermore, capitalizing on existing data ‐whether to determine appropriate sample sizes for new studies, refine methods, or even answer key medical questions without having to involve new animals in experiments‐ could certainty aid in our goal to address the 3Rs. An excellent example of this approach is the recently announced open‐source resource for NHP optogenetics (Tremblay et al., 2020). The NPRCs, as well as most large NHP laboratories, maintain extensive animal records, including health, genetics, experimental history, and origins (rearing conditions, the breeding facility where they were born, etc.) but this information may not be harmonized, easily searchable, or linked permanently to the individual animal. Stakeholders from various disciplines of NHP research should work together to determine what features may be relevant to their science and the science of others. For example, behavioral scientists increasingly recognize that early rearing conditions can cause permanent variance in an individual's behavior and biology, and present social conditions and changes in social conditions in adulthood exert similar although potentially not as long‐lasting impacts. Interdisciplinary perspectives are required to determine what the content of the meta‐data should be. Ultimately, these data should be harmonized, searchable, and provided in a flexible format to investigators when they are designing studies, selecting animals, accessing banked biological samples, or purchasing animals from other facilities. Reporting this metadata in the form of supplementary data during publication would also allow scholars to draw conclusions about the generalizability of studies across sites and species. As illustrated by the successful sharing of data related to Zika across NPRCs, LabKey is one potential vehicle for this harmonized metadata, but the consistent reporting of the data across sites and studies is more important than the software selected.

QUALITY ASSURANCE (QA) PROGRAMS

Variability is inherent in every step of an experimental procedure. However, for those processes that are routinely performed, reduction in variability should be a priority, as this can add validity both to the data acquired and its interpretation and conclusions. Although QA regarding laboratory techniques is often carried out at the level of individual labs (e.g., validating antibodies before their use (Gore, 2013; Saper, 2005), and institutional QA programs exist in many contexts (e.g., pharma and biotechnology companies), NPRC and university‐level QA programs are less common. Every NPRC has SOPs and QA programs for practices involving animals. However, the laboratory techniques related to samples derived from the animals are not routinely standardized. One example of an assay that is fraught with reproducibility issues is the ELIspot. This assay is open to subjective quantification and, even with automated systems, evaluation of plates is operator‐dependent (Cox et al., 2006; Janetzki et al., 2004). As such, QA programs have been instituted to mitigate variability in these and other assays, exemplified by the Duke University External Quality Assurance Program Oversight Laboratory. The Tulane NPRC (TNPRC) has similarly created a Quality program to assure reproducibility in standard assays. The TNPRC is the Coordinating Center for the COVID‐19 response, including the development of the NHP model and testing of vaccines and therapeutics. Initially, steps in the workflow that contribute to potential variability were identified. The implementation included the development of best practices, standard protocols, and oversight by a QA Specialist reporting to the Director of Quality Assurance, administered by the office of the Vice President of Research. Best practices include intellectual honesty and communication of errors, operator training, experiment documentation, safe and organized long‐term storage of data, open and efficient dialogue between core laboratories and investigators, and assay supervision by dedicated core facility managers. The standardization of protocols involves testing varying SOPs for intra‐assay, inter‐assay, and inter‐operator variability to ensure reproducible and accurate results and performing rigorous Quality Control (QC) and QA checks on instruments. The use of biological reference controls to test both reproducibility and to ensure confidence in longitudinal data results, and testing of all new reagent lots provided by manufacturers are integral to the QC program as well. The program was initiated with flow cytometry and further applied throughout other core services, such as real‐time PCR/RT‐PCR and Luminex®‐based assays. A similar program exists at the Texas Biomedical Research Institute and is utilized by the Southwest NPRC. Plans are underway to implement QA/QC processes in lab protocols across the NPRCs. Standardization and sharing of these QA/QC procedures across labs will facilitate the robustness of science as well as the ability to conduct multi‐site studies and replicate findings.

CONCLUSIONS

Enhancing rigor and reproducibility in the biomedical sciences is truly a collective effort, as outlined in Figure 1, and its collective nature is made possible by both community standards and norms and the individual efforts of each investigator. Greater recognition of the extent to which much published science has not been carried out in a rigorous way and thus slowed progress in basic and translational/health science has led to a science‐wide evaluation of how best we can, as a community and as individuals, change the norms in both how we carry out and how we report our science to improve its robustness and other scholars' ability to evaluate it. NHP research is an interesting test case in which to deploy rigor and reproducibility efforts because it is inherently constrained, both ethically and practically, in ways that other types of science are not and it cross‐cuts disciplinary boundaries that themselves have their own norms. As a result, what efforts are deployed must be well‐tailored to the NHP model and the constraints of carrying out NHP science (e.g., simply increasing the sample size of NHP studies is not an option like it is in some fields), while simultaneously being broad enough to be efficacious across disciplines. Despite these constraints and challenges, the NIH's significant investment in NHP research via the NPRCs creates a unique testing ground for rigor and reproducibility efforts, before they are deployed more generally in NHP science. To that end, our goal in this Perspective is to provide broad consideration and specific examples of how the principles of scientific rigor apply to NHP research, setting the stage for coordinated efforts, initially across NPRCs and then across NHP labs more broadly, to fundamentally improve NHP research.

CONFLICT OF INTERESTS

The authors declare that there are no conflicts of interest.

69 in total

1. Russell and Burch's 3Rs then and now: the need for clarity in definition and purpose.

Authors: Jerrold Tannenbaum; B Taylor Bennett
Journal: J Am Assoc Lab Anim Sci Date: 2015-03 Impact factor: 1.232

Review 2. Power failure: why small sample size undermines the reliability of neuroscience.

Authors: Katherine S Button; John P A Ioannidis; Claire Mokrysz; Brian A Nosek; Jonathan Flint; Emma S J Robinson; Marcus R Munafò
Journal: Nat Rev Neurosci Date: 2013-04-10 Impact factor: 34.870

3. Better methods can't make up for mediocre theory.

Authors: Paul Smaldino
Journal: Nature Date: 2019-11 Impact factor: 49.962

Review 4. Immune senescence in aged nonhuman primates.

Authors: Kristen Haberthur; Flora Engelman; Alex Barron; Ilhem Messaoudi
Journal: Exp Gerontol Date: 2010-06-15 Impact factor: 4.032

5. Induction of Plasmodium falciparum transmission-blocking antibodies in nonhuman primates by a combination of DNA and protein immunizations.

Authors: Cevayir Coban; Mario T Philipp; Jeanette E Purcell; David B Keister; Mobolaji Okulate; Dale S Martin; Nirbhay Kumar
Journal: Infect Immun Date: 2004-01 Impact factor: 3.441

6. Real-time monitoring of disease progression in rhesus macaques infected with Borrelia turicatae by tick bite.

Authors: Job E Lopez; Heather Vinet-Oliphant; Hannah K Wilder; Christopher P Brooks; Britton J Grasperge; Timothy W Morgan; Kerstan J Stuckey; Monica E Embers
Journal: J Infect Dis Date: 2014-05-30 Impact factor: 5.226

7. Risk factors for stereotypic behavior and self-biting in rhesus macaques (Macaca mulatta): animal's history, current environment, and personality.

Authors: Daniel H Gottlieb; John P Capitanio; Brenda McCowan
Journal: Am J Primatol Date: 2013-05-02 Impact factor: 2.371

8. Persistence of Borrelia burgdorferi in rhesus macaques following antibiotic treatment of disseminated infection.

Authors: Monica E Embers; Stephen W Barthold; Juan T Borda; Lisa Bowers; Lara Doyle; Emir Hodzic; Mary B Jacobs; Nicole R Hasenkampf; Dale S Martin; Sukanya Narasimhan; Kathrine M Phillippi-Falkenstein; Jeanette E Purcell; Marion S Ratterree; Mario T Philipp
Journal: PLoS One Date: 2012-01-11 Impact factor: 3.240

9. A manifesto for reproducible science.

Authors: Marcus R Munafò; Brian A Nosek; Dorothy V M Bishop; Katherine S Button; Christopher D Chambers; Nathalie Percie du Sert; Uri Simonsohn; Eric-Jan Wagenmakers; Jennifer J Ware; John P A Ioannidis
Journal: Nat Hum Behav Date: 2017-01-10

Review 10. Nonhuman Primate Models of Respiratory Disease: Past, Present, and Future.

Authors: Lisa A Miller; Christopher M Royer; Kent E Pinkerton; Edward S Schelegle
Journal: ILAR J Date: 2017-12-01

4 in total

1. Amygdala or hippocampus damage only minimally impacts affective responding to threat.

Authors: Joey A Charbonneau; Jeffrey L Bennett; Eliza Bliss-Moreau
Journal: Behav Neurosci Date: 2021-10-07 Impact factor: 1.912

2. An assessment of ambient noise and other environmental variables in a nonhuman primate housing facility.

Authors: Alexander R McLeod; Jane A Burton; Chase A Mackey; Ramnarayan Ramachandran
Journal: Lab Anim (NY) Date: 2022-07-27 Impact factor: 9.667

3. Improving transparency-A call to include social housing information in biomedical research articles involving nonhuman primates.

Authors: Ori Pomerantz; Kate C Baker; Rita U Bellanca; Mollie A Bloomsmith; Kristine Coleman; Eric K Hutchinson; Peter J Pierre; James L Weed
Journal: Am J Primatol Date: 2022-04-01 Impact factor: 3.014

Review 4. Improving rigor and reproducibility in nonhuman primate research.

Authors: Eliza Bliss-Moreau; Rama R Amara; Elizabeth A Buffalo; Ricki J Colman; Monica E Embers; John H Morrison; Ellen E Quillen; Jonah B Sacha; Charles T Roberts
Journal: Am J Primatol Date: 2021-09-20 Impact factor: 3.014

4 in total