Literature DB >> 31192353

Adaptation of the Systematic Review Framework to the Assessment of Toxicological Test Methods: Challenges and Lessons Learned with the Zebrafish Embryotoxicity Test.

Martin L Stephens¹, Sevcan Gül Akgün-Ölmez², Sebastian Hoffmann^1,3, Rob de Vries^1,4, Burkhard Flick⁵, Thomas Hartung^6,7, Manoj Lalu⁸, Alexandra Maertens⁶, Hilda Witters⁹, Robert Wright¹⁰, Katya Tsaioun¹.

Abstract

Systematic review methodology is a means of addressing specific questions through structured, consistent, and transparent examinations of the relevant scientific evidence. This methodology has been used to advantage in clinical medicine, and is being adapted for use in other disciplines. While some applications to toxicology have been explored, especially for hazard identification, the present preparatory study is, to our knowledge, the first attempt to adapt it to the assessment of toxicological test methods. As our test case, we chose the zebrafish embryotoxicity test (ZET) for developmental toxicity and its mammalian counterpart, the standard mammalian prenatal development toxicity study, focusing the review on how well the ZET predicts the presence or absence of chemical-induced pre-natal developmental toxicity observed in mammalian studies. An interdisciplinary team prepared a systematic review protocol and adjusted it throughout this piloting phase, where needed. The final protocol was registered and will guide the main study (systematic review), which will execute the protocol to comprehensively answer the review question. The goal of this preparatory study was to translate systematic review methodology to the assessment of toxicological test method performance. Consequently, it focused on the methodological issues encountered, whereas the main study will report substantive findings. These relate to numerous systematic review steps, but primarily to searching and selecting the evidence. Applying the lessons learned to these challenges can improve not only our main study, but may also be helpful to others seeking to use systematic review methodology to compare toxicological test methods. We conclude with a series of recommendations that, if adopted, would help improve the quality of the published literature, and make conducting systematic reviews of toxicological studies faster and easier over time. Published by Oxford University Press 2019.

Entities: Chemical Disease Gene Species

Keywords: malformations; prenatal developmental toxicity; systematic review; test method comparison; zebrafish embryotoxicity test

Year: 2018 PMID： 31192353 PMCID： PMC6736188 DOI： 10.1093/toxsci/kfz128

Source DB: PubMed Journal: Toxicol Sci ISSN： 1096-0929 Impact factor: 4.849

Toxicology is undergoing a paradigm shift, in which new test methods are being developed that may contribute to replacing existing methods that have been used routinely for decades (Andersen and Krewski, 2009). Over time, researchers exploring a new method may standardize the protocol and accumulate data on large numbers of chemicals. An important question to ask is how well the method serves its intended purpose. For potential replacement tests, this question typically is addressed by comparing the results of the new test with those of the standard test used routinely for the toxicity of interest, ie, the reference test method (Hoffmann ). When there is a sufficient body of published literature, questions about test method performance could potentially be addressed by assessing the available evidence through a literature review (Balls ; Corvi ). Such retrospective assessments could be conducted in lieu of, or as a justification for, prospective validation trials. Whether concerned with test method performance or other issues, reviews of the toxicological literature are typically carried out in the form of expert narratives. Typically, narrative reviews have no clear objective, articulated search strategy, predefined inclusion and exclusion criteria, protocol, or study quality and risk-of-bias assessment. Such reviews have the benefit of being relatively economical in manpower and resources in the short term, but their methodology tends to be fairly ad hoc and nontransparent, and, therefore, the reviews are potentially difficult to appraise, interpret, and reproduce (de Vries ). Expert narrative reviews have their place in the continuum of types of published literature summaries, for example, by generating hypotheses or presenting speculative mechanistic insight that could stimulate creativity and new ideas. However, where human health or environmental regulatory decisions are concerned, such as the selection of clinical trial design, relevant outcome or biomarker, or selection of a test method to determine toxicity of a new chemical to be used in the environment, a more systematic approach is warranted. Here we explore systematic review methodology as a means of evaluating the performance of a given test method. In contrast to narrative reviews, systematic reviews consist of a formal series of steps that starts with the formulation of a specific question and culminates in the synthesis of the relevant data from the included papers. Systematic reviews originated in clinical medicine have been standardized for application to clinical and health care over several decades by Cochrane (Higgins and Green, 2011), and are now applied in many areas. Within toxicology, systematic review methodology has begun to be applied to the hazard identification and risk assessment of chemicals (Birnbaum ; EFSA, 2010; Johnson ; Rooney ), but it has not yet been applied to the assessment of toxicological test methods (Stephens ). Translating this methodology to other use contexts inevitably involves some adaptation, while retaining the general approach and adhering to core principles such as transparency, comprehensiveness, and objectivity (Hoffmann ). Here we present the specific adaptations we found necessary in the present context of toxicological test methods. For our test case of applying systematic review methods to test method performance, we chose the zebrafish embryotoxicity test (ZET) and its ability to predict the outcomes of the standard mammalian test for prenatal developmental toxicity. In the ZET, freshly fertilized zebrafish embryo eggs are exposed to various concentrations of a test substances added to their aqueous environment, usually for up to 120 h post fertilization. During this period, embryos are examined for various toxic effects that relate to mortality, general embryotoxicity (such as hatching rate and body shape), and specific embryotoxicity (Beekhuijzen ; He ; Selderslaghs ). Although the principles of the ZET are largely agreed, protocol details may vary considerably among studies, which results in a lack of harmonization and standardization (Hamm ). The standard mammalian test has been standardized as the Organisation for Economic Co-operation and Development’s Test Guideline 414 (TG 414), which was first published in 1981 and revised in 2001 and in 2018 (OECD, 2018), and similar national guidelines. In brief, pregnant mammals, most often rats or rabbits, are administered test articles and toxic effects on fetuses are observed at the end of the gestation period. The developmental effects observed can be grouped as growth, external, skeletal, or soft tissue. In addition, variations, commonly defined as effect diverging beyond the usual range of structural constitution, which may not adversely affect survival or health, are discriminated from malformations, commonly defined as permanent structural changes, which may adversely affect survival, development, or function (Chahoud ; Solecki ). A further complication is added when maternal toxicity has been observed, which may have been causing or contributing to the effects observed in the fetuses. This mammalian prenatal developmental toxicity test has drawbacks including low throughput, long duration, considerable expense, and large numbers of animals used (Sipes ). These challenges are being addressed by exploring alternative approaches, either alone or in combination (Augustine-Rauch ; Ball ; Kroese ; Panzica-Kelly ). Given the present novel application of systematic review methodology to the setting of toxicological test method assessment, we first explored methodological issues in this preparatory study. The available chapters of Cochrane’s systematic review methodology for the assessment of diagnostic test accuracy (DTA) were used as a starting point (Cochrane, 2018). We focus here on the challenges encountered and lessons learned from this methodological translation. The insights will be applied to the conduct of the systematic review to comprehensively answer the review question of how predictive the ZET is of mammalian results, which will be reported elsewhere. Acknowledging that systematic review methodology works best on narrowly defined questions, we focused our comparisons of effects on developing zebrafish (lethality, general, and specific embryotoxicity) and mammalian prenatal developmental toxicity described in external, soft tissue, and skeletal fetal examinations. Moreover, we translated data on the nature and severity of different findings into qualitative outcomes of the presence or absence of prenatal developmental toxicity. Given these considerations, we designed our systematic review to answer the following question: How well does the presence or absence of treatment-related findings in the ZET predict the presence or absence of prenatal development toxicity in rats and rabbit studies (OECD TG 414 and equivalents)?

MATERIALS AND METHODS

Review team formation and protocol preparation

This study was initiated and coordinated by the Evidence-based Toxicology Collaboration (EBTC) (http://www.ebtox.org/; last accessed June 9, 2019), which is based at the Johns Hopkins Bloomberg School of Public Health. The EBTC is an international multi-stakeholder organization that seeks to facilitate the application of evidence-based approaches—including systematic review—to toxicology. The EBTC staff invited individuals with relevant expertise from an existing EBTC working group to join the review team, which undertook the present systematic review. Members were recruited who could provide the necessary and diverse expertise on mammalian reproductive toxicology, zebrafish developmental toxicity, systematic review methodology, and information science. Care was taken to recruit individuals from relevant sectors, including academia, government, industry, and nongovernmental organizations, which would help to ensure that diverse perspectives were represented. The members of the review team served as individual scientists, not as representatives of their organizations or sectors. Their initial charge was to prepare a protocol describing how the various steps in the review would be carried out.

Search strategy

The goal of the literature search strategy was to find the full set of chemicals (and their associated studies) that had been tested in both the ZET and the mammalian prenatal developmental toxicity test. The review team was familiar enough with the literature on prenatal developmental toxicity testing to realize that there was only limited published evidence directly comparing the results from the ZET and guideline studies. Consequently, rather than synthesizing pre-existing test comparisons in the literature, we identified primary studies of chemicals tested in the ZET or the mammalian assays. Specifically, the review team developed a 2-stage strategy. We first searched for ZET studies; after screening and the application of our inclusion and exclusion criteria, the resulting included studies yielded the identities of the chemicals that were tested in zebrafish embryos. In the second stage, we searched the mammalian literature for prenatal developmental toxicity studies on the same set of chemicals identified in the first stage. We designed the strategies for both stages to achieve a balance of precision and comprehensiveness in the results. Search elements included controlled vocabulary terms (ie, MeSH [Medical Subject Headings] and Emtree terms), as well as keywords applied to relevant search fields (title, abstract, descriptors, etc.). We searched PubMed, Embase (Embase.com), BIOSIS Previews (Clarivate Analytics), and TOXLINE (National Library of Medicine) from the earliest available dates to the dates of the searches (see below). No language limits were applied in the search. The zebrafish search consisted of a zebrafish concept, a developmental stage concept, and a toxicity concept. We ran this search in all 4 databases on April 24, 2014. The mammalian search consisted of a rat and rabbit concept, an embryo and maternal concept, and a chemicals concept consisting of the compounds identified through the zebrafish search. We ran this search in all 4 databases on March 7, 2016. The results of the zebrafish and mammalian searches were entered into EndNote for the identification and removal of duplicates. The complete search strategies are provided as a Supplementary Material.

Screening of zebrafish studies for inclusion and exclusion

Inclusion and exclusion criteria relating to technical aspects of the zebrafish studies were difficult to formulate as there has been limited standardization of the ZET. We used the following inclusion criteria: The study reported original data. The study was conducted on wild-type zebrafish (Danio rerio) embryos (strain reported). Zebrafish embryos were exposed to an individual chemical with clear identification (eg, chemical name). At least 3 chemical concentrations were tested in addition to a negative/vehicle control group. Exposure began no later than 6 h post fertilization (hpf). The study was performed for a duration of 48–120 hpf. The reported outcomes included mortality, general toxicity (ie, outcomes related to hatching, cell viability, body shape [general], edema, the cardiovascular system [heartbeat and blood flow], and the yolk sac), and specific embryotoxicity (outcomes related to body shape [specific], fins, skin, the cardiovascular system [specific, eg, alteration to blood vessels], the central nervous system, sensory organs, head, the digestive system, and trunk). The study included at least 10 eggs per concentration. The study was reported in English. For the purposes of this preparatory study, we randomly selected a subset of 50 ZET studies, based on the assumption that this number would yield at least some eligible zebrafish studies. This was considered sufficient to optimize the search strategy and the selection process, as well as to develop extraction tables for the subsequent application in the main study, which was expected to include thousands of zebrafish studies. The 50 randomly selected studies were subjected to title and abstract screening (level 1 screening). The studies that met the prespecified inclusion criteria and studies for which an inclusion/exclusion decision could not be made from the information in the title and abstract alone were carried forward to full text screening (level 2 screening). At both levels, all studies were screened by 2 reviewers independently. Conflicts were resolved between the screeners or, if they could not reach agreement by themselves, by a third reviewer.

Extraction of data from included zebrafish studies

The chemicals tested in the included ZET studies were identified and extracted into the Microsoft Excel table. For each chemical, the information specified in the review protocol was extracted into the same table. This included bibliographic details (first author and year of publication), study design characteristics (eg, the type of included controls, species, and strain), intervention characteristics (eg, the chemical name, concentrations tested, and start and duration of exposure), and outcomes (related to mortality and morphological alterations). The developmental effects to be assessed are not yet harmonized in the ZET community (Hamm ). We, therefore, decided on several commonly reported outcomes, such as mortality, hatching rate and delay, body shape, edema, and other alterations, to include and extract, and these were considered sufficient for this adaptation of systematic review methodology.

Screening of the mammalian studies for inclusion and exclusion

The literature search for mammalian studies was based on the chemicals that were extracted from the included zebrafish studies. The search strings are provided in the Supplementary Material. We screened the titles and abstracts (level 1 screening) of the resulting mammalian studies. The studies that met the inclusion criteria and studies for which an inclusion/exclusion decision could not be made from the information in the title and abstract alone were carried forward to full text screening (level 2 screening). At both levels, all studies were screened by 2 reviewers independently. Conflicts were resolved between the screeners or, if they could not reach agreement by themselves, by a third reviewer. Reasons for exclusion were documented. Both levels of screening were carried out using cloud-based SWIFT-Active Screener software (https://www.sciome.com/swift-activescreener/; last accessed June 11, 2019). We used the following inclusion criteria in screening the mammalian studies: The study reported original data. The study was conducted on wild-type rats or rabbits (strain reported). Rats/rabbits were exposed to an individual chemical from the included zebrafish studies. At least 3 doses were administered orally in addition to a negative/vehicle control group. At least 4 pregnant females were treated and reported per group. The developing fetuses were examined for death, structural malformations and variations (external, visceral, and skeletal), and altered growth, as defined and classified by others (Chahoud ; Solecki ; Wise ). The study was reported in English.

Extraction of data from included mammalian studies

For each study-chemical combination that met the inclusion criteria, data were extracted from the full texts into a Microsoft Excel extraction table, according to the protocol. In addition, information on maternal toxicity was collected in order to allow for a discussion of primary embryotoxic effect versus secondary effects potentially caused by maternal toxicity, if warranted.

Risk of bias

At the time this study was conducted, no risk of bias guidance or tool that focused on toxicological animal studies was available. Consequently, we considered tools for animal studies in general (Krauth ) and for preclinical animal studies, ie, SYRCLE’s risk of bias tool (Hooijmans ). We aimed for a set of risk of bias (and methodological and reporting) criteria that would apply equally to both the ZET and the mammalian prenatal developmental toxicity tests, notwithstanding that the ZET is clearly not a classical animal test and that some of its design features are characteristic of in vitro or ecotoxicological studies, eg, the use of microtiter plates or the “immersed” exposure in an aqueous environment. We selected a set of 11 criteria (Table 2). Eight were drawn from the SYRCLE risk of bias tool (Hooijmans ) and 3—reporting of randomization, blinding, and sample size calculation—were the most commonly identified criteria in the Krauth systematic review of risk of bias and methodological quality instruments for animal studies. Two reviewers applied the tool to each study independently, with the assessment options “yes,” “no,” and “unknown” for the risk-of-bias criteria, and the options “yes” and “no” for the reporting criteria. Any disagreements were resolved by discussion between the 2 reviewers, or by third reviewer when needed.

Table 2.

Assessment of Risk-of-Bias and Reporting Quality of Included Mammalian and Zebrafish Studies

Study Type	Study ID	Risk-of-Bias Criteria (Hooijmans et al., 2014 )								Reporting criteria
Study Type	Study ID	Were the Groups Similar at Baseline or Adjusted for Confounders?	Was the Allocation Sequence Adequately Generated and Applied?	Was the Allocation Adequately Concealed?	Were the Animals Randomly Housed During the Experiment?	Were the Caregivers/Investigators During the Course of the Experiment Adequately Blinded?	Were Animals Selected at Random During Outcome Assessment?	Was the Outcome Assessment Adequately Blinded?	Were Incomplete Outcome Data Adequately Addressed?	Is It Mentioned That the Experiment Was Randomized?	Is It Mentioned That the Experiment Was Blinded?	Is a Power/Sample Size Calculation Shown?
Mammalian	Obbink and Dalderup (1963)
	Staples and Holtkamp (1963)									^a
	Dwornik and Moore (1965)
	Fratta et al. (1965)
	Schumacher et al. (1968)
	Lehmann and Niggeschulze (1971)
	McBride (1974)
	Flohé et al. (1981)
	Matsubara et al. (1983)
	Sterz et al. (1987)
	Zhao et al. (2010)
	Kawamura et al. (2014)
Zebrafish	Gao et al. (2014)

Green, yes; red, no; yellow, unknown.

Randomization is mentioned for rats, but not for rabbits.

Assessment of Risk-of-Bias and Reporting Quality of Included Mammalian and Zebrafish Studies Green, yes; red, no; yellow, unknown. Randomization is mentioned for rats, but not for rabbits. As information pertinent to these 11 criteria is rarely reported (Avey ; Drucker, 2016; Kilkenny ; Leung ), we approached the assessment in a manner we considered most efficient. When several chemicals and/or species were tested in a given study, it is recommended to assess the risk of bias for each species-chemical combination, as aspects pertinent to the assessment, such as the outcomes reported or the attrition rate, may differ among such combinations. However, we assumed that reporting would be consistent for each species-treatment combination in a study and assessed each study as a whole, instead of evaluating each species-treatment combination per study.

Data evaluation

Prenatal developmental toxicity hazard, ie, the potential of a chemical to cause adverse effect that is relevant for hazard assessment, was considered to be a binary outcome. Consequently, we tailored evaluation procedures for the ZET and the mammalian tests according to whether the chemical was negative (ie, not embryotoxic in the ZET or absence of adverse findings in fetal examination of the mammalian studies, respectively) or positive (ie, embryotoxic in the ZET or presence of adverse findings in the fetal examinations of the mammalian studies, respectively) in a given study. Data analysis and the presentation of preliminary findings on test method performance were not the focus of this preparatory study. We refer interested readers to the protocol (Tsaioun ), which specifies the evaluation in detail.

RESULTS

Protocol Preparation and Amendments

The review team produced a working draft of the review protocol. Numerous amendments proved necessary, given that this preparatory study was pioneering the application of systematic review methodology to the new context of toxicological test method comparison. All amendments were tracked and incorporated into the final protocol, which is being executed in the main study, and was registered in PROSPERO, an international prospective register of systematic reviews (CRD42018096120) (Tsaioun ).

Search Results

The results of our literature search for ZET studies are summarized in Figure 1, which follows the PRISMA format (Moher ); 11 741 studies were retrieved from our search. This number was reduced to 5074 after using EndNote functionality to remove duplicates and, for the purpose of this preparatory study, documents clearly out-of-scope (eg, papers indexed as non-English and documents without original data, such as research proposals and meeting abstracts). As planned, 50 of these studies were randomly selected to explore the applicability of the adapted methodology. After screening the titles and abstracts of these studies against our inclusion/exclusion criteria (level 1 screening), 8 papers remained included. After retrieving full texts of these papers and applying the same criteria (level 2 screening), 1 paper was left. Papers were excluded at each screening level for a variety of reasons, eg, reporting no prenatal developmental toxicity outcomes or presenting no original data (Figure 1).

Figure 1.

Preferred reporting items for a systematic review and meta-analysis (PRISMA) flow diagram for the zebrafish studies retrieved from the literature search (hpf: hours post fertilization).

Preferred reporting items for a systematic review and meta-analysis (PRISMA) flow diagram for the zebrafish studies retrieved from the literature search (hpf: hours post fertilization). The one included paper by Gao , reported on a ZET study that assessed 7 chemicals for developmental toxicity effects, including structural malformations. These 7 chemicals were Auranofin (CAS-no. 34031-32-8), Curcumin (CAS-no. 458-37-7), Gambogic acid (CAS-no. 2752-65-0), Mycophenolic acid (CAS-no. 24280-93-1), Taxol (CAS-no. 33069-62-4), Thalidomide (CAS-no. 50-35-1), and Triptolide (CAS-no. 38748-32-2). These compounds were then used as the chemical concept in the subsequent systematic search of the mammalian literature (see Supplementary Material). This resulted in 1442 papers being retrieved, after removing duplicates; 263 papers remained included after level 1 (title and abstract) screening, and 12 of these papers met our inclusion/exclusion criteria after the level 2 (full text) review. The reasons for exclusion are reported in Figure 2. These 12 papers were included in the final analysis.

Figure 2.

Preferred reporting items for a systematic review and meta-analysis (PRISMA) flow diagram for the mammalian studies retrieved from the literature search.

Test Results

For the 7 chemicals assessed by the included zebrafish study, acute toxicity, and cardiovascular toxicity, as well as developmental toxicity, were evaluated by Gao . Mammalian prenatal developmental toxicity data were found on 2 of these chemicals, namely, gambogic acid and thalidomide. These 2 chemicals caused treatment-related findings in the zebrafish, ie, missing pectoral fins for both chemicals and reduced pigmentation for gambogic acid (Gao ). Of the 12 included mammalian studies, 11 assessed thalidomide, and 1 gambogic acid. In these studies, thalidomide was tested in both rats (2 studies) and rabbits (9 studies), whereas gambogic acid was tested only in rats. Treatment-related adverse findings of prenatal developmental toxicity were described for thalidomide in both rats and rabbits, and for gambogic acid in rats, for which no rabbit data were available (Table 1). Among the reported fetal malformations for thalidomide in rabbits were increased cleft palate, hydrocephalus, microphthalmia, dysmelia, malrotated limbs, and spina bifida. In rats, thalidomide was found to have resulted in prenatal developmental toxicity, such as abnormalities of the vertebral centrum and the fifth sternal ossification centrum. For gambogic acid, the prenatal developmental toxicity in the rat was manifested in an increase of fetal skeletal alterations. The variations reported for this chemical were rudimentary cervical ribs and delayed skull and sternebral ossifications, as well as retarded ossifications of vertebra (Table 1).

Table 1.

Summary of Included Mammalian Studies

Study ID	Species	Strain	Chemical	Effect(s)	Overall Assessment of Prenatal Developmental Toxicity
Zhao et al. (2010)	Rat	Sprague Dawley	Gambogic acid	Decrease in fetal weight in the presence of maternal toxicity. No fetal malformations. Increase in fetal variations: rudimentary cervical ribs and retarded ossification in skull, sternebra, and vertebra	Positive
Obbink and Dalderup (1963)	Rat	Wistar albino	Thalidomide	No maternal toxicity. No effect on fetal weight. Decreased litter size based on increased number of resorptions and stillborn. No fetal malformations. Increased number of abnormal fifth sternal ossification centrum	Positive
Staples and Holtkamp (1963)	Rabbit	Dutch-belted		Tail malformations; malrotated (clubbed ) limbs	Positive
Dwornik and Moore (1965)	Rat	Holtzman albino		Increased number of abnormal vertebral centra and vertebrae, increased incidence of absent fifth sternebra and miscellaneous abnormalities like poor ossification of some or all bones of the pelvis	Positive
Fratta et al. (1965)	Rabbit	New Zealand		Dysmelia	Positive
Schumacher et al. (1968)	Rabbit	New Zealand		Increased number of limb abnormalities and rib abnormalities (no detailed effect description, but assumed to be malformations)	Positive
Lehmann and Niggeschulze (1971)	Rabbit	Himalayan rabbits “Biberach”		Dose-dependent incidence of malformations, increased cleft palate	Positive
McBride (1974)	Rabbit	New Zealand white		Fetuses with multiple external malformations (at high doses; no malformations in control)	Positive
Flohé et al. (1981)	Rabbit	New Zealand white		Increased number of malformed fetuses (no more details, but reference to another paper)	Positive
Matsubara et al. (1983)	Rabbit	Japanese white; JW-NIBS rabbits		Increased hydrocephalus; microphthalmia	Positive
Sterz et al. (1987)	Rabbit	Himalayan rabbits		Dysmelia	Positive
Kawamura et al. (2014)	Rabbit	Kbl: JW rabbits		Malrotated paws; ectrodactyly, brachydactyly	Positive

Summary of Included Mammalian Studies Given that our search of the mammalian literature yielded studies with exposures to only 2 out of 7 of these chemicals, ie, thalidomide and gambogic acid, we can compare the mammalian prenatal developmental toxicity results with the ZET results only for these, which showed treatment-related findings in both species.

Risk of Bias

The risk of bias and the reporting quality of the included zebrafish (N = 1) and mammalian (N = 12) papers were assessed (Table 2). For the 13 studies, the vast majority of the risk-of-bias criteria was rated as “unknown” (indicated as yellow in Table 2), as the necessary information was not reported. The same holds true for the 3 reporting criteria: with a very few exceptions, none of the relevant information was reported (indicated as red in Table 2). There was no substantive difference observed between the risk of bias in the mammalian studies and the zebrafish study. Although the number of studies was small, it is worth noting that there was no obvious trend in the more recent mammalian literature toward more detailed reporting, as the older studies were equally likely (or not) to contain the information.

Challenges and Lessons Learned

In addition to the preparation of the protocol that adapts systematic review methodology to toxicological test method assessment, we consider the identification of the challenges encountered and the lessons learned as the primary result of this preparatory study. Here we provide an annotated listing of these challenges and lessons learned, organized under the headings of the typical steps of a systematic review.

Formulating the question

As outcomes in zebrafish and mammals cannot be compared directly due to differences in anatomy and embryogenesis, we chose embryotoxicity in zebrafish and prenatal developmental toxicity in mammals as nonspecific outcomes that subsume various effects. This broadness in the review question was necessary to render outcomes of the test methods comparable, but it also presented a challenge in several of the subsequent review steps.

Searching the evidence

In the absence of studies that tested the same chemicals in parallel in both tests, a novel 2-stage strategy was devised, first to identify studies that tested chemicals in the ZET and, second, to identify studies that tested the same chemicals in mammalian prenatal developmental toxicity studies. Fine-tuning of the search strategy was made difficult by the fact that MeSH terms in MEDLINE/PubMed are oriented towards clinical medicine, and do not capture the relevant fields for toxicology (see eg, https://www.nlm.nih.gov/mesh/meshhome.html/ last accessed June 11, 2019).

Selecting the evidence

Due to the lack of structured abstracts in the screened literature, it was often challenging to identify the information pertinent to the inclusion and exclusion criteria, which considerably slowed down the efficiency of identifying relevant studies. Substantial diversity in reported ZET outcomes, likely a consequence of the lack of protocol harmonization in the field, complicated the identification of relevant studies.

Extracting data

Data extraction was made difficult by the lack of a commonly accepted ontology for adverse outcomes in zebrafish studies and, in general, by a failure to use a controlled vocabulary for reporting study information.

Analyzing data

Several challenges, some of which have been mentioned above, compelled us to adopt a simplified approach to data analysis, focusing on the presence or absence of any kind of structural alterations. All efforts to make these considerations as transparent and rational as possible were undertaken and are explained in the protocol (Tsaioun ).

Reporting

We observed that the published study reports in our sample were inadequately reported to fully assess risk-of-bias and methodological quality (Table 2), which suggests that this may be the case throughout this literature. This would be consistent with the finding of poor quality of reporting of experimental animal studies, both observed by other reviews related to toxicology (Koustas ; NTP, 2016) and in the literature of preclinical animal studies (Freedman ; Sena ; Tihanyi ).

DISCUSSION

Clinical medicine has pioneered systematic review methodology (Chandler and Hopewell, 2013). This methodology is being adopted and adapted to other scientific disciplines such as environmental health science (Rooney , 2016 and Sutton, 2014), preclinical animal studies (Hirst ; Yauw ), and other fields (Okoli and Schabram, 2010) because of its structured framework, transparency, comprehensiveness, reproducibility, and objectivity. This preparatory study’s goal was the translation of systematic review methodology to a new use context: the assessment of test method performance in toxicology. Consequently, we focused on the methodological issues encountered, whereas the main study will report substantive findings. The methodological translation proved feasible but raised a number of issues and presented numerous challenges. Before discussing these, we briefly comment on the relevance of this preparatory study to the review question, concerning the ability of the ZET to predict prenatal developmental toxicity in the mammalian guideline studies. First, the results of the literature search indicate that there is a substantial literature on the testing of chemicals in both sets of tests (Figs. 1 and 2). Extrapolating the inclusion ratio of 1 in 50 studies from this preparatory study to the main study with several thousand ZET studies, it can be expected that approximately 100 studies may be included in the main study. As these will have tested several hundred chemicals, many of which will also have eligible mammalian prenatal developmental toxicity studies, it can be expected that the main study will be based on a substantial amount of evidence. A second noteworthy finding was the poor reporting, resulting in an unknown risk of bias in the included individual studies (Table 2). Projected to the main study, this may limit the level of confidence in the review’s conclusions. Since the time this study was conducted, several critical appraisal tools focused on toxicology/environmental health have been be published (Beronius ; Rooney ; Woodruff and Sutton, 2014). To the extent that these tools assess risk of bias, they basically address the same bias domains as were assessed here and would have led to largely the same conclusions regarding risk of bias in the studies included here. As a consequence, the risk-of-bias assessment will be reconsidered in relation to other factors that potentially influence the confidence in results of individual studies, such as the physical-chemical properties of chemicals, eg, low water solubility, that may lead to reduced exposure in the ZET, or the interpretation of adverse effects in the mammalian studies in the presence of maternal toxicity. Third, using the nonspecific outcome of embryotoxicity in the ZET and prenatal developmental toxicity in mammals, results of the ZET and the mammalian prenatal developmental toxicity tests were comparable. However, the small number of chemicals, which resulted from our methodological focus, did not—as expected—allow us to draw any sound conclusions regarding the performance of the ZET. This preparatory study, in particular the developed protocol, has provided the methodology to be used for the main study, which will evaluate all evidence obtained from the literature search. This is expected to result in a sufficiently large number of chemicals to compare the test methods and calculate predictive performance parameters. As this study was the pioneering effort to adapt the systematic review framework to the context of test method assessment, we made a few decisions to facilitate—and learn from—this transition. First, we focused our comparisons on embryotoxicity in vitro and prenatal developmental toxicity in vivo only, setting aside potential comparisons of other adverse developmental effects, such as developmental neurotoxic outcomes. In the future, the test methods could be more fully compared in a comprehensive systematic review with subgroup analyses, eg, focusing on specific outcomes or group of outcomes. Second, within our chosen domain, we translated data on the nature and severity of embryotoxicity in the ZET and prenatal developmental toxicity in mammals into qualitative outcomes of the presence or absence of treatment-related alterations. And finally, we limited the mammalian prenatal developmental toxicity studies to those involving rats or rabbits, given that these species have been more commonly used than other species. These decisions resulted in the following main study review question: How well does the presence or absence of treatment-related findings in the ZET predict the presence or absence of prenatal development toxicity in rats and rabbit studies (OECD TG 414 and equivalents)? In systematic review terminology, this is fundamentally a PECO question design (eg, Morgan ; Woodruff and Sutton, 2014)—that is, the question addresses the Populations (exposed zebrafish embryos, rats, and rabbits fetuses), Exposure (to individual chemicals), Comparison (comparator test: rats/rabbits prenatal developmental toxicity test), and Outcome (embryotoxicity). Simply put, the review compares the embryotoxicity hazards of chemicals in the ZET with the prenatal developmental toxicity observed in rats and rabbits. The outcomes included in this question were broad. They covered many morphological alterations of the embryos in the ZET as well as many morphological alterations in external, soft tissue, and skeletal fetal examinations specific for mammalian prenatal development, which are due to species differences not directly comparable. In contrast, questions should be narrow to make them amenable to systematic review (Hoffmann ). Therefore, the comparison of test methods that provide information on a broad range of outcomes that all inform the same hazard is a fundamental challenge that causes problems in the subsequent systematic review steps, eg, for the study selection (What is the minimum set of outcomes that a study must report to be eligible?) and for the data analysis (How to summarize repeat studies of the same chemical?). These aspects should be thoroughly considered when embarking on a systematic review to compare toxicological test methods. We faced some challenges when translating our PECO question into database search strategies, notably when generating toxicology-related search terms from PubMed’s controlled vocabulary MeSH. Although we identified a number of relevant MeSH terms for our search (eg, “Toxicity Tests”[Mesh]), we found a general lack of robustness in MeSH’s coverage of toxicology. We addressed this challenge by carefully including relevant keywords in our PubMed search and by running our search in multiple databases. Because of their unique features, these other databases added results not found by PubMed. For instance, Embase has its own controlled vocabulary that addresses toxicology with more robustness, and TOXLINE has an explicit focus on the toxicological literature. Our starting point in this preparatory study was the methodology for a standard systematic review as used in clinical medicine, as suggested by Hartung (2010). Led by Cochrane, clinical medicine has pioneered systematic review methodology in the context of assessing the effectiveness of “interventions” such as new drugs or surgical techniques (Chandler and Hopewell, 2013). Cochrane is currently translating systematic review methodology to the assessment of DTA, a context that has some parallels to assessing test methods in toxicology (Hoffmann and Hartung, 2005, 2006). Cochrane has been publishing chapters of its Handbook for DTA Review online as they become available (Cochrane, 2018). Although intended for a different context, this emerging Handbook was helpful to the present study in a number of areas, especially in protocol preparation and terminology. However, the parallels between this clinical situation (assessing DTA) and the toxicological situation (assessing test performance) are limited in practice. For example, in the clinical situation, the reviewed studies are themselves direct comparisons of the diagnostic tests under review, thus rendering the review essentially a compilation of pre-existing comparisons, as reflected, for example, in the preferred reporting items for a systematic review and meta-analysis (PRISMA) for studies of diagnostic test accuracy (McInnes ). In the toxicology context, however, the studies relevant for comparison of 2 toxicological test methods are typically studies of individual chemicals in one or the other test, but not studies comparing both. In addition, this effort should be put in the larger context of applying systematic review methodology to the evaluation of toxicological studies for environmental health questions and in chemical risk assessment (Rooney ; Whaley ; Woodruff and Sutton, 2014). This application commonly takes the form of reviewing the evidence that associates chemical exposure with a specific health effect (see eg, Cano-Sancho ; Koustas ; NTP, 2016). Although the number of such systematic reviews is increasing, the adaptation of the systematic review methodology for this purpose still faces challenges, such as rating the confidence in the body of evidence or integrating the evidence from human, animal, and nonanimal studies (Morgan ). Although the PECO questions of the 2 systematic review applications (exposure effects vs test method comparisons) are substantially different, close connections of these 2 applications are evident in other review steps. Literature sources will be similar and some search concepts, eg, for the exposure (chemicals and their synonyms) and for outcomes, are required for both applications. In addition, eligibility criteria related to study design are likely to be similar, as both applications are usually based on studies that fulfill at least basic design requirements, as are approaches to critical appraisal of studies. In contrast, rating the confidence in the body of evidence will differ in some regards, as aspects such as consistency, precision, and effect size are not directly applicable to test method comparisons. In addition and as in the clinical field, data analysis approaches will not have any great similarity. We conclude with several recommendations that stem from the challenges and lessons identified above. MeSH search terms: The terminology and hierarchy of MeSH search terms in PubMed should be expanded to provide more utility to toxicology. Structured abstracts: Toxicology journals should consider requiring structured abstracts that call for critical types of information to be present, labeled, and listed in a certain sequence, as has been called for in clinical studies (Mulrow ). An additional advantage of structured abstracts is that they are more amenable to automated approaches, such as machine-reading. Completeness of reporting: Methodological details and study results should be reported in sufficient detail to permit readers to assess how confident to be in the results and conclusions, as well as how to replicate a given study (Avey ; Drucker, 2016; Kilkenny ; Leung ). Numerous guidelines are available for reporting quality, see eg, Samuel . Also, improved and comprehensive reporting of studies in toxicological databases would be helpful and could ultimately qualify them as eligible evidence sources for systematic review purposes. Risk of bias: As already called for by Rooney , studies providing empirical evidence on the impact of individual biases on toxicological evidence should be conducted. Once a bias has been demonstrated to be influential in the toxicological literature, concrete steps could be taken to minimize such potential biases in experimental studies not only by the researchers, but also by organizations conducting, commissioning and funding the studies, and by regulatory agencies. In addition, the importance of the risk-of-bias assessment needs to be assessed in relation to how other factors, such as external validity, potentially influence conclusion. Ontologies/controlled vocabularies: An ontology and/or controlled vocabulary can expedite the systematic review process (by streamlining data compilation), and ultimately, can make machine-learning and data-mining approaches possible (Hardy ,b). An ontology and/or controlled vocabulary should be developed (or aligned to an existing) as early in the development of a new test method as is appropriate, which is especially important for test methods with many potential outcomes and those that involve organisms or tissues more distantly related to those used routinely. If followed, these recommendations would not only facilitate the conduct of systematic reviews, but also promote the broader goal of producing reliable science to inform regulatory decisions. In conclusion, we argue that systematic reviews of the assessment of toxicological test methods are feasible, although challenging. A practical prerequisite for a definitive systematic review in this context, as in others, is that sufficient relevant evidence is publicly available, which might not be the case for every new toxicological test method. Some systematic reviews have value in documenting the limited extent of available evidence, and thus flagging a data gap. However, it is difficult to assess whether the toxicological community would be well served by a full-blown systematic review of test method comparisons that simply flagged a data gap; other approaches are better suited, eg, evidence maps (Miake-Lye ). Moreover, there should be data of sufficient detail and quality in the routinely used species (or human outcomes), retrievable by a systematic literature search, in order to provide a basis for comparison of the new tests. For regulatory evaluations in the future, one could envision test developers submitting detailed information about the mechanistic basis of a new test (Hartung ), along with a dataset exhibiting low risk of bias. It could then be compared to the outcomes of the standard guideline test and human outcomes, both obtained through systematic literature searches. It is acknowledged that the comparison of 2 test methods will be increasingly replaced by comparing combinations of test methods against a single or composite reference standard. Such combinations of various test method and other information, for example in testing strategies and integrated approaches to testing and assessment, will be essential, for example, in implementing Toxicity Testing in the 21st Century (NRC, 2007). The challenges in designing and assessing such strategic approaches have been identified, but the discussions of solutions continues (Jaworska and Hoffmann, 2010; Piersma ; Rovida ). The complexity goes far beyond the direct comparison of 2 test methods for the same purpose as planned in our review, which has successfully been used for so-called one-to-one replacements of in vivo test method by a nonanimal test method (see eg, Spielmann ). However, assuming that a performance assessment will also be required for testing strategies, a systematic review approach could be equally applied. Therefore, our methodological adaptation to one-to-one comparisons will also be of value for more complex situations. For example, test methods addressing the same mechanistic event could be compared systematically or a combination of test methods could be compared to reference results (Kleinstreuer ). Although it can be anticipated that a substantial effort is required to assess toxicological test methods, either individually or in combination, the advent of artificial intelligence (AI) and machine learning (ML) bears the promise of increasing efficiency, ultimately enabling updates of existing systematic reviews in real time. This will make this application much more pragmatic, as compared to prospective studies, eg, when formally validating test methods according to international requirements (Hartung ; OECD, 2005). In addition, various methodological challenges remain that call for the adaptation of existing methodology, if not for new approaches. The necessary methodological solutions should adhere to the fundamental evidence-based principles of transparency, objectivity, and consistency, and should be agreed on by all interested stakeholders. As these solutions are developed, systematic review may become a standard tool for the retrospective evaluation of toxicological test methods.

SUPPLEMENTARY DATA

Supplementary data are available at Toxicological Sciences online.

DECLARATION OF CONFLICTING INTERESTS

The authors declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article. Click here for additional data file.

71 in total

1. Classification terms in developmental toxicology: need for harmonisation. Report of the Second Workshop on the Terminology in Developmental Toxicology Berlin, 27-28 August 1998.

Authors: I Chahoud; J Buschmann; R Clark; A Druga; H Falke; A Faqi; E Hansen; B Heinrich-Hirsch; J Hellwig; W Lingk; M Parkinson; F J Paumgartten; R Pfeil; T Platzek; A R Scialli; J Seed; R Stahlmann; B Ulbrich; X Wu; M Yasuda; M Younes; R Solecki
Journal: Reprod Toxicol Date: 1999 Jan-Feb Impact factor: 3.143

2. EFFECTS OF PARENTAL THALIDOMIDE TREATMENT ON GESTATION AND FETAL DEVELOPMENT.

Authors: R E STAPLES; D E HOLTKAMP
Journal: Exp Mol Pathol Suppl Date: 1963-12

3. EFFECTS OF THALIDOMIDE IN THE RAT FOETUS.

Authors: H J OBBINK; L M DALDERUP
Journal: Experientia Date: 1963-12-15

4. TERATOGENIC EFFECTS OF THALIDOMIDE IN RABBITS, RATS, HAMSTERS, AND MICE.

Authors: I D FRATTA; E B SIGG; K MAIORANA
Journal: Toxicol Appl Pharmacol Date: 1965-03 Impact factor: 4.219

5. SKELETAL MALFORMATIONS IN THE HOLTZMAN RAT EMBRYO FOLLOWING THE ADMINISTRATION OF THALIDOMIDE.

Authors: J J DWORNIK; K L MOORE
Journal: J Embryol Exp Morphol Date: 1965-04

6. A modular approach to the ECVAM principles on test validity.

Authors: Thomas Hartung; Susanne Bremer; Silvia Casati; Sandra Coecke; Raffaella Corvi; Salvador Fortaner; Laura Gribaldo; Marlies Halder; Sebastian Hoffmann; Annett Janusch Roi; Pilar Prieto; Enrico Sabbioni; Laurie Scott; Andrew Worth; Valérie Zuang
Journal: Altern Lab Anim Date: 2004-11 Impact factor: 1.303

7. Diagnosis: toxic!--trying to apply approaches of clinical diagnostics and prevalence in toxicology considerations.

Authors: Sebastian Hoffmann; Thomas Hartung
Journal: Toxicol Sci Date: 2005-02-02 Impact factor: 4.849

8. The principles of weight of evidence validation of test methods and testing strategies. The report and recommendations of ECVAM workshop 58.

Authors: Michael Balls; Patric Amcoff; Susanne Bremer; Silvia Casati; Sandra Coecke; Richard Clothier; Robert Combes; Raffaella Corvi; Rodger Curren; Chantra Eskes; Julia Fentem; Laura Gribaldo; Marlies Halder; Thomas Hartung; Sebastian Hoffmann; Leonard Schectman; Laurie Scott; Horst Spielmann; William Stokes; Raymond Tice; Drew Wagner; Valérie Zuang
Journal: Altern Lab Anim Date: 2006-12 Impact factor: 1.303

Review 9. Toward an evidence-based toxicology.

Authors: S Hoffmann; T Hartung
Journal: Hum Exp Toxicol Date: 2006-09 Impact factor: 2.903

10. Harmonization of rat fetal external and visceral terminology and classification. Report of the Fourth Workshop on the Terminology in Developmental Toxicology, Berlin, 18-20 April 2002.

Authors: Roland Solecki; Brigitte Bergmann; Heinrich Bürgin; Jochen Buschmann; Ruth Clark; Alice Druga; E A J Van Duijnhoven; Martine Duverger; James Edwards; Hannelore Freudenberger; Pierre Guittin; Palmira Hakaite; Barbara Heinrich-Hirsch; Jürgen Hellwig; Thomas Hofmann; Ulrich Hübel; Samia Khalil; Ana maria Klaus; Sabine Kudicke; Wolfgang Lingk; Tim Meredith; Mary Moxon; Simone Müller; Martin Paul; Francisco Paumgartten; Elke Röhrdanz; Rudolf Pfeil; Martina Rauch-Ernst; Jennifer Seed; Francois Spezia; Carolyn Vickers; Brigitte Woelffel; Ibrahim Chahoud
Journal: Reprod Toxicol Date: 2003 Sep-Oct Impact factor: 3.143

4 in total

1. Avoiding Regrettable Substitutions: Green Toxicology for Sustainable Chemistry.

Authors: Alexandra Maertens; Emily Golden; Thomas Hartung
Journal: ACS Sustain Chem Eng Date: 2021-06-01 Impact factor: 9.224

2. Pluripotent stem cell assays: Modalities and applications for predictive developmental toxicity.

Authors: Aldert H Piersma; Nancy C Baker; George P Daston; Burkhard Flick; Michio Fujiwara; Thomas B Knudsen; Horst Spielmann; Noriyuki Suzuki; Katya Tsaioun; Hajime Kojima
Journal: Curr Res Toxicol Date: 2022-05-13

3. The ECOTOXicology Knowledgebase: A Curated Database of Ecologically Relevant Toxicity Tests to Support Environmental Research and Risk Assessment.

Authors: Jennifer H Olker; Colleen M Elonen; Anne Pilli; Arne Anderson; Brian Kinziger; Stephen Erickson; Michael Skopinski; Anita Pomplun; Carlie A LaLone; Christine L Russom; Dale Hoff
Journal: Environ Toxicol Chem Date: 2022-04-26 Impact factor: 4.218

4. The EU-ToxRisk method documentation, data processing and chemical testing pipeline for the regulatory use of new approach methods.

Authors: Alice Krebs; Barbara M A van Vugt-Lussenburg; Tanja Waldmann; Wiebke Albrecht; Jan Boei; Bas Ter Braak; Maja Brajnik; Thomas Braunbeck; Tim Brecklinghaus; Francois Busquet; Andras Dinnyes; Joh Dokler; Xenia Dolde; Thomas E Exner; Ciarán Fisher; David Fluri; Anna Forsby; Jan G Hengstler; Anna-Katharina Holzer; Zofia Janstova; Paul Jennings; Jaffar Kisitu; Julianna Kobolak; Manoj Kumar; Alice Limonciel; Jessica Lundqvist; Balázs Mihalik; Wolfgang Moritz; Giorgia Pallocca; Andrea Paola Cediel Ulloa; Manuel Pastor; Costanza Rovida; Ugis Sarkans; Johannes P Schimming; Bela Z Schmidt; Regina Stöber; Tobias Strassfeld; Bob van de Water; Anja Wilmes; Bart van der Burg; Catherine M Verfaillie; Rebecca von Hellfeld; Harry Vrieling; Nanette G Vrijenhoek; Marcel Leist
Journal: Arch Toxicol Date: 2020-07-06 Impact factor: 5.153

4 in total