Literature DB >> 30024384

Weight of Evidence for Hazard Identification: A Critical Review of the Literature.

Pierre Martin^1,2, Claire Bladier³, Bette Meek⁴, Olivier Bruyere⁵, Eve Feinblatt³, Mathilde Touvier⁶, Laurence Watier⁷, David Makowski⁸.

Abstract

BACKGROUND: Transparency when documenting and assessing weight of evidence (WOE) has been an area of increasing focus for national and international health agencies.
OBJECTIVE: The objective of this work was to conduct a critical review of WOE analysis methods as a basis for developing a practical framework for considering and assessing WOE in hazard identification in areas of application at the French Agency for Food, Environmental and Occupational Health and Safety (ANSES).
METHODS: Based on a review of the literature and directed requests to 63 international and national agencies, 116 relevant articles and guidance documents were selected. The WOE approaches were assessed based on three aspects: the extent of their prescriptive nature, their purpose-specific relevance, and their ease of implementation.
RESULTS: Twenty-four approaches meeting the specified criteria were identified from selected reviewed documents. Most approaches satisfied one or two of the assessed considerations, but not all three. The approaches were grouped within a practical framework comprising the following four stages: (1) planning the assessment, including scoping, formulating the question, and developing the assessment method; (2) establishing lines of evidence (LOEs), including identifying and selecting studies, assessing their quality, and integrating with studies of similar type; (3) integrating the LOEs to evaluate WOE; and (4) presenting conclusions. DISCUSSION: Based on the review, considerations for selecting methods for a wide range of applications are proposed. Priority areas for further development are identified. https://doi.org/10.1289/EHP3067.

Entities: Chemical Disease Gene Species

Mesh：

Substances：
Hazardous Substances

Year: 2018 PMID： 30024384 PMCID： PMC6108859 DOI： 10.1289/EHP3067

Source DB: PubMed Journal: Environ Health Perspect ISSN： 0091-6765 Impact factor: 9.031

Introduction

Risk assessment is usually characterized by four components: hazard identification, hazard characterization (including dose–response analysis), exposure assessment, and risk characterization. Identifying relevant hazards for subsequent consideration in dose–response analysis and risk characterization requires the assimilation and assessment of a wide range of different types of data (NRC 2014; OECD 2014; U.S. EPA 2014). Variations in conclusions drawn by different organizations on the potential of specific substances to cause hazards in such assessments have highlighted the need for greater consistency in the analysis of such data. Examples include variations among the conclusions of the European Food Safety Agency (EFSA), the U.S. Environmental Protection Agency (U.S. EPA 2017), and the International Agency for Research on Cancer (IARC) regarding the carcinogenicity of glyphosate (EFSA 2017) and among those of the EFSA, the ANSES, and the U.S. National Toxicology Program on the reproductive/developmental hazards of bisphenol A (ANSES 2015; U.S. NTP 2008). These variations have led to an increasing focus of national and international agencies on the robustness and transparency of expert-informed assessments (Hardy et al. 2015; OHAT 2015) as a basis for increasing the understanding and confidence of the relevant scientific community, stakeholders, and the public. Although the term “weight of evidence” (WOE) appears frequently in the scientific literature, it is often poorly and inconsistently defined, with limited documentation of the supporting expert-informed process and methodology (NRC 2014; Weed 2005). For example, WOE has long been referenced in a range of disciplines, including the medical sector, where it was introduced principally as a clinical decision-support tool for prioritizing knowledge of medical research, focusing on a critical review of the literature (Sackett et al. 1996). WOE assessment has also been widely referenced and applied in environmental health (Mandrioli and Silbergeld 2016; Krimsky 2005). In various disciplines, approaches to the assessment of WOE have evolved beyond a review of the literature to include expert-informed reviews and the integration of different types of information in a transparent and systematic manner (e.g., meta-analysis). The reviews of Linkov et al. (2009) and Rhomberg et al. (2013) described a wide range of approaches, ranging from those that are largely qualitative in nature (e.g., Guyatt et al. 2011a) to fully quantitative techniques (e.g., Gosling et al. 2013). The inclusiveness and organization of different approaches vary, with some including references to establishing lines of evidence (i.e., groupings of evidence of similar types to assess a hypothesis) and integrating evidence of different types (e.g., toxicological, epidemiological, and mechanistic data). Others address only integrating different types of evidence without reference to the prerequisite stages, such as identifying and selecting relevant evidence and establishing “lines of evidence” (LOEs). A framework has been proposed here, then, to support the selection of WOE methodologies, depending on the objectives and focus of assessments. The specific objective of this work was to propose harmonized approaches to assessing and communicating WOE in environmental, occupational, and food safety, as well as plant and animal health, for the French Agency for Food, Environmental and Occupational Health and Safety (ANSES). The review was limited to considering documented approaches to WOE assessment (interpreted here as the structured synthesis of evidence) and did not address issues related to the selection of experts and conflicts of interest. The scope and basis of the current review are broader than those of earlier reviews by, for example, Rhomberg et al. (2013), which was confined to chemical hazards to human health. The review addresses approaches relevant to a wide range of applications within the purview of ANSES, including, for example, microbiological quality. It includes not only an extensive review of the literature through PubMed and Scopus but also a focused consultation of 63 public health and environmental agencies worldwide and characterizes identified approaches in component stages of the proposed practical framework for WOE analysis, relevant to this broader range of assessments. Each approach is also rated according to three criteria assessing their prescriptive nature, relevance, and feasibility for screening of their potential for application within ANSES and possibly within other food and environmental safety agencies.

Methods

Both peer-reviewed journal articles and guidance developed by health and environmental agencies were considered in the review. To limit the search to WOE assessment in risk analysis, the query of the review was composed of the combination of two sets of terms using the AND operator, with the first one related to WOE and the second one related to risk analysis (Figure 1). Two databases were queried on 16 March 2015 (i.e., Scopus and PubMed), and the title, summary, and keyword sections were searched. Papers were excluded if they were published before 2010 and after March 2015 and in languages other than English or French, as were case studies, editorials, or papers without identifiable content related to WOE approaches.

Figure 1.

Keywords combined to produce the Scopus and PubMed search query.

Keywords combined to produce the Scopus and PubMed search query. Sixty-three national and international agencies or organizations performing risk assessment were also consulted to identify relevant guidance (Table S1). Additional documents identified from the lists of references in the selected articles and relevant reports were also reviewed. Titles and abstracts were screened by at least two people. Descriptions of the approaches based on extraction and assessment of relevant information were completed for each selected article by individual authors within their area of expertise and were reviewed collectively by all the authors of this manuscript. Critical aspects included the domain and scope of the study, the definition of terms (e.g., WOE) and the approach and methodology for WOE assessment, including the nature and number of considerations taken into account for the stage or stages of assessment addressed by the approach (Figure S1). Ranking considerations for the prescriptive nature, relevance, and feasibility of WOE approaches for ANSES evaluations. For application in assessment planning at ANSES, these descriptions were also considered collectively by the authors to characterize and relatively rank the following aspects (Table 1):

Table 1

Ranking considerations for the prescriptive nature, relevance, and feasibility of WOE approaches for ANSES evaluations.

Consideration	Rank	Ranking
Prescriptive nature	1	No explicit rules
	2	Some methodological elements for assessment and weighting defined but insufficiently detailed for non-expert users
	3	Implementation rules are well defined for most aspects of the WOE assessment
	4	Implementation rules are defined in sufficient detail to permit application by non-expert users
Relevance	1	The specificity of the methodology restricts its use to specialized aspects or applications of WOE assessment for which it was developed
	2	The methodology can be applied for a limited range of aspects or applications in hazard assessment within ANSES
	3	The methodology is applicable to most aspects or applications of a broader range of assessments of hazard within ANSES
	4	The methodology is sufficiently generic to be applicable to most aspects of a broad range of assessments of hazard within ANSES
Feasibility	1	Implementation of the method is resource intensive (complexity high) and requires considerable specialized expertise and/or material resources
	2	Implementation of the method impacts moderately on resources (moderate complexity), requiring some specific training
	3	Implementation of the method impacts minimally on resources and does not require specialized training, expertise and/or material resources
	4	Implementation of the method not anticipated to impact significantly on timeframe and resources for assessment

The “extent of their prescriptive nature” which contributes to transparency and reproducibility. This consideration addressed the degree of prescription of the factors assessed in considering the quality and subsequent weighting of studies and bodies of evidence and often derives from the extent of expert-informed experience in developing and applying the approach (i.e., approaches based on extensive application experience are often more prescriptive). Relative ranking was based on the extent to which considerations for implementation were precisely delineated and defined in the various approaches and ranged from “no explicit rules provided” () to “implementation rules defined in significant detail, facilitating use by non-experts” (). “Relevance” was related to the extent to which the approaches could be broadly applied within the types of assessments conducted within ANSES. For example, were they specific to specialized components or aspects of WOE consideration (e.g., mechanistic data), or were they more broadly applicable to all aspects of assessments of hazard commonly conducted within ANSES? Rankings ranged from “the specificity of the methodology restricts its use to a relatively narrow application for which it was developed” () to “the methodology is sufficiently generic to be broadly applicable to most aspects of a broad range of assessments of hazard within ANSES” (). “Ease of implementation” (feasibility) in terms of time and material/human resources, including the requirement for specific and often advanced methodological skills (modeling, statistics, etc.). Relative ranking of the ease of implementation was based on the extent of complexity of the approach and the associated nature and extent of required experience, skills, time, and material resources for application. Scores ranged from “resource intensive, requiring considerable resources and expertise” () to “limited requirement for specialized expertise, material resources and/or time” ().

Results

The study selection process is described in Figure 2. The study selection process is described in Figure 2, using PRISMA (Moher et al. 2009). In all, 643 articles identified in the Scopus and PubMed searches, 25 relevant reports from agencies, and 67 documents from the screening of the associated lists of references were retrieved. This corresponded to 663 documents after the removal of duplicates. We reviewed the titles and abstracts of the 663 documents and excluded 538 due to the lack of reference to WOE in the title or abstract or the lack of a description of a WOE approach. We reviewed the remaining 125 full-text articles for eligibility and excluded 9 because the reported data were not relevant to the objective of the paper. The remaining 116 documents formed the principal basis for the current review/analysis. Twenty-four relevant approaches were identified in the 116 selected documents (Table 2). A previous review of WOE frameworks was also identified in the selected documents, i.e., Rhomberg et al. (2013).

Figure 2.

Flow diagram of the study selection process using Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA, Moher et al. 2009).

Table 2

WOE Approaches identified in the literature.

Name	Description	Category	SR included	Form of evaluation^a	PF stages (step of stage)	Reference
AMSTAR	Assessment of syntheses of observational and clinical studies through the scoring (1–4) of 11 aspects	Method	Yes	Scoring	2 (2)	Kung et al. (2010); Pieper et al. (2015); Shea et al. (2007a, 2007b, 2009)
Bayesian inference	Statistical analysis combining expert knowledge (described by a prior probability distribution) with data to estimate a quantity of interest and analyze uncertainty	Method	No	Quantitative	3;4	BioBayes Group (2015); Gosling et al. (2013); Guha et al. (2013); Schleier et al. (2015); Spiegelhalter et al. (2004); Williams et al. (2011)
Bradford Hill	Qualitative consideration of causality (9 aspects) in epidemiological studies	Method	No	Qualitative	2 (3);3	ANSES (2012) Bergman et al. (2015); Guzelian et al. (2005); Hill (1965); Rothman and Greenland (2005); Vinken (2013)
Decision tree	Tool based on a tree-like graph describing options for various decision points	Method	No	Qualitative or quantitative	3	ANSES (2013a, 2013b); FAO/WHO (2001); Khosrovyan et al. (2015); Metcalfe (2005)
Epid-Tox	Grid based on a five-step process to evaluate the quality of epidemiological and toxicological studies, and their intersection, to establish causal inference	Method	No	Qualitative	2 (2);2 (3);3;4	Adami et al. (2011); ECETOC (2009)
FDA	Qualitative evaluation of individual studies in humans and of the total scientific evidence based on study type, quantity of evidence, relevance to the target population, replication of study results and overall consistency	Method	No	Qualitative	2 (1);2 (2);2 (3)	FDA (2009)
GRADE	Assessment of methodological flaws within the component studies, the consistency of results across different studies, the generalizability of research results to the wider patient base, and the effectiveness of treatments	Method	Yes	Scoring	1;2 (2);2 (3);4	Akl et al. (2007); Andrews et al. (2013a, 2013b); Balshem et al. (2011); Berkman et al. (2012); Guyatt et al. (2011a, 2011b, 2011c, 2011d, 2011e, 2011f, 2011g, 2011h, 2011i); HAS (2013); Kho and Brouwers (2012); WHO (2012)
Hope and Clarkson	Weighting and integration of information relating cause and effect to estimate the probability of an adverse outcome for an ecological assessment endpoint	Framework	No	Scoring	1;2 (2);2 (3);3;4	Hope and Clarkson (2014)
Hypothesis-based	Fully expert-dependent assessment for various hypotheses for hazard identification of chemical substances	Method	No	Qualitative	3;4	Bailey et al. (2016); Rhomberg (2015)
IARC	Assessment of the quality of individual studies based on “principles of good practice” without reporting templates. Four categories for classification of combined evidence on toxicology and epidemiology and three for mode of action. Expert dependent	Method	No	Qualitative	2 (2);2 (3);3;4	IARC (2006)
ILSI	Set of qualitative criteria to assess evidence on allergens proposed by the International Life Sciences Institute (ILSI) Europe Food Allergy Task Force	Method	No	Rank ordering	2 (2);2 (3)	Van Bilsen et al. (2011)
INCa	Criteria to assess evidence on nutritional factors and their associated cancer risk	Method	No	Qualitative	2 (3);3;4	INCa (2015)
Klimisch	Scoring of the quality of individual toxicological studies based on limited indicators for reliability, relevance, and adequacy of data	Method	No	Scoring	2 (2)	ECHA (2011); Klimisch et al. (1997); Money et al. (2013); Schneider et al. (2009)
Meta-analysis	Statistical analysis of data collected in separate but similar studies, leading to the estimation of the magnitude of an effect and associated confidence interval	Method	Yes	Quantitative	2 (3)	Chalmers et al. (2002); EFSA (2014); Goodman et al. (2010); Marvier et al. (2007, 2011); Moher et al. (2015); Murad et al. 2014
Modified Bradford Hill	Comparative analysis for alternative mode of action hypotheses based on rank ordering of a subset of Bradford Hill considerations, taking into account epidemiological, toxicological and mechanistic data	Method	No	Rank ordering	3;4	Boobis et al. (2006, 2008); Meek 2008; Meek et al. (2003, 2014a, 2014b); OCDE (2014)
Multi-criteria analysis	Expert-based quantitative judgment of quality of studies and their integration, including sensitivity and uncertainty analysis	Method	No	Quantitative or scoring	2 (2);2 (3);3;4	Hristozov et al. (2014a, 2014b); Linkov et al. (2009), 2011; U.S. EPA (1997, 2003)
Navigation Guide	Synthesis of results for the reproductive and developmental hazards of chemical agents in the research context through 4 steps focused principally on systematic review	Method	Yes	Scoring	2 (1);2 (3);3	Viswanathan et al. (2012); Woodruff and Sutton (2011, 2014)
NRC	Principal focus on systematic review	Framework	Yes	Scoring	1; 2 (1);2 (2);3; 4	NRC 2014
OHAT	Detailed documentation of components for all stages	Framework	Yes	Scoring	1; 2 (1); 2 (2); 2 (3);3; 4	Howard et al. (2014); OHAT (2015); Rooney et al. (2014); U.S. NTP (2015)
SCENIHR	Considerations to address individual studies in 3 categories for quality and relevance and 3 categories for coherence between studies of similar type with weighting of lines of evidence by utility/coherence	Framework	No	Scoring	2 (1); 2 (2); 2 (3);3; 4	SCENIHR (2012)
SR-Cochrane	Handbook for Systematic Reviews of Interventions. Planning: PICO for question formulation, detailed specification of search strategy, documentation of bias in study selection and presentation of results, their applicability, quality (in 4 categories) and outcome (EPICOT)	Method	Yes	Qualitative	1;2 (1); 2 (2);4	Bilotta et al. (2014); Higgins and Green (2011); Mandrioli and Silbergeld (2016); O'Connor et al. (2011); Schünemann et al. (2011)
SR-EFSA	Detailed planning, process and documentation of systematic review, including PICO, PECO, PIT and PO and selection of studies (modification of SR-Cochrane)	Method	Yes	Qualitative	1;2 (1);4	EFSA (2010)
WCRF/AICR	Classification regarding nutrition and cancer risk relationships. Evaluation of individual studies (epidemiological and mechanistic) based on good practice, meta-analysis of epidemiological studies identified through systematic review, and consideration of mechanistic data in relation to the biological plausibility of human data. Classification of WOE for each nutritional factor in 5 classes	Method	Yes	Qualitative	2 (2); 2 (3);3;4	WCRF/AICR (2014)
Weighted Bradford Hill	Estimation of the probability of causality in epidemiological studies through expert assessment of the extent of supporting data for each of 9 weighted Bradford Hill considerations	Method	No	Quantitative	2 (3);3	Swaen and van Amelsvoort (2009)

Note: AMSTAR, Assessing the Methodological Quality of Systematic Reviews; EFSA, European Food Safety Authority; FDA, U.S. Food and Drug Administration; GRADE, Grading of Recommendations Assessment, Development and Evaluation; IARC, International Agency for Research on Cancer; ILSI, International Life Sciences Institute; INCa, Institut National du Cancer/French National Cancer Institute; NRC, U.S. National Research Council; OHAT, Office of Health Assessment and Translation; PF, Practical Framework; SCENIHR, Scientific Committee on Emerging and Newly Identified Health Risks; SR, Systematic Review; WCRF/AICR, World Cancer Research Fund and American Institute for Cancer Research; WHO, World Health Organization.

Semiquantitative refers to approaches that include scoring and rank ordering of various components, without quantitation.

Flow diagram of the study selection process using Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA, Moher et al. 2009). WOE Approaches identified in the literature. Note: AMSTAR, Assessing the Methodological Quality of Systematic Reviews; EFSA, European Food Safety Authority; FDA, U.S. Food and Drug Administration; GRADE, Grading of Recommendations Assessment, Development and Evaluation; IARC, International Agency for Research on Cancer; ILSI, International Life Sciences Institute; INCa, Institut National du Cancer/French National Cancer Institute; NRC, U.S. National Research Council; OHAT, Office of Health Assessment and Translation; PF, Practical Framework; SCENIHR, Scientific Committee on Emerging and Newly Identified Health Risks; SR, Systematic Review; WCRF/AICR, World Cancer Research Fund and American Institute for Cancer Research; WHO, World Health Organization. Semiquantitative refers to approaches that include scoring and rank ordering of various components, without quantitation. Each of the methods/frameworks cited in Table 2 has been applied in one or more fields (Table 3). A wide range of approaches has been adopted in environmental health, food safety and nutrition, and medical applications. The most commonly adopted approaches based on the numbers of examples of applications in different domains are IARC classifications, followed by Bradford Hill considerations in the assessment of causality in epidemiological studies, modified Bradford Hill considerations in mode of action analyses, expert rule-based decision trees and systematic reviews proposed by EFSA (SR-EFSA). As expected, the methodologies proposed by the Cochrane Collaboration (SR-Cochrane) have only been adopted within the medical community, whereas Grading of Recommendations, Assessment, Development, and Evaluations (GRADE) guidelines have also been applied in the nutrition and environmental health fields.

Table 3

Areas of application for the 24 WOE approaches screened by the literature review.

Approaches^a	Safety at work	Food microbiology	Food chemistry	Nutrition	Animal feed and health	Environmental health	Crop protection products, biocides and fertilizers	Medical	Ecology-environment
AMSTAR								X
Bayesian inference						X	X
Bradford Hill	X			X		X		X
Decision tree		X		X		X			X
Epid-Tox			X			X		X
FDA			X	X
GRADE				X		X		X
Hope and Clarkson									X
Hypothesis based						X
IARC	X		X	X		X	X	X
ILSI				X
INCa				X				X
Klimisch			X			X	X
Meta-analysis				X		X		X
Modified Bradford Hill	X		X	X		X
Multi-criteria analysis	X					X			X
Navigation Guide						X
NRC			X			X
OHAT						X
SCENIHR						X
SR-Cochrane								X
SR-EFSA		X	X	X	X
WCRF/AICR				X				X
Weighted Bradford Hill						X

Descriptions of approaches and associated references are included in Table 2.

Areas of application for the 24 WOE approaches screened by the literature review. Note: AMSTAR, Assessing the Methodological Quality of Systematic Reviews; EFSA, European Food Safety Authority; FDA, U.S. Food and Drug Administration; GRADE, Grading of Recommendations Assessment, Development and Evaluation; IARC, International Agency for Research on Cancer; ILSI, International Life Sciences Institute; INCa, Institut National du Cancer/French National Cancer Institute; NRC, U.S. National Research Council; OHAT, Office of Health Assessment and Translation; PF, Practical Framework; SCENIHR, Scientific Committee on Emerging and Newly Identified Health Risks; SR, Systematic Review; WCRF/AICR, World Cancer Research Fund and American Institute for Cancer Research. Descriptions of approaches and associated references are included in Table 2.

Frameworks for Assessing the Body of Evidence

Of the 24 WOE approaches identified in Table 2, Hope and Clarkson (2014), U.S. National Research Council (NRC 2014), Office of Health Assessment and Translation (OHAT 2015), Scientific Committee on Emerging and Newly Identified Health Risks (SCENIHR 2012), and Rhomberg et al. (2013) addressed more than one stage of systematic data compilation, assessment and integration. These five approaches are subsequently referred to here as frameworks, and the other 19 are described as methods. Based on the definitions identified in the literature review (Table S2), LOE and WOE are defined in the practical framework proposed here as follows: An LOE is a set of relevant items of information of similar type grouped to assess a hypothesis; and WOE is the structured synthesis of lines of evidence, possibly of varying quality, to determine the degree of support for hypotheses. The term “strength of evidence,” although appearing in some of the selected documents, was defined differently in varying contexts related to WOE by different authors, e.g., as a constitutive element of WOE (Suter and Cormier 2011; Linkov et al. 2009) or as a distinct entity (EFSA 2010). Subsequently, no definition is elaborated here. The five identified frameworks differ in terms of the number of stages and level of detail (Figure 3). Three frameworks – namely, those of the NRC, Hope and Clarkson, and OHAT – address planning, scoping, problem formulation, and protocol development. Rhomberg et al. (2013) define causal questions and identify criteria for study selection. All five of the frameworks distinguish additional steps in establishing LOEs, namely, identification and selection of studies and an evaluation of their quality (based on specific criteria), and an assessment of LOEs. For all five frameworks, weighting and/or integrating one or more LOEs to assess WOE is addressed, and, lastly, conclusions are drawn. To support conclusions, SCENIHR adds an expression of uncertainty, and Hope and Clarkson estimate ecological risk based on WOE.

Figure 3.

Stages addressed in the five WOE frameworks identified in the literature.

Stages addressed in the five WOE frameworks identified in the literature. Based on this analysis, and in view of the broad scope of ANSES expert-informed evaluations, a practical framework including four main stages is proposed here (Figure 4). The four stages are as follows: planning the assessment, establishing LOEs, integrating LOEs, and expressing WOE conclusions. For each stage of this framework, the identified methods were considered according to the three aspects introduced above, namely, the extent of their prescriptive nature, relevance, and feasibility.

Figure 4.

Practical framework for weight of evidence assessment.

Practical framework for weight of evidence assessment. The aim of formally documenting assessment planning (stage 1) is to increase transparency in the focus and methodology selected for the assessment. This first stage has three operational steps: Scoping (i.e., determining the appropriate focus, based on the objectives and preliminary consideration of available data), Formulating the question(s) to be assessed, and Developing the protocol for WOE assessment Establishing LOEs (stage 2) also has three operational steps: Identifying and selecting studies Assessing the quality of the studies; and Analyzing a set of studies of similar type (epidemiological, toxicological, etc.) to establish LOEs. Stage 3 addresses the integration of data from available LOEs to establish WOE in order to determine the degree of support for hypotheses or to estimate quantities of interest. The objective of Stage 4 (i.e., the formal expression of conclusions) is an explicit presentation of WOE in a form that maximally supports decision-making.

Stages of WOE Addressed in the Identified Methods

Each method/framework identified in the literature addresses one or more key stages and steps of the WOE practical framework presented in Figure 4. Most address steps 2 (assessing the quality of the studies) and 3 (analyzing studies of similar type to establish LOEs) of stage 2, stage 3 (integration to establish WOE), and stage 4 (formal expression of conclusions). Few of them consider stage 1 (assessment planning) and step 1 of stage 2 (systematic identification and selection of studies) (see column on stages/steps addressed in Table 2).

Stage 1. Assessment planning.

Stage 1 is addressed in six approaches: Hope and Clarkson (2014), NRC (2014), OHAT (2015), GRADE (Guyatt et al. 2011a), SR-Cochrane (Higgins and Green 2011), and SR-EFSA (EFSA 2010). Hope and Clarkson, NRC, and OHAT differentiate between the three operational steps as shown in Figure 4.

Step 1. Scoping.

Hope and Clarkson describe the objective of scoping as defining environmental management objectives with stakeholders in neutral, precise, and measurable terms. NRC outlines these objectives as understanding the needs of clients in evaluating chemical products or processes. For OHAT, the aims are presented as identifying participants, evaluating the impact of conducting an evaluation and identifying the on-going and related components of the assessment to be developed. For GRADE, scoping is not considered, as this method is devoted to the examination of alternative clinical management strategies or interventions. For problem formulation in the consideration of environmental risk (including exposure and hazard), Hope and Clarkson suggest developing a conceptual model for each question and sub-question. These authors describe a conceptual model as a diagram that illustrates the succession of risk hypotheses based on predicted relationships among sources, stressors, exposures and assessment endpoint responses.

Step 2. Formulating the question(s).

NRC formulates the problem based on a matrix outlining the testing strategy (i.e., the nature of the effects to be investigated in specified testing protocols (in vivo, in vitro, etc.). OHAT and GRADE adopt the PECO reporting template (population, exposure, comparator, and outcome). The latter is derived from PICO elements (patient/problem/population, intervention, comparator, and outcome) promoting well-developed clinical questions in evidence-based medicine. In addition, NRC and OHAT propose working with a systematic review (SR) specialist but do not specifically outline the requirements for systematic review. The SR-Cochrane method has adapted the PICO reporting templates, whereas OHAT has developed PECOTS by adding the elements of time (T) and information on the setting of interest (S). SR-EFSA recommends additional templates for assessing the accuracy of a test result and quantifying a scenario of interest (prevalence, for instance) and has developed a method for completing the templates based on the literature.

Step 3. Developing the assessment protocol.

For protocol development in assessment planning, Hope and Clarkson and NRC list assessment methods to establish LOEs and key elements of a systematic review, respectively. OHAT has developed a reporting template for a detailed analysis that includes scoping elements, the PECO template, and a description of all methods of analysis, from evidence identification to the development and presentation of conclusions. OHAT also recommends the use of text-mining, e.g., SWIFT (Howard et al. 2014), to characterize the extent and nature of available data. SR-EFSA recommends that the criteria for study inclusion or exclusion, the methodology adopted for each step of stage 2, and the process for the conducting the review (i.e., the composition of the multidisciplinary team, the timetable, and allocated resources) be specified to reduce the risk of bias — thus limiting their later criticism — and to increase the level of repeatability. GRADE consists of distinguishing the importance of outcomes in three steps, i.e., specifying all potential patient-important outcomes in their endeavor, distinguishing between critical and important-but-not-critical outcomes and making judgments about the balance between the desirable and undesirable effects of an intervention. The approaches proposing a reporting template for one or more of the substeps (i.e., SR-EFSA and OHAT) are considered to be the most prescriptive and, as such, they promote transparency in assessment planning (Table 4). Generally, then, for assessment planning, GRADE, NRC, OHAT, SR-Cochrane, and SR-EFSA are equally prescriptive, but are more prescriptive than Hope and Clarkson (Table 4) because the latter authors do not propose a reporting template. NRC, OHAT, GRADE, SR-Cochrane, and SR-EFSA are considered broadly applicable or relevant, whereas the application of Hope and Clarkson is limited to the estimation of ecological risk.

Table 4

Ranking of the methods for planning the assessment.

Approach	Prescriptive nature^a	Relevance^a	Feasibility^a
GRADE	4	3	3
Hope and Clarkson	2	2	3
NRC	4	3	3
OHAT	4	3	3
SR-Cochrane	4	3	3
SR-EFSA	4	3	3

Note: GRADE, Grading of Recommendations Assessment, Development and Evaluation; EFSA, European Food Safety Authority; NRC, U.S. National Research Council; OHAT, Office of Health Assessment and Translation; SR, Systematic Review.

The rankings were assigned to the methods by the authors collectively and reflect relative consideration of each of the three aspects defined and outlined in the Methods and Table 1: the extent of prescriptive nature contributing to transparency and reproducibility, relevance to be broadly applied within ANSES, and ease of implementation in terms of time and material/human resources (feasibility). Each aspect is ranked from 1 (i.e., the least) to 4 (i.e., the most).

Ranking of the methods for planning the assessment. Note: GRADE, Grading of Recommendations Assessment, Development and Evaluation; EFSA, European Food Safety Authority; NRC, U.S. National Research Council; OHAT, Office of Health Assessment and Translation; SR, Systematic Review. The rankings were assigned to the methods by the authors collectively and reflect relative consideration of each of the three aspects defined and outlined in the Methods and Table 1: the extent of prescriptive nature contributing to transparency and reproducibility, relevance to be broadly applied within ANSES, and ease of implementation in terms of time and material/human resources (feasibility). Each aspect is ranked from 1 (i.e., the least) to 4 (i.e., the most).

Stage 2. Establishing LOEs.

Nineteen methods address the collection and consideration of data to establish LOEs.

Step 1. Identification and selection of studies.

Five methods/frameworks consider Step 1: the Navigation Guide (Woodruff and Sutton 2014), OHAT (2015), SR-Cochrane (Higgins and Green 2011), SR-EFSA (EFSA 2010), and Institut National du Cancer/French National Cancer Institute (INCa 2015), primarily through systematic literature review, the objective of which is to limit bias in the assembly, critical appraisal, and synthesis of all relevant studies. The principles adopted by SR-Cochrane, SR-EFSA, IARC, and OHAT are the use of at least two databases, the selection of studies by two independent reviewers, and the identification of the study selection criteria and data extraction format prior to the review. These approaches are considered prescriptive and relevant, but the requirement for considerable human resources makes them less feasible (Table 5). INCa is considered prescriptive and feasible. However, its relevance is limited to consideration of meta-analysis only in the establishment of LOEs.

Table 5

Ranking of the methods for establishing lines of evidence.

Approach	Identifying and selecting studies^a			Assessing the quality of the studies^a			Analyzing a set of studies of similar type^a
Approach	PN	REL	FEA	PN	REL	FEA	PN	REL	FEA
AMSTAR	NA	NA	NA	4	3	4	NA	NA	NA
Bradford Hill	NA	NA	NA	NA	NA	NA	2	4	4
Epid-Tox	NA	NA	NA	2	4	4	2	4	3
FDA	NA	NA	NA	3	4	4	2	3	3
GRADE	NA	NA	NA	4	3	3	2	3	4
Hope and Clarkson	NA	NA	NA	2	3	3	2	3	3
IARC	NA	NA	NA	2	4	4	2	3	4
ILSI	NA	NA	NA	2	3	3	3	2	3
INCa	3	2	4	3	2	4	NA	NA	NA
Klimisch	NA	NA	NA	2	3	4	NA	NA	NA
Meta-analysis	NA	NA	NA	NA	NA	NA	4	4	1
Modified Bradford Hill	NA	NA	NA	NA	NA	NA	3	3	3
Multi-criteria analysis	NA	NA	NA	2	4	3	2	4	3
Navigation Guide	1	3	2	1	3	4	1	3	3
OHAT	3	3	2	3	3	4	2	3	3
SR-Cochrane	3	3	2	2	4	4	NA	NA	NA
SR-EFSA	3	3	2	NA	NA	NA	NA	NA	NA
SCENIHR	NA	NA	NA	2	3	4	1	3	4
WCRF/AICR	NA	NA	NA	2	4	4	4	4	2
Weighted Bradford Hill	NA	NA	NA	NA	NA	NA	3	3	3

Note: AMSTAR, Assessing the Methodological Quality of Systematic Reviews; EFSA, European Food Safety Authority; FDA, U.S. Food and Drug Administration; FEA, Feasibility; GRADE, Grading of Recommendations Assessment, Development and Evaluation; IARC, International Agency for Research on Cancer; ILSI, International Life Sciences Institute; INCa, Institut National du Cancer/French National Cancer Institute; NA, Not applicable because the corresponding step was not addressed by the approach; NRC, U.S. National Research Council; OHAT, Office of Health Assessment and Translation; PF, Practical Framework; PN, Prescriptive nature; REL, Relevance; SCENIHR, Scientific Committee on Emerging and Newly Identified Health Risks; SR, Systematic Review; WCRF/AICR, World Cancer Research Fund and American Institute for Cancer Research.

Ranking of the methods for establishing lines of evidence. Note: AMSTAR, Assessing the Methodological Quality of Systematic Reviews; EFSA, European Food Safety Authority; FDA, U.S. Food and Drug Administration; FEA, Feasibility; GRADE, Grading of Recommendations Assessment, Development and Evaluation; IARC, International Agency for Research on Cancer; ILSI, International Life Sciences Institute; INCa, Institut National du Cancer/French National Cancer Institute; NA, Not applicable because the corresponding step was not addressed by the approach; NRC, U.S. National Research Council; OHAT, Office of Health Assessment and Translation; PF, Practical Framework; PN, Prescriptive nature; REL, Relevance; SCENIHR, Scientific Committee on Emerging and Newly Identified Health Risks; SR, Systematic Review; WCRF/AICR, World Cancer Research Fund and American Institute for Cancer Research. The rankings were assigned to the methods by the authors collectively and reflect relative consideration of each of the three aspects defined and outlined in the Methods and Table 1: the extent of prescriptive nature contributing to transparency and reproducibility, relevance to be broadly applied within ANSES, and ease of implementation in terms of time and material/human resources (feasibility). Each aspect is ranked from 1 (i.e., the least) to 4 (i.e., the most).

Step 2. Assessing the quality of the studies.

The quality of relevant studies considered in the establishment of LOEs is usually assessed according to the degree of transparency in the documentation of the methodology, analysis and results, and the degree to which potential methodological bias, such as information and selection bias, is considered. Alternatively, or in addition, quality is assessed by the extent and nature of the scientific data (e.g., whether supporting data are direct or indirect). Two types of assessment methods are presented in the literature, i.e., those with or without quantitative scoring. IARC (2006), WCRF/AICR (2014), SR-Cochrane (Higgins and Green 2011), and FDA (2009) are based on a qualitative evaluation of studies, i.e., without scoring. The evaluation criteria relate to good research practices for each area (epidemiology, toxicology, etc.). Epid-Tox (Adami et al. 2011) adopts criteria proposed by the U.S. EPA (2001) for evaluating toxicological studies and those from the European Centre for Ecotoxicology and Toxicology of Chemicals (ECETOC 2009) for assessing epidemiological studies in three categories: “reliable without restriction” (minimum limitations), “reliable with restrictions” (moderate limitations), and “not reliable” (limitations sufficient to be excluded from WOE assessment). The extent of prescription of the qualitative methods is generally low, a function of their being expert-dependent with a varying degree of transparency in the considerations of the resulting judgments. Their simplicity makes them feasible, and they are broadly applicable or relevant (Table 5). In multicriteria decision analysis (Linkov et al. 2011), Hope and Clarkson, GRADE, OHAT, International Life Sciences Institute (ILSI) (Van Bilsen et al. 2011), and Klimisch (Klimisch et al. 1997) attribute scores to individual studies, taking into account their quality. The tools proposed by GRADE and OHAT enable the classification of study quality on a qualitative scale based on a set of questions. For instance, the OHAT Bias Risk Tool is composed of eleven questions related to good research practices for various types of studies. A response to a question is expressed in terms of risk of bias (low, probably low, probably high, and high). Multicriteria decision analysis and Hope and Clarkson score individual studies based on a quantitative scale according to specific criteria. For example, Hope and Clarkson address five criteria, i.e., study quality, use of standard methods to design the study, site specificity, spatial representativeness, and temporal representativeness. Each of these criteria is scored in binary fashion for each study, with the value of 1 corresponding to criteria effectiveness for each of the five LOEs, each one addressing a specific aspect, i.e., endpoint/attribute association, exposure/response function, sensitivity to stressor, specificity to stressor, and quantification of response. The weighting for each LOE is then calculated by combining the criteria scores. None of these methods prescribes a quantitative threshold value for exclusion. Criteria specified by the Klimisch, Hope and Clarkson, and multicriteria decision analysis methods are less specific (i.e., less prescriptive) than those of GRADE and OHAT. Each of the methods considered in this section, i.e., multicriteria decision analysis, Hope and Clarkson, GRADE, OHAT, ILSI, and Klimisch, is considered relevant and feasible (Table 5). Assessing the Methodological Quality of Systematic Reviews (AMSTAR) and its revised version R-AMSTAR are the only methods considered here that address the quality of a synthesis of studies. The methodology is relatively prescriptive, delineating a questionnaire with eleven items to score, contributing to transparency and reproducibility of reviews. Although relevant for the assessment of syntheses of both clinical trials and observational studies, the method addresses only one component of one stage of the developed practical framework. The method is also considered feasible, requiring limited time to develop the score.

Step 3. Analyzing a set of studies of similar type.

Fourteen methods/frameworks include considerations for establishing LOEs of similar type. Meta-analysis and all methods based on meta-analyses of epidemiological studies are considered prescriptive, as specified elements of the considered studies must be sufficiently similar to enable their statistical analysis (Chalmers et al. 2002). These include WCRF/AICR, which systematically performs meta-analyses in all its nutrition–cancer evaluations; IARC, which commissions specific meta-analyses for selected topics, such as asbestos and ovarian cancer; or INCa, which performs systematic reviews of published meta-analyses on nutrition and cancer risk. Prescribed methods are transparent and reproducible and enable a quantitative synthesis of studies of similar type to estimate quantities of interest and to test hypotheses (Table 5). However, these methods are considered less feasible, as implementation is time-consuming and may require specialized computational and/or statistical skills. Multicriteria decision analysis requires selected experts to define specific considerations and their relative weighting. Although relevant to a broad range of applications, it is not prescriptive nor reproducible, due to its dependence on the judgment of selected experts. The outcome is therefore highly sensitive to the judgment of the participating experts, for whom selection criteria are often not specified or well described. The other methods/frameworks considered here, namely, Bradford Hill, IARC (for some other topics), Epid-Tox, FDA, GRADE, Hope and Clarkson, OHAT, and SCENIHR, are based on qualitative or semiquantitative approaches to establishing LOEs. Most of these methods/frameworks assign a level of evidence or confidence, utility or consistency, according to predefined scales. Bradford Hill considerations (or modifications thereof) continue to be applied when assessing causality of associations in epidemiological studies (e.g., IARC, Epid-Tox). These approaches are thus considered relevant and feasible, except GRADE, which is restricted principally to randomized controlled trials and meta-analysis. These approaches are not very prescriptive (or transparent), with results being highly sensitive to expert input; as such, they have a low degree of reproducibility (Table 5).

Stage 3. Integrating LOEs.

Nineteen methods/frameworks address the integration of LOEs to establish WOE. One category of methods used to integrate LOEs relies on statistical techniques. Bayesian inference (BioBayes Group 2015) is highly relevant to combining experimental data and expert opinion, but it is rather complex to implement. Thus, although it is prescriptive and relevant, this method is less feasible due to the complexity of accessing expert knowledge through elicitation and statistical methodology to combine experimental data with expert opinion (Table 6).

Table 6

Ranking of the methods for integrating lines of evidence.

Approach	Prescriptive nature^a	Relevance^a	Feasibility^a
Bayesian inference	3	4	2
Bradford Hill	2	4	4
Decision tree	1	3	3
Epid-Tox	2	4	3
Hope and Clarkson	3	3	3
Hypothesis based	2	3	3
IARC	3	3	4
INCa	3	3	4
Multi-criteria analysis	2	4	3
Modified Bradford Hill	3	3	3
Navigation Guide	1	3	3
OHAT	3	3	4
SCENIHR	2	3	4
WCRF/AICR	3	3	4
Weighted Bradford Hill	3	4	4

Note: IARC, International Agency for Research on Cancer; INCa, Institut National du Cancer/French National Cancer Institute; OHAT, Office of Health Assessment and Translation; SCENIHR, Scientific Committee on Emerging and Newly Identified Health Risks; WCRF/AICR, World Cancer Research Fund and American Institute for Cancer Research.

Ranking of the methods for integrating lines of evidence. Note: IARC, International Agency for Research on Cancer; INCa, Institut National du Cancer/French National Cancer Institute; OHAT, Office of Health Assessment and Translation; SCENIHR, Scientific Committee on Emerging and Newly Identified Health Risks; WCRF/AICR, World Cancer Research Fund and American Institute for Cancer Research. The rankings were assigned to the methods by the authors collectively and reflect relative consideration of each of the three aspects defined and outlined in the Methods and Table 1: the extent of prescriptive nature contributing to transparency and reproducibility, relevance to be broadly applied within ANSES, and ease of implementation in terms of time and material/human resources (feasibility). Each aspect is ranked from 1 (i.e., the least) to 4 (i.e., the most). A second category of approaches for integrating LOEs includes semiquantitative methods, i.e., modified Bradford Hill (Meek et al. 2014b) and Hope and Clarkson (2014), and qualitative approaches, i.e., IARC (2006), WCRF/AICR (2014), OHAT (2015), hypothesis-based (Rhomberg 2015), Epid-Tox (Adami et al. 2011), and INCa (2015), and SCENIHR (2012). These methods are relevant and feasible, with their extent of prescription varying depending on the nature of the expert-informed experience upon which they draw (i.e., for those where there is greater experience, the considerations to be taken into account in integration are often more precisely delineated, drawing on a larger number of documented examples). For example, assessing causality in epidemiological studies based on Bradford Hill considerations is commonly quite subjective, which limits the reproducibility of the evaluation (i.e., the results vary considerably, depending on the experts involved). In the modified Bradford Hill approach, as a basis for increasing the consistency and reproducibility of mode of action analyses, selected considerations have been modified for the specific application and precisely defined and rank ordered (i.e., weighted) by their relative importance, taking into account acquired experience. Examples of the types of datasets (integrating epidemiological, toxicological and mechanistic data) associated with higher or lower confidence are also provided. Of the qualitative methods, OHAT is the most prescriptive, drawing upon a number of previously documented approaches in clinical medicine. The quality of individual studies (Step 2 of Stage 2) is evaluated based on responses to up to 15 questions (depending on study type) to assess the risk of bias. In Stage 3, preliminary confidence scores developed on this basis are either downgraded through the assessment of 5 properties of the body of evidence (risk of bias, unexplained inconsistency, indirectness, imprecision, and publication bias) or upgraded based on the consideration of another 4 properties (large magnitude of effect, dose response, residual confounding, and cross-species/population/study consistency). A comparison of OHAT and IARC, through a feasibility study of their application in an ANSES assessment of airborne particulates, indicated that the more prescriptive nature of OHAT led to greater ease of application, consistency, and reproducibility (Table 6). A third category includes the decision tree method and multicriteria decision analysis. Multicriteria decision analysis is relevant when combining any type of data (qualitative or quantitative). However, considerable expert knowledge is required for its implementation to identify criteria and their associated weights (i.e., limited feasibility), and the results are highly expert-dependent. For the decision tree method, classification rules are expert-derived and based on acquired experience, taking into consideration diverse types of information, such as experimental studies, observations, and model outputs. Although feasible and relevant, decision trees are less prescriptive because there are no associated evaluation rules (Table 6).

Stage 4. Expressing WOE conclusions.

The 13 methods/frameworks reviewed here address the expression of WOE conclusions. Most methods/frameworks use four classes, with an additional class to indicate that the available data preclude evaluation. Examples of classification in methods/frameworks are presented in Table 7.

Table 7

Example of weight of evidence classifications.

Method/framework	Reference	Number of Classes	Class title
Bayesian Inference	Schleier et al. (2015)	NA	NA
Epid-Tox	Adami et al. (2011)	4	Likely, Uncertain, Uncertain but plausible, Unlikely (Used to qualify the causal relationship between the environmental factor and the disease condition)
GRADE	Andrews et al. (2013b)	4	Strong Against, Weak Against, Weak For, Strong For
Hope and Clarkson	Hope and Clarkson (2014)	5	Weak, Not indicated, Not indicated, Not indicated, Strong
IARC	IARC (2006)	5	Group 1: The agent is carcinogenic to humans Group 2A: The agent is probably carcinogenic to humans Group 2B: The agent is possibly carcinogenic to humans Group 3: The agent is not classifiable as to its carcinogenicity to humans Group 4: The agent is probably not carcinogenic to humans
Modified Bradford Hill	Meek et al. (2014a); OECD (2014)	3	Weak, Moderate, Strong
Multi-criteria analysis	Linkov et al. (2011)	6	Do nothing, Institutional control, Clay capping, Mechanical dredging, Hydraulic dredging, Hot spot dredging
NRC	NRC (2014)	5	Carcinogenic to humans, Likely to be carcinogenic to humans, Suggestive evidence of carcinogenic potential, Inadequate information to assess carcinogenic potential, Not likely to be carcinogenic to humans
OHAT	OHAT (2015)	5	(1) Known to be a hazard to humans, (2) Presumed to be a hazard to humans, (3) Suspected to be a hazard to humans, (4) Not classifiable as a hazard to humans, or (5) Not identified as a hazard to humans
SR–Cochrane	Higgins and Green (2011)	4	Very low, Low, Moderate, Strong
SR-EFSA	EFSA (2010)	NA	NA
SCENIHR	SCENIHR (2012)	5	Weighting not possible, Uncertain, Weak, Moderate, Strong
WCRF/AICR	WCRF/AICR (2014)	5	Convincing/Probable/Limited - suggestive/Limited - no conclusion/Substantial effect on risk unlikely

Note: EFSA, European Food Safety Authority; GRADE, Grading of Recommendations Assessment, Development and Evaluation; NA, Not applicable because the corresponding step was not addressed by the approach; NRC, U.S. National Research Council; OHAT, Office of Health Assessment and Translation; SCENIHR, Scientific Committee on Emerging and Newly Identified Health Risks; SR, Systematic Review; WCRF/AICR, World Cancer Research Fund and American Institute for Cancer Research.

Example of weight of evidence classifications. Likely, Uncertain, Uncertain but plausible, Unlikely (Used to qualify the causal relationship between the environmental factor and the disease condition) Group 1: The agent is carcinogenic to humans Group 2A: The agent is probably carcinogenic to humans Group 2B: The agent is possibly carcinogenic to humans Group 3: The agent is not classifiable as to its carcinogenicity to humans Group 4: The agent is probably not carcinogenic to humans Note: EFSA, European Food Safety Authority; GRADE, Grading of Recommendations Assessment, Development and Evaluation; NA, Not applicable because the corresponding step was not addressed by the approach; NRC, U.S. National Research Council; OHAT, Office of Health Assessment and Translation; SCENIHR, Scientific Committee on Emerging and Newly Identified Health Risks; SR, Systematic Review; WCRF/AICR, World Cancer Research Fund and American Institute for Cancer Research. Conclusions are illustrated or presented in different formats. OHAT presents the intermediate results in the form of graphs. NRC conducts an uncertainty analysis on WOE. SCENIHR expresses uncertainty in the WOE analysis in five classes (i.e., certain, probable, confident, possible, and uncertain). For multicriteria decision analysis, Hristozov et al. (2014a, 2014b) conducted a quantitative uncertainty analysis on data and expert judgments, whereas Linkov et al. (2011) conducted a sensitivity analysis on weightings and some input data. Although not necessarily increasing consistency, due to their being mostly dependent on varying expert input with the often-limited prescription of decision rules, these methods promote transparency in communicating the basis for the conclusion. With regard to communication of the outcome, SR-EFSA and SR-Cochrane specify the topics to be addressed in the discussion and conclusions. SR-Cochrane relies on EPICOT (i.e., the PICO structure completed outlining the current state of the evidence and the date of recommendation) to identify the need and priorities for research, whereas GRADE structures the conclusions according to PICO, both of which are addressed initially in problem formulation. Both PICO and EPICOT offer cohesive consideration of communication at the outset and throughout the assessment. All the examples of classifications reviewed here (Table 7) are considered relevant and feasible for expressing conclusions of WOE analysis, with varying degrees of prescription (Table 8).

Table 8

Ranking of the methods for expressing weight of evidence conclusions.

Approach	Prescriptive nature^a	Relevance^a	Feasibility^a
Bayesian inference	3	4	2
Epid-Tox	3	4	4
GRADE	3	4	4
Hope and Clarkson	3	3	3
IARC	3	4	4
Modified Bradford Hill	3	2	3
Multi-criteria analysis	3	3	3
NRC	3	4	3
OHAT	3	4	4
SR-Cochrane	4	4	3
SR-EFSA	4	4	3
SCENIHR	3	4	3
WCRF/AICR	4	3	4

Note: EFSA, European Food Safety Authority; GRADE, Grading of Recommendations Assessment, Development and Evaluation; IARC, International Agency for Research on Cancer; NRC, U.S. National Research Council; OHAT, Office of Health Assessment and Translation; SCENIHR, Scientific Committee on Emerging and Newly Identified Health Risks; SR, Systematic Review.

Ranking of the methods for expressing weight of evidence conclusions. Note: EFSA, European Food Safety Authority; GRADE, Grading of Recommendations Assessment, Development and Evaluation; IARC, International Agency for Research on Cancer; NRC, U.S. National Research Council; OHAT, Office of Health Assessment and Translation; SCENIHR, Scientific Committee on Emerging and Newly Identified Health Risks; SR, Systematic Review. The rankings were assigned to the methods by the authors collectively and reflect relative consideration of each of the three aspects defined and outlined in the Methods and Table 1: the extent of prescriptive nature contributing to transparency and reproducibility, relevance to be broadly applied within ANSES, and ease of implementation in terms of time and material/human resources (feasibility). Each aspect is ranked from 1 (i.e., the least) to 4 (i.e., the most). Based on the ratings presented in Tables 4, 5, 6, and 8, several of the methods performed well for the three criteria considered, i.e., OHAT, modified Bradford Hill, AMSTAR, and WCRF/AICR. Systematic reviews (i.e., SR-Cochrane, SR-EFSA) and meta-analysis methods are considered prescriptive and relevant but less feasible.

Discussion

The results of the review described here have illustrated that a wide range of methods is applied when assessing WOE in hazard identification, most notably and broadly in the environmental field (i.e., to assess effects on human and ecological health in the general environment). Developing and documenting the three assessment planning steps (i.e., scoping, question formulation and protocol for assessment) has contributed to the efficient and cohesive consideration of priority areas and their transparent communication in assessing WOE. Assessment protocols are designed considering appropriate and available associated resourcing, taking into account urgency, potential public, and environmental health impacts, available data, societal issues, and the level of acceptable uncertainty. Consideration of the relative resourcing of various stages in WOE assessment (i.e., how complex the approach is at each stage) should also be addressed, commensurate with their likely impact on the outcome (e.g., the more direct impact of a systematic assessment of integration based on prescriptive approaches at stage 3 versus a systematic review of the literature at stage 1). Depending on the issues addressed, the existing reporting templates (PICO, PECO, etc.) reviewed here have contributed to but have not fully delineated the nature of the required documentation. In addition, although conceptual models normally address risk resulting from identified pathways of exposure, similar figurative representation of the envisaged steps in assessing WOE in hazard identification may facilitate assessment planning, as part of formal planning in the iterative definition of areas of focus and critical questions and subquestions (U.S. EPA 1998, 2014). The development of reporting templates for delineating protocols in assessment planning would facilitate greater transparency, and potentially consistency, on the basis of the selection of specific approaches to weight of evidence assessment, depending on available resources. Aspects to be documented in assessment protocols include the type of literature review (namely, a formally systematic review), or an in-depth review taking into account the considerations proposed, for example, by EFSA (2010). Protocols for assessing the quantity and quality of available evidence, including sources and potential confidentiality, and for integrating conflicting results should be specified, as well as resources needed to carry out the review. The assessment protocol should also specify criteria for the inclusion and exclusion of relevant data based on consideration of the quality and weighting, for integrating studies of a similar type. The protocol for establishing and integrating LOEs should also be specified, along with estimated resources for conducting the assessment. Developing and completing prerequisite reporting templates for the assessment protocol would improve transparency for the rationale for selecting methods in WOE assessment, such as partial or full/directed systematic reviews, meta-analyses, and Bayesian inference. We analyzed the frameworks and methods (namely, to identify the extent of the prescriptive nature, relevance, and feasibility) and found that preferred methods are often the least feasible (i.e., the most complex requiring, for example, specialized expertise), due to limited resources (e.g., lack of expertise or time). This finding underscores the need for transparent, easily adaptable, and broadly applicable communicative methods that draw on collective expertise. Due to limited resources, it is expected that the application of the more complex approaches for which feasibility has been judged as low in the current study (e.g., Bayesian analysis and meta-analysis) will understandably be limited, based on careful consideration of the abovementioned factors, including the importance of the question at hand, urgency, and available resources. However, implementation of these complex approaches can lead to greater efficiency in public health protection through a more systematic allocation of resources than is currently made. The availability of a documented assessment protocol addressing delineated considerations in reporting templates should also enhance common understanding of (sometimes limited) objectives and facilitate the provision of early input to modify the selection of appropriate methods and allocation of associated resources. The results of this review have also indicated that the principles of the limited range of methods identified as being relevant to potentially the most influential stages of WOE assessment—namely, the later steps of stage 2 (integration within an LOE) and 3 (integration of LOEs)—are similar and relate essentially to expert-informed weighting of components. These methods range from qualitative to semiquantitative to fully quantitative. Expert-informed experience is derived from a formal analysis of previous examples in defining relevant considerations and their relative weighting. This analysis is distinct from expert judgment of an individual or group, for which relevant criteria and weightings are often not well specified. Bradford Hill considerations have figured prominently in this integration but have varied depending on the extent of their prescriptive nature and the field of application (e.g., epidemiological studies, mode of action or integration of epidemiological and toxicological data, based on the consideration of mode of action). The extent to which these methods have been prescribed, taking into account previous expert-informed experience, contributes most to their consistency and reproducibility. The approaches identified have been based on qualitative, semiquantitative, or quantitative techniques to establish LOEs and WOE, consistent with the WOE classification system proposed by Linkov et al. (2009). Although quantitative methods are more rigorous (i.e., prescriptive), their implementation (stages 2 and 3) requires specific knowledge of elicitation and statistical methodology. In contrast, purely qualitative methods for establishing LOEs, such as Bradford Hill considerations in assessing causality in epidemiological studies, require fewer resources to implement. However, their transparency is limited, often leading to different conclusions by different groups, the basis for which is unclear. Semiquantitative, more prescriptive methods, such as OHAT and modified Bradford Hill, offer, then, a valuable intermediate option that conserves resources but also increases the transparency and consistency of assessments. The delineation of conclusions in various defined classifications also contributes to transparency. The nature of these descriptions requires careful consideration, to avoid, as far as possible, the misinterpretation that higher classifications infer greater hazard; rather, they indicate greater preponderance of evidence. Brief, plain-language descriptions of the nature and extent of evidence and graphical illustrations may be preferred over less clear descriptors such as “probably,” “possibly,” and “potentially.” The results of this review have also indicated that methods have been broadly applied in some application fields, such as environmental health or human food and nutrition (cf. Table 3). However, the seeming lack of application in some fields for certain methods may be a function of specific assessment needs or, for example, the restricted date range of the literature review. For example, the sole method identified here as enabling an assessment of the quality of study syntheses is R-AMSTAR; all the other identified methods are based exclusively on the quality of individual studies to establish LOEs. Although R-AMSTAR has mostly been adopted in the medical sector, it could be applied in a range of disciplines, given the broad relevance of its rather generic contents. In other applications and disciplines (e.g., Plant Health), WOE analysis is not referenced. This finding relates in part to variations in terminology and requirements in different application fields (e.g., although not explicitly mentioned, the decision-support scheme for pest risk analysis developed by the European and Mediterranean Plant Protection Organization addresses WOE). Consequently, the current work contributes to relate primarily to considering the principles of existing methods and assessing their potential utility in a broad range of application areas relevant to the purview of ANSES. We considered a number of characteristics of each method as a basis for method selection in assessment planning, including the extent of prescription, relevance, and ease of implementation (feasibility). We additionally used a series of case studies for selected ongoing or completed assessments in a range of different applications at ANSES (ANSES 2017) to further evaluate the value of these characteristics to screening for planning and conducting assessments. Two limitations of the current study were the fact that we restricted our consideration of WOE analysis to hazard identification alone and the fact that the interrelationships between WOE and uncertainty analysis were not explicitly considered. We plan to develop and integrate these aspects in future research, in further consideration of the working group’s recommendations by ANSES. In addition, it is important to note that the scores developed for the prescriptive nature, relevance, and feasibility of various methods are meaningful in a relative context only and are limited to generalized considerations for assessment. They mostly reflect the extent and documentation of expert judgment and ease of application across a broad range of applications. Applying each of the methods to specific assessments is necessarily dependent on case-specific objectives and conditions, as indicated in problem formulation. The results of the current analysis indicate that ultimately, over the short term, transparency is critical in increasing confidence and, over the long term, is critical to potentially increasing consistency in WOE analysis, within defined constraints of assessment planning. Identified outstanding areas that are relevant for considering the quality of studies when establishing LOEs and their weighted integration include delineating criteria for the consideration of additional factors, such as selection criteria for experts.

Conclusions

The documentation of planning, taking into account factors outlined for each of the approaches reviewed here (namely, extent of prescription, relevance to the question at hand and ease of implementation), considerably increases transparency in the rationale for the justified adoption of different approaches based on factors such as urgency and the extent of available resources. This finding should increase common understanding of the constraints that provide a legitimate basis for variations in the approaches taken when considering WOE. Development and application of reporting templates for assessment planning outlining specific aspects to be addressed will likely increase common understanding of the appropriate nature of required transparency in the selection of assessment methods. The current review also highlighted the value of acquired experience in contributing to expert-informed prescription of the relevant factors to be considered in reporting templates, as a basis for increasing the transparency and defensibility of WOE analysis. This aspect is particularly important for establishing and integrating LOEs. All WOE assessments include elements of expert-informed judgments. However, transparency regarding the nature and basis of those judgments in individual assessments (often attributed to “expert judgment”) is often lacking. Developing prescriptive reporting templates based on collective expert experience increases common understanding of important elements for consideration and their relative weighting. This, in turn, contributes to more consistent evaluations, but necessarily requires that contributing experts be much more explicit about the factors being taken into consideration. For example, OHAT provides a relatively prescriptive and transparent approach to assessment planning, review and evaluation, which facilitates adoption and is likely to increase common understanding of relevant elements for consideration. The generic utility of this approach has been illustrated in assessments of the National Toxicology Program (U.S. NTP 2015) and in a range of case studies conducted by various organizations (e.g., EFSA and ANSES). However, it does not yet robustly address mechanistic data. Integrating experience from more mechanistically driven approaches, such as the application of modified Bradford Hill considerations as a basis for considering patterns of epidemiological, toxicological, and mechanistic data in mode of action analyses, may well inform its additional development. IARC classifications, on the other hand, result from the consideration of a much less prescriptive approach by a convened group of experts and, as such, reflect less documented and variable expert judgment, as does multicriteria decision analysis. Explicit criteria for selecting experts and process considerations concerning the weighting of their input seem essential to ensure greater transparency in these more judgment-dependent methodologies. However, reporting templates that draw much more broadly on previous collective experience, defining specific aspects taken into consideration and the nature of their relative weighting, are preferred. Prescriptive generic approaches providing an encompassing framework, such as OHAT, that draw broadly on an analysis of experience acquired in application and less on consensus expert opinion, are likely to offer the greatest transparency and consistency in WOE analysis. Specific issues identified, when planning an assessment, require a combination of these more generic frameworks with specialized approaches (e.g., those used to consider the extent of mechanistic support for competing hypotheses in mechanistically motivated integration of LOEs). Selecting expert-informed prescriptive approaches (versus consensus based on expert judgment) is likely to provide the greatest transparency and, potentially, the greatest consistency of evaluations within identified constraints. Click here for additional data file.

63 in total

Review 1. GRADE guidelines: 11. Making an overall rating of confidence in effect estimates for a single outcome and for all outcomes.

Authors: Gordon Guyatt; Andrew D Oxman; Shahnaz Sultan; Jan Brozek; Paul Glasziou; Pablo Alonso-Coello; David Atkins; Regina Kunz; Victor Montori; Roman Jaeschke; David Rind; Philipp Dahm; Elie A Akl; Joerg Meerpohl; Gunn Vist; Elise Berliner; Susan Norris; Yngve Falck-Ytter; Holger J Schünemann
Journal: J Clin Epidemiol Date: 2012-04-27 Impact factor: 6.437

2. GRADE guidelines: 2. Framing the question and deciding on important outcomes.

Authors: Gordon H Guyatt; Andrew D Oxman; Regina Kunz; David Atkins; Jan Brozek; Gunn Vist; Philip Alderson; Paul Glasziou; Yngve Falck-Ytter; Holger J Schünemann
Journal: J Clin Epidemiol Date: 2010-12-30 Impact factor: 6.437

Review 3. The adverse outcome pathway concept: a pragmatic tool in toxicology.

Authors: Mathieu Vinken
Journal: Toxicology Date: 2013-08-23 Impact factor: 4.221

Review 4. A survey of frameworks for best practices in weight-of-evidence analyses.

Authors: Lorenz R Rhomberg; Julie E Goodman; Lisa A Bailey; Robyn L Prueitt; Nancy B Beck; Christopher Bevan; Michael Honeycutt; Norbert E Kaminski; Greg Paoli; Lynn H Pottenger; Roberta W Scherer; Kimberly C Wise; Richard A Becker
Journal: Crit Rev Toxicol Date: 2013-10 Impact factor: 5.635

Review 5. Weight-of-evidence evaluation in environmental assessment: review of qualitative and quantitative approaches.

Authors: Igor Linkov; Drew Loney; Susan Cormier; F Kyle Satterstrom; Todd Bridges
Journal: Sci Total Environ Date: 2009-07-19 Impact factor: 7.963

6. GRADE guidelines: 5. Rating the quality of evidence--publication bias.

Authors: Gordon H Guyatt; Andrew D Oxman; Victor Montori; Gunn Vist; Regina Kunz; Jan Brozek; Pablo Alonso-Coello; Ben Djulbegovic; David Atkins; Yngve Falck-Ytter; John W Williams; Joerg Meerpohl; Susan L Norris; Elie A Akl; Holger J Schünemann
Journal: J Clin Epidemiol Date: 2011-07-30 Impact factor: 6.437

7. From Systematic Reviews to Clinical Recommendations for Evidence-Based Health Care: Validation of Revised Assessment of Multiple Systematic Reviews (R-AMSTAR) for Grading of Clinical Relevance.

Authors: Jason Kung; Francesco Chiappelli; Olivia O Cajulis; Raisa Avezova; George Kossan; Laura Chew; Carl A Maida
Journal: Open Dent J Date: 2010-07-16

Review 8. A weight of evidence approach for hazard screening of engineered nanomaterials.

Authors: Danail R Hristozov; Alex Zabeo; Christy Foran; Panagiotis Isigonis; Andrea Critto; Antonio Marcomini; Igor Linkov
Journal: Nanotoxicology Date: 2012-12-14 Impact factor: 5.913

9. A meta-analysis of effects of Bt cotton and maize on nontarget invertebrates.

Authors: Michelle Marvier; Chanel McCreedy; James Regetz; Peter Kareiva
Journal: Science Date: 2007-06-08 Impact factor: 47.728

Review 10. IPCS framework for analyzing the relevance of a noncancer mode of action for humans.

Authors: Alan R Boobis; John E Doe; Barbara Heinrich-Hirsch; M E Bette Meek; Sharon Munn; Mathuros Ruchirawat; Josef Schlatter; Jennifer Seed; Carolyn Vickers
Journal: Crit Rev Toxicol Date: 2008 Impact factor: 5.635

4 in total

1. Systematic Review and Weight of Evidence Are Integral to Ecological and Human Health Assessments: They Need an Integrated Framework.

Authors: Glenn Suter; Jennifer Nichols; Emma Lavoie; Susan Cormier
Journal: Integr Environ Assess Manag Date: 2020-04-28 Impact factor: 3.084

2. Multi-Strategy Assessment of Different Uses of QSAR under REACH Analysis of Alternatives to Advance Information Transparency.

Authors: Kazue Chinen; Timothy Malloy
Journal: Int J Environ Res Public Health Date: 2022-04-04 Impact factor: 3.390

3. Application of evidence-based methods to construct mechanism-driven chemical assessment frameworks.

Authors: Sebastian Hoffmann; Elisa Aiassa; Michelle Angrish; Claire Beausoleil; Frederic Y Bois; Laura Ciccolallo; Peter S Craig; Rob B M De Vries; Jean Lou C M Dorne; Ingrid L Druwe; Stephen W Edwards; Chantra Eskes; Marios Georgiadis; Thomas Hartung; Aude Kienzler; Elisabeth A Kristjansson; Juleen Lam; Laura Martino; Bette Meek; Rebecca L Morgan; Irene Munoz-Guajardo; Pamela D Noyes; Elena Parmelli; Aldert Piersma; Andrew Rooney; Emily Sena; Kristie Sullivan; José Tarazona; Andrea Terron; Kris Thayer; Jan Turner; Jos Verbeek; Didier Verloo; Mathieu Vinken; Sean Watford; Paul Whaley; Daniele Wikoff; Kate Willett; Katya Tsaioun
Journal: ALTEX Date: 2022-03-01 Impact factor: 6.250

4. A Survey of Systematic Evidence Mapping Practice and the Case for Knowledge Graphs in Environmental Health and Toxicology.

Authors: Taylor A M Wolffe; John Vidler; Crispin Halsall; Neil Hunt; Paul Whaley
Journal: Toxicol Sci Date: 2020-05-01 Impact factor: 4.849

4 in total