Literature DB >> 30715381

Fundamental Concepts for Semiquantitative Tissue Scoring in Translational Research.

Abstract

Failure to reproduce results from some scientific studies has raised awareness of the critical need for reproducibility in translational studies. Macroscopic and microscopic examination is a common approach to determine changes in tissues, but text descriptions and visual images have limitations for group comparisons. Semiquantitative scoring is a way of transforming qualitative tissue data into numerical data that allow more robust group comparisons. Semiquantitative scoring has broad uses in preclinical and clinical studies for evaluation of tissue lesions. Reproducibility can be improved by constraining bias through appropriate experimental design, randomization of tissues, effective use of multidisciplinary collaborations, and valid masking procedures. Scoring can be applied to tissue lesions (eg, size, distribution, characteristics) and also to tissues through evaluation of staining distribution and intensity. Semiquantitative scores should be validated to demonstrate relevance to biological data and to demonstrate observer reproducibility. Statistical analysis should make use of appropriate tests to give robust confidence in the results and interpretations. Following key principles of semiquantitative scoring will not only enhance descriptive tissue evaluation but also improve quality, reproducibility, and rigor of tissue studies.

Entities: Chemical Disease Gene Species

Keywords: bias; clinical; grading; pathology; preclinical; reproducibility; semiquantitative scoring; tissue scores

Mesh：

Year: 2018 PMID： 30715381 PMCID： PMC6927897 DOI： 10.1093/ilar/ily025

Source DB: PubMed Journal: ILAR J ISSN： 1084-2020

Introduction and Uses

Tissue evaluation is a common research tool used in basic science and[1-4] toxicological[5-10] and clinical studies.[11-14] Scoring of tissues changes or lesions can aid in assessing model phenotypes, disease pathogenesis, toxicities, and efficacy of therapies.[2,5,12-15] Morphological examination of tissues produces text descriptions and visual images that can be valuable to define initial group-specific differences; however, these observations are qualitative in nature and have limitations for rigorous group comparisons. In general, quantitative and semiquantitative approaches can be applied to tissues to produce scores that enhance the rigor of data. “Quantitative” scores are derived from measuring tissue parameters often using manual techniques or by using specialized software to analyze digital images[3,16,17] and yield a discrete numeric value on a continuous scale (eg, 0.3, 1.25, 4.5, etc.). In contrast, “semiquantitative” scores are assigned by an observer based on predefined morphologic criteria,3 and these whole number scores are, by definition, less precise than quantitative scores because they approximate relative changes. Semiquantitative scoring can be applied to macroscopic and microscopic tissue changes, allowing generation of robust data that are amenable to statistical analysis and evaluation of experimental groups. The goals of this paper are to introduce investigators to key ideas in reproducible semiquantitative scoring of tissues and guide them in finding additional resources for more detailed discussions and examples. For the remainder of this paper, “scores” and “scoring” will refer, unless otherwise specified, to semiquantitative methods. Integration of semiquantitative scoring in translational research can be useful in several situations.[3,4,18,19] First, semiquantitative scoring data are relatively inexpensive, because no software or computational tools are necessarily needed. Second, it can be a quick screening method to produce pilot data for grant applications or guide future research studies. Third, semiquantitative data can enhance the rigor of descriptive text. While annotated images and descriptive text may show apparent differences between groups, semiquantitative scores can provide a comprehensive overview of tissue changes for group comparisons. Lastly, semiquantitative data can be used to guide, corroborate, and validate observations or data obtained from other assays. Semiquantitative scoring can be used to acquire data in several scientific areas, and fundamentally the core concepts are similar.[3-8] In the preclinical area, which utilizes models (eg, animal, tissue/cell cultures, etc.) of human diseases/conditions, semiquantitative scoring is regularly used to compare experimental groups.[1,2,11,20] In the clinical area, semiquantitative scoring of human tissues (eg, cancers, tissue/cell cultures, etc.) is often used to help define disease diagnosis, pathogenesis, biomarkers, and clinical prognosis.[12-14] Semiquantitative scoring is also a key component of nonclinical toxicology studies,[5,10] which are performed to support regulatory agency submissions and thus have an inherently different purpose than preclinical investigative studies. Here, the goal is to evaluate the safety of the material being tested (ie, hazard identification and risk assessment) rather than to assess potential treatment efficacy. To support future clinical trials, all toxicity studies must be performed according to guidance documents from various regulatory agencies, such as the Food and Drug Administration. Additionally, the usage of consistent diagnostic terminology for each organ system in rodents and large animals is strongly recommended.[9,21] Collaboration with experienced toxicologists and toxicological pathologists is highly encouraged before investigators plan these types of studies to ensure the current regulatory guidelines are followed. Unless specified, the remainder of the paper will focus on foundational concepts for semiquantitative scoring emphasizing nontoxicologic translational studies (Table 1).

Table 1

Fundamental Concepts for Semiquantitative Scoring of Tissues

Principles	Resources/Examples
Bias control	Experimental design[4,27,32]
	Randomization[31,32]
	Expertise[23,32–34,50]
	Masking[3–5,51]
Methods	Lesions (size, shape, number, etc.)[13,20,41,44]
	Stains (incidence; intensity)[3,12,13,20]
	Scoring methods[4,43,45]
Evaluation	Biological validation[3,4,14,48]
	Validation of repeatability[52–54]
	Group comparisons[32,43,55–58]

Fundamental Concepts for Semiquantitative Scoring of Tissues

Bias Control

Statistician George Box once stated, “All models are wrong, but some are useful.”[22] To apply this quote in the context of translational research, modeling in itself (eg, animal models) is never fully identical to the condition being modeled (eg, human disease). Due to several factors (genetic diversity, comorbidities, etc.), even small cohorts of humans do not fully “model” the human condition. This is, in part, why large and multiple clinical trials are often required to test for efficacy and adverse effects of new therapeutics in humans. In research, studies that model the human condition should be constructed to be as useful and reproducible as possible; one way to do this is to guard against factors that are known to cause bias. In science, bias is a term applied to areas of subjectivity (from overt to subconscious) that can skew data and contribute to lack of scientific reproducibility, an unfortunate reality that has been increasingly recognized.[23-25] There are several ways to constrain bias when scoring tissues, and by using these precepts investigators can acquire more objective data.

Experimental Design

A critical step for reproducible science is to establish a strong foundation in sound experimental design.[4,23,26-29] Constraining bias early, at the experimental design stage, avoids downstream “junk in, junk out” problems and issues of “regret” that can lead to adverse and unexpected influences in the quality and analyses of tissues.[4,30] Considerations to address during the experimental planning stage include selection of the appropriate model (eg, species or strain), consideration of the appropriate controls (eg, matching with respect to age, sex, or litter), and calculation of the sufficient sample size needed for statistical significance. It can be helpful to revisit proper techniques for tissue collection as well as the different options available for fixation and storage because tissue handling variables can influence staining quality.[3,4,27,30] Staining techniques can also vary in consistency as a function of stain choice and by staining protocol. For example, the planning phase for a hypothetical experiment involving viral-induced inflammation in the lungs of a mouse should address whether there is sufficient tissue for multiple tests (eg, bronchoalveolar lavage, paraffin, and OCT embedded tissues, PCR, microarray, protein quantification, and viral culture). Novice investigators might make several invalid assumptions (eg, homogenous virus distribution in lungs, bronchoalveolar lavage collection does not affect other analyses, murine lung size will allow for ample tissue sampling, etc.) that can lead to incomplete and/or skewed data.[4] Early consultation with all key collaborators (especially pathologists) at the time of experimental design will ensure all needs are accounted for (eg, appropriate amount and type of tissue allocations) to prevent oversights.

Randomization

Randomization (“heterogenization”) is an important tool to prevent the introduction of treatment bias that arises from overly homogenized groups; this situation has been variably coined as litter effect, cage effect, or batch effect.[30-32] The introduction of such bias can sometimes happen in innocuous ways. For example, tissue harvest from a large cohort of animals will likely produce a wide range of times from onset of the experimental day until necropsy. If animal in one treatment group were necropsied early, before starting on the other group, tissue parameters such as liver glycogen stores (especially in fasted animals) could be affected and create artifactual group-specific bias. Randomization of all the groups (animals and their tissues) can mitigate bias introduced by the experimental procedures. Other examples of variables that could render a study nonrandomized include differential housing of subjects (single vs group) or subject/sample processing order. Any variable that is not randomized across treatment groups has the potential to confound the data.

Expertise

Bias may also be introduced into translational research in studies conducted without the support of expertise-specific collaborators to help plan, execute, and appropriately interpret the study.[33,34] Specifically, statistical and pathological analyses are common components in translational studies, but trained statisticians and board-certified pathologists are often omitted from these multidisciplinary teams, leading to data interpretations that are more prone to errors.[22,23,35] For tissue scoring, a designated “observer” must thoroughly examine samples and ascribe scores. Various biomedical personnel (including principal investigators, postdocs, and even students) have been assigned the role of observer to score tissues. This approach, which lacks the expertise of a board-certified pathologist trained in tissue interpretation, has been labeled as do-it-yourself pathology, a practice that has been associated with numerous publications with erroneous interpretations.[4,30,36-39] While observations made by biomedical personnel may be biologically accurate in some cases, it is important to note that tissue examination by nonpathologists (even those who are “scientific experts” for a particular disease) is not recommended. Nonpathologist observers are more prone to making Type I errors (ie, “false positives” often from inadequate consideration of other morphologically similar tissue changes) and Type II errors (ie, “false negatives” often from not recognizing unexpected tissue changes). Inclusion of experienced and board-certified pathologists, who are specially trained to examine and interpret tissue changes as part of the multidisciplinary team, can greatly enhance the quality of tissue evaluation and scoring.

Masking

Semiquantitative scoring depends on the judgment of an “observer,” exposing the evaluation to some level of bias. Masking (also known as blinding) is a method to keep the observer from knowing the treatment groups when assigning tissue scores. Experts at every level (even pathologists!) are at risk of having their judgment subliminally influenced by information cues from the study. Masking significantly reduces this possibility. There are several methods to mask observers to the experimental groups, each with advantages and disadvantages that have been previously reviewed.[3,4,40] Briefly, comprehensive masking prevents the observer from knowing any details about the study design, treatments, or grouping of samples at initial examination. This approach may seem unbiased and even useful upon first glance, but in reality can easily lead to false negatives and skewed interpretations. An alternative approach to comprehensive masking is group masking. Here, the study design, treatments, and goals are all transparent to the observer; however, the samples are each assigned into de-identified groups, so that the observer does not know which group had specific treatments. A final example is that of postexamination masking. In this approach, full transparency and access are allowed to all study-related information and slides. This is an important step, especially in new or poorly characterized models, to avoid missing subtle or unexpected treatment-related changes. Once the decision is made to score the tissues, the slides are masked to the observer and scores assigned. Masking should be a standard component that is defined in the methodology of all studies that use semiquantitative scoring. For each of these approaches, the observer should evaluate the scores and tissues after scoring in a nonmasked fashion to give confidence in the scoring system and interpretation of the results.

Methods

One of the major benefits of semiquantitative scoring is the transformation of descriptive (qualitative) observations into numerical data so as to allow statistical group comparisons and enrich data quality. A widely accepted premise for tissue scoring is the exhibition of at least three characteristics: it should be definable, reproducible, and produce meaningful results.[5] In translational studies, scoring is typically performed on tissues to detect treatment group differences. There are 2 major types of tissue changes that are targeted when scoring tissues: lesions and stains (or other labeling techniques). Some studies have used a merged scoring (ie, an average or sum of scores) approach in which multiple parameters are combined to form one final “composite” score, but if this approach is used it should have biological relevance.[3,12,41]

Lesions

A tissue lesion can be defined as an observed morphologic change that differs from control or normal tissue architecture. Lesions can be scored in many ways, such as size, shape, distribution, presence/absence, etc., depending on the expected disease-specific findings or tissue observations. Considerations for selecting the appropriate scoring parameter include a thorough examination of all tissues that catalogs the lesions seen; identification of lesion parameters (size, shaped, etc.) that appear to have chronological or group specific differences; and biological relevance to the pathophysiology of the model.

Stains

Another common approach is to score histochemically or immunohistochemically stained tissues or cells.[3,42] Here, the observer can assess either the distribution (eg, percent of stained cells) or intensity (eg, weak to robust) of the labeled cells.[12] Similar to considerations described for “lesions,” selection of a scoring parameter may be dependent on the staining presentation as well as the biology of the model. For example, a virus infection of the lung might warrant evaluation of the distribution of staining, whereas a TP53 marker might require staining intensity as a gauge of activation in benign vs malignant tumors.

Scoring Methods

Several methods of semiquantitative scoring have been discussed in recent reviews, and readers are encouraged to use these for more specific details.[3,4,6,41,43-45] While several types of semiquantitative scoring tests are available, ordinal scoring is by far the most common in translational research and will be further discussed here. Ordinal systems produce hierarchal or progressive numeral scores (also known as “grades” or “tiers”) that are reflective of the extent and/or severity of change. A mock example of this is an ordinal scoring method composed of whole numbers from 0 to 4 representing distribution of tissue necrosis in which 0 is normal, 1 is <25% necrosis, 2 is 25% to 50% necrosis, 3 is 51% to 75% necrosis, and 4 is >75% necrosis. Ordinal scoring systems should follow several key principles for enhanced reproducibility. First, the range of levels is recommended to be about 4 to 5; fewer than this decreases sensitivity to detect group differences and more than this reduces repeatability[3,5,6,43] Second, each progressive level should have well-defined descriptors (such as the percentage of tissue affected, as in the example above). Descriptors that are vague and subjective, such as 0 is normal, 1 is mild, 2 is moderate, and 3 is severe, should be avoided or include additional information to clearly discern each level. Score descriptors in an ordinal system can be defined by multiple lesion parameters (eg, inflammation, proliferation, necrosis), but in these situations reproducibility can sometimes be limited. Therefore, separating each lesion parameter into its own ordinal scoring system is often preferred. Third, ordinal scores are inherently discontinuous data that are not normally distributed (bell-shaped) and require nonparametric statistical analyses. Data that are normally distributed should be analyzed with parametric analysis (eg, paired or unpaired t tests). Many statistics software packages include tests for normality for determining whether a given statistical test will be valid for the dataset. It is not appropriate to use parametric analysis to analyze data derived from ordinal scoring systems.[3,4,46]

Evaluation

Biological Validation

For semiquantitative scoring to have purpose and relevance, it should have validation with biologically relevant data. In this evaluation, semiquantitative scores are tested for a correlation with biologically relevant data in the model.[4,47,48] If a significantly positive or negative correlation exists, then this confirms that the scoring system is relevant to the model. Conversely, if no correlation exists, then one has to question the use and utility of the scoring system for the model.

Validation of Repeatability

Another form of validation is that of repeatability by the observer, both intra-observer (same person scoring the data) and inter-observer (different people scoring the data).[3,4,49] Validation of repeatability gives confidence in the scoring system descriptors as it relates to the model and also gives confidence in its repeatable use by other laboratories.

Group Comparisons

Once the semiquantitative tissue scores are collected, appropriate statistical tests can be applied; these have been reviewed.[4,5,8,45,46] As mentioned above, appropriate expertise such as a statistician collaborator would be advantageous to guide proper statistical analyses of the data. Awareness of the type of data produced by semiquantitative scoring is very important because it guides the type of statistical tests used to give the most compelling interpretations of the study.[46] As alluded to above, ordinal scoring is not parametric in nature, and thus selection of nonparametric tests should be considered.

Summary

Semiquantitative scoring is a simple and relatively inexpensive approach to enhance descriptive/qualitative tissue data. Understanding common applications of semiquantitative scoring and the key concepts for repeatability will enhance scientific studies in translational research.

58 in total

1. Observer accuracy in estimating proportions in images: implications for the semiquantitative assessment of staining reactions and a proposal for a new system.

Authors: S S Cross
Journal: J Clin Pathol Date: 2001-05 Impact factor: 3.411

2. Drug development: Raise standards for preclinical cancer research.

Authors: C Glenn Begley; Lee M Ellis
Journal: Nature Date: 2012-03-28 Impact factor: 49.962

Review 3. Bias in research studies.

Authors: Gregory T Sica
Journal: Radiology Date: 2006-03 Impact factor: 11.105

4. Unbiased histological examinations in toxicological experiments (or, the informed leading the blinded examination).

Authors: Tom Holland; Christopher Holland
Journal: Toxicol Pathol Date: 2011-05-02 Impact factor: 1.902

5. International Harmonization of Nomenclature and Diagnostic Criteria (INHAND): Progress to Date and Future Plans.

Authors: C M Keenan; J Baker; A Bradley; D G Goodman; T Harada; R Herbert; W Kaufmann; R Kellner; B Mahler; E Meseck; T Nolte; S Rittinghausen; J Vahle; K Yoshizawa
Journal: Toxicol Pathol Date: 2014-12-21 Impact factor: 1.902

6. Animal models: Software for study design falls short.

Authors: David K Meyerholz; Alessandra Piersigilli
Journal: Nature Date: 2016-04-14 Impact factor: 49.962

Review 7. Domestic animal models for biomedical research.

Authors: A Bähr; E Wolf
Journal: Reprod Domest Anim Date: 2012-08 Impact factor: 2.005

8. Experimental lupus is aggravated in mouse strains with impaired induction of neutrophil extracellular traps.

Authors: Deborah Kienhöfer; Jonas Hahn; Julia Stoof; Janka Zsófia Csepregi; Christiane Reinwald; Vilma Urbonaviciute; Caroline Johnsson; Christian Maueröder; Malgorzata J Podolska; Mona H Biermann; Moritz Leppkes; Thomas Harrer; Malin Hultqvist; Peter Olofsson; Luis E Munoz; Attila Mocsai; Martin Herrmann; Georg Schett; Rikard Holmdahl; Markus H Hoffmann
Journal: JCI Insight Date: 2017-05-18

9. Successful Integration of the Histology Core Laboratory in Translational Research.

Authors: Katherine N Gibson-Corley; Christine Hochstedler; Mary Sturm; Janis Rogers; Alicia K Olivier; David K Meyerholz
Journal: J Histotechnol Date: 2012-04-01 Impact factor: 0.714

Review 10. Principles for valid histopathologic scoring in research.

Authors: K N Gibson-Corley; A K Olivier; D K Meyerholz
Journal: Vet Pathol Date: 2013-04-04 Impact factor: 2.221

11 in total

1. A Semi-quantitative Scoring System for Green Histopathological Evaluation of Large Animal Models of Acute Lung Injury.

Authors: Iran A N Silva; Nika Gvazava; Deniz A Bölükbas; Martin Stenlo; Jiao Dong; Snejana Hyllen; Leif Pierre; Sandra Lindstedt; Darcy E Wagner
Journal: Bio Protoc Date: 2022-08-20

2. Comparative Study of the Role of Interepithelial Mucosal Mast Cells in the Context of Intestinal Adenoma-Carcinoma Progression.

Authors: Tanja Groll; Miguel Silva; Rim Sabrina Jahan Sarker; Markus Tschurtschenthaler; Theresa Schnalzger; Carolin Mogler; Daniela Denk; Sebastian Schölch; Barbara U Schraml; Jürgen Ruland; Roland Rad; Dieter Saur; Wilko Weichert; Moritz Jesinghaus; Kaspar Matiasek; Katja Steiger
Journal: Cancers (Basel) Date: 2022-04-30 Impact factor: 6.575

3. Utility of CD138/syndecan-1 immunohistochemistry for localization of plasmacytes is tissue-dependent in B6 mice.

Authors: David K Meyerholz; Mariah R Leidinger; J Adam Goeken; Thomas R Businga; Allison Akers; Sebastian Vizuett; Courtney A Kaemmer; Jordan L Kohlmeyer; Rebecca D Dodd; Dawn E Quelle
Journal: BMC Res Notes Date: 2022-06-25

4. Long-term culturing of porcine nodose ganglia.

Authors: Shin-Ping Kuan; Kalina R Atanasova; Maria V Guevara; Emily N Collins; Leah R Reznikov
Journal: J Neurosci Methods Date: 2019-12-09 Impact factor: 2.390

5. Planning and Reporting of the Histomorphometry Used to Assess the Intestinal Health in Fish Nutrition Research-Suggestions to Increase Comparability of the Studies.

Authors: Ioannis N Vatsos
Journal: Front Vet Sci Date: 2021-04-21

10. Comparison of Semi-Quantitative Scoring and Artificial Intelligence Aided Digital Image Analysis of Chromogenic Immunohistochemistry.

Authors: János Bencze; Máté Szarka; Balázs Kóti; Woosung Seo; Tibor G Hortobágyi; Viktor Bencs; László V Módis; Tibor Hortobágyi
Journal: Biomolecules Date: 2021-12-23