Literature DB >> 28492025

Validation studies of outcome measures in pemphigus.

Sarah Hanna^1,2, Minhee Kim^1,2, Dedee F Murrell^1,2.

Abstract

Pemphigus is a group of rare and potentially fatal autoimmune blistering diseases that are associated with auto-antibodies that target intercellular adhesion molecules. Incidence of pemphigus varies among populations, with the lowest incidence in Switzerland and Finland at 0.6-0.76 per million per year and the highest in Jewish communities at 16.1-32 per million per year. Pemphigus is associated with devastating morbidity and despite advancements in our understanding of the disease and a widening array of therapeutic options, no cure exists. The delay in the development of a cure may in part be attributed to the absence of a standardized and completely validated severity outcome measures to allow for high-quality multicenter control studies. Such a tool is necessary to define the best practice in clinical studies, allow for accurate comparisons between study results, justify drug use within the clinical setting, and reduce the cost burden that is associated with the use of ineffective therapies. Utilizing outcome measures that are not validated provides an opportunity to synthesize outcome measures with the intent to favor particular treatments and thus produce false conclusions. According to the COnsensus-based Standards for the selection of health Measurement INstruments (COSMIN) group, a validation of these measurement instruments requires investigating their responsiveness, reliability, and validity. More than 116 outcome measures exist to assess pemphigus severity, of which the Pemphigus Disease Area Index (PDAI), Autoimmune Bullous Skin Disorder Intensity Score (ABSIS), and Pemphigus Vulgaris Activity Score (PVAS) are the most comprehensively corroborated measures. With regard to validity and reliability, PDAI was unsurpassed by ABSIS and PVAS. Data indicate that ABSIS is more reliable than PVAS, but PVAS seems to have greater validity although the results are not consistent. PDAI, ABSIS, and PVAS have not yet had their responsiveness analyzed, which should be the next step to completely validate the outcome measures and conclusively determine which measure is superior.

Entities: Chemical Disease Gene Species

Keywords: autoimmune blistering diseases; autoimmune bullous skin disorder intensity score; outcome measures; pemphigus; pemphigus disease area index; validation

Year: 2016 PMID： 28492025 PMCID： PMC5419045 DOI： 10.1016/j.ijwd.2016.10.003

Source DB: PubMed Journal: Int J Womens Dermatol ISSN： 2352-6475

Introduction

This literature review will discuss the key features of pemphigus, illustrate the significance of validated scoring systems, outline previous responsiveness studies for other dermatological scoring systems, and discuss existing outcome measures for pemphigus. The purpose is to understand which scores are better for use in studies and clinical practice and what research remains to be conducted in this area.

Background

Pemphigus is a group of autoimmune vesiculobullous diseases that are associated with auto-antibodies that target intercellular adhesion molecules. The majority of these auto-antibodies are immunoglobulin (Ig) G that target the ectodomain of desmosomal cadherins and in doing so cause loss of keratinocyte-to-keratinocyte adhesion (i.e., acantholysis). Acantholysis leads to blister formation in the epidermis and patients may develop cutaneous flaccid bullae, erosions, or pustules and/or mucosal erosions. The mechanism whereby IgG auto-antibodies induce keratinocyte detachment is still widely debated. The two principal theories include steric hindrance, which implies direct interference with desmosomal adhesion, and triggering of intracellular signalling, which causes loss of keratinocyte adhesion (Evangelista et al., 2015). There is amassing evidence to support both theories and it is likely that both are significant to the pathogenesis of pemphigus. There are multiple pemphigus subtypes that possess characteristic clinical, histological, and immunologic features, seemingly due to distinctive desmosomal protein targets. These subtypes include pemphigus vulgaris (PV), pemphigus foliaceus (PF), endemic pemphigus foliaceus or fogo selvage (FS), paraneoplastic pemphigus (PNP), and IgA pemphigus (Evangelista et al., 2015). A diagnosis of pemphigus is reliant on clinical features, findings via lesional and perilesional biopsy (histopathology and direct immunofluorescence respectively), and serology (indirect immunofluorescence and enzyme-linked immunosorbent assay [ELISA]; Table 1).

Table 1

Summary of clinical, histologic, and immunologic findings in pemphigus

	Epidemiology	Clinical Features	Histopathology	Direct immune-fluorescence	Indirect immune-fluorescence	ELISA	Variants
PV	• Most common (in most populations) • Mostly middle-aged:, 50-60 years old, men and women affected equally	• At presentation: mucosal erosions (oropharyngeal and/or genital) • Mucosal lesions ➔ Pain when chewing and swallowing ➔ poor alimentation, weight loss, and malnutrition • Flaccid blisters on normal-looking or erythematous skin; palms and soles usually spared • Pruritus often absent • Nikolsky sign can be provoked	Suprabasilar split with acantholysis	Intercellular IgG deposition	Intercellular IgG deposition. Preferred substrate is monkey esophagus	• Dsg 3 auto-antibodies • Dsg 1 and Dsg 3 auto-antibodies	• Pemphigus vegetans • Pemphigus herpetiformis
PF	• Second most common form • PV incidence > PF incidence except in vicinities with endemic form • Mostly middle-aged: 50-60 years old, men and women affected equally • Endemic form mainly in children and young adults	• At presentation: superficial blisters + erosions on trunk and extremities • Common for blisters to rupture before presentation, thus examination reveals superficial crusting and erosions or erythematous patches • Usually seborrheic distribution • Nikolsky sign can be provoked • No mucosal involvement thus systemic symptoms absent	Subcorneal split with acantholysis	Intercellular IgG deposition	Intercellular IgG depositionPreferred substrate: normal human skin or Monkey esophagus	Dsg 1 autoantibodies	• FS (believed to have environmental trigger) • Pemphigus erythematosus • Pemphigus herpetiformis • Pemphigus vegetans
PNP	• Any age though mostly adults	• In setting of malignancy • Extensive mucositis • Polymorphic cutaneous lesions e.g. blisters, erosions, lichenoid lesions which may resemble other autoimmune blistering diseases • Bronchiolitis obliterans	Intraepidermal clefting with acantholysisDense lichenoid infiltrate + interface dermatitis + necrotic keratinocytes	Intercellular and/or basement membrane zone deposition of C3 and/or IgG	IgG intercellular depositionPreferred substrate: rat bladder	• Dsg 1 and Dsg 3 auto-antibodies • Auto-antibodies to plakin proteins (e.g., envoplakin and periplakin)
IgA pemphigus- subcorneal pustular dermatosis	Any age	• Grouped vesicles or pustules • Erythematous plaques with crusts • Annular, circinate, or herpetiform morphology • Trunk and proximal extremities most commonly involved • Mucosa usually spared	Subcorneal clefting and pustules + nominal acantholysisMixed dermal infiltrate	Intercellular IgA deposition	Negative in 50%Intercellular IgA depositionPreferred substrate: monkey esophagus	• Desmocollin 1 auto-antibodies • Target antigens likely non-desmosomal	• IgA/IgG subtype (demonstrates intercellular deposition of both IgG and IgA); atypical clinical and histologic manifestations due to heterogeneity of auto antigens (desmocollins, Dsg 1 and Dsg 3), associated with internal malignancies (lung cancer)
IgA pemphigus – intraepiderm-al neutrophilic dermatosis	Any age	• Grouped vesicles or pustules • Erythematous plaques with crusts • Annular, circinate, or herpetiform morphology • Trunk and proximal extremities most commonly involved Mucosa usually spared	Intraepidermal pustules + nominal acantholysisMixed dermal infiltrate	Intercellular IgA deposition	Intercellular IgA depositionPreferred substrate: Monkey esophagus	• Desmocollin 1 auto-antibodies • Desmoglein 1 and Desmoglein 3 auto-antibodies	• IgA/IgG subtype
Dsg = desmoglein; ELISA = enzyme-linked immunosorbent assay; FS = fogo selvage; IgA = immunoglobulin A; IgG = immunoglobulin G; PV = pemphigus vulgaris; PF = pemphigus foliaceus; PNP = paraneoplastic pemphigus.Sources: Evangelista et al., 2015, Hertl et al., 2006, Hertl and Sitaru, 2015, Mihai and Sitaru, 2007, Oiso et al., 2002, Santoro et al., 2013, Tsuruta et al., 2011, Yeh et al., 2003.

The incidence of pemphigus alters markedly between populations. This variability is due to the different genetic backgrounds and trigger factors that are associated with particular geographical locations. Most epidemiological studies concur that persons with Jewish ancestry are the most at risk to develop PV. However, the quality of these studies is impeded by their retrospective design and inability to ensure inclusion of all patients (Schmidt et al., 2015). Mortality associated with PV and PF dropped dramatically from 75% to 30% with the introduction of corticosteroid treatment in the 1950s. Subsequently, adjuvant use of immunosuppressant drugs in the 1980s decreased mortality rates further to approximately 5% of the study populations. Most recently, studies in Taiwan and the United Kingdom have demonstrated that a patient’s risk of death compared to a healthy control is 2-3 times greater, primarily because of infections and particularly pneumonia and septicemia (Huang et al., 2012, Langan et al., 2008, Schmidt et al., 2015).

Importance of scoring systems

Measurement in medicine is impeded by the absence of a consensus on the best instruments to utilize to characterize disease severity. Consequently, this results in non-comparable study outcomes, conceivably false conclusions, and non-evidence based practice. Scoring systems in dermatology are particularly challenging given the shortage of radiographic and laboratory findings that are known to correlate with disease severity (Gaines and Werth, 2008). Thus, generic instrumentation such as the Physician Global Assessment (PGA) are often utilized. The advantage of generic instrumentation is their versatility but their poor reproducibility, reliance on physician experience with the condition, and inability to capture the severity of illnesses that are localized to small areas (e.g., acne) are significant disadvantages (Albrecht and Werth, 2007). Disease-specific scoring systems provide superior accuracy and sensitivity compared with generic scoring systems, as has been proven with the Psoriasis Area and Severity Index (PASI) and Scoring Atopic Dermatitis (SCORAD) (Schram et al., 2012, Weisman et al., 2003). For pemphigus, there is a definite shortage of multicenter controlled studies that is widely attributed to the difficulty in objectively comparing therapeutic outcomes. A systematic literature review counted more than 116 different outcome measures for pemphigus severity that were used in 96 articles published during the preceding 25 years (Martin and Murrell, 2006). A standardized and validated scoring system is required to address this issue and used universally to: (1) quantify disease severity and progression for interventions in clinical studies (Gaines and Werth, 2008) and allow multidisciplinary discussion of cases (Loh et al., 2014); (2) justify drug use in clinical settings (Gaines and Werth, 2008); and (3) reduce financial costs by identifying and ceasing ineffective treatments (Gaines and Werth, 2008).

Importance of validation

Validation studies illustrate the responsiveness, reliability, and practicality of a tool with regard to its intended measure (Streiner et al., 2008). The use of unvalidated tools provides for the opportunity to produce incorrect study conclusions by utilizing scoring systems that are synthesized specifically in favor of particular treatments, as shown in a systematic review by Marshall et al. (2000) of schizophrenia scoring systems. The study uncovered that studies with unpublished scales were more likely to support a treatment over control. To address the issue of unvalidated measurement tools, an international Delphi study was held from 2006 to 2007 with 43 health status measurement experts, collectively known as the COnsensus-based Standards for the selection of health Measurement INstruments (COSMIN) group. The COSMIN group worked to establish a consensus on the measurement properties to include when validating Health-Related Patient-Reported Outcomes (HR-PROs) and their definitions, as well as standards and design requirements for the evaluation of the defined measurement properties. The study resulted in the isolation of three quality domains: reliability, validity, and responsiveness. Within each quality domain, there were one or more measurement properties (Fig. 1; Mokkink et al., 2006, Mokkink et al., 2010a, Mokkink et al., 2010b). While the study was designed specifically for HR-PROs such as quality of life measures, the key principles, quality domains, and definitions are applicable to validate disease severity tools. However, some of the described measurement properties in the COSMIN study, for example cross cultural validity, are not applicable to disease severity tools and thus have been excluded from our discussion.

Fig. 1

Domains to decipher the quality of a disease severity outcome measure (Mokkink et al., 2010a).

Quality domains

Reliability

Reliability ensures that the instrument is free from measurement error and provides insight in the inherent noise or variability of the score. The reliability quality domain contains three measurement properties: internal consistency (degree of association between the items), reliability (determined via consistency in score values between observers [inter-rater] and a single observer with sufficient delay between two scorings [intra-rater], and measurement error (changes in score that are not reflective of true changes in the intended construct [e.g., Standard Error of Measurement or SEM]).

Validity

The validity quality domain also consists of three measurement properties. The first, content validity, determines whether the content of the instrument reflects the intended construct and includes the subjective measure of face validity. For pemphigus, a key question that concerns content validity is whether added weighting for site increases the validity or if lesion count alone is sufficient as site distribution plays a role in severity. The second measurement property is construct validity, which aims to conclude that the instrument results represent the intended construct by assessing internal relationships and relationships to other instruments for the same construct. It can also identify discrepancies between pertinent groups. Finally, the criterion validity measures accuracy by determining how much the outcome measure reflects a golden standard to ensure minimal systematic and random bias.

Responsiveness

The domain of responsiveness has only one eponymous measurement property to illustrate the capability of the instrument and perceive change over time. It aims to assess how well the outcome measure detects true changes in the disease state rather than measurement error. It is otherwise referenced as sensitivity to change or discriminant validity and has significant ramifications on conclusions that are drawn with regard to the efficacy of therapies in clinical studies. Criteria in addition to those described by the COSMIN group are feasibility and cutoffs. Feasibility refers to the time taken to complete the scoring and the resources and/or costs needed to implement the instrument, which may have considerable implications on the outcome measure’s practicality. Disease severity cutoffs allow for a differentiation between mild, moderate, and severe disease status and may have important implications in clinical practice when identifying appropriate therapies and within clinical studies when drawing meaningful comparisons. Table 2 illustrates the degree of validation of commonly-used dermatological scoring systems by key measurement properties.

Table 2

Previous validation studies on commonly-used dermatological scoring systems

ABSIS = Autoimmune Bullous Skin Disorder Intensity Score; BPDAI = Bullous Pemphigoid Disease Area Index; CDASI = Cutaneous Dermatomyositis Disease Area and Severity Index; CLASI = Cutaneous Lupus Erythematosus Disease Area and Severity Index; EBDASI = Epidermolysis Bullosa Disease Activity and Scarring Index; EASI = Eczema Acvitity and Scarring Index; MCID = minimal clinically-important difference; PASI = Psoriasis Area and Severity Index; PDAI = Pemphigus Disease Area Index; SCORAD = Scoring Atopic Dermatitis.

1Rahbar et al., 2014; 2Murrell et al., 2008; 3Boulard et al., 2016; 4Shimizu et al., 2014, 5Pfutze et al., 2007; 6Gourraud et al., 2012; 7Puzenat et al., 2010; 8Tofte et al., 1998; 9Hanifin et al., 2001; 10Breuer et al., 2004; 11Schram et al., 2012; 12Sartorius et al., 2010; 13Loh et al., 2014; 14Jain et al., 2016; 15Wijayanti et al., 2014, Wijayanti et al., 2016; 16Murrell et al., 2012; 17Patsatsi et al., 2012; 18Lévy-Sitbon et al., 2014; 19Bonilla-Martinez et al., 2008; 20Albrecht et al., 2005; 21Klein et al., 2011; 22Goreshi et al., 2012; 23Anyanwu et al., 2015; 24Stalder et al., 1993; 25Schram et al., 2012; 26Angelova-Fischer et al., 2005; 27Langley and Ellis, 2004; 28Zhao et al., 2015.

Previous responsiveness studies for dermatological scoring systems

Responsiveness is essentially proving the validity of the change score and whether the direction and magnitude correlate to the expected results. To determine responsiveness, a longitudinal study design is required with at least two measurements administered and some validated external reference to illustrate whether the patient’s condition is improving, remaining stable, or deteriorating. This is significant in order to conclude whether the patient was truly stable, the instrument is not responsive, or whether the results are due to poor comparator instrument quality. The COSMIN study divided the design requirements in cases with a golden standard (items 1-7, and 15-18) and cases without golden standard (items 1-14; Fig. 2). In golden standard cases with dichotomous results, the preferred method to deduce responsiveness is a correlation between change scores using Receiver Operator Curve or sensitivity and specificity if the study instrument scores are also dichotomous (Mokkink et al., 2010a).

Fig. 2

Responsiveness checklist (with permission; open access; Mokkink et al., 2010a).

The COSMIN group also emphasize the need for specific hypotheses to be formulated beforehand and stated in the method section when a golden standard is not available. The hypothesis should predict direction (positive or negative) and magnitude (absolute or relative). A detailed hypothesis avoids bias by avoiding retrospective analysis and alternative explanations for weak correlations where the conclusion should be that the instrument is not responsive. Table 3 outlines previous responsiveness studies on commonly-used dermatological scoring systems and the methods employed to establish responsiveness.

Table 3

Previous responsiveness studies in dermatology

Instrument	Authors	Year	Sample Size	Method for Responsiveness
EBDASI	Jain et al.	2016	36	Utilized distribution and anchor-based methods.Distribution-based: Mean change of scores, standardized response mean, and standardized effect size utilized to illustrate the magnitude of change in activity and damage scores.Anchor-based: Pearson’s correlation coefficient utilized to determine the degree of correlation between change in EBDASI score and Likert scale of change.
BPDAI	Wijayanti et al.	2016	32	Physician subjective assessment: improved, stable, deteriorated. Paired t test with BPDAI to note statistical significance. To be responsive ➔ statistically significant between improved and deteriorated, not statistically significant when stable.
BPDAI	Patsatsi et al.	2012	39	Correlated BPDAI to BP180 titers at baseline, 3-month, and 6-month interval using Spearman’s rho correlation.
CLASI	Bonilla-Martinez et al.	2008	8	Utilized: correlations, linear regressions, and Wilcoxon rank sum and 1-sided signed rank exact testsThe difference between baseline score and day 56 (change scores) were recorded for CLASI activity and damage, and each correlated via Pearson correlation coefficient with the change score of1) Physicians assessment of patient’s global skin health 2) Patients self-assessment of global skin health 3) Pain 4) Itch
CDASI	Goreshi et al.	2012	35	Included the two consecutive visits with the greatest variance in PGA-activity for analysis. Responsiveness was measured via Standardized Response Mean (SRM), SRM = ratio of the mean differences (i.e., CDASI score before and after clinical change was noted) to the standard deviation of the differences
EASI	Breuer et al.	2004	Pimecrolimus (n = 129)Control (n = 66)	Following treatment with 1% pimecrolimus, EASI, IGA, SCORAD dropped significantly compared to the vehicle/control group as was depicted by t-test. Between-group comparisons established via Cochran-Mantel-Haenzel test.Close correlation between pairs of EASI, IGA, and SCORAD depicted via Pearson test.
EASI, SCORAD	Schram et al.	2012	143	Mean scores of EASI and SCORAD were correlated to mean scores of IGA and PGA within each treatment group per time point. Then ROC was utilized.

BPDAI = Bullous Pemphigoid Disease Area Index; CDASI = Cutaneous Dermatomyositis Disease Area and Severity Index; CLASI = Cutaneous Lupus Erythematosus Disease Area and Severity Index; EBDASI = Epidermolysis Bullosa Disease Activity and Scarring Index; EASI = Eczema Acvitity and Scarring Index; IGA = investigator global assessment; PGA = physcian global assessment; ROC = receiver operating characteristic; SCORAD = Scoring Atopic Dermatitis.

Minimal clinically-important difference

It is important to distinguish minimal clinically-important difference (MCID) from responsiveness. MCID is the smallest change in an instrument score that correlates to a meaningful clinical difference. It is an inappropriate measure of responsiveness because it is simply about the interpretation of a change score as opposed to validity (Mokkink et al., 2010a). MCID usually revolves around patient perception although variations can include MCID obtained through a clinical report, change in clinical parameter, and effect size. For disease severity outcome measures, it is standard for MCID to be derived from some form of physician global assessment or a related tool. There are up to nine methods to identify MCID, some that anchor solely on external criteria while others utilize internal values. The results can vary enormously based on the method used (Cook, 2008). Table 4 illustrates previous studies and methods utilized to establish MCID.

Table 4

Previous MCID studies in dermatology

Instrument	Authors	Year	Sample Size	Method for MCID
EBDASI	Jain et al.	2016	36	Pearson correlation coefficient > 0.3 between Likert scale and EBDASI, thus sufficient to determine MCID.- MCID derived via linear regression analysis setting a responder score of 3 on Likert Scale. - MCID also calculated via ROC curves with responder score of 3 on Likert scale. - To account for baseline severity MCID analysis was also conducted utilizing ROC on percentage change in activity scores (change in activity score divided by baseline activity score)
BPDAI	Wijayanti et al.	2016	32	Average signed change in BPDAI of responders (determined by physician subjective assessment: improved, deteriorated, stable). Confirmed via ROC at/around this cut-off value
CLASI	Bonilla-Martinez et al.	2008	8	Clinical cut points which represent minimal clinically meaningful change (responders) were determined- PGA-VAS: change of 2 points - Pain: change of 2 points - Itch: change of 2 points - Patients global skin health rating: change of 3 points For each measure, Wilcoxon rank sum tests were used to compare CLASI change in responders vs. non-responders
CDASI	Anyanwu et al.	2015	128	Utilized PGA-VAS with a clinical cut point of 2 for responders and less than 2 for non-responders. Used ROC curve to determine the change score which correlated with responders
EASI, SCORAD	Schram et al.	2012	Data from three randomized control studies on atopic eczema treatments n = 143	Responders were defined as in improvement or decline greater than or equal to 1 in PGA and IGA. ROC utilized. > 0.7 = fair, > 0.8 = good, > 0.9 = excellent responsiveness

BPDAI = Bullous Pemphigoid Disease Area Index; CDASI = Cutaneous Dermatomyositis Disease Area and Severity Index; CLASI = Cutaneous Lupus Erythematosus Disease Area and Severity Index; EBDASI = Epidermolysis Bullosa Disease Activity and Scarring Index; EASI = Eczema Acvitity and Scarring Index; IGA = investigator global assessment; MCID = minimal clinically-important difference; PGA = physcian global assessment; ROC = receiver operating characteristic; SCORAD = Scoring Atopic Dermatitis; VAS = visual analogue scale.

Scoring systems for pemphigus

Pemphigus Disease Area Index

PDAI was published by the International Pemphigus Definitions Group (IDPG) in 2008 (Fig. 3). The IDPG held five consensus meetings between 2006 and 2008 to establish consensus definitions and develop a scoring system for pemphigus that was molded from the Cutaneous Lupus Erythematosus Disease Area and Severity Index. The IDPG panel consisted of experts on autoimmune blistering diseases, led by Victoria Werth and Dedee Murrell (Murrell et al., 2008).

Fig. 3

Pemphigus Disease Area Index (with permission; license number 3921300270111; Rosenbach et al., 2009).

PDAI scores can range from 0 to 263, comprised of 250 points for disease activity (120 for skin, 10 for scalp, and 12 for mucosa) and 13 points for damage. For activity, the size and number of lesions in each area play a role in the calculation of points assigned. The damage score reflects post-inflammatory hyperpigmentation. A considerable advantage of this scoring system is its sensitivity to small lesion numbers, which increases inter-rater reliability (Zhao and Murrell, 2015). Furthermore, PDAI does not take body surface area (BSA) or lesion type into account, which are both arduous to evaluate, cannot capture mild amounts of disease activity, and can potentially exacerbate small variations between raters.

Autoimmune Bullous Skin Disorder Intensity Score

ABSIS is a generic, AIBD outcome measure produced by the German Blistering Group (Pfutze et al., 2007). The scores can vary from 0 to 206, of which 150 points represent skin involvement, 11 points for oral involvement, and 45 points for subjective discomfort. ABSIS uses the rule of nines and rule of palms to establish BSA, and BSA and lesion type are weighting factors.

Pemphigus Vulgaris Activity Score

PVAS was created by Chams-Davatchi et al. (2013) and produces scores between 0 and 18 with 11 points for cutaneous and 7 points for mucosal involvement. Lesion type, lesion number, and distribution all contribute to the score (Fig. 4). Compared with PDAI, PVAS places less emphasis on the head and greater emphasis on the limbs. PVAS also takes into account Nikolsky’s sign and thus is more susceptible to variability based on the expertise of the rater.

Fig. 4

Pemphigus Vulgaris Activity Score (with permission; open access; Chams-Davatchi et al., 2013)

Harman’s scoring system

Harman’s scoring system was created by Harman et al. (2001) in the United Kingdom and system scores are based on the number of skin and oral erosions (Fig. 5). Harman et al. related these scores with anti-desmoglein (Dsg) 1 and anti-Dsg3 ELISA and noted that there was a correlation between severity and Dsg antibody levels.The use of this scoring system is limited by the lack of validation studies and the poor sensitivity to BSA involvement and anatomical distribution as scores are awarded irrespective of site and size.

Fig. 5

Harman’s severity scoring system (Harman et al., 2001).

We could not identify any studies to illustrate the reliability, validity, responsiveness, feasibility, or severity cutoffs for the Harman grading system.

Validation studies on pemphigus scoring systems

Rosenbach et al. conducted a study at the University of Pennsylvania to demonstrate the inter- and intra-rater reliability and convergent validity of PDAI and ABSIS. The study was conducted with the assistance of ten dermatologists who specialize in AIBD to score 15 patients with pemphigus using the PDAI, ABSIS, and PGA scoring systems. The study was limited because the majority of patients had stable disease. Nonetheless, the study demonstrated that PDAI has strong intra- and inter-reliability with an intra-class correlation coefficient (ICC) of 0.98 (95% confidence interval [CI]: 0.96-1.0) and 0.76 (95% CI: 0.61-0.91), respectively. ABSIS intra-and inter-rater reliability had an ICC of 0.80 (95% CI: 0.65-0.96) and 0.77 (95% CI: 0.63-0.91), respectively. Thus, inter-rater reliability for PDAI and ABSIS was almost indistinguishable. However, it is important to emphasize that despite of this, capturing low disease activity (both Dsg and ELISA negative) was different with PDAI. The difference in inter-rater reliability between PDAI and ABSIS became more apparent when only the objective skin activity scores were compared: ICC of 0.86 (95% CI: 0.76-0.95) for PDAI versus 0.39 (95% CI: 0.17-0.60) for ABSIS. This study also illustrated good convergent validity between PDAI and PGA with a Spearman’s rho correlation of 0.6 (95% CI: 0.49-0.71) compared with the poorer convergent validity of ABSIS of 0.43 (95% CI: 0.30-0.55; Rosenbach et al., 2009). Independent of this study, Chams-Davatchi et al. (2013) asked five experts to score 50 patients with PV to illustrate that PVAS has a superior convergent validity of 0.75 with PGA. Rahbar et al. (2014) conducted a study independent of the IDPG and German blistering group and produced unbiased results of ABSIS and PDAI in comparison with their PVAS scoring system. The study had a sizeable sample size of 100 patients with active lesions. The study produced higher values for inter-rater reliabilities, which may be due to the increased sample size in the study and perhaps demonstrated a learning curve by dermatologists in the application of scoring systems over the preceding 5 years. The results showed an ICC of 0.98 (95% CI: 0.97-0.98), 0.97 (95% CI: 0.96-0.98), and 0.93 (95% CI: 0.9-0.95) for PDAI, ABSIS, and PVAS inter-rater reliability, respectively, and thus illustrated that PDAI and ABSIS are the most reproducible with almost identical ICC rates. However, this study also considered ICC rates by range and the lower range (anti-Dsg1 and anti-Dsg3 negative; n = 10) was only statistically significant for PDAI with an ICC of 0.96 (95% CI: 0.93-0.98). This illustrates that PDAI is more reliable for low disease activity than PVAS and ABSIS. Convergent validity against anti-Dsg1 titers was the highest for PDAI, producing a Spearman's rho correlation of 0.67 (p < 0.001), 0.33 (p = 0.002), and 0.52 (p < 0.01) for PDAI, ABSIS, and PVAS, respectively. Convergent validity as determined by anti-Dsg3 titers was poor for all three instruments with an ICC of 0.35 (p = 0.001), 0.33 (p = 0.002), and 0.35 (p = 0.001) for PDAI, ABSIS, and PVAS, respectively (Rahbar et al., 2014). Two studies have investigated severity cutoffs for PDAI to divide patients based on disease severity into mild, moderate, and severe categories. The first study was conducted in Japan and utilized the physician’s subjective impression of the disease state (mild, moderate, severe) and correlated this with the PDAI score to establish cutoffs using the Youden Index. The values obtained were identified as mild (0-8), moderate (9-24), or severe (≥ 25; Shimizu et al., 2014). In contrast, an independent French study by Boulard et al. (2016) calculated severity cutoffs by identifying scores that correlated with the 25th and 75th percentiles of the scores, which resulted in significantly higher cutoffs. The results obtained were identified as mild (0-14), moderate (15-44), or severe (≥ 45). In their discussion, the researchers justify this discrepancy by sample selection, stating that Shimizu et al. recruited newly-diagnosed and previously-treated patients with minimal severe cases while they enlisted newly-diagnosed, non-treated cases and thus with greater severity (Boulard et al., 2016). It is also possible that there are population-based variabilities or that the differing methods that were utilized make one study more vulnerable to become skewed. No reviews or studies exist that outline the best method to calculate severity cutoffs and it seems the method can have significant implications on the results, particularly if the population is concentrated in a particular subgroup. Boulard et al. also obtained cutoffs of 17 and 53 to distinguish the three groups for ABSIS using the same method. The mean time to complete PDAI, ABSIS and PVAS is 2.9 minutes (standard deviation [SD] 1.3 min), 1.9 minutes (SD 1.1 min) and 1.1 minutes (SD 0.7 min), respectively, which makes PVAS the fastest instrument to complete (Rahbar et al., 2014). To date, there are no responsiveness studies on any of the pemphigus scoring systems (Table 5).

Table 5

Validation studies to date on commonly-used pemphigus measurement instruments

ABSIS = Autoimmune Bullous Skin Disorder Intensity Score; CI = confidence interval; Dsg = desmoglein; ICC = xxx; PDAI = Pemphigus Disease Area Index; PGA = physician global assessment; PVAS = Pemphigus Vulgaris Activity Score; SD = standard deviation;

1Rosenbach et al., 2009; 2Rahbar et al., 2014; 3Murrell et al., 2008; 4Pfutze et al., 2007; 5Chams-Davatchi et al., 2013; 6Boulard et al., 2016; 7Harman et al., 2001; 8Shimizu et al., 2014.

Conclusion

Despite the significant morbidity associated with pemphigus, there is a shortage of multicenter control studies to facilitate evidence-based practice due to the rarity of the disease and the inability to objectively compare therapeutic outcomes. PDAI and ABSIS are promising scoring systems that have proven to be valid and reliable; however, to undergo a complete validation, responsiveness must be assessed. MCID and reaffirmation of cutoff values would also provide significant information that may improve the utility of the instrument in clinical practice.

46 in total

Review 1. Outcome measures for autoimmune blistering diseases.

Authors: Cathy Y Zhao; Dédée F Murrell
Journal: J Dermatol Date: 2015-01 Impact factor: 4.005

2. IgA pemphigus.

Authors: Daisuke Tsuruta; Norito Ishii; Takahiro Hamada; Bungo Ohyama; Shunpei Fukuda; Hiroshi Koga; Kazuko Imamura; Hiromi Kobayashi; Tadashi Karashima; Takekuni Nakama; Teruki Dainichi; Takashi Hashimoto
Journal: Clin Dermatol Date: 2011 Jul-Aug Impact factor: 3.541

3. Incidence, mortality, and causes of death of patients with pemphigus in Taiwan: a nationwide population-based study.

Authors: Yu-Huei Huang; Chang-Fu Kuo; Yi-Hua Chen; Ya-Wen Yang
Journal: J Invest Dermatol Date: 2011-08-18 Impact factor: 8.551

4. Influence of pimecrolimus cream 1% on different morphological signs of eczema in infants with atopic dermatitis.

Authors: Kristine Breuer; Matthias Braeutigam; Alexander Kapp; Thomas Werfel
Journal: Dermatology Date: 2004 Impact factor: 5.366

Review 5. Introducing a novel Autoimmune Bullous Skin Disorder Intensity Score (ABSIS) in pemphigus.

Authors: Martin Pfütze; Andrea Niedermeier; Michael Hertl; Rüdiger Eming
Journal: Eur J Dermatol Date: 2007-02-27 Impact factor: 3.328

6. The COSMIN checklist for assessing the methodological quality of studies on measurement properties of health status measurement instruments: an international Delphi study.

Authors: Lidwine B Mokkink; Caroline B Terwee; Donald L Patrick; Jordi Alonso; Paul W Stratford; Dirk L Knol; Lex M Bouter; Henrica C W de Vet
Journal: Qual Life Res Date: 2010-02-19 Impact factor: 4.147

7. Assessment of bullous pemphigoid disease area index during treatment: a prospective study of 30 patients.

Authors: Célia Lévy-Sitbon; Coralie Barbe; Julie Plee; Anne-Laure Goeldel; Frank Antonicelli; Ziad Reguiaï; Damien Jolly; Florent Grange; Philippe Bernard
Journal: Dermatology Date: 2014-07-05 Impact factor: 5.366

8. Protocol of the COSMIN study: COnsensus-based Standards for the selection of health Measurement INstruments.

Authors: L B Mokkink; C B Terwee; D L Knol; P W Stratford; J Alonso; D L Patrick; L M Bouter; H C W de Vet
Journal: BMC Med Res Methodol Date: 2006-01-24 Impact factor: 4.615

9. The Epidermolysis Bullosa Disease Activity and Scarring Index (EBDASI): grading disease severity and assessing responsiveness to clinical change in epidermolysis bullosa.

Authors: S V Jain; A G Harris; J C Su; D Orchard; L J Warren; H McManus; D F Murrell
Journal: J Eur Acad Dermatol Venereol Date: 2016-10-03 Impact factor: 6.166

Review 10. Immunopathology and molecular diagnosis of autoimmune bullous diseases.

Authors: Sidonia Mihai; Cassian Sitaru
Journal: J Cell Mol Med Date: 2007 May-Jun Impact factor: 5.310

3 in total

1. Blistering disorders and their impact on women and their families in the International Journal of Women's Dermatology: Honoring the contributions of Professor Dedee Murrell.

Authors: Maryam Daneshpazhooh; Akaterina Patsatsi; Snejina Vassileva; Jenny E Murase
Journal: Int J Womens Dermatol Date: 2022-03-22

2. NEONATAL PEMPHIGUS IN AN INFANT BORN TO A MOTHER WITH PEMPHIGUS VULGARIS: A CASE REPORT.

Authors: Adriana Amaral Carvalho; Dinamar Amador Dos Santos Neto; Mirelle Augusta Dos Reis Carvalho; Sabrina Jeane Prates Eleutério; Alessandra Rejane Ericsson de Oliveira Xavier
Journal: Rev Paul Pediatr Date: 2018-07-26

3. Assessing the Correlation Between Disease Severity Indices and Quality of Life Measurement Tools in Pemphigus.

Authors: Rebecca L Krain; Carolyn J Kushner; Meera Tarazi; Rebecca G Gaffney; Andrea C Yeguez; Danielle E Zamalin; David R Pearson; Rui Feng; Aimee S Payne; Victoria P Werth
Journal: Front Immunol Date: 2019-11-06 Impact factor: 7.561

3 in total