Derek Jones1, Lisa Donofrio, Bhushan Hardas, Diane K Murphy, Jean Carruthers, Alastair Carruthers, Jonathan M Sykes, Lela Creutz, Ann Marx, Sara Dill. 1. *Division of Dermatology, University of California at Los Angeles, Los Angeles, California; †Department of Dermatology, Yale University School of Medicine, New Haven, Connecticut; ‡Allergan plc, Irvine, California; Departments of §Ophthalmology and Visual Sciences, and ‖Dermatology and Skin Science, University of British Columbia, Vancouver, British Columbia, Canada; ¶UC Davis Medical Group, Sacramento, California; #Peloton Advantage, LLC, Parsippany, New Jersey.
Abstract
BACKGROUND: A validated scale is needed for objective and reproducible comparisons of hand appearance before and after treatment in practice and clinical studies. OBJECTIVE: To describe the development and validation of the 5-point photonumeric Allergan Hand Volume Deficit Scale. METHODS: The scale was developed to include an assessment guide, verbal descriptors, morphed images, and real-subject images for each grade. The clinical significance of a 1-point score difference was evaluated in a review of image pairs representing varying differences in severity. Interrater and intrarater reliability was evaluated in a live-subject validation study (N = 296) completed during 2 sessions occurring 3 weeks apart. RESULTS: A score difference of ≥1 point was shown to reflect a clinically significant difference (mean [95% confidence interval] absolute score difference, 1.12 [0.99-1.26] for clinically different image pairs and 0.45 [0.33-0.57] for not clinically different pairs). Intrarater agreement between the 2 validation sessions was almost perfect (mean weighted kappa = 0.83). Interrater agreement was almost perfect during the second session (0.82, primary end point). CONCLUSION: The Allergan Hand Volume Deficit Scale is a validated and reliable scale for physician rating of hand volume deficit.
BACKGROUND: A validated scale is needed for objective and reproducible comparisons of hand appearance before and after treatment in practice and clinical studies. OBJECTIVE: To describe the development and validation of the 5-point photonumeric Allergan Hand Volume Deficit Scale. METHODS: The scale was developed to include an assessment guide, verbal descriptors, morphed images, and real-subject images for each grade. The clinical significance of a 1-point score difference was evaluated in a review of image pairs representing varying differences in severity. Interrater and intrarater reliability was evaluated in a live-subject validation study (N = 296) completed during 2 sessions occurring 3 weeks apart. RESULTS: A score difference of ≥1 point was shown to reflect a clinically significant difference (mean [95% confidence interval] absolute score difference, 1.12 [0.99-1.26] for clinically different image pairs and 0.45 [0.33-0.57] for not clinically different pairs). Intrarater agreement between the 2 validation sessions was almost perfect (mean weighted kappa = 0.83). Interrater agreement was almost perfect during the second session (0.82, primary end point). CONCLUSION: The Allergan Hand Volume Deficit Scale is a validated and reliable scale for physician rating of hand volume deficit.
With aging, atrophy of the subdermal fat and dermis of the hands can lead to the appearance of prominent bones, tendons, and veins on the dorsum of the hand.[ In addition, hands are exposed to high levels of UV solar radiation, which can cause irregular surface pigmentation and thinning of the dermis because of the gradual loss and disorganization of supporting collagen, elastin fibers, and connective tissue.[ Other environmental factors (e.g., cigarette smoking)[ and genetics[ may accelerate skin aging. As more patients undergo facial rejuvenation treatments, discrepancies in the appearance of a youthful face and aged hands may become bothersome and reveal a patient's true age.[ Accordingly, greater numbers of aesthetically aware patients are seeking hand rejuvenation treatments.Several treatments are used to restore lost volume and minimize the appearance of veins and tendons in the hand, including injectable hyaluronic acid,[ poly-l-lactic acid,[ and calcium hydroxylapatite[; autologous fat transfer[; vein treatment (sclerotherapy)[; chemical peels[; and laser and light therapies.[ One photonumeric scale has been validated for photographic[ and live-subject[ assessments of the severity of hand aging. Based on photographic assessment, hyaluronic acid was proven effective in the treatment of hand rejuvenation[; live-subject assessments demonstrated sensitivity of the scale for detecting clinically meaningful and aesthetically pleasing changes in hand appearance after treatment with a calcium hydroxylapatite–based dermal filler.[ However, although that scale includes morphed images to represent each scale grade, it does not include representative real-world images or a range of skin types.[This report describes the development and validation of a new photonumeric scale designed to rate the severity of volume deficit in the hands (Allergan Hand Volume Deficit Scale) using a combination of real- and morphed-subject images over a range of Fitzpatrick skin types. The objectives of this study were to determine the clinically significant difference in scale scores and to establish the interrater and intrarater reliability of this scale for rating hand volume deficits in live subjects.
Methods
Scale Development
Figure 1 summarizes key steps in the creation and validation of the Allergan Hand Volume Deficit Scale. A 9-member team comprising 5 external members (3 board-certified dermatologists, 1 board-certified facial plastic surgeon, and 1 board-certified oculoplastic surgeon) and 4 Allergan employees (2 dermatologists, 1 plastic surgeon, and 1 clinical scientist) developed the scale from a pool of subject images captured by Canfield Scientific, Inc. (Canfield, Fairfield, NJ). A total of 396 untreated men and women aged 18 years or older with Fitzpatrick skin Types I through VI and in good general health volunteered for image capture. All subjects provided informed photo consent before image collection. Subjects were excluded if they had anything that would interfere with visual assessment of the area of interest. Two-dimensional (2D) images of right hands were obtained using a 2D custom camera system for hand imaging (Hand Device and Nikon D90 SLR). Images of the right hand were cropped from the fingertips to 2 cm proximal to the wrist to ensure that the dorsum of the hand was the primary focus and fully visible.
Figure 1.
Development and validation processes for the Allergan Hand Volume Deficit Scale.
Development and validation processes for the Allergan Hand Volume Deficit Scale.Scale descriptors were created for each of the 5 grades of the scale (Table 1). Two members of the Allergan team met with each member of the scale development team for preliminary input on each scale grade. After preliminary scale grades were established, all 9 individuals involved in scale creation had a collaborative discussion about the scale grades and descriptors. The wording for each grade was then finalized by the Allergan team.
TABLE 1.
Descriptors for the Allergan Hand Volume Deficit Scale
Descriptors for the Allergan Hand Volume Deficit ScaleAn assessment guide with a line drawing of anatomic markers demarcating the dorsal hand area from the metacarpophalangeal joints to 1 cm distal to the wrist was created by Canfield based on detailed instructions from the Allergan team regarding anatomic markers (Figure 2). The drawing was then revised by Canfield multiple times after careful review by the Allergan team.
Figure 2.
Assessment guide for the Allergan Hand Volume Deficit Scale.
Assessment guide for the Allergan Hand Volume Deficit Scale.A base image to demonstrate Grade 2 hand volume deficit was selected, and this image was morphed to represent all 5 grades of the scale. A Canfield graphics technician morphed the hand area of interest in the base image to match the descriptors provided for Grades 0, 1, 3, and 4. Alignment of the morphed images with the scale descriptors was achieved through an interactive process with the Allergan team.A forced ranking review was performed to delineate the range of severity between Grades 2 and 3 and to confirm the selection of the best representative image to be used as Grade 2 on the scale. The 5 external scale developers performed a web-based forced ranking exercise on preselected images that represented the upper and lower boundaries of Grades 2 and 3.To determine whether there was a clinically significant difference between grades of the scale, the 5 external scale developers were asked to perform an on-line clinical significance review. Multiple image pairs were selected to represent varying degrees of differences in severity (ranging from no difference to a 4-point difference). During the session, the scale developers determined whether there was a clinically significant difference (Yes/No) between images for each pair. After the session, the individual images from all image pairs were randomly mixed in with other images to be used in the morphed image scale validation (described in the following paragraph) and assigned a score by the external scale developers so that score differences between each image in each pair could be calculated.The morphed image scale was validated by having the 5 external scale developers use the scale to rate randomized images representing all grades of the scale during 2 web-based sessions occurring at least 3 days apart. A total of 293 images were rated (120 images in Session 1 and 173 images in Session 2). The scale had acceptable interrater and intrarater agreement (>0.5), so scale development proceeded using the morphed images.For both the clinical significance review and the morphed image scale validation review, scale developers were provided uniform hardware by Canfield to complete the reviews. Before the reviews, the external scale developers completed web‐based PowerPoint training to familiarize themselves with the hardware, the review platform, and the purpose of the clinical significance and morphed image validation reviews. The external scale developers were not allowed to discuss the review with one another, and each completed the image review independently.After the morphed scale was created, 2 subjects' photographs representing each grade of the scale were selected to represent diversity in sex and Fitzpatrick skin type per grade. The final scale includes scale descriptors for each grade, an assessment guide, the morphed images, and the real-subject images (Figure 3).
Figure 3.
The Allergan Hand Volume Deficit Scale assigns a grade from none (0) to extreme (4) describing the degree of protrusion of tendons and veins in the dorsal surface of the hand.
The Allergan Hand Volume Deficit Scale assigns a grade from none (0) to extreme (4) describing the degree of protrusion of tendons and veins in the dorsal surface of the hand.
Scale Validation
The interrater and intrarater reliability of the final scale was evaluated in a live-subject rating validation study. Eight physician raters experienced in using aesthetic photonumeric scales, who were not involved in scale development, participated in two 2-day live validation sessions occurring 3 weeks apart. Before the first live evaluation session, all physician raters were trained on the use of the scale in an interactive group training session using 4 example subjects. Only right hands were rated to align with the hand shown in the scale. Right hands were used because more people are right handed, and the appearance of the dominant hand is usually worse than the nondominant hand. Raters were instructed to rate hands primarily based on tendons rather than veins. The only grade determined by veins is the difference between Grade 0 (no visible tendons or veins) and Grade 1 (no protruding tendons; veins are visible and may be mildly protruding). Raters were also instructed that hands with any tendon showing (excluding metacarpophalangeal joints) should be rated at least Grade 2.All subjects who qualified for the initial image capture events were invited to attend the live validation sessions. Because the subjects were participating in validation sessions for facial scales on the same day, they were instructed to arrive at the study center clean shaven, to remove make-up and jewelry, to wear dark pants or jeans and a provided black T-shirt, to not drink alcohol excessively before the sessions, to try not to alter their usual routine (e.g., their facial care routine and normal sleep or hydration patterns) between sessions, and to not have tanning sessions or extensive sun exposure between sessions. On arrival at the study center for the first live validation session, subjects signed informed consent and were then assessed for eligibility, age, sex, race (as reported by the subject), and Fitzpatrick skin type (determined by the investigator). Subjects were excluded if they had the following: their photographs included in the scale, anything that would interfere with visual assessment of the hands; any treatment with toxin/fillers, or surgery that would alter hand appearance within 2 weeks of the first evaluation session, or plans to have one of these procedures between the 2 evaluation sessions; or diagnosis of pregnancy. 2D images of each subject's right hand were collected at the first live validation session using a hand device and Nikon D90 SLR camera. The first 5 subjects rated during the first validation session were considered run-in training subjects and were excluded from the analysis.During the first and second live scale validation sessions, each physician rater evaluated all subjects on all scales (7 additional scales for other anatomic features were evaluated at the same sessions and are reported separately[). Raters had separate evaluation stations with an examination lamp, table, and a stool for subject seating, supplies, and the photonumeric scale mounted and displayed for use in subject evaluation. Subjects presented themselves to each rater individually and proceeded from 1 rating station to the next in the same order until evaluated by all 8 raters. Raters were instructed to not discuss ratings with subjects or other raters. The raters took at least a 10-minute break every hour and at least a 30-minute lunch break to avoid rater fatigue.
Statistics
To determine the utility of the scale grades for detecting clinically significant differences in hand volume deficit, absolute score differences for the image pairs deemed “clinically different” or “not clinically different” during scale development were summarized (mean, SD, range, 95% confidence interval [CI]). For the live scale validation study, intrarater reliability was compared between rounds 1 and 2's scores by calculating weighted kappa scores using Fleiss–Cohen weights.[ Kappa scores within the range of 0.0 to 0.20 indicate slight agreement, 0.21 to 0.40 indicate fair agreement, 0.41 to 0.60 indicate moderate agreement, 0.61 to 0.80 indicate substantial agreement, and 0.81 to 1.00 indicate almost perfect agreement.[ Interrater agreement was measured by determining the intraclass correlation coefficient (ICC[2,1]) and 95% CIs calculated using the formula described by Shrout and Fleiss.[ The a priori primary end point for the interrater agreement analysis was ICC(2,1) for the second rating session. SAS version 9.3 (Cary, NC) was used for all statistical analyses.
Sample Size Considerations
The sample size for the live-subject validation sessions was calculated using the method described by Bonett.[ With up to 10 raters and an ICC of 0.5, a total of 66 subjects were needed for the scale to have a 95% Cl with a width of 0.2 for interrater reliability. Considering potential loss of subjects between the 2 rounds, at least 80 subjects were to be enrolled. Because 296 subjects were eligible for the hand scale validation analysis, the number of subjects evaluated using the scale was substantially larger than the preplanned sample size of 80, and the overall number of assessments for some grades of this scale were larger than those for the other grades. To minimize imbalance in the number of subjects across scale grades and to meet the sample size requirement, the mean score across the 8 raters for each subject was used to assign an overall grade for each subject. A subset of 81 subjects with minimal imbalance across the grades (∼16 subjects per scale for each of the 5 scale grades) was randomly selected from the eligible subjects using a prespecified procedure. This random selection of the subset was performed 20 times. Interrater and intrarater agreements calculated for each of the 20 subsets were combined using SAS procedure PROC MIANALYZE to obtain the overall interrater and intrarater agreements.
Results
Clinical Significance Determination by Scale Developers
The mean (95% CI) absolute difference in scale scores was 1.12 (0.99–1.26) for clinically different image pairs and 0.45 (0.33–0.57) for pairs deemed not clinically different (Table 2). The 95% CIs for the pairs deemed to be clinically different did not overlap with the CIs for the pairs deemed not clinically different, confirming that a 1-point difference in scores is clinically significant.
TABLE 2.
Difference in Scores for Image Pairs Deemed Clinically Different or Not Clinically Different Using the Allergan Hand Volume Deficit Scale
Difference in Scores for Image Pairs Deemed Clinically Different or Not Clinically Different Using the Allergan Hand Volume Deficit Scale
Live-Subject Scale Validation
Of the 296 subjects eligible for scale validation analysis, 288 subjects were selected in at least 1 of the 20 random subsets for analysis of intrarater and interrater agreement. Demographic characteristics of subjects in the final scale validation set are shown in Table 3. Most subjects were women (67%), Caucasian (79%), and had Fitzpatrick skin Type III (26%) or IV (33%). Median age was 48 years, and a broad span of age was represented (range: 18–83 years).
TABLE 3.
Demographics of Subjects in the Live Scale Validation Study
Demographics of Subjects in the Live Scale Validation StudyIntrarater agreement between the 2 live-subject rating sessions was almost perfect (mean weighted kappa = 0.83) (Table 4). Interrater agreement was substantial (ICC = 0.78) during the first rating session and almost perfect (ICC = 0.82) during the second rating session (primary end point) (Table 4).
TABLE 4.
Physician Intrarater and Interrater Agreement on the Allergan Hand Volume Deficit Scale (Validation Testing With Live Subjects)
Physician Intrarater and Interrater Agreement on the Allergan Hand Volume Deficit Scale (Validation Testing With Live Subjects)
Discussion
This study demonstrated substantial to almost perfect interrater and intrarater agreement for the Allergan Hand Volume Deficit Scale, suggesting that multiple assessments for the same subject and across different raters are reliable. A 1-point difference in ratings was shown to reflect clinically significant differences, indicating that the scale has sufficient sensitivity for detecting clinically significant changes in volume deficit of the hands.This scale assesses volume deficit on the dorsum of the hands, an area for which patients seek aesthetic treatment. The scale includes verbal descriptors for each grade and a diagram delineating the hand area of interest. These factors likely contributed to the high interrater reliability and may translate to ease of use by clinicians. The use of morphed images to represent each grade helps to focus the rater's attention on the change from 1 grade to the next, as all other features remain constant across scale grades. The inclusion of real-world images representing a diverse range of skin types across sexes and races is also important, as morphed images may not always translate clinically to the broad array of physical appearances or physical changes observed in the aging hand. The scale ratings do not take into consideration the appearance of skin discoloration because the Allergan Hand Volume Deficit Scale was designed to rate only the severity of hand volume loss, which may be treated with filler treatments. When using the scale, each hand should be rated separately, as volume loss in the left and right hands may differ in individual patients because of increased use of the dominant hand.The Merz Hand Grading Scale (MHGS) has been validated for photographic and live assessment of hands. In a randomized blinded study, 3 physician raters used the MHGS to rate the hands of 84 live subjects.[ The study demonstrated overall intrarater reliability (weighted kappa) of 0.74 and interrater reliability (kappa) ranging from 0.59 to 0.71. In this study, the intrarater agreement was 0.83 (weighted kappa) and the interrater agreement was 0.82 (ICC) for the Allergan Hand Volume Deficit Scale.In the authors' experience, some patients present with hand aging as an isolated concern, but it is much more common for patients to have had therapeutic improvement in facial appearance and present with concerns about the incongruities between their aged hands and their less aged face. Their response to treatment is generally positive if they have been appropriately informed regarding the potential degree of improvement and possible side effects. The use of a validated scale for formalized and reproducible consultation procedures can help to prepare patients for potential treatment outcomes and may thus improve patient satisfaction.[
Study Limitations
The verbal descriptors for each grade of the Allergan Hand Volume Deficit Scale are subjective; however, the descriptors were developed and refined by extensive feedback between 9 experts, minimizing inherent subjectivity. The clinical significance of scale scores was determined solely by the scale developers; although a 1-point change on the scale was considered significant to the scale developers, it may or may not be meaningful to patients. A less than 1-point change may be meaningful to patients desiring a subtle change, whereas other patients may perceive only dramatic changes as meaningful; hence, this scale is not recommended for patient's self-assessment of meaningful improvement. The Michigan Hand Outcomes Questionnaire has an aesthetics subscale for the assessment of patient satisfaction with hand appearance and may be helpful for assessing patient satisfaction before and after hand treatment.[
Conclusions
The Allergan Hand Volume Deficit Scale demonstrated almost perfect interrater and intrarater agreement among physicians, and 1-point score differences were shown to reflect clinically significant differences in hand volume deficit. This volume deficit scale includes user-friendly diagrams, detailed verbal descriptions, and morphed- and real-subject images representative across sexes and skin types. The scale's standardized ratings may be uniformly applied in day-to-day clinical practice and potentially in clinical trials because of its validation in live subjects and use of both morphed and unaltered images.
Authors: Anatoli Freiman; Garrett Bird; Andrei I Metelitsa; Benjamin Barankin; Gilles J Lauzon Journal: J Cutan Med Surg Date: 2004 Nov-Dec Impact factor: 2.092
Authors: Jonathan M Sykes; Alastair Carruthers; Bhushan Hardas; Diane K Murphy; Derek Jones; Jean Carruthers; Lisa Donofrio; Lela Creutz; Ann Marx; Sara Dill Journal: Dermatol Surg Date: 2016-10 Impact factor: 3.398
Authors: Alastair Carruthers; Lisa Donofrio; Bhushan Hardas; Diane K Murphy; Jean Carruthers; Jonathan M Sykes; Derek Jones; Lela Creutz; Ann Marx; Sara Dill Journal: Dermatol Surg Date: 2016-10 Impact factor: 3.398
Authors: Derek Jones; Alastair Carruthers; Bhushan Hardas; Diane K Murphy; Jonathan M Sykes; Lisa Donofrio; Jean Carruthers; Lela Creutz; Ann Marx; Sara Dill Journal: Dermatol Surg Date: 2016-10 Impact factor: 3.398
Authors: Jean Carruthers; Derek Jones; Bhushan Hardas; Diane K Murphy; Lisa Donofrio; Jonathan M Sykes; Alastair Carruthers; Lela Creutz; Ann Marx; Sara Dill Journal: Dermatol Surg Date: 2016-10 Impact factor: 3.398
Authors: Shraddha Mehta; Rowena F Bastero-Caballero; Yijun Sun; Ray Zhu; Diane K Murphy; Bhushan Hardas; Gary Koch Journal: Stat Med Date: 2018-04-29 Impact factor: 2.373
Authors: Jonathan M Sykes; Alastair Carruthers; Bhushan Hardas; Diane K Murphy; Derek Jones; Jean Carruthers; Lisa Donofrio; Lela Creutz; Ann Marx; Sara Dill Journal: Dermatol Surg Date: 2016-10 Impact factor: 3.398
Authors: Jean Carruthers; Lisa Donofrio; Bhushan Hardas; Diane K Murphy; Derek Jones; Alastair Carruthers; Jonathan M Sykes; Lela Creutz; Ann Marx; Sara Dill Journal: Dermatol Surg Date: 2016-10 Impact factor: 3.398
Authors: Lisa Donofrio; Alastair Carruthers; Bhushan Hardas; Diane K Murphy; Jean Carruthers; Derek Jones; Jonathan M Sykes; Lela Creutz; Ann Marx; Sara Dill Journal: Dermatol Surg Date: 2016-10 Impact factor: 3.398
Authors: Lisa Donofrio; Jean Carruthers; Bhushan Hardas; Diane K Murphy; Derek Jones; Jonathan M Sykes; Alastair Carruthers; Lela Creutz; Ann Marx; Sara Dill Journal: Dermatol Surg Date: 2016-10 Impact factor: 3.398
Authors: Alastair Carruthers; Lisa Donofrio; Bhushan Hardas; Diane K Murphy; Jean Carruthers; Jonathan M Sykes; Derek Jones; Lela Creutz; Ann Marx; Sara Dill Journal: Dermatol Surg Date: 2016-10 Impact factor: 3.398
Authors: Derek Jones; Alastair Carruthers; Bhushan Hardas; Diane K Murphy; Jonathan M Sykes; Lisa Donofrio; Jean Carruthers; Lela Creutz; Ann Marx; Sara Dill Journal: Dermatol Surg Date: 2016-10 Impact factor: 3.398
Authors: Jean Carruthers; Derek Jones; Bhushan Hardas; Diane K Murphy; Lisa Donofrio; Jonathan M Sykes; Alastair Carruthers; Lela Creutz; Ann Marx; Sara Dill Journal: Dermatol Surg Date: 2016-10 Impact factor: 3.398