Derek Jones1, Alastair Carruthers, Bhushan Hardas, Diane K Murphy, Jonathan M Sykes, Lisa Donofrio, Jean Carruthers, Lela Creutz, Ann Marx, Sara Dill. 1. *Division of Dermatology, University of California at Los Angeles, Los Angeles, California; †Department of Dermatology and Skin Science, University of British Columbia, Vancouver, British Columbia, Canada; ‡Allergan plc, Irvine, California; §UC Davis Medical Group, Sacramento, California; ‖Department of Dermatology, Yale University School of Medicine, New Haven, Connecticut; ¶Department of Ophthalmology and Visual Sciences, University of British Columbia, Vancouver, British Columbia, Canada; #Peloton Advantage, LLC, Parsippany, New Jersey.
Abstract
BACKGROUND: A validated scale is needed for objective and reproducible comparisons of horizontal neck lines before and after treatment in practice and clinical studies. OBJECTIVE: To describe the development and validation of the 5-point photonumeric Allergan Transverse Neck Lines Scale. METHODS: The Allergan Transverse Neck Lines Scale was developed to include an assessment guide, verbal descriptors, morphed images, and real subject images for each scale grade. The clinical significance of a 1-point score difference was evaluated in a review of multiple image pairs representing varying differences in severity. Interrater and intrarater reliability was evaluated in a live-subject rating validation study (N = 297) completed during 2 sessions occurring 3 weeks apart. RESULTS: A difference of ≥1 point on the scale was shown to reflect a clinically significant difference (mean [95% confidence interval] absolute score difference, 1.22 [1.09-1.35] for clinically different image pairs and 0.57 [0.42-0.72] for not clinically different pairs). Intrarater agreement between the 2 live-subject rating validation sessions was substantial (mean weighted kappa = 0.78). Interrater agreement was substantial during the second rating session (0.73, primary end point). CONCLUSION: The Allergan Transverse Neck Lines Scale is a validated and reliable scale for rating of severity of neck lines.
BACKGROUND: A validated scale is needed for objective and reproducible comparisons of horizontal neck lines before and after treatment in practice and clinical studies. OBJECTIVE: To describe the development and validation of the 5-point photonumeric Allergan Transverse Neck Lines Scale. METHODS: The Allergan Transverse Neck Lines Scale was developed to include an assessment guide, verbal descriptors, morphed images, and real subject images for each scale grade. The clinical significance of a 1-point score difference was evaluated in a review of multiple image pairs representing varying differences in severity. Interrater and intrarater reliability was evaluated in a live-subject rating validation study (N = 297) completed during 2 sessions occurring 3 weeks apart. RESULTS: A difference of ≥1 point on the scale was shown to reflect a clinically significant difference (mean [95% confidence interval] absolute score difference, 1.22 [1.09-1.35] for clinically different image pairs and 0.57 [0.42-0.72] for not clinically different pairs). Intrarater agreement between the 2 live-subject rating validation sessions was substantial (mean weighted kappa = 0.78). Interrater agreement was substantial during the second rating session (0.73, primary end point). CONCLUSION: The Allergan Transverse Neck Lines Scale is a validated and reliable scale for rating of severity of neck lines.
Horizontal or transverse neck lines can occur at any age.[ Neck lines may be associated with the deposition of submental and subplatysmal fat, and they are exacerbated by age-related decreases in elasticity and thickness of the skin of the neck, combined with gravity and the downward pull of the platysma muscle.[ Horizontal neck lines may be treated with botulinum toxin Type A in cases where the lines are clearly caused by the activity of the platysma muscles,[ although some groups report having little success with this approach.[ Use of injectable filler for the treatment of horizontal neck lines has been reported in one case study[ and in a prospective single-center study in combination with other therapies.[ Other approaches for reducing the appearance of neck lines include rhytidectomy,[ fractional laser treatment,[ fractional radiofrequency treatment,[ and microfocused ultrasound.[Patients are increasingly seeking treatment for nonfacial rejuvenation, including neck lines, and clinicians need a way to both educate and assess patients regarding treatments. Clinical studies of neck line treatments have assessed outcomes using general numeric wrinkle scales that did not include images and were not validated for the assessment of the neck.[ This report describes the development and validation of a new photonumeric scale designed to rate horizontal lines of the neck (Allergan Transverse Neck Lines Scale). The scale was created to meet FDA requirements for outcome assessments in clinical trials[ and to provide a practical tool that physicians can use for the assessment of patients. The objectives of this study were to determine the clinically significant difference in scale scores and to establish the interrater and intrarater reliability of the scale for rating severity of horizontal lines of the neck in live subjects.
Methods
Scale Development
Figure 1 summarizes key steps in the creation and validation of the Allergan Transverse Neck Lines Scale. A 9-member team comprising 5 external members (3 board-certified dermatologists, 1 board-certified oculoplastic surgeon, and 1 board-certified facial plastic surgeon) and 4 Allergan employees (2 dermatologists, 1 plastic surgeon, and 1 clinical scientist) developed the scale from a pool of subject images collected for scale development by Canfield Scientific, Inc (Canfield, Fairfield, NJ). A total of 396 men and women aged 18 years or older with Fitzpatrick skin Types I through VI and in good general health volunteered for image capture. All subjects provided informed photograph consent before image collection. Subjects were excluded if they had anything that would interfere with visual assessment of the area of interest. Canfield photographers obtained full 2-dimensional (2D) images of the face and neck using a 2D custom suite for face and neck imaging (Nikon D7100 Hi Res SLR). Images were cropped horizontally from 1 cm lateral to the neck/shoulder junction on the left and right sides and vertically from 1 cm above the bony menton down to 2 cm below the neck/shoulder junction to produce images of the area of interest.
Figure 1.
Scale development and validation processes.
Scale development and validation processes.Scale descriptors were created for each of the 5 grades of the scale (Table 1). Two members of the Allergan team met individually with each member of the scale development team for preliminary input on each scale grade. After preliminary scale grades were established, all 9 individuals involved in scale creation had a collaborative discussion about the scale grades and descriptors. The wording for each grade was then finalized by the Allergan team.
TABLE 1.
Descriptors for the Allergan Transverse Neck Lines Scale
Descriptors for the Allergan Transverse Neck Lines ScaleCanfield created an assessment guide with a line drawing of anatomic markers demarcating the anterior third of the neck between each sternocleidomastoid based on detailed instructions from the Allergan team regarding anatomic markers (Figure 2). Canfield revised the drawing multiple times based on careful review by the Allergan team.
Figure 2.
Assessment guide for the Allergan Transverse Neck Lines Scale.
Assessment guide for the Allergan Transverse Neck Lines Scale.A base image to demonstrate Grade 2 neck lines was selected, and this image was morphed to represent all 5 grades of the scale. A Canfield graphics technician morphed the anatomic area of interest in the base image to match the descriptors provided for Grades 0, 1, 3, and 4. Alignment of the morphed images with the scale descriptors was achieved via an interactive process with the Allergan team.A forced ranking review was performed to delineate the range of severity between Grades 2 and 3 and to confirm the selection of the best representative image to be used as Grade 2. The 5 external scale developers performed the web-based forced ranking exercise on preselected images that represented the upper and lower boundaries of Grades 2 and 3.To determine whether there was a clinically significant difference between grades of the scale, the 5 external scale developers were asked to perform an online clinical significance review of image pairs. Multiple image pairs were selected to represent varying degrees of differences in severity (ranging from no difference to a 4-point difference). During the session, the scale developers determined whether there was a clinically significant difference (Yes/No) between images for each pair. After the session, the images from all image pairs were randomly mixed in with other images to be used in the morphed image scale validation (described in the following paragraph) and assigned a score by scale developers so that score differences between the 2 images in each pair could be calculated.The morphed image scale was validated by having the 5 external scale developers use the scale to rate randomized images representing all scale grades during 2 web-based sessions occurring at least 3 days apart. A total of 299 images were rated (120 images in Session 1 and 179 images in Session 2). The scale had acceptable interrater and intrarater agreement (>0.5), so scale development proceeded using the morphed images.For both the clinical significance review and the morphed image scale validation review, Canfield provided scale developers uniform hardware to complete the reviews. Before the reviews, the external scale developers completed a web‐based PowerPoint training to familiarize themselves with the hardware, the review platform, and the purpose of the clinical significance and morphed image validation reviews. The scale developers were not allowed to discuss the reviews with one another, and each completed the reviews independently.After the morphed image scale was created, 2 subject photographs representing each grade of the scale were selected to represent diversity in sex and Fitzpatrick skin type per grade. The final scale contains the scale descriptors for each grade, an assessment guide, the morphed images, and the real subject images (Figure 3).
Figure 3.
The Allergan Transverse Neck Lines Scale assigns a grade from none (0) to extreme (4) that describes the presence and depth of transverse lines within the area of the neck demarcated in the diagram in the upper right corner.
The Allergan Transverse Neck Lines Scale assigns a grade from none (0) to extreme (4) that describes the presence and depth of transverse lines within the area of the neck demarcated in the diagram in the upper right corner.
Scale Validation
The interrater and intrarater reliability of the final scale was evaluated in a live-subject rating validation study. Eight physician raters experienced in using aesthetic photonumeric scales who were not involved in scale development participated in two 2-day live validation sessions occurring 3 weeks apart. Before the first live evaluation session, all physician raters were trained on the use of the scale in an interactive group training session using 4 example subjects. Raters were instructed to rate only horizontal neck lines, to disregard vertical lines (e.g., platysmal bands on neck), to select a grade based on the most severe line present (with 1 line being sufficient to determine grade), and to assess effaceable versus noneffaceable lines visually and not through attempts to manually efface lines (Figure 3).All subjects who qualified for the initial image capture events were invited to attend the live validation sessions. Subjects were instructed to arrive clean shaven, remove makeup and jewelry, wear dark pants or jeans and a provided black T-shirt, not drink alcohol excessively before the sessions, try not to alter their usual routine (e.g., their facial care routine and normal sleep or hydration patterns) between sessions, and not have tanning sessions or extensive sun exposure between sessions. Upon arrival at the study center for the first live validation session, subjects signed informed consent and were assessed for eligibility, age, sex, race (as reported by the subject), and Fitzpatrick skin type (determined by the investigator). Subjects were excluded if they had their photographs included in the scale; anything that would interfere with the visual assessment of the area of interest; any treatment with toxin/fillers, dental procedures, or surgery that would alter the area of interest within 2 weeks of the first validation session or plans to have one of these procedures between the 2 sessions; or diagnosis of pregnancy. Two-dimensional images of each subject were collected using a 2D custom studio suite at the first live validation session. The first 5 subjects rated during the first validation session were considered run-in training subjects and were excluded from the analysis.During the first and second live scale validation sessions, each physician rater evaluated all subjects on all scales (7 additional scales for other anatomic features were evaluated at the same sessions and are reported separately[). Raters had separate evaluation stations with an examination lamp, table, a stool for subject seating, supplies, and the photonumeric scale mounted and displayed for use in subject evaluation. Subjects presented themselves to each rater individually and proceeded from one rating station to the next in the same order until evaluated by all 8 raters. Raters were instructed to not discuss ratings with subjects or other raters. Raters took at least a 10-minute break every hour and at least a 30-minute lunch break to avoid rater fatigue.
Statistics
To determine the utility of the scale grades for detecting clinically meaningful differences in horizontal neck lines, absolute score differences for the image pairs deemed “clinically different” or “not clinically different” during scale development were summarized (mean, standard deviation, range, 95% confidence interval [CI]). For the live-subject scale validation study, intrarater reliability was compared between Round 1 and Round 2 scores by calculating weighted kappa scores using Fleiss-Cohen weights.[ Kappa scores within the range of 0.0 to 0.20 indicate slight agreement, 0.21 to 0.40 indicate fair agreement, 0.41 to 0.60 indicate moderate agreement, 0.61 to 0.80 indicate substantial agreement, and 0.81 to 1.00 indicate almost perfect agreement.[ Interrater agreement was measured by determining the intraclass correlation coefficient (ICC [2,1]) and 95% CIs calculated using the formula described by Shrout and Fleiss.[ The a priori primary end point for the interrater agreement analysis was ICC (2,1) for the second rating session. SAS version 9.3 (Cary, NC) was used for all statistical analyses.
Sample Size Considerations
The sample size for the live-subject validation sessions was calculated using the method described by Bonett.[ With up to 10 raters and an ICC of 0.5, a total of 66 subjects were needed in order to have a 95% CI with a width of 0.2 for interrater reliability. Considering the potential loss of subjects between the 2 rounds, at least 80 subjects were to be enrolled for the scale. Because 297 subjects were eligible for the scale validation analysis, the number of subjects evaluated using this scale was substantially larger than the preplanned sample size of 80, and the overall number of assessments for some grades of this scale were larger than those for the other grades. To minimize the imbalance in the number of subjects across scale grades and to meet the sample size requirement, the mean score across the 8 raters for each subject was used to assign an overall grade for each subject, and a subset of 80 subjects with minimal imbalance across the grades (∼16 subjects per each of the 5 scale grades) was randomly selected from the eligible subjects using a prespecified procedure and a preselected randomization seed. This random selection of the subset was performed 20 times. Interrater and intrarater agreements calculated for each of the 20 subsets were combined using SAS procedure PROC MIANALYZE to obtain the overall interrater and intrarater agreements.
Results
Clinical Significance Determination by Scale Developers
The mean (95% CI) absolute difference in scores was 1.22 (1.09–1.35) for image pairs identified as clinically different and 0.57 (0.42–0.72) for image pairs identified as not clinically different (Table 2). The 95% CIs for clinically different pairs did not overlap with the 95% CIs for pairs deemed not clinically different, confirming that a 1-point difference in scores is clinically significant.
TABLE 2.
Differences in Scores for Image Pairs Deemed Clinically Different or Not Clinically Different Using the Allergan Transverse Neck Lines Scale
Differences in Scores for Image Pairs Deemed Clinically Different or Not Clinically Different Using the Allergan Transverse Neck Lines Scale
Live-Subject Scale Validation
Of the 297 subjects eligible for Allergan Transverse Neck Lines Scale validation analysis, 288 subjects were selected in at least 1 of the 20 random subsets. Demographic characteristics of subjects in the final scale validation set are shown in Table 3. Most subjects were female (67%), Caucasian (79%), and had Fitzpatrick skin Type III (27%) or IV (33%). Median age was 48 years, and a broad span of ages was represented (18–83 years).
TABLE 3.
Demographics of Subjects in the Live Scale Validation Study for the Allergan Transverse Neck Lines Scale
Demographics of Subjects in the Live Scale Validation Study for the Allergan Transverse Neck Lines ScaleIntrarater agreement between the 2 live-subject rating validation sessions was substantial (mean weighted kappa = 0.78) (Table 4). Interrater agreement for the Allergan Transverse Neck Lines Scale was substantial in Session 1 (0.72) and Session 2 (0.73) (Table 4).
TABLE 4.
Physician Intrarater and Interrater Agreement on the Allergan Transverse Neck Lines Scale (Validation Testing With Live Subjects)
Physician Intrarater and Interrater Agreement on the Allergan Transverse Neck Lines Scale (Validation Testing With Live Subjects)
Discussion
This study demonstrated substantial interrater and intrarater agreement for the Allergan Transverse Neck Lines Scale, indicating that the scale is reliable for multiple assessments of the same subject and across different raters. A 1-point difference in scale ratings was shown to reflect clinically significant differences, indicating that the scale has sufficient sensitivity for detecting clinically significant changes in horizontal lines of the neck.The scale requires that effaceable versus noneffaceable lines be assessed visually, not manually; most physicians with experience in the treatment of neck lines can generally tell whether the line is effaceable with visual inspection alone. The scale uses morphed images to represent each grade to focus the rater's attention on the change from one grade to the next, with all other features remaining constant across scale grades. Real-world images representing a diverse range of skin types across sexes and races are an important addition to the scale because morphed images may not always translate to the broad array of appearances or physical changes observed in the clinic. Representation of both sexes and multiple ethnic groups in rating scales is important, as growing numbers of men and members of diverse ethnic groups are seeking aesthetic facial treatment.[Patients are increasingly seeking aesthetic treatment for areas other than the face, including the neck. In the experience of the authors, transverse neck lines are often observed in younger patients, even those without extensive photodamage. In some middle-aged patients, the neck is much more severely damaged than the face, making neck lines a chief concern. Restoration of a more normal neck appearance can substantially improve self-esteem and confidence. Clinicians need a way to both educate and assess patients for neck line treatments, and the Allergan Transverse Neck Lines Scale provides standardized ratings that may be uniformly applied in day-to-day clinical practice and potentially in clinical trials, due to its validation in live subjects and use of both morphed and unaltered images.The Allergan Transverse Neck Lines Scale is not used to rate vertical neck lines. In the experience of the authors, neck treatments such as botulinum toxin Type A are especially useful for improving the appearance of the neck and jaw line rather than just reducing lines; the loss of downward pull and the softening of vertical lines are also important considerations with neck treatments. More generic wrinkle scales may be helpful for assessing vertical neck lines.[
Study Limitations
The scale developers solely determined the clinical significance of scale scores; although a 1-point change on the scale was considered meaningful to the scale developers, it may or may not be meaningful to subjects. Hence, this scale is not intended for patient self-assessment of meaningful improvement. Use of the FACE-Q appearance appraisal scale, a validated patient satisfaction instrument with a subscale for satisfaction with the neck, may be helpful for capturing the perspective of the patient on the appearance before and after treatment.[ Finally, the verbal descriptors for each grade on the Allergan Transverse Neck Lines Scale are subjective. However, the descriptors were developed and refined during extensive collaboration among 9 clinical experts to minimize inherent subjectivity.
Conclusions
Because increasing numbers of patients are seeking aesthetic treatment of the neck, there is a need for a validated scale for the assessment of neck lines. The Allergan Transverse Neck Lines Scale includes user-friendly diagrams, detailed verbal descriptions, and morphed and real subject images representative of both sexes and diverse skin types. The scale demonstrated substantial intrarater and interrater agreement among physicians, and a 1-point score difference was shown to reflect clinically significant differences in horizontal neck lines. The scale meets FDA criteria for validated clinical outcome measures in clinical trials and provides standardized ratings that can be uniformly applied by dermatologists and plastic surgeons who treat patients seeking treatment of horizontal lines of the neck.
Authors: Jonathan M Sykes; Alastair Carruthers; Bhushan Hardas; Diane K Murphy; Derek Jones; Jean Carruthers; Lisa Donofrio; Lela Creutz; Ann Marx; Sara Dill Journal: Dermatol Surg Date: 2016-10 Impact factor: 3.398
Authors: Derek Jones; Lisa Donofrio; Bhushan Hardas; Diane K Murphy; Jean Carruthers; Alastair Carruthers; Jonathan M Sykes; Lela Creutz; Ann Marx; Sara Dill Journal: Dermatol Surg Date: 2016-10 Impact factor: 3.398
Authors: Alastair Carruthers; Lisa Donofrio; Bhushan Hardas; Diane K Murphy; Jean Carruthers; Jonathan M Sykes; Derek Jones; Lela Creutz; Ann Marx; Sara Dill Journal: Dermatol Surg Date: 2016-10 Impact factor: 3.398
Authors: Jean Carruthers; Derek Jones; Bhushan Hardas; Diane K Murphy; Lisa Donofrio; Jonathan M Sykes; Alastair Carruthers; Lela Creutz; Ann Marx; Sara Dill Journal: Dermatol Surg Date: 2016-10 Impact factor: 3.398
Authors: Jonathan M Sykes; Alastair Carruthers; Bhushan Hardas; Diane K Murphy; Derek Jones; Jean Carruthers; Lisa Donofrio; Lela Creutz; Ann Marx; Sara Dill Journal: Dermatol Surg Date: 2016-10 Impact factor: 3.398
Authors: Jean Carruthers; Lisa Donofrio; Bhushan Hardas; Diane K Murphy; Derek Jones; Alastair Carruthers; Jonathan M Sykes; Lela Creutz; Ann Marx; Sara Dill Journal: Dermatol Surg Date: 2016-10 Impact factor: 3.398
Authors: Lisa Donofrio; Alastair Carruthers; Bhushan Hardas; Diane K Murphy; Jean Carruthers; Derek Jones; Jonathan M Sykes; Lela Creutz; Ann Marx; Sara Dill Journal: Dermatol Surg Date: 2016-10 Impact factor: 3.398
Authors: Derek Jones; Lisa Donofrio; Bhushan Hardas; Diane K Murphy; Jean Carruthers; Alastair Carruthers; Jonathan M Sykes; Lela Creutz; Ann Marx; Sara Dill Journal: Dermatol Surg Date: 2016-10 Impact factor: 3.398
Authors: Lisa Donofrio; Jean Carruthers; Bhushan Hardas; Diane K Murphy; Derek Jones; Jonathan M Sykes; Alastair Carruthers; Lela Creutz; Ann Marx; Sara Dill Journal: Dermatol Surg Date: 2016-10 Impact factor: 3.398
Authors: Alastair Carruthers; Lisa Donofrio; Bhushan Hardas; Diane K Murphy; Jean Carruthers; Jonathan M Sykes; Derek Jones; Lela Creutz; Ann Marx; Sara Dill Journal: Dermatol Surg Date: 2016-10 Impact factor: 3.398
Authors: Jean Carruthers; Derek Jones; Bhushan Hardas; Diane K Murphy; Lisa Donofrio; Jonathan M Sykes; Alastair Carruthers; Lela Creutz; Ann Marx; Sara Dill Journal: Dermatol Surg Date: 2016-10 Impact factor: 3.398