Charalambos Charalambous1, Agoritsa Koulori2, Aristidis Vasilopoulos3, Zoe Roupa4. 1. Acute Trauma and Surgical Unit. North West of Anglia Foundation Trust, Huntingdon, United Kingdom. 2. Health Centre of Salamis, Salamis, Greece. 3. Helenic Center for Diseases Control and Prevention, Athens, Greece. 4. Nursing Technological Institute of Central Greece, Athens, Greece.
Abstract
INTRODUCTION: Prevention is the ideal strategy to tackle the problem of pressure ulcers. Pressure ulcer risk assessment scales are one of the most pivotal measures applied to tackle the problem, much criticisms has been developed regarding the validity and reliability of these scales. OBJECTIVE: To investigate the validity and reliability of the Waterlow pressure ulcer risk assessment scale. METHOD: The methodology used is a narrative literature review, the bibliography was reviewed through Cinahl, Pubmed, EBSCO, Medline and Google scholar, 26 scientific articles where identified. The articles where chosen due to their direct correlation with the objective under study and their scientific relevance. RESULTS: The construct and face validity of the Waterlow appears adequate, but with regards to content validity changes in the category age and gender can be beneficial. The concurrent validity cannot be assessed. The predictive validity of the Waterlow is characterized by high specificity and low sensitivity. The inter-rater reliability has been demonstrated to be inadequate, this may be due to lack of clear definitions within the categories and differentiating level of knowledge between the users. CONCLUSION: Due to the limitations presented regarding the validity and reliability of the Waterlow pressure ulcer risk assessment scale, the scale should be used in conjunction with clinical assessment to provide optimum results.
INTRODUCTION: Prevention is the ideal strategy to tackle the problem of pressure ulcers. Pressure ulcer risk assessment scales are one of the most pivotal measures applied to tackle the problem, much criticisms has been developed regarding the validity and reliability of these scales. OBJECTIVE: To investigate the validity and reliability of the Waterlow pressure ulcer risk assessment scale. METHOD: The methodology used is a narrative literature review, the bibliography was reviewed through Cinahl, Pubmed, EBSCO, Medline and Google scholar, 26 scientific articles where identified. The articles where chosen due to their direct correlation with the objective under study and their scientific relevance. RESULTS: The construct and face validity of the Waterlow appears adequate, but with regards to content validity changes in the category age and gender can be beneficial. The concurrent validity cannot be assessed. The predictive validity of the Waterlow is characterized by high specificity and low sensitivity. The inter-rater reliability has been demonstrated to be inadequate, this may be due to lack of clear definitions within the categories and differentiating level of knowledge between the users. CONCLUSION: Due to the limitations presented regarding the validity and reliability of the Waterlow pressure ulcer risk assessment scale, the scale should be used in conjunction with clinical assessment to provide optimum results.
A pressure ulcer (PU) is defined as: a) Localised injury to the skin, and/or underlying tissue, usually over a bony prominence, as a result of pressure (including pressure associate with shear); b) A number of contributing factors are also associated with PU the primary of which is impaired mobility (1).PU are identified as one of seven medical conditions that affect a large number of individuals and require expensive treatment (2). Apart from the financial problem, the development of a PU has negative psychological and physiological impact upon the individual (3). Downie et al. (4) proposed that 95% of all PU are preventable. Jalali and Rezaie (5) reported that prevention is the best strategy to tackle the PU problem. One of the most pivotal preventative measure is the risk assessment of an individual for PU development (6).This can be achieved using a valid, reliable and clinically relevant risk assessment scale (RAS) (7). Pancorbo–Hidalgo et al. (8) proposed that the use of a RAS can easily identify an individual at risk and apply preventative measures however, much criticism has been applied concerning the clinimetrics of those scales (9).
OBJECTIVE
The aim of the review was to evaluate the validity and reliability of the Waterlow pressure ulcer risk assessment scale and provide an understanding of the current bibliography.
METHOD
The method of the review was a narrative review. The scientific bibliography was reviewed between March 2017 and April 2017 through the databases Cinahl, Pubmed, EBSCO, Medline and Google scholar.The key words used were Waterlow AND pressure ulcer AND validity AND reliability AND risk assessment scale, 26 scientific articles where identified. The inclusion criteria for the articles where the existence of full text the direct correlation with the objective under study and their scientific credibility (peer review articles).
RESULTS
VALIDITY OF THE WATERLOW SCALE
Validity refers to the degree that a scale measures what it claims to measure (Haesler 2014). In the case of PU RAS, it would be how well it predicts the danger of an individual to develop a PU (10). Validity is not an one time process on the contrary, to establish that a tool is valid some basic aspects must be fulfilled in depth of time (11).
Face Validity
Face validity refers to if a scale appears to measure what is intend to measure (12). Waterlow gives the impression that fulfils face validity, initially due to the items included that relate to PU development (9). Secondly, it utilises a scoring system to produce a sum, which is enabling the user to allocate the individual in one of the risk categories (13). Face validity due to its subjectivity is not extensively reported still, it remains a useful indicator as the users are more likely to complete a tool with high face validity (12).
Content Validity
Content Validity is related to the degree that the items included in a scale are suitable to measure the outcome under investigation on the target population, without sacrificing the ease of use (14). It is estimated that over 200 PU risk factors have been identified, although not all of them are suitable to be included in a RAS (15).Factors are most commonly picked through extensive bibliographical reviews or/and expert panels consensus (12). Waterlow (13) based her factor selection on personal knowledge, literature review which was very limited at the time and at the pre-existence Norton RAS (16). The Waterlow scale was revised in 2005 in the light of new evidence and a more comprehensive explanation were provided concerning the risk factor selection. Nevertheless, even on the reviewed version the risk factors selected are not supported by sufficient evidence, leaving a sense of doubt upon the factors included (17).The Waterlow (17) revised scale included the following factors: built/weight for height BMI, assessment of the skin, gender, age, continence, mobility, nutrition, medication, tissue malnutrition and neurological deficits and major surgery or trauma. The inclusion of such a wide variety of risk factors is enabling the scale to cover a wide variety of patients although it encloses the danger to over predict the risk of an individual and making its use complex (18). Coleman et al. (19) tried to identify the risk factors that are suitable to predict independently PU development in a systematic review. The result indicated that the factors emerging most frequently as independent predictors were mobility, perfusion and skin status. Other factors such as skin moisture, age, haematological measures, nutrition, and sensory perception were found to be regularly reported but not as frequent. Medication and gender were reported as not significant factors. Medication as an independent factor of PU development is generally not supported and is unlikely that any medication will cause a PU, rather is more likely to be a surrogate indicator (19). Gender was reported as not significant by Coleman et al. (19), on the contrary, Waterlow (17) supports that physiological differences between genders are affecting the tolerability towards PU damage. Maklebust and Magnan (20) identified no statistical relationship between gender and PU development, the authors concluded that if factors that are known to decline by age are included in a RAS then the use of gender as an independent factor is unnecessary. Overall the risk factors included in Waterlow RAS appear to relate with PU development and the content validity of the tool seem to be at an acceptable level. Adjustments in the categories gender and medication can have a beneficial effect on the content validity of the tool.
Construct Validity
Construct validity involves the demonstration of the relationship between the concepts under study and the relating theory (21). Construct validity is related with content validity, as the items included in a scale are describing the outcome under measurement (22). The theory behind Waterlow RAS is that by including factors related to PU development and scoring them, a sum will be provided indicating the risk of an individual to develop PU (Waterlow 2005). To establish construct validity, Kottner and Dassen (23) measured the convergent between Waterlow, Braden and a Visual Analogue Scale. Correlation coefficient showed that Waterlow was measuring something similar with the other scales (r=0.51 VAS, r=-0, 71). By showing a degree of convergence between the scales, it indicates that the Waterlow RAS is identifying the domains of risk effectively supporting the construct validity of the scale (12). Tannen et al. (24) tested the correlation of Waterlow RAS with tools measuring different outcomes to establish construct validity, it found statistically significant correlation between Waterlow RAS and the care dependency scale (r=-0.061, p<0.001). The authors support that the construct validity of Waterlow as a PURAS is justified, because PU risk increases as the care dependency increases. Although, the association between the two scales can lead to the assumption that Waterlow is measuring general care needs and not PU risk development (24). Overall the construct validity of the Waterlow RAS needs further investigation and assessment, as there has not been a large enough volume of studies produced (23).
Criterion Validity
Criterion validity is assessing the validity of a tool by comparing and observing its performance against the existing gold standard (25). In reference to Waterlow, no tool has been identified as the gold standard, thus the measurement of criterion validity is undergone by comparing the scale with other existent relevant scales (26). The degree of agreement between the scales is referred as concurrent validity (27). Gould et al. (26) examined the concurrent validity of the Norton, Waterlow and Braden RAS versus the consensus views of an expert’s panel. Using patient scenarios 236 nurses produced in total 941 assessments, Waterlow RAS had the highest degree of agreement (20%), although according to the authors it is still in low levels. The lack of a clear gold standard PU RAS, complicates the proof of criterion/ concurrent validity, as the validity of other tools is questionable (14).
Predictive validity
Predictive validity can be described as the consistency of a scale in predicting how an individual will perform in the future, based on measurements of today (12). Therefore, if a patient is assessed with a PU RAS today and found to be at high risk then it is expected that they will develop a PU in the future (28). Predictive validity is measured by specificity or true negative, this refers to the proportions of the patients that were identified as low risk and did not develop PU and sensitivity or true positive, which refers to the percentage of the patients that were identified as high risk and developed PU (9).In a systematic review by Pancorbo–Hidalgo et al. (8), seven studies were analyzed concerning the Waterlow predictive validity. It was observed good predictive validity (OR=2.05, CI 95%=1.11-3.76) with high sensitivity (82, 4%) but low specificity (27.4%). The results were consistent with other studies supporting the high specificity, low sensitivity status of Waterlow RAS (29, 30). Transferring the results to clinical practice, patients assessed with the Waterlow RAS have increased possibility to be identified as high risk of developing PU and receiving unnecessary preventative equipment, resulting in a higher financial burden on healthcare settings (8).Predictive validity has been criticized for its applicability in PU RAS (31). This is due to the fact that it is unethical to hold available preventative measures from patients, so that the predictive validity measurement would be possible (14). In reality, when an individual is identified as high risk they receive preventative measures, which in turn will show Waterlow RAS to poorly perform concerning specificity (32). The inadequate predictive validity of the Waterlow RAS proposes that the scale should not be used on its own for the prediction of PU development, but in conjunction with clinical judgement. Together, they can provide valuable guidance in the allocation of preventative measures (6, 14).
RELIABILITY OF THE WATERLOW RISK ASSESSMENT SCALE
To be practical in everyday clinical practice, Waterlow must have the ability to produce consistent results (33). There are two major reliability aspects: the inter-rater reliability, which is the degree to which two raters operating independently assign the same ratings (agreement) for an attribute being measured (34), and the intra-rater reliability, which refers to the ability of a scale to produce the same ratings in different points of time when used by the same rater. This allows to the assumption of stability in the individual’s condition, which can be difficult to achieve as the patients at risk for PU development condition can change quite quickly (35, 14).
Inter ratter reliability
According to a systematic review by Kottner et al. (23) examining the intra and inter-rater reliability of Waterlow, no studies examining the intra-ratter reliability of the scale were identified. The lack of intra-ratter reliability studies might be explained by the fact that in clinical practise it is highly unlikely the same patient to be assessed by the same health care professional each time their condition needs to be evaluated, thus much of the focus has been given on the inter-rater reliability (14).Edwards (36) in a cross sectional observational study assessed the intra-rater reliability of the Waterlow in a sample of 40 elderly patients in the community, with or without PU. A second assessor with the same knowledge level as the researcher concerning PU risk assessment knowledge was chosen, which reduced the possibility of bias due to knowledge level difference between the raters (9). Low levels of agreement between the raters were observed (25%) (33). Categories where disagreement was observed the most were skin type, BMI and mobility, the authors concluded that the subjectivity due to the need of the assessor’s opinion in the scoring might lead to high disagreement (9). Low levels of inter-rater reliability were found also by Watkinson (37) in his comparative study between Waterlow, Norton and Braden. The agreement between the raters concerning Waterlow reached 55.6%. The raters chosen for the study were nine registered nurses, one enrolled nurse and two student nurses. The sample consistent of nine patient admitted in an acute hospital (37). Watkinson (37) states that the low agreement percentage is a result of differing knowledge level concerning PU risk assessment. Cook et al. (38) assessed the inter-rater reliability of an adapted Waterlow, minor changes were made to the categories continence, neurological deficit and medication; 15 patients were assessed by two different nurses daily for a period of seven days. Inter-rater reliability was assessed by percentage and correlation. The results proposed week to moderate level of reliability between the raters (55, 5%, r=0.36). Similarly to the previous study by Watkinson et al. (37), differing level of knowledge was reported as having an impact on the results. Inter-rater reliability of the Waterlow seems to be inadequate (14). Although the difference in the pathophysiology of the patients makes it difficult to identify if low reliability is caused by different perception of the patients state by the assessor, or differentiation in the interpretation of the tool (33). Additionally, the lack of clear definition (BMI, skin status) within some categories might lead to misunderstanding and wrong scoring (39, 36) and to the differentiating knowledge level amongst the health care professional using the tool (37, 38). Waterlow (17) states that high reliability is achievable through education and continuous use of the scale.
EASE OF USE
A RAS can be reliable and valid in the highest degree, although without high levels of completeness and usability will probably never be accepted by the users and thus applied in the clinical settings (14). Waterlow (17) ensure that the tool is user friendly by including explanations and cards for different factors and aspects (18). Although there is no direct study investigating the ease of use, some authors report that the scale is user-friendly and unambiguous (37). On the contrary, Banks and Bale (40) reported that community nurses found the scale time consuming. Community settings have differences from hospitals and difficulties completing direct skin observation and limited access on patient information might affect its use (41). A clear conclusion cannot be extracted for the ease of use as the volume of evidence is limited and further investigation is needed focusing only on this aspect.
CONCLUSION
Based on the evidence the construct and face validity of Waterlow is acceptable, but with regards to content validity changes in the category age and gender can be beneficial. Concurrent validity of the RAS is not clear as the scale seems to measure something similar with other PU RAS, although the relevance with a scale that measures general life needs, raises doubts and requires further investigation. The predictive validity of the Waterlow is characterized by high specificity and low sensitivity. The limitation in the measurement of predictive validity caused by the application of preventative measures is not allowing a clear conclusion to be extracted and is proposing that Waterlow should be used with clinical judgement. Additionally the inter-rater reliability of Waterlow has been found to be inadequate. This may be due to lack of clear definitions within the categories and differentiating level of knowledge between the users. Waterlow is relatively easy to use, although in community settings there is an indication that the tool has limitations due to the limited information access. Waterlow is not the perfect RAS concerning the validity and reliability levels, but it can be helpful if it’s used in conjunction with clinical judgement, serving as an indicator of PU risk rather than a diagnostic tool.
Authors: Pedro L Pancorbo-Hidalgo; Francisco Pedro Garcia-Fernandez; Isabel Ma Lopez-Medina; Carmen Alvarez-Nieto Journal: J Adv Nurs Date: 2006-04 Impact factor: 3.187
Authors: L N Smith; N Booth; D Douglas; W R Robertson; A Walker; M Durie; A Fraser; E H Hillan; J Swaffield Journal: J Clin Nurs Date: 1995-05 Impact factor: 3.036