| Literature DB >> 31693797 |
Holly Walton1, Aimee Spector2, Morgan Williamson3, Ildiko Tombor4, Susan Michie2.
Abstract
OBJECTIVES: To understand whether interventions are effective, we need to know whether the interventions are delivered as planned (with fidelity) and engaged with. To measure fidelity and engagement effectively, high-quality measures are needed. We outline a five-step method which can be used to develop quality measures of fidelity and engagement for complex health interventions. We provide examples from a fidelity study conducted within an evaluation of an intervention aimed to increase independence in dementia.Entities:
Keywords: complex health intervention; dementia; engagement; fidelity of delivery; implementation; measures; psychometric; quality
Mesh:
Year: 2019 PMID: 31693797 PMCID: PMC7004004 DOI: 10.1111/bjhp.12394
Source DB: PubMed Journal: Br J Health Psychol ISSN: 1359-107X
The five steps used to develop quality fidelity and engagement measures
| Step | Proposed procedure | How to apply this step |
|---|---|---|
| 1) Review previous measures | 1a) Review measures used in fidelity assessments within your field and/or related fields |
Look at other checklists used to measure fidelity in your field to see what have been included in their checklist and to help you make decisions about what to use in your own checklist (e.g., types of response options) |
| 2) Analyse intervention components and develop an intervention framework | 2a) Analyse intervention components |
Read and code intervention materials (e.g., intervention manual) for key ‘components’ (aspects of an intervention that need to be delivered). To code intervention content, comment functions in word, and PDF can be used Consider what level of coding is appropriate (e.g., identifying components that should be delivered generally or identifying specific behaviour change techniques (Michie If intervention development occurs at the same time as this step, review the list of key components once the intervention/manual is updated and finalized |
| 2b) Group the list of components into categories |
Identify similarities between your intervention components and develop these into groups. For example, at the start of the intervention, there may be lots of components used to deliver information – these may all be grouped under ‘Necessary basic information’ | |
| 2c) Develop a comprehensive intervention framework |
Develop an intervention framework. This should be a table which provides an overview of what the intervention is, what components it includes, and other relevant aspects such as: who these components should be delivered to, and how these components link to the wider intervention aims and target behaviours The framework can be structured around your groups of components identified in 2b and could include columns with headings such as ‘Key targets’ (i.e., groups from 2c), intervention components, session number(s) that the component is delivered in, target behaviour that the component aims to improve, and intervention objectives | |
| 2d) Remove redundant components from framework |
Review the framework and identify and remove redundant components (e.g., components that have similarities with other components) | |
| 3) Develop fidelity checklists | 3a) Identify which components from the framework take place in which of the intervention sessions |
Use the column in your framework referring to session number to identify components for each of the intervention sessions |
| 3b) Develop one checklist for each of your intervention sessions, based on your framework |
For each intervention session, use the intervention manual to put the components in the order that they will be delivered to participants Develop a checklist for each session which lists all the standardized components that should be delivered to all participants Choose which response options you would like to use (e.g., done, done to some extent, and not done) If the intervention has standardized (delivered to all) and tailored (participant choice) components, develop your fidelity checklist using the standardized components so that this can be compared across participants and providers and sites. Then, consider developing a separate table to supplement this checklist to capture delivery of the tailored components. One way could be to develop a separate tailored grid to explore one of the standardized components in more detail If measuring fidelity and engagement, checklists can also include questions on what participants understood and put into practice | |
| 3c) Tailor checklists for use by your intended audiences |
Consider creating different versions of the fidelity checklists for different audiences (e.g., researchers, providers, and participants) Different wording may be appropriate for different audiences, for example, provider and researcher checklists may be written in terms of delivery, whereas participant checklists may be written in terms of what is received | |
| 3d) Review the checklists |
Review the checklists to identify and remove redundant components (e.g., repetition) or jargon | |
| 3e) Develop simple guidelines for all target users which explain how to complete the checklists |
Develop guidelines for all intended users of the checklists (researchers, providers, and participants) Guidelines will serve different purposes for different audiences and therefore may include different content Researcher guidelines could include simple but in‐depth: (1) information about the checklists, (2) guidance on how to complete the checklists, (3) guidance on how to decide which score to give, (4) definitions for each component, (5) illustrative examples for ‘done’, ‘done to some extent’, and ‘not done’ for all components across all sessions Provider guidelines could include simple explanations of (1) what the checklists and audio‐recordings are for, (2) what to return and how, (3) how to fill out the checklists, (4) an example checklist Participant guidelines could include a simple explanation on: (1) how to complete the forms, (2) how to return the forms, (3) an example checklist | |
| 4) Obtain feedback about the content and wording of the checklists and guidelines | 4a) Ask relevant stakeholders to give feedback on the content and wording of checklists and coding guidelines |
Decide who you need to get feedback from. This could include intervention team members with expertise of developing the intervention and/or target users of the checklists Ask stakeholders to provide feedback on the content and wording of checklists and coding guidelines. |
| 4b) Edit checklists and guidelines to take this feedback into account |
In line with feedback received, make changes to the checklists and guidelines Consider consulting relevant guidance (e.g., condition‐specific guidance) and/or readability statistics to make sure that checklists are as easy to use as possible | |
|
5) Pilot and refine checklists and coding guidelines to assess and improve reliability | 5a) Use multiple researchers to test coding guidelines and checklists against some initial intervention transcripts (initial piloting) |
Decide on a reliability threshold that you use to determine reliability of researcher coding (e.g., weighted kappa, kappa or percentage agreement) Select a percentage of your overall fidelity sample to pilot (e.g., 10% of your overall sample) Transcribe these sessions Identify a second researcher Together with the second researcher, independently ‘code’ the transcripts discussion. Coding the transcript means applying the coding guideline to the transcript, and identifying evidence for the delivery each component within that transcript To ‘code’ a transcript, use the comment function in Microsoft word to identify evidence for each component. Add a comment with the number of the component and a note to say whether the component was delivered/partly delivered/not delivered Add explanations for coding choices, where necessary to facilitate discussions Each researcher should complete a checklist for each of the sessions that they rate. Save the checklists with the researcher's initials |
| 5b) Discuss discrepancies and amend coding guidelines |
Work out agreement between the two researchers using your chosen reliability statistic and identify components that have been disagreed on (e.g., by creating a spreadsheet to record these disagreements) Meet to discuss coding and discrepancies. To do this, go through each component that was disagreed on and outline reasons for the score that you gave and use the coded word document to support this discussion. Decide between you which of the scores is the most appropriate and agree on this Note down the reasons for disagreement between coders Create a third version of the checklists to record final agreed scores on. For clarity, save this checklist with ‘agreed’ and both of your initials at the end of the file name Identify and record reasons for discrepancies Amend coding guidelines where necessary to improve clarity | |
| 5c) Pilot and amend coding guidelines until selected agreement threshold is achieved |
Repeat stages 5a and 5b until chosen reliability threshold is achieved Note. More sets of sessions may need to be transcribed and included in your pilot to achieve agreement If coding guidelines are amended throughout the piloting process, the transcripts used for piloting may need to be re‐coded during your main fidelity assessment once coding guidelines have been finalized |
Figure 1An example provider checklist (Session one). [Colour figure can be viewed at http://www.wileyonlinelibrary.com]
Figure 2An example participant ‘your experience’ checklist (Session one). [Colour figure can be viewed at http://www.wileyonlinelibrary.com]
Weighted kappa and percentage agreement for standardized components across PRIDE Sessions one, two, and three in both the piloting stage and main assessment stage
| Set of transcripts | Weighted kappa (%) | |||
|---|---|---|---|---|
| Session 1 | Session 2 | Session 3 | ||
| Piloting coding guidelines and checklists to achieve agreement | ||||
| 1 | Coding pair 1 (pilot) | 0.21 (59.1) | 0.26 (55.6) | −0.33 (50) |
| Coding pair 2 | 0.38 (54.6) | 0.4 (66.66) | −0.11 (66.66) | |
| 3 | −0.2 (36.4) | 0.48 (61.1) | −0.25 (41.66) | |
| 4 | 0.47 (63.6) | 0.65 (72.2) | 0.49 (66.66) | |
| 5 | 0.55 (59.1) | 0.62 (77.7) | 0.29 (58.33) | |
| 6 | 0.62 (77.3) | 0.69 (77.7) | 0.31 (50) | |
| 7 | 0.28 (68.2) | 0.16 (50) | 0.59 (66.6) | |
| 11 | 0.56 (77.3) | 0.54 (66.7) | 0.00 (33.3) | |
| 2 | 0.83 (90.9) | 0.71 (77.7) | No session | |
| 8 | No transcript | No transcript | 0.31 (58.3) | |
| 9 | 0.07 (72.7) | 0.41 (61.1) | No transcript | |
| 10 | 0.85 (90.9) | 0.83 (83.3) | No transcript | |
| 13 | 0.81 (86.4) | No transcript | 0.61 (66.66) | |
| 12 | No transcript | 0.45 (55.6) | 0.46 (58.33) | |
| 14 | 0.42 (86.4) | No transcript | 0.57 (66.66) | |
| 15 | No transcript | No transcript | 1.00 (100) | |
| 16 | No transcript | No transcript | 0.68 (75) | |
| 17 | – | 0.83 (83.3) | No transcript | |
| 18 | – | 0.77 (83.3) | 0.64 (83.33) | |
| 1 (re‐coded new guidelines) | – | 0.68 (88.9) | – | |
| Main fidelity assessment | ||||
| 5 (*) | 0.7 (72.7) | 0.5 (72.2) | 0.3 (66.7) | |
| 6 | – | 0.4 (66.7) | 0.8 (83.3) | |
| 7 | – | 0.4 (72.2) | – | |
| 18 (*Session 1) | 0.4 (68.2) | Pre‐coded | Pre‐coded | |
| 19 (*Session 2) | 0.6 (77.3) | 0.5 (66.7) | 0.4 (50) | |
| 20 | 0.8 (90.9) | 0.7 (77.7) | 0.6 (75) | |
| 23 (*) | 0.8 (90.9) | 0.5 (55.5) | No transcript | |
| 24 | – | 0.7 (83.3) | 0.8 (91.7) | |
This was used when agreement had already been reached, and no further sessions needed to be coded until the next sampled set.
No transcript – refers to sessions where transcripts were not available to code.
(*) Sets in the main fidelity assessment that were selected for double coding.
Pre‐coded refers to sets that were coded during the piloting phase.
Indicates agreement >0.61 was reached.
Coding guidelines not changed after coding this set.
Weighted kappa did not reach >0.61 however >85% agreement achieved three times in a row and >0.8 kappa 3 times in last five sets. Kappa low due to lots of ‘not done’ responses, despite only three disagreements.
Percentage agreement for delivery of tailored topics and topic components (scored out of 11) in PRIDE Sessions one and two in both the piloting stage and main assessment stage
| Topic (number of sets delivered in Session 1 and 2) | Mean number of components agreed on (range) (%) | |
|---|---|---|
| Session 1 | Session 2 | |
| Piloting coding guidelines and checklists to achieve agreement | ||
| Keeping mentally active (S1: 9, S2: 2) | 75.7 (54.6–90.9) | 86.4 (81.8–90.9) |
| Keeping physically active (S1: 3, S2: 0) | 84.8 (72.7–90.9) | N/A |
| Keeping socially active (S1: 4, S2: 3) | 86.4 (72.7–90.9) | 87.8 (81.8–90.9) |
| Making decisions (S1: 2, S2: 1) | 86.4 (81.8–90.9) | 81.8 |
| Getting your message across (S1: 4, S2: 1) | 75 (27.3–90.9) | 81.8 |
| Receiving a diagnosis (S1: 1, S2: 2) | 54.6 | 72.7 (63.6–81.8) |
| Keeping healthy (S1: 0, S2: 0) | N/A | 81.8 (63.6–90.9) |
| No topics delivered (S1: 2, S2: 3) | N/A | N/A |
| Main fidelity assessment | ||
| 1 Keeping mentally active (S1: 4, S2: 1) | 93.6 (81.8–100) | 100 |
| 2 Keeping physically active (S1: 2, S2: 2) | 90.9 | 90.9 |
| 3 Keeping socially active (S1: 2, S2: 2) | 90.9 (81.8–100) | 86.4 (81.8–90.9) |
| 4 Making decisions (S1: 2, S2: 1) | 81.8 | 63.6 |
| 5 Getting your message across (S1: 2, S2: 1) | 81.8 (72.7–90.9) | 81.8 |
| 6 Receiving a diagnosis (S1: 2, S2: 0) | 95.5 (90.9–100) | N/A |
| 7 Keeping healthy (S1: 0, S2: 2) | N/A | 77.3 (63.6–90.9) |
| No topic delivered (S1: 0, S2: 3) | N/A | N/A |
N/A = not applicable: Topic not delivered.
11 components = 100%.