Literature DB >> 29547066

Development and validation of the ExPRESS instrument for primary health care providers' evaluation of external supervision.

Michael Schriver1, Vincent Kalumire Cubaka1,2, Peter Vedsted3, Innocent Besigye4, Per Kallestrup1.   

Abstract

BACKGROUND: External supervision of primary health care facilities to monitor and improve services is common in low-income countries. Currently there are no tools to measure the quality of support in external supervision in these countries. AIM: To develop a provider-reported instrument to assess the support delivered through external supervision in Rwanda and other countries.
METHODS: "External supervision: Provider Evaluation of Supervisor Support" (ExPRESS) was developed in 18 steps, primarily in Rwanda. Content validity was optimised using systematic search for related instruments, interviews, translations, and relevance assessments by international supervision experts as well as local experts in Nigeria, Kenya, Uganda and Rwanda. Construct validity and reliability were examined in two separate field tests, the first using exploratory factor analysis and a test-retest design, the second for confirmatory factor analysis.
RESULTS: We included 16 items in section A ('The most recent experience with an external supervisor'), and 13 items in section B ('The overall experience with external supervisors'). Item-content validity index was acceptable. In field test I, test-retest had acceptable kappa values and exploratory factor analysis suggested relevant factors in sections A and B used for model hypotheses. In field test II, models were tested by confirmatory factor analysis fitting a 4-factor model for section A, and a 3-factor model for section B.
CONCLUSIONS: ExPRESS is a promising tool for evaluation of the quality of support of primary health care providers in external supervision of primary health care facilities in resource-constrained settings. ExPRESS may be used as specific feedback to external supervisors to help identify and address gaps in the supervision they provide. Further studies should determine optimal interpretation of scores and the number of respondents needed per supervisor to obtain precise results, as well as test the functionality of section B.

Entities:  

Keywords:  Africa; Rwanda; Supportive supervision; instrument development; primary health care

Mesh:

Year:  2018        PMID: 29547066      PMCID: PMC5945230          DOI: 10.1080/16549716.2018.1445466

Source DB:  PubMed          Journal:  Glob Health Action        ISSN: 1654-9880            Impact factor:   2.640


Background

Health professionals in resource-constrained primary health care settings are likely to work in overburdened conditions, carry responsibilities above their level of training and receive little or no further clinical training or support [1-4]. Generally, supervision is regarded a core element to ensure high quality care [5]. The more remote the setting in which health professionals work, the higher the level of supervision needed [1]. In low-income countries, external supervision (i.e. supervision delivered by supervisors from outside the facility) of primary health care facilities appears to be common practice [6-9]. External supervision often focuses on management and administration more than on problem solving and feedback [6,7]. Yet, health policies across Africa describe support for providers’ professional development as a component of external supervision [7,10-12], sometimes referred to as supportive supervision [8,13]. External supervisors may thus have a dual role that relates to: (1) managerial quality control of performance; and (2) formative support of providers. It has been suggested that there is a gap between health supervision policies and implementation of formative aspects of external supervision [7,14]. The external supportive supervision model [6-9,13] is described as unique to developing countries [15]. Numerous instruments have been developed in high-income settings to evaluate the quality of provider-centred supervision [16,17] and training [18] practices. The applicability of these instruments in management-centred, external supervision contexts has not been unexplored. Questionnaire-based outcome measures applied in studies of external, supportive supervision in Africa are commonly non-validated [8].

Supervision context in Rwanda

In Rwanda, external supervisors regularly visit primary health care facilities (health centres) for evaluative and formative supervisory purposes [14,19]. The external supervisors work in teams under the district hospital to which health centres refer. Supervisors are typically clinically experienced nurses with a higher nursing degree [19]. One of the major supervision drivers is the monthly or quarterly performance evaluations, which constitute the core of a nationwide performance-based financing system [14]. The health centres have no medical doctors, and more than 90% of their providers are nurses with a basic secondary school-based nursing degree (known as an A2 degree). The providers do not have a personal supervisor. Supervision encounters may happen between one or more supervisors and one or more providers. The lack of a personal supervisor together with a high turnover, absenteeism and frequent provider shifts between services, make it likely that providers interact with a new supervisor at each supervision encounter [14,19]. A rating scale to assess external supervision may help assure supervision quality in these diverse contexts. Such an instrument should assess the construct ‘Perceived quality of supportive aspects within external supervision of primary health care providers’. It reflects a view of the provider as a direct beneficiary of external supervision despite its managerial and evaluative purposes [6,7]. Our aim was to develop a tool measuring provider-reported quality of supervision to be used to give feedback to supervisors and supervision teams in Rwanda to facilitate informed changes in the practice of external supervision [20]. Moreover, to empower providers with an opportunity to give feedback to supervisors within an otherwise asymmetric power relation [19]. The tool should thus focus on aspects of supervision potentially modifiable by supervisors, and cover key concepts in supportive supervision within health care. We aimed to make the tool applicable in other African countries.

Methods

Multiple methods were used. Table 1 gives an overview of 18 chronological steps in three phases in the development of the External supervision: Provider evaluation of supervisor support (ExPRESS) tool. While phase 0 and phase 1 represent a pre-designed logical order of steps, phase 2 represents additional steps that emerged as necessary or logical to address problems or shortages discovered during the development process. A detailed view of added, revised and removed items during these steps is included as supplementary material 1.
Table 1.

Phases and steps in the development of ExPRESS.

 StepObjectiveDescription
Phase 01. Qualitative studiesIncrease understanding of external supervisionSeven focus group discussions with providers and supervisors to understand experiences. Reported elsewhere.
2. Instrument searchIdentify supervision measurement instrumentsSystematic search for instruments to measure supervision. No existing tool found applicable to external supervision context.
Phase 13. Conceptualisation IDevelop model for questionnaireDefining construct. Categorisation in normative, formative and restorative functions. Division in section A (individual supervisor) and section B (supervision overall)
4. Item development IDevelop item pool and adapt relevant itemsOf > 400 items, 122 retained in item pool, of which six used directly, 22 modified or inspirational and eight new items added for the first version.
5. Translation IPrepare for tests in RwandaForward and backward translation into Kinyarwanda
6. Interviewing ICognitive testing of itemsIndividual cognitive interviews of 10 providers and one information expert. Two group discussion with five providers and six supervisors. 17 items modified, 10 items removed, 3 items added.
7. Field test IFactor structure and reliability134 respondents, 58 retest. Exploratory factor analysis and test–retest reliability
Phase 28. Conceptualisation IIRefine conceptual modelSystematic refinement of conceptualisation using multiple sources on supportive supervision
9. Item development IIAdapt to refined model11 items modified, 1 item removed, 5 items added
10. Interviewing IILexical testTwo interviews with professional native English linguist to test lexical qualities of English version
11. Relevance assessment IContent validation by international expertRelevance assessment of items by four international experts on supportive supervision analysed via the Content Validity Index. 14 items modified, 5 items removed, 6 items added
12. Item development IIIReview and revise prior to new translation12 items modified, 4 items added.
13. Translation IIPrepare final Rwandan versionRenewed translation and back-translation of all items due to several changes and modifications
14. Relevance assessment IIContent validation by regional expertsRelevance assessment of items by five providers and five external supervisors in Rwanda, Uganda, Kenya and Nigeria analysed via the Content Validity Index
15. Interviewing IIITesting response scaleTwo group discussions with five providers on the response scale. Response scale changed for section B.
16. Item development IVAdding latent variableAdding three items of a latent variable ‘Solving problems jointly’
17. Translation IIITranslate added itemsTranslation and back-translation of added items for the ‘Joint Problem Solving’ latent variable.
18. Field test IIConfirmatory factor analysis154 respondents. Confirmatory factor analysis and Differential Item Functioning
Phases and steps in the development of ExPRESS. In this paper, item numbers corresponding to the questionnaire used in field test I (step 7) are referred to by small letters (a1–a16, b1–b13), and the item numbers in field test II (step 18) are referred to by capital letters (A1–A18, B1–B15).

Phase 0

In the preparatory phase, we conducted qualitative studies (step 1) to understand the practice of external supervision in Rwanda. We used focus group discussions with separate groups of providers and supervisors to explore the relationships between evaluative and formative supervision activities and between supervisors and providers. Methods and results are reported elsewhere [14,19]. We also conducted a systematic search (step 2) for published instruments measuring supervision or mentorship in health care to develop a bank of constructs and items (supplementary material 2 for search strategy). Further, we used reviews of directly or indirectly related instruments [16-18] and Google searches for non-published instruments. Additionally, we searched guidelines about supervision and mentoring within health or social sciences, and performed snowball searches in reference lists.

Phases 1 and 2

Conceptual model

The questionnaire is based on a reflective conceptual framework [21]. In the initial conceptual model (step 3) we categorised items according to Proctor’s tripartition of supervisory tasks into normative (administration and performance evaluation), formative (education) and restorative (personal wellbeing at work) [22]. Further, we divided the questionnaire into a specific A and a generic B section as providers may interact with different supervisors from encounter to encounter. Section A evaluates the most recent supervision experience using items that providers may reasonably assess after each supervision encounter with an individual external supervisor. Section B represents a sum experience with external supervisors to ensure coverage. In phase 2, we refined the conceptual model (step 8) using key articles and guidelines on supportive supervision in a low-resource setting [13,23-30]. Supportive supervision contents were extracted, discussed and categorised, leading to a list of key aspects to cover in the questionnaire (supplementary material 3).

Item development

Two researchers (MS and VKC, in step 4) screened all items identified in the literature search and created an item pool of those appropriate in contexts where: Providers may not have a personal supervisor The supervisor is from an external institution Supervisors may carry a managerial role Further, each item should: Focus on a specific event related to supervision Use simple, non-idiomatic phrases Items in the pool were inductively categorised in themes. Relevance to the instrument construct was assessed as ”yes”, ”no” and ”maybe” by two researchers (MS and VKC) independently. Subsequently, an iterative process of discussion among researchers informed by qualitative findings [14,19], supervision literature, conceptual models, item categories and considerations of language, semantics and level of specification, led to the composition of a first combination of items for section A and B to undergo a translation. Following the refined conceptual model, the combination of items was again modified (See Table 1 step 9, and supplementary material 1). Items were developed with focus on both clinical and non-clinical aspects, as both may be supervised in the same encounter. It was difficult to find appropriate items to evaluate the key concept of joint problem solving. At an advanced stage (step 16), a publication [31] provided an idea for how to add ‘solving problems jointly’ as a latent variable (a variable that may not be directly observed but may be indirectly measured through a set of observable items) in section A, using phrases such as ‘engaged me in’ and ‘involved me in’.

Translation

Items were developed in English and translated into Kinyarwanda for testing in Rwanda. We followed a standardized approach [32]. Two translators, a professional translator not knowledgeable about supervision and someone who had published articles about health care supervision in Rwanda, did the translation of items into Kinyarwanda. Two other translators, a native English speaker and someone who spoke English as a second language from early childhood, did the back translation. To obtain consensus of the translation of each item, MS and VKC met with the first translators, and subsequently with all four translators. As items were translated during the development of the instrument, complete translation and back-translation including meetings was done twice (steps 5 and 13). Subsequent addition of a latent variable (‘solving problems jointly’) required a third translation process of three items (step 17), with participation of only one back-translator. When discussions suggested a need to change the original English version, this was done only if there was consensus between the two researchers and the translators.

Interviews

For cognitive testing of items (step 6) we used a combination of ”Think aloud” and ”Probing” techniques [33] (supplementary material 4 for interview guide). Initially, a local communication expert and a local external supervisor were interviewed, followed by 10 individual interviews with local primary health care providers at health centres. Interviews were held in Kinyarwanda by a trained interviewer with a social science background, who also took notes, item by item. Interviews lasted 1.5–2 hours and were not recorded. After each interview, notes were discussed between the interviewer, MS and VKC, and agreed changes were applied to ExPRESS before the next cognitive test interview. Further, two focus group discussions facilitated by the same interviewer, one with six providers (five females, one male) and one with five external supervisors (three males, two females), examined meaning and relevance item by item, and suggested missing concepts. Interviews and focus group discussions led to several changes of items (see supplementary material 1).

Response scale

Initially, we used a 5-point neutral-centred agreement response scale with the advantage of uniform applicability regardless of whether items are phrased positively or negatively. In four initial, cognitive interviews (step 6) the most positive response option was endorsed for nearly all items, and interviewees did not endorse negative response options. This was in spite of the providers verbally criticising their supervisors on the same items. Therefore, a 5-point quality response scale was applied instead: ‘1 = poor; 2 = fair; 3 = good; 4 = very good; 5 = excellent’, to expand the positive spectrum. Interviewees did not report problems with understanding or using this scale. In step 15, we held two focus group discussions each with five primary health care providers to discuss alternative response scales. For section A, the quality response scale described above, a variant of the quality scale and a 4-point scale (‘no, not at all’, ‘yes, a little’, ‘yes, somewhat’, ‘yes, very much’) were explored. For section B, we explored the quality response scale and a frequency scale (‘never’, ‘sometimes’, ‘usually’, ‘quite often’, ‘always’). First, providers individually chose their preferred scale, and then discussed their preferences. All preferred the quality response scale (poor–excellent) for section A. Due to time-related items in section B, most but not all preferred the frequency response scale, which was applied in field test II.

Data collection in field tests I and II

Questionnaires in field tests I and II were self-administered after brief, face-to-face information by one of two trained assistants. For factor analysis we needed four respondents per item and for test–retest 50 respondents, as recommended [34]. We added 15–20% more respondents due to anticipated missing items. In field test I, all respondents were nurses recruited at their health centre after agreement with facility managers. In field test II, 107 (69%) nurses were recruited in this way, and the rest were nurses recruited from nursing schools, where they attended further training while being employed at a health centre. Only respondents who had experienced external supervision in the previous four months were invited. Participants filled in the questionnaire in privacy. All data in field tests I and II were entered into EpiData 2.0.5.17 using double entry, and analysed in STATA 14.2.

Field test I

The purpose of field test I (step 7) was to explore structural validity (the combination of items that would adequately reflect the construct of the questionnaire), and to conduct a test–retest reliability study (testing to what extent a provider would give the same responses about the same supervision experience when asked at two different moments in time). Structural validity was assessed with explorative factor analysis (EFA), in which factor loadings are used to study the correlation of items. The purpose of this is to identify a meaningful categorisation of items in which each item has a high factor loading with only one group of items, and thus does not cross-load (correlate) with other groups of items. We used so-called polychoric correlation matrices [34], principal axis factoring and promax oblique rotation [35]. We considered a factor loading ±0.50 or higher as practically significant, and only explored loadings ±0.30 or higher [36]. Loadings and cross-loadings ±0.30 to ±0.49 were considered potentially problematic. First, a forced 2-factor structure for the entire questionnaire (sections A and B) was explored. Secondly, structural validity was assessed within section A and B, respectively. Here, number of factors were explored stepwise, starting with the maximum potential factors as suggested by scree plot, the Akaike Information Criterion (AIC), the Bayesian Information Criterion (BIC) and eigenvalues, until at least two and preferably three items loaded 0.5 or greater on all factors [35,37,38]. For test–retest reliability, we considered respondents ‘stable’ if they had not experienced supervision between the first and second time they filled in the questionnaire. The time between responses was 12–14 days. We used weighted Cohen’s kappa [39] with linear and quadratic weights [40]. Additionally, a modified weight of identical answers as 1, directly adjacent as 0.8 and all others as 0 was used, since we expected the majority of retest responses to be within ± 1 of the test response. We applied Landis and Koch for kappa-values: 0–0.20 slight, 0.21–0.40 fair, 0.41–0.60 moderate, 0.61–0.80 substantial, and 0.81–1 nearly perfect [34].

Content validation

We conducted relevance assessments (steps 11 and 14) using the Content Validity Index (CVI) to get a consensus estimate. Each item was scored by experts as: (1) not at all; (2) somewhat; (3) quite; or (4) highly relevant for measuring a given construct. The item-CVI is the fraction of experts who found an item highly or quite relevant. With five or fewer experts, the item-CVI should be 1 (relevant to all experts) to retain an item, whereas for more than five experts an item-CVI above 0.78 was considered acceptable [41,42]. Four international experts on supportive supervision in Sub-Saharan Africa who had published on the matter [13,27,43,44], assessed the relevance of items. We knew of these experts only through their publications, which we considered important points of reference for supportive supervision in Africa. All were contacted by email. For items considered somewhat or not at all relevant, experts explained their score to enable discovery of solutions to item problems. After modifications, experts re-assessed the items [42]. Subsequently (step 14), we conducted a content validity assessment of the revised English questionnaire in Nigeria, Uganda, Kenya and Rwanda, by five external supervisors and five primary health care providers in each country (40 individuals). A collaborator in each country helped to collect the data using a standardized relevance assessment questionnaire. Respondents had to be able to read, write and understand English. Further, they were required to have a minimum of two years of experience as a provider in or a supervisor of public non-hospital primary health care facilities, as well as have visited a primary health care facility to supervise (supervisors), or experienced external supervision at their facility (providers) within the previous four months.

Field test II

In field test II (step 18) we conducted a CFA for section A and B separately, using maximum likelihood with the Satorra-Bentler (SB) estimation, which is robust to non-normality [45,46]. This was relevant since neither section showed multivariate normality. Model fit was considered good with a p-value for the chi2 test > 0.01, Tucker Lewis Index (TLI) and Comparative Fit Index (CFI) > 0.95, root mean square error of approximation (RMSEA) < 0.6 and standardized root mean square residual (SRMR) < 0.8, and acceptable if approaching these values [34,47-49]. Since SRMR and the confidence interval of RMSEA were not available for the SB estimation, these are reported based on full (non-SB) maximum likelihood. For section A, we hypothesised a 4-factor structure as the best fit. This was informed by a 3-factor output of the EFA and a fourth latent variable ‘solving problems jointly’ was added later. To compare the fit, we had predefined relevant 3- and 2-factor models. As indicated by results of the EFA, we hypothesised that the fit could potentially be improved by removing A2 and moving A9 to ‘Generating comfort’ (see Table 5).
Table 5.

Final items of ExPRESS. Items removed following field test II are indicated with *.

 Latent variable
Section A. During the most recent supervision, the supervisor 
A1. Communicated in a friendly wayGC
*A2. Explained the purpose of the supervision visit 
A3. Wanted to know my opinionsGC
A4. Listened to me attentivelyGC
A5. Treated me with respectGC
A6. Observed how I carry out specific tasks of my workUW
A7. Spend enough time discussing my work tasks with meUW
A8. Was familiar with my area of workUW
A9. Showed appreciation for my workGC
A10. Asked me what problems I experience at workSPJ
A11. Engaged me in discussions to examine problems at workSPJ
A12. Involved me in deciding how to handle problems at workSPJ
*A13. Followed up on previous discussions 
A14. Encouraged me to ask questionsBC
A15. Gave useful feedback about my workBC
A16. Asked me what I need to learn more aboutBC
A17. Discussed next stepsBC
A18. Checked to make sure I understood everything we discussedBC
Section B. In general, supervisors… 
B1. Keep their supervision appointmentsC
B2. Try not to disturb patient careC
*B3. Treat women and men equally 
B4. Gather me and my colleagues for discussing as a group, when neededC
B5. Maintain proper confidentiality of work-related informationC
B6. Strengthen the teamwork at my facility, when neededC
B7. Explain the criteria used when assessing my performanceAP
B8. Assess my performance in a fair wayAP
B9. Give useful feedback after assessing my performanceAP
B10. Have sufficient clinical skills and knowledgeCT
B11. Explain difficult issues in a clear wayCT
B12. Update me when there are major changes in guidelinesCT
B13. Help make sure my needs for training are metCT
B14. Help me feel confident at workCT
*B15. Conduct supervision in a way that makes me provide better care 

The instrument has the following latent variables:

GC: Generating comfort, UW: Understanding work, SPJ: Solving problems jointly, BC: Building capacity, C: Collaborating, AP: Assessing performance, CT: Supervisor capacity to teach.

Data quality and characteristics presented as range and (mean) across all items within a section. OBS: Items of field test I and II are different Test–retest differences and kappa values of field test I (N = 58). Item numbers correspond to field test I version, non-corresponding with item numbers of field test II Confirmatory factor analysis in field test II. Model fit of section A and B. Df: Degrees of freedom, CFI: Confirmatory fit index, TLI: Tucker-Lewis Index, RMSEA: root mean square error of approximation, SRMR: standardized root mean square residual. *Based on Satorra-Bentler estimation (not available for confidence intervals of RMSEA nor for SRMR). ** Error correlations: A3–A4 and A16A17. Bold model: the final selected model. Final items of ExPRESS. Items removed following field test II are indicated with *. The instrument has the following latent variables: GC: Generating comfort, UW: Understanding work, SPJ: Solving problems jointly, BC: Building capacity, C: Collaborating, AP: Assessing performance, CT: Supervisor capacity to teach. For section B we hypothesised that a 4-factor structure would be the best fit, and had predefined relevant 2-factor, 3-factor, 4-factor models to compare fit, as well as improving fit by excluding items B3, B5 and B15 (see Table 5). The hypothesised models are included as supplementary material 5.

Results

The systematic search identified 21 measurement instruments related to supervision of which five were not published in scientific journals [50-54], six were published but not as papers to validate the instrument [55-60] and 11 were published as validation studies [61-71]. Additionally, three instruments were found where respondents assessed an external event [18,72,73], and one instrument related to the primary health care field [74]. Over 400 items were identified, and 122 were retained in the item pool. The most common reasons for excluding items were that they were complicated in phrasing, vague, or inappropriate for the context of external supervision.

Phase 1

Following categorisation of pooled items as well as discussion and assessment of their relevance, a first version of the questionnaire was composed for cognitive testing. For section A, four items from the item pool were used with no modifications, 14 items were modified or the idea was used to develop another item, and based on qualitative supervision data, three new items were added [14,19]. For section B, two items were used without modifications, eight items were modified or the idea was used to develop a new item, and five new items were added. After cognitive testing, 10 items were removed, three items were added and 17 items modified. After the refined conceptual model, one item was removed, five items added and 11 items were modified (supplementary material 1). All invited participants responded to the field test I questionnaire version (items a1–a16 and b1–b13, supplementary material 1). A total of 134 primary health care nurses, 52% of whom were from districts in the capital, Kigali, participated. Respondents were from 27 health centres, 52% had their most recent supervision within the previous month, and 75% were female, reflecting a predominance of females in the nursing profession (supplementary material 6 for participant characteristics). Respondents had assessed 36 different supervisors in section A (24 did not provide the supervisor name). A total of 111 and 119 respondents had no missing items for section A and section B, respectively. Table 2 shows the range and mean of various descriptive statistical indicators across all items within a section (including field test II).
Table 2.

Data quality and characteristics presented as range and (mean) across all items within a section.

 Field test I (N = 134)
Field test II (N = 158)
ParametersSection ASection BSection ASection B
Item mean2.6–4.0 (3.5)2.8–4.0 (3.5)2.6–4 (3.3)2.8–4.3 (3.7)
Item SD0.9–1.2 (1.1)0.8–1.2 (1.0)0.9–1.2 (1.1)0.9–1.4 (1.1)
Item median3–4 (3.7)3–4 (3.7)3–4 (3.4)2–5 (3.8)
% with lowest response in item1–25 (6)1–12 (4)2–25 (10)1–29 (5.8)
% with highest response in item6–33 (17)6–31 (19)3–29 (11)9–60 (35)
Item kurtosis1.8–3.6 (2.7)2.0–3.3 (2.5)1.9–4.0 (2.7)1.6–4.0 (2.7)
Item skewness, absolute values0.2–0.9 (0.5)0.1–0.7 (0.3)0.0–0.9 (0.5)0.2–1.3 (0.8)

OBS: Items of field test I and II are different

Exploratory factor analysis

In the forced 2-factor structure, all section A items loaded above 0.50 in factor 1 (except a13 loading 0.47) and all section B items loaded above 0.50 in factor 2. Only one item (b13) cross-loaded above 0.3 (0.34). For section A, up to six factors were suggested. Following stepwise exploration of loadings, we found a potential fit of a 3-factor model corresponding to ‘Generating comfort’, ‘Understanding work of providers’ and ‘Building provider capacity’, retaining items a1–a12. Item a1 had lowest loading (0.56) and communality (0.53). Item a7 cross-loaded with factors of ‘Generating comfort’ and ‘Understanding work’ in several models. These observations of a1 (=A2 in field test II) and a7 (= A9 in field test II) were considered for the CFA models in field test II. Item a13 had loadings and communality below 0.5. Due to content validity it was moved to section B instead of being excluded. Items a14a16 were excluded as they did not represent specific supervisory events, and loaded on a fourth factor with which several items cross-loaded. For section B, up to seven factors were suggested. Using stepwise exploration, a 4-factor model emerged with factors corresponding to ‘Planning’, ‘Team work’, ‘Assessing Performance’ and ‘Capacity to teach’, retaining items b1-b11. Items b13 (≈ B15 in field test II) and b12 loaded on a factor with several cross-loadings, and did not evaluate specific supervisory events. Items b3 (≈ B5 in field test II) and b10 (= B3 in field test II) loaded below 0.5. These were considered for CFA modelling in field test II.

Test–retest reliability

Of 134 providers in field test I, 58 had not experienced supervision since their response and participated in the retest (supplementary material 6 for characteristics). Table 3 shows the distribution of differences in test and retest responses, number of missing responses per item and weighted kappa values.
Table 3.

Test–retest differences and kappa values of field test I (N = 58).

 Differences (Retest minus test)
 Weighted Kappa
Item−4−3−2−101234MissingModifiedLinearQuadratic
a1  1123112 1 10,630,470,59
a2 13152781  30,570,470,60
a3  4113281  20,630,580,71
a4  4172772  10,640,600,72
a5  184071  10,750,680,78
a6  41526103  00,450,390,55
a7 111329111  20,610,480,63
a8  51031111  00,590,530,67
a9 131724832 00,470,390,52
a10  3123074  20,500,460,57
a11 111328712 50,610,500,59
a122 1133281  10,660,530,57
a1311483111 1 10,610,530,60
a14  3123661  00,650,580,69
a15  320287   00,590,450,62
a16 12173061  10,600,500,63
% of total0%1%5%23%53%15%2%1%0%    
b1  4103264 110,490,470,51
b2  41323125 100,390,330,45
b3  112271321 20,550,420,55
b4 14113551  10,590,550,61
b5  61227102  10,420,380,50
b6 151029103  00,480,440,55
b7  5123263  00,470,460,55
b8 13133091  10,560,470,59
b9  2172963  10,540,460,60
b10 1 1132112  10,550,430,50
b11  41727531 10,420,370,48
b12  217353   10,720,610,74
b13  415336   00,570,490,59
% of total0%1%6%23%53%14%4%0%0%    

Item numbers correspond to field test I version, non-corresponding with item numbers of field test II

More than 90% of all retest responses were within +/- 1 of the test response. In all cases, linear weights had the lowest kappa values, and in most cases, quadratic weights the highest. With the suggested modified weight, all items had moderate to substantial agreement, except b2 with κ = 0.39.

Phase 2

Content validation by experts using the CVI

Following relevance assessment by four international experts in supportive supervision, we deleted five, modified 14 and added six items (see supplementary material 1). New and modified items were subsequently assessed by the same experts, as a 2nd iteration [42]. Here, only item B1 had an item-CVI below 1 (supplementary material 7). The item was included for field testing due to relevance in the qualitative studies. The regional relevance assessment by five supervisors and five providers in each of four countries had acceptable item-CVI for all items except in Nigeria for item A2, A17, and a previous version of item A7 (Supplementary material 7). In Rwanda, these items were found relevant, and therefore included in field test II.

Field study II

Among 154 respondents, 72% were female, 90% had more than three years of practice experience and 68% had their most recent supervision within the previous two months (supplementary material 6 for participant characteristics). Respondents came from 17 different districts, and had evaluated 69 different supervisors in section A (eight respondents had not reported the supervisor name). Of 154 respondents, 146 were retained for CFA of section A and 145 for section B, as they had no missing items. Table 2 shows that 35% of respondents endorsed the highest possible response (‘always’) in section B items, compared to 11% (‘excellent’) in section A items. Item B1 (see Table 5) was included in the field test despite a CVI of 0.75 and had the lowest median and mean suggesting that it reflected a perceived problem. Table 4 shows goodness of fit output of the confirmatory factor analysis.
Table 4.

Confirmatory factor analysis in field test II. Model fit of section A and B.

ModelChi2*df*p*CFI*TLI*RMSEA*RMSEA (90% CI)SRMR
A: 1 factor3751530,000,810,780,110,13(0,12; 0,15)0,08
A: 2 factors3631530,000,820,790,110,13(0,12; 0,14)0,08
A: 3 factors2661530,000,890,880,080,10(0,09; 0,12)0,08
A: 4 factors2061530,000,940,930,060,09(0,07; 0,10)0,06
A: 4 factors, exclude A21771360,000,950,940,060,08(0,07; 0,10)0,06
A: 4 factors, exclude A2, move A91621360,000,960,950,050,08(0,06; 0,09)0,05
A: 4 factors, exclude A2, A13, move A91491200,000,960,950,060,08(0,06; 0,10)0,05
A: As above + error correlations**1191200,050,980,980,040,06(0,04; 0,08)0,04
B: 1 factor1821050,000,860,830,080,10(0,08; 0,12)0,07
B: 2 factors1681050,000,880,850,080,09(0,08; 0,11)0,07
B: 3 factors1501050,000,900,880,070,09(0,07; 0,11)0,06
B: 4 factors1461050,000,900,880,070,09(0,07; 0,11)0,06
B: 4 factors, exclude B3107910,000,940,920,060,07(0,05; 0,09)0,05
B: 4 factors, exclude B3 B1586780,010,950,930,060,07(0,04; 0,09)0,05
B: 4 factors, exclude B3 B15 B569660,030,960,950,050,07(0,04; 0,09)0,05
B: 3 factors, exclude B3 B15 B576660,010,950,940,060,07(0,04; 0,09)0,05
B: 3 factors, exclude B3 B1575660,020,950,940,060,07(0,04; 0,09)0,06

Df: Degrees of freedom, CFI: Confirmatory fit index, TLI: Tucker-Lewis Index, RMSEA: root mean square error of approximation, SRMR: standardized root mean square residual.

*Based on Satorra-Bentler estimation (not available for confidence intervals of RMSEA nor for SRMR).

** Error correlations: A3–A4 and A16–A17. Bold model: the final selected model.

A reasonable fit was found for section A with the hypothesised 4-factor model, improved by excluding item A2 and moving A9 to factor 1 as hypothesised. The model improved by adding error correlations between items A3 and A4, and items A16 and A17, which was not predicted. Conceptually, these error correlations were reasonable and did not indicate redundancy. Item A13 (‘followed up on previous discussions’) had a loading of 0.51 and was previously found irrelevant by an international expert (see supplementary material 1). It was therefore discussed and found inappropriate for section A, not necessarily linked to support and therefore excluded. Figure 1 shows the final 4-factor, 16-item model.
Figure 1.

Final structural equation model for section A with standardised factor loadings, error terms and error correlations.

GC: Generating comfort, UW: Understanding work, SPJ: Solving problems jointly, BC: Building capacity

Final structural equation model for section A with standardised factor loadings, error terms and error correlations. GC: Generating comfort, UW: Understanding work, SPJ: Solving problems jointly, BC: Building capacity For section B, excluding item B3 significantly improved the fit of the proposed 4-factor model. Item B15 was non-specific and somewhat abstract, and was excluded to slightly improve fit. While excluding B5 slightly improved fit, it was retained for content validity reasons. To avoid a factor of two items we adopted the 3-factor comparison model, which also had an appropriate fit. Improvements from error correlations were not conceptually appropriate. The final 3-factor and 13-item model is shown in Figure 2.
Figure 2.

Final structural equation model for section B with standardised factor loadings and error terms.

C: Collaborating; AP: Assessing performance; CT: Capacity to teach.

Final structural equation model for section B with standardised factor loadings and error terms. C: Collaborating; AP: Assessing performance; CT: Capacity to teach. Cronbach’s alpha was 0.93 for the final 16-item version of section A with item-rest correlations of 0.55 to 0.74. The final 13-item version of section B had alpha 0.87 and item-rest correlations from 0.39 to 0.70. The final questionnaire is prese final questionnaire is prese final questionnaire is presented in Table 5. The individual supervisor at a specific supervision encounter is assessed in section A, which contains the latent variables generating comfort (5 items), understanding work (3 items), solving problems jointly (3 items) and building capacity (5 items). The overall experience of supervision is assessed in section B, which contains the latent variables collaborating (5 items), assessing performance (3 items) and capacity to teach (5 items).

Discussion

This study documents the rigorous process of development and validation of the ExPRESS questionnaire using multiple strategies to allow for triangulation. Items were developed through an iterative approach using an item pool derived from 25 existing instruments, and discussions informed by the construct, conceptual framework and qualitative supervision data grounded in the experiences and perceptions of primary health care providers and their supervisors. A standardized translation process, cognitive interviewing and lexical testing resulted in several relevant modifications. Further modifications were made following content validation using the content validity index among international experts as well as among supervisors and primary health care providers in other sub-Saharan African countries. Structural validation was conducted using EFA in field test I, which guided further instrument development and generation of model hypotheses tested in field test II.

Contribution to supervision measurement

To our knowledge, ExPRESS is the only instrument designed and validated for primary health care providers to evaluate the quality of support in external supervision, in which normative functions such as performance control generally dominate. While the tools retrieved for this study assumed a provider-centred supervision approach (with some exceptions [52,60]), ExPRESS is appropriate for managerial supervision that claims to maintain provider support as a key objective. This form of supervision is particularly prevalent in resource-constrained settings. Some existing tools evaluate a specific encounter [51,52,58,61,71], and others a sum of experiences [57,59,60,62-70], although this may not be explicit. ExPRESS is the only instrument divided into two sections to assess both a specific encounter and a sum experience of supervision. This is relevant for diversified external supervision contexts where providers may encounter different supervisors. The items included in ExPRESS generally assess specific events that may or may not take place in the encounter between a provider and a supervisor. This event-orientation of items allows ExPRESS to provide concrete feedback to a named supervisor and/or a supervisory team on areas to improve. Only one tool [52] specified the particular supervision encounter assessed and used event-oriented items, but was neither developed for administration by supervisees nor validated.

Scoring and interpretation

Optimal scoring and interpretation of the instrument remain to be determined. Using scores 1 (lowest) through 5 (highest) as response options, we preliminarily suggest that scores below 80% of the maximum possible score (corresponding to the three lowest response options, if each item is considered separately) indicate a practical need for improvement. This threshold could also be used for items combined. For instance the latent variable of three items ‘solving problems jointly’ would have a maximum possible score of 5*3 = 15, and thus a score of 11 or below would indicate a need for improvement. In case of missing items, the maximum possible score would be altered (by subtracting 5 per item missing) and the score needed for a proportion of a minimum of 80% would thus be proportionately altered [75]. Criterion validation could be possible using other measures of supervision and achieved competences, and construct validity may be further evaluated by ‘known group’ analysis and item response theory. Further studies are needed to determine the number of assessments necessary per supervisor in section A and per supervision team in section B for achieving appropriate statistical precision. Comparable instruments recommend from 4 [76] to 20 [72] assessments per evaluatee. ExPRESS is a measure of providers’ expression of supervisors’ behaviour. It should not be interpreted as a measure of supervisor behaviour [77]. Perceptions of the same supervision event may differ between people depending on their personality [78].

Strengths and limitations

This study has a number of strengths. The design involved multiple phases and methods including systematic search, qualitative explorations and mirroring steps of item development, content validation and structural validation, leading to relevant modifications throughout the process. By developing the tool in English with the purpose of making it useful across contexts of external supervision and using a standardized translation process, we avoided local language issues and idioms while ensuring cultural and contextual adaptation. Regional relevance assessments indicated high generalizability, and international experts were involved to improve as well as assess the instrument. We also reached the intended number of respondents for the test–retest, field tests I and II, and respondents represented districts and health centres across Rwanda. The study has several limitations related to the design, data collection and data analysis. ExPRESS was framed as a reflective measurement model with latent variables reflecting supervisor traits and abilities. However, this is not self-evident and the event-orientation of items could raise reasonable arguments for formative relationships [21]. The responsiveness of ExPRESS, that is its ability to measure change over time, was not evaluated, but would be needed to apply ExPRESS in measuring effects of supervision interventions. Since the main part of the cognitive testing was conducted on preliminary versions of the questionnaire during phase 1, items A10-A12 did not undergo cognitive or relevance testing in their final form. However, as they did not have higher missing rates than other items in field test II and represented modifications of items previously tested and found relevant, we considered their content validity acceptable. Test–retest reliability data was collected in field test I, which may not be transferred to the final questionnaire version. While field test I data was collected from providers during the daytime and at health centres, this was not feasible for the retest data two weeks later, which for many was collected in the evening or outside the health centre. This may have caused an underestimation of agreement between test and retest [34]. Finally, in field test II we collected data on a frequency response scale for section B, as opposed to field test I, where a quality response scale was used. This may in part explain the significant difference in the percentage endorsing the highest 5-point response. A further study may establish the extent to which the frequency response scale contributes to a ceiling effect compared to the quality response scale. Applying a 5-point ordinal scale as continuous data in CFA and using the maximum likelihood method has been shown to be appropriate [49]. The risk is to wrongly reject a proper model (type 1 error) [45]. We used the SB estimation due to questionable normality of section B in particular. The asymptotic distribution free method is applicable for non-continuous data, but was not applied as it may reject properly specified models if sample sizes are small (N < 500) or deviation from normality is minimal [45]. It has been suggested that non-normality is not problematic for the maximum likelihood method until univariate skewness and kurtosis approach 2.0 and 10.0, respectively [45]; our data is below these limits (Table 2). Recall bias may be a concern for the section A assessing the most recent supervision. Therefore, we tried to identify participants who were recently supervised. In field test I, almost 50% and in field test II almost 80% of respondents had their most recent supervision experience over a month before answering the questionnaire. Therefore, the assessment may be hampered by recall bias. On the other hand, a more precise measure of an experience may require time to consider the experience [79,80]. We found measurement invariance for all items when comparing respondents supervised more and less than one month prior to the field test.

Conclusion

External supervision is a common strategy in primary health care management in resource-constrained settings. This paper presents the stepwise development of a novel instrument, ExPRESS, to measure the quality of support delivered through external supervision as assessed by its direct beneficiaries – primary health care providers. The instrument includes a section A assessing an individual external supervisor at a specific supervisory encounter, and a section B assessing external supervisors in general. Items were found relevant by experts of supportive supervision, as well as by providers and supervisors in four African countries. We believe ExPRESS has a high content validity and a reasonable structural validity, and can be useful to evaluate external supervision in resource-constrained primary health care settings. This may include under-resourced settings in high-income countries. It is freely available to collaborators for non-commercial use. Further analyses must focus on scoring, interpretation, responsiveness and using the tool for feedback as well as on setting up a database of representative samples to explore how ExPRESS evaluates the quality of external supervision.
  41 in total

1.  The Maastricht Clinical Teaching Questionnaire (MCTQ) as a valid and reliable instrument for the evaluation of clinical teachers.

Authors:  Renée E Stalmeijer; Diana H J M Dolmans; Ineke H A P Wolfhagen; Arno M M Muijtjens; Albert J J A Scherpbier
Journal:  Acad Med       Date:  2010-11       Impact factor: 6.893

2.  The perceptions of nurses in a district health system in KwaZulu-Natal of their supervision, self-esteem and job satisfaction.

Authors:  L R Uys; A Minnaar; S Reid; J R Naidoo
Journal:  Curationis       Date:  2004-05

3.  The content validity index: are you sure you know what's being reported? Critique and recommendations.

Authors:  Denise F Polit; Cheryl Tatano Beck
Journal:  Res Nurs Health       Date:  2006-10       Impact factor: 2.228

Review 4.  Is the CVI an acceptable indicator of content validity? Appraisal and recommendations.

Authors:  Denise F Polit; Cheryl Tatano Beck; Steven V Owen
Journal:  Res Nurs Health       Date:  2007-08       Impact factor: 2.228

5.  Development and validation of the Supervisory Relationship Questionnaire (SRQ) in UK trainee clinical psychologists.

Authors:  Marina Palomo; Helen Beinart; Myra J Cooper
Journal:  Br J Clin Psychol       Date:  2009-05-19

6.  The Evolving Role of Physicians - Don't Forget the Generalist Primary Care Providers Comment on "Non-physician Clinicians in Sub-Saharan Africa and the Evolving Role of Physicians".

Authors:  Vincent Kalumire Cubaka; Michael Schriver; Maaike Flinkenflögel; Philip Cotton
Journal:  Int J Health Policy Manag       Date:  2016-10-01

7.  Role and working conditions of nurses in public health in Mexico and Peru: a binational qualitative study.

Authors:  Maria Isabel Peñarrietade De Córdova; Nelda Mier; Nora Hilda Gonzales Quirarte; Tranquilina Gutiérrez Gómez; Socorro Piñones; Alejandro Borda
Journal:  J Nurs Manag       Date:  2012-11-01       Impact factor: 3.325

8.  Patient perception of quality following a visit to a doctor in a primary care unit.

Authors:  S Haddad; L Potvin; D Roberge; R Pineault; M Remondin
Journal:  Fam Pract       Date:  2000-02       Impact factor: 2.267

9.  Plausible role for CHW peer support groups in increasing care-seeking in an integrated community case management project in Rwanda: a mixed methods evaluation.

Authors:  Anne Langston; Jennifer Weiss; Justine Landegger; Thomas Pullum; Melanie Morrow; Melene Kabadege; Catherine Mugeni; Eric Sarriot
Journal:  Glob Health Sci Pract       Date:  2014-08-31

Review 10.  Support and performance improvement for primary health care workers in low- and middle-income countries: a scoping review of intervention design and methods.

Authors:  Ashwin Vasan; David C Mabey; Simran Chaudhri; Helen-Ann Brown Epstein; Stephen D Lawn
Journal:  Health Policy Plan       Date:  2017-04-01       Impact factor: 3.344

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.