Literature DB >> 30413196

Development and pilot testing of a tool to assess evidence-based practice skills among French general practitioners.

Nicolas Rousselot1,2, Thomas Tombrey3, Drissa Zongo4,5, Evelyne Mouillet4,5, Jean-Philippe Joseph3,5, Bernard Gay3, Louis Rachid Salmi4,5,6.   

Abstract

BACKGROUND: There is currently an absence of valid and relevant instruments to evaluate how Evidence-based Practice (EBP) training improves, beyond knowledge, physicians' skills. Our aim was to develop and test a tool to assess physicians' EBP skills.
METHODS: The tool we developed includes four parts to assess the necessary skills for applying EBP steps: clinical question formulation; literature search; critical appraisal of literature; synthesis and decision making. We evaluated content and face validity, then tested applicability of the tool and whether external observers could reliably use it to assess acquired skills. We estimated Kappa coefficients to measure concordance between raters.
RESULTS: Twelve general practice (GP) residents and eleven GP teachers from the University of Bordeaux, France, were asked to: formulate four clinical questions (diagnostic, prognosis, treatment, and aetiology) from a proposed clinical vignette, find articles or guidelines to answer four relevant provided questions, analyse an original article answering one of these questions, synthesize knowledge from provided synopses, and decide about the four clinical questions. Concordance between two external raters was excellent for their assessment of participants' appraisal of the significance of article results (K = 0.83), and good for assessment of the formulation of a diagnostic question (K = 0.76), PubMed/Medline (K = 0.71) or guideline (K = 0.67) search, and of appraisal of methodological validity of articles (K = 0.68).
CONCLUSIONS: Our tool allows an in-depth analysis of EBP skills, thus could supplement existing instruments focused on knowledge or specific EBP step. The actual usefulness of such tools to improve care and population health remains to be evaluated.

Entities:  

Keywords:  Critical appraisal; Evidence-based practice; General practice; Kappa reliability; Medical education; Skills

Mesh:

Year:  2018        PMID: 30413196      PMCID: PMC6234795          DOI: 10.1186/s12909-018-1368-y

Source DB:  PubMed          Journal:  BMC Med Educ        ISSN: 1472-6920            Impact factor:   2.463


Background

Evidence-based Practice (EBP) is the integration of best research evidence, clinical expertise, and patient values, in a specific care context [1]. This way of practicing medicine developed in the 1980’s and has subsequently been integrated worldwide within new teaching approaches, centred on problem-based learning. EBP teaching was introduced in many initial and continuing medical education curricula to improve health care by better integrating relevant information from the scientific literature [2-14]. EBP has been described as having five steps [15, 16]: 1) Formulate a clear clinical question about a patient’s problem; 2) Search the literature, with an appropriate strategy, for relevant articles [17]; 3) Critically appraise the evidence for its validity, clinical relevance and applicability; 4) Implement the useful findings back into clinical practice [18]; and 5) Evaluate the impact. This approach is particularly useful in general practice (GP) to manage primary care situations, where it has been described as the sound simultaneous use of a critical research-based approach and a person-centred approach [19, 20]. Whilst many potential advantages have been suggested [16, 21], some criticisms have also been made [22]. A serious drawback is that it has not been clearly shown that EBP can improve physician skills or patient health [23-25]. Very few randomized clinical trials have documented the effect of EBP, with these trials frequently including non-comparable groups. Further, these trials were often based on subjective judgements, due to the lack of reliable and valid tools to assess EBP skills [13, 14, 25–28]. Indeed, some tools have been proposed, but are not easily accessible or validated [14, 28–32]. Most existing tools focus on assessing knowledge, rather than skills, particularly for the literature search [21, 33]; they do not assess skills for each step of EBP [34], but rather focus on article critical assessment [30, 31, 33, 35, 36], sometimes without any relation to a clinical situation [35]. Our aim was to develop a tool to assess the skills necessary for the first four steps of the EBP process, and to evaluate whether independent raters could reliably use the tool to assess acquired skills.

Methods

To assess EBP skills, we developed a comprehensive tool, including a test of skills and a scoring grid, based on literature and expert advice. We tested the applicability of the test and evaluated whether independent observers could reliably use the scoring tool to analyse answers to the test to assess acquired skills (Fig. 1). Our validity approach was based on a classical model of clinical evaluation of tool validity [37], which provides a strategy to develop and evaluate the performance of tests. This conceptualisation is similar to the “validity as a test characteristic” described in the health professions education literature [38]. This approach is shared in a large part of the French GP teachers who are also clinicians.
Fig. 1

Main steps of EBP skills assessment tool development and testing

Main steps of EBP skills assessment tool development and testing

Tool development

Literature sources

Our tool was developed based on syntheses of the medical literature on EBP, published in the Journal of the American Medical Association [2, 13, 17, 18, 30, 39–42], and in the British Medical Journal [3, 23, 33, 43]. We also considered previous published tools’ strengths and limitations [29, 33, 34, 36].

Expert input on content and purpose of tool

Three of the authors supervised tool development: a senior general practitioner (BG), a senior epidemiologist (LRS), both with recognised experience in EBP teaching in both initial and continuing medical education, and an experienced senior librarian (EM) with experience in teaching literature search for health professionals. Whereas previous tools mostly assessed knowledge [44], our aim was to assess skills, defined as the participant using knowledge by actually carrying out EBP steps about a clinical scenario [14, 28]. To assess participants’ skills, we asked them to perform tasks associated with the different EBP steps [14], with open but precise instructions, rather than only asking them how they would undertake those tasks. Then, we observed their ability to actually complete these tasks. We assessed all first four steps of EBP independently, thus allowing participants to undertake all tasks, even if they were wrong in one of the earlier steps. This also allowed participants to receive feedback regarding their results as part of a formative assessment for each step. Our test was also built as a continuum from problems described in a clinical situation to decisions made to deal with these problems. Physician daily constraints (computer and Internet access, time… [45-47]) were also considered when designing the test. Our tool was divided into four parts to assess necessary skills for each of the first four steps of EBP (Table 1): A clinical vignette (Table 2), on a common and complex situation likely to be seen in primary care, was used to assess the ability to formulate a clear clinical question about a patient’s problem. We asked participants to formulate four clinical questions on diagnostic, prognosis, aetiology, and treatment. The scoring grid for that part was inspired by the first question of the Fresno test [33] and assessed whether the formulated question respected the PICO (Population, Intervention, Comparison, Outcomes) criteria [48]. To assess the ability to search the literature for relevant documents related to the previous clinical questions, we asked participants to find the full text of an original article or guideline for each question. Scoring of this ability was based on recording the participants’ computers screenshots, using the Wink Screen Recording Software 2.0 (available at http://www.debugmode.com/wink/), which registered one screenshot every three seconds during the test. The scoring grid was adapted from a published tool [34] to assess literature search strategies. To assess critical appraisal skills, we selected four English-language full-text original articles, covering each one of the four search questions (diagnostic, prognosis, aetiology, and treatment). Each participant was to appraise the validity of methods, relevance for care, and significance of results of only one of these articles. The scoring grid was based on previous works [1] and specific criteria to appraise the quality of articles on diagnostic [39], prognosis [40], treatment or prevention [41], and harm [42]. To assess the ability to synthesize and decide about a specific clinical situation, we developed four synopses reporting the critical appraisal of the four articles responding to each of the initial clinical questions. The scoring grid assessed clarity of the decision, and elements used to justify the decision, including consideration of the clinical context and a question on the degree to which the participant trusted study results (Additional file 1).
Table 1

Main characteristics of the EBP skills assessment tool used for each participant during the test

Test partEBP stepTaskSupport usedSkills: performance assessmenta
FirstFormulate a clinical questionBuild 4 search questions to answer a clinical problem1 case vignetteHow complete and relevant are the GPs’ PICO questions?
SecondSearch relevant clinical articlesFind 4 relevant articles in medical literature (with different strategies)4 bibliographic retrieval questionsHow thoroughly and efficiently do GP conduct searches?
ThirdCritically appraise literatureAppraise validity, relevance and results significance of an article1 original articleCan GP complete critical appraisals?
FourthImplement useful findings in clinical practiceAnswer 4 clinical questions4 synopses (of 4 original articles)Can GP come to a reasonable interpretation of how to apply the evidence?

GP General practitioners ; aaccording to Tilson et al. [14]

Table 2

Summary of the case vignette

A 75-years-old man visits his general practitioner. In his medical history: an ischemic stroke 2 years before, atrial fibrillation, smoking, hypertension, and hypercholesterolemia. He was worried by a risk of epilepsy because of his stroke; asked if his use of coffee was excessive; asked to refill his prescription (with no anticoagulant but aspirin); and complained about a calf pain (without any deep vein thrombosis sign).
Main characteristics of the EBP skills assessment tool used for each participant during the test GP General practitioners ; aaccording to Tilson et al. [14] Summary of the case vignette

Content and face validity

To improve our tool adequacy for its purpose, as part of the “content and face validity” step [37], we asked a panel of experts from the CNGE (French National College of Teachers in General Practice) for a critical review. We asked them to judge the relevance of included items, whether any item was missing, and the format of the tool. Their comments were considered in a pre-test version of the assessment tool and the scoring grid.

Pilot test

We tested the assessment tool with a senior GP teacher of the Department of General Practice of Bordeaux and a volunteer second year GP resident, to evaluate its technical applicability and their understanding of instructions. The scoring grids were adapted and filled in once, jointly by two GP raters (TT, DZ), to formalize and homogenize the scoring procedure.

Evaluation of feasibility

We documented [28, 37]: acceptability of the tool as reflected by participation, number of undocumented items, and satisfaction of participants, time required to complete the test, time required to rate the test; for undocumented items, we tried to judge whether this was related to comprehension or technical problems, for instance failure of the Internet connection.

Selection of participants

Participants to a full test were GP residents in internship with general practitioners near Bordeaux, and GP teachers from the Department of General Medicine of Bordeaux. All had a general practice activity and were contacted by phone. Verbal informed consent was obtained from all participants.

Application workshop

The test was conducted in computer rooms of the University of Bordeaux, during a three-hour session. Each participant was provided with a computer and Internet access. Once the participants had carried out one part of the process, they sent their output by E-mail to the organizer (TT) and then received instructions for the next part. The first part was expected to last 20 min, i.e. 5 min to formulate each of the four clinical questions. The second part was one-hour long, i.e. 15 min to search one document. Each participant had to find four documents: two original articles using PubMed/MEDLINE, one document using research tools to specifically identify guidelines, and one document using a free search on the Web. The order in which participants were to find the different types of documents was randomly allocated, so that three faculty and three residents were searching in the same order. The third part was 45-min long. Each participant had to analyse one of four articles. Here again the article was randomly allocated so that each type of article was analysed by three faculty and three residents. The last part was 40-min long, i.e. 10 min to analyse each of the four synopses and write the decision.

Duration, missing data and satisfaction analysis

The duration of tests and scoring was measured and missing or ambiguous data analysed. An anonymous satisfaction questionnaire (Additional file 2) was filled in by participants at the end of the test. After the test, participants received a synopsis of what was expected from them.

Evaluation of reliability of scoring acquired skills

Rating of acquired skills

Two of the authors (TT, DZ) independently corrected all anonymized tests, filling the scoring grids. They judged, on a four-level Likert scale the conformity of output to what was expected to reflect a given skill (for example, completely conform to expected PICO; rather conform; rather not conform; completely not conform). They separately scored: each of the four clinical questions; each of the three search strategies; appraisal of the methodological validity, relevance for care, and significance of results; each of the four decisions (Table 3, Additional file 1).
Table 3

Results of Likert scales for each assessed task of the EBP steps

StepNCCompletely conformRather conformRather not conformCompletely not conform
nn%n%n%n%
Formulating a focused question
 Diagnostic100.0313.0939.11043.5
 Prognosis128.7730.4521.7834.8
 Etiologic114.31565.2313.0313.0
 Therapeutic100.028.71565.2521.7
Best information search
 PubMed/MEDLINE800.000.0417.41147.8
 Guidelines1700.0313.0313.000.0
 Free search (Web)4417.4417.4730.4313.0
Critical appraisal
 Methodological validity314.328.7939.1834.8
 Relevance for patient care300.014.31252.2730.4
 Significance of results300.000.0417.41669.6
Synthesis and decision
 Diagnostic article114.3834.81251.214.3
 Prognostic article11356.5417.4521.700.0
 Etiologic article11251.2521.7521.700.0
 Therapeutic article2521.71043.5313.0313.0

n = number of participants, NC= not completed (missing data)

Results of Likert scales for each assessed task of the EBP steps n = number of participants, NC= not completed (missing data)

Agreement analysis

Analyses were done from data where neither the participant nor the rater was identified, with the SAS statistical software package, version 9.0 (SAS Institute Inc.). A linear weighted Kappa coefficient and its 95% confidence interval (CI) was calculated for each Likert scale to measure concordance between the two assessments [37]. Kappa was considered excellent if higher than 0.8, good if between 0.6 and 0.8, medium if between 0.4 and 0.6, and low if under 0.4 [49]. The main analysis considered missing data as completely not conform. A second analysis excluded missing data. An analysis of the sources of discrepancies between the two raters was done collegially, with the two raters and a senior epidemiologist (LRS).

Results

Feasibility

Of the 28 general practice residents who were contacted, 12 agreed to participate. Of the 85 GP teachers of the Department of General Practice of Bordeaux, 46 could be contacted by phone, and 14 agreed to participate; three withdrew after initially agreeing, including one who cancelled three days before the workshop and could not be replaced. Eventually, 12 GP second-year residents, two men and 10 women, and 11 GP teachers, 10 men and one woman, participated. The GP teachers were one associate professor, three assistant professors and seven part-time instructors; they were aged 53 years on average.

Test and scoring duration

The workshop followed all steps as planned. The average response time was 171 min for teachers and 185 min for residents. There was a difference in the last part of the workshop (33 min for teachers and 44 min for residents), and the set time was exceeded for the third part of the test (53 min for teachers and 56 min for residents). The scoring lasted on average 44 min by test for the first rater (total: 17 h), and 30 min by test for the second rater (total: 11 h 50 min).

Missing data

Data on the test was missing in 14.6% of the Likert scales, 16.9% for teachers and 12.5% for residents (Table 3). Most missing data was for the second part of the test: four of the 23 participants’ computer screenshot files were lost (3 for teachers), possibly due to handling errors by participants. Such errors were also seen once in the first part, three times in the third part, and once in the last part. Instructions were not followed for bibliographic retrieval for 17 of the 69 Likert scales scored: 11 for residents; four were for PubMed/MEDLINE and 13 for guideline searches.

Satisfaction

Satisfaction questionnaires were filled by 22 participants. All participants were satisfied: they found the experience interesting (100%), relevant (82%), useful for clinical practice (100%), but difficult (97%). They expressed that the workshop underscored the need for training (91%) and the tool assessed well participant familiarity with EBP (91%) and could be used to assess progress with training (86%). Only 46% reported using EBP in their usual practice with the main reasons for not using it being: lack of time (94%), poor understanding of English (59%) and lack of skills to use necessary tools (71%).

Reliability of acquired skills scoring

Concordance between the two raters was excellent for their assessment of participants’ appraisal of the significance of article results (Table 4). It was good for the formulation of a diagnostic question, PubMed/Medline or guideline search, and for methodological validity appraisal. It was lower for all other aspects.
Table 4

Concordance between the two raters’ Likert scale for each question of the EBP steps

StepAgreementWeighted Kappa (K)Weighted Kappa excluding missing data
n%K95% CIK95% CI
Formulating a focused question
 Diagnostic1982.60.760.53–0.990.750.51–0.99
 Prognosis1356.50.580.36–0.810.560.33–0.79
 Etiologic1356.50.400.07–0.720.34−0.00-0.68
 Therapeutic1356.50.32−0.02-0.650.27−0.08-0.61
Best information search
 PubMed/MEDLINE2186.70.750.42–1.000.710.34–1.00
 Guidelines2283.30.930.79–1.000.670.10–1.00
 Free search (Web)1347.40.580.31–0.850.390.10–0.67
Critical appraisal
 Methodological validity1878.30.680.40–0.950.720.47–0.97
 Relevance for patient care1773.90.590.32–0.860.530.23–0.83
 Significance of results2295.70.830.51–1.000.830.50–1.00
Synthesis and decision
 Diagnostic article1147.80.21−0.01-0.440.23−0.04-0.50
 Prognostic article939.10.450.24–0.650.390.20–0.59
 Etiologic article939.10.270.00–0.530.260.07–0.45
 Therapeutic article1252.20.440.19–0.700.370.11–0.63

n = number of participants with agreement between raters, CI = confidence interval

Concordance between the two raters’ Likert scale for each question of the EBP steps n = number of participants with agreement between raters, CI = confidence interval The main sources of discrepancy were: differences in appreciation of PICO criteria (the difference between an “incomplete” and “not conform” response depending on response precision, which was not assessed equally by the two raters); raters’ entry errors and irrelevant response not scored as “not conform”; errors and omissions in filling scoring grid; discrepancies in assessment of articles and website quality for free research; differences in appreciation of decision making and synthesis, depending on rater’s harshness and expectation for decisions to be explained. In case of disagreement between raters, we chose to keep the most favourable assessment for this last question only.

Discussion

We developed the first French-language tool to assess EBP skills of general practitioners. Concordance between raters was excellent for assessment of the participants’ appraisal of the significance of article results. It was good for the formulation of a diagnostic question, PubMed/MEDLINE and guideline searches, and for article methodological validity appraisal. It was lower for all other aspects. Our tool covers all relevant skills, as the main four steps of the EBP process are assessed. In that regard, it completes existing tools, such as the Fresno test [33] and the Berlin questionnaire [36], as both only include the first three steps, and focus mostly on critical appraisal [14]. The only published validated test assessing those four steps is the ACE tool [21]. Our tool is again complementary, as the ACE tool assesses more knowledge than skills, using simple true-false questions, whereas our tool includes observation of actual searches and critical appraisals. This more focused assessment of knowledge rather than skills is also a limitation of the Fresno test, which mostly covers literature search and critical appraisal, and of the Berlin Questionnaire. We assessed physicians’ skills with open-ended questions, asking for the completion of specific tasks; for instance, our observation was innovative with the recording of screenshots, and assessed them with objective items. These features make our tool and its application closer to and more relevant for clinical practice. It has been developed using various kinds of complex questions relating to real-life situations, which, to our knowledge, has not been done before; we believe it could be transposed to many complex clinical situations. We still have to improve parts of the tool before in can be proposed to the EBP teaching community. Concordance between raters was low, notably for the last part of the test related to synthesis and decision making. More precise scoring grids and a better application of assessment items are needed to reduce raters’ subjectivity when assessing skills. This was also sometimes seen for the first part of the test, regarding formulation of a search question. This first part, based on the Fresno test for which good inter-rater reliability has been documented [33], was composed of questions on short and simple case vignette. This part of the Fresno test had a low variability of possible responses, whereas our test was closer to practice. Another potential limitation of our test is the time needed for its completion; three hours, much longer than the ACE tool and Berlin Questionnaire (15–20 min), and Fresno test (one-hour long) [21, 33, 36]. Simplifying our tool might shorten this completion time, but is likely to reduce its relevance for practice. Moreover, time devoted to each part (5 min to build a search question, 15 min to find an original article, 45–60 min to analyse it, and 10 min to synthesize and decide) is a realistic reflection of what can be done in practice. Two possible reasons for the low level of reliability of some items of our tool are the low level of skills, and the variation in the harshness of raters. Another hypothesis is that the tool is not a valid reflection of the actual skills. Indeed, a tool well-perceived by users (the so-called “face validity”), of which the content has been agreed by experts (content validity) and which showed acceptable reliability, might still not adequately measure what it is supposed to measure [37, 50]. Therefore, we still need studies of the construct or criterion validity of our tool. However, the latter is difficult to assess, as there is no gold standard for all EBP skills. A gold standard could be developed through expert judgement based on formal consensus methods [51]. As our tool yields 14 independent scores, it is well suited to identify which of the skills a student or a physician should focus his future training on (formative assessment). However, we still need to develop a way to provide profiles for the four main skills and a judgment of an individual’s overall EBP skills, as a way to compare participants and evaluate our tool’s validity. Other perspectives to further develop our test and evaluate its performance should take into consideration limitations of our study: small number of testers, precluding the use of other analytical techniques to evaluate reliability such as log linear models. As our work was initiated by the GP Department of the University, we selected participants with a practical experience in GP. Indeed, we wanted to assess the ability to use EBP skills to improve patients care in a GP setting. Moreover, the use of the same clinical scenario throughout the whole assessment process is an indirect way to evaluate the potential impact of acquired skills in clinical practice. We also selected GP residents and teachers to get a heterogeneous sample, as recommended to evaluate reliability [52]. Nevertheless, we believe, by looking at the responses, that all residents were probably not EBP fledglings and all GP teachers, given their age, were not EBP experts, as already shown elsewhere [45]. This generation contrast, the small number of participants and raters [53], and the focus on a population linked with the University probably limit the generalizability of our results.

Conclusions

Our tool is relevant for practice as it allows an in-depth analysis of EBP skills. It could respond to a real need to better assess EBP skills of general practitioners. It can also be seen as usefully complementing existing tools, but further validation, including comparison with the latter, is needed. The actual usefulness of such tools to improve care and population health remains to be evaluated. Two parts of the tool to assess EBP skills: 1) Content of the skill assessment form; and 2) Scoring grid. This file gives more information about our tool. (DOCX 61 kb) Satisfaction questionnaire. This file presents the satisfaction questionnaire filled in by participants at the end of the test. (DOCX 16 kb)
  43 in total

1.  Practitioners of evidence based care. Not all clinicians need to appraise evidence from scratch but all need some skills.

Authors:  G H Guyatt; M O Meade; R Z Jaeschke; D J Cook; R B Haynes
Journal:  BMJ       Date:  2000-04-08

2.  Validation of the Fresno test of competence in evidence based medicine.

Authors:  Kathleen D Ramos; Sean Schafer; Susan M Tracz
Journal:  BMJ       Date:  2003-02-08

3.  Evidence-based medicine. A new approach to teaching the practice of medicine.

Authors: 
Journal:  JAMA       Date:  1992-11-04       Impact factor: 56.272

4.  Guidelines for Reporting Reliability and Agreement Studies (GRRAS) were proposed.

Authors:  Jan Kottner; Laurent Audigé; Stig Brorson; Allan Donner; Byron J Gajewski; Asbjørn Hróbjartsson; Chris Roberts; Mohamed Shoukri; David L Streiner
Journal:  J Clin Epidemiol       Date:  2010-06-17       Impact factor: 6.437

Review 5.  What is the evidence that postgraduate teaching in evidence based medicine changes anything? A systematic review.

Authors:  Arri Coomarasamy; Khalid S Khan
Journal:  BMJ       Date:  2004-10-30

6.  Face validity of assessments: faith-based interpretations or evidence-based science?

Authors:  Steven M Downing
Journal:  Med Educ       Date:  2006-01       Impact factor: 6.251

7.  Evidence-based medicine teaching in UK medical schools.

Authors:  Emma Meats; Carl Heneghan; Mike Crilly; Paul Glasziou
Journal:  Med Teach       Date:  2009-04       Impact factor: 3.650

8.  Users' guides to the medical literature. III. How to use an article about a diagnostic test. A. Are the results of the study valid? Evidence-Based Medicine Working Group.

Authors:  R Jaeschke; G Guyatt; D L Sackett
Journal:  JAMA       Date:  1994-02-02       Impact factor: 56.272

9.  Teaching trainers to incorporate evidence-based medicine (EBM) teaching in clinical practice: the EU-EBM project.

Authors:  Shakila Thangaratinam; Gemma Barnfield; Susanne Weinbrenner; Berit Meyerrose; Theodoros N Arvanitis; Andrea R Horvath; Gianni Zanrei; Regina Kunz; Katja Suter; Jacek Walczak; Anna Kaleta; Katrien Oude Rengerink; Harry Gee; Ben W J Mol; Khalid S Khan
Journal:  BMC Med Educ       Date:  2009-09-10       Impact factor: 2.463

Review 10.  Evidence based practice in postgraduate healthcare education: a systematic review.

Authors:  Gemma Flores-Mateo; Josep M Argimon
Journal:  BMC Health Serv Res       Date:  2007-07-26       Impact factor: 2.655

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.