BACKGROUND AND AIMS: Gaining expertise in procedural skills is essential for achieving clinical competence during anesthesia training. Supervisors have the important responsibility of deciding when the trainee can be allowed to perform various procedures without direct supervision while ensuring patient safety. This requires robust and reliable assessment techniques. Airway management with bag-mask ventilation and tracheal intubation are routinely performed by anesthesia trainees at induction of anesthesia and to save lives during a cardiorespiratory arrest. The purpose of this study was to evaluate the construct validity, and inter-rater and test-retest reliability of a tool designed to assess competence in bag-mask ventilation followed by tracheal intubation in anesthesia trainees. MATERIAL AND METHODS: Informed consent was obtained from all participants. Tracheal intubation and bag-mask ventilation skills in 10 junior and 10 senior anesthesia trainees were assessed by two investigators on two occasions at a 3-4 weeks interval, using a procedure-specific assessment tool. RESULTS: Average kappa value for inter-rater reliability was 0.91 and 0.99 for the first and second assessments, respectively, with an average agreement of 95%. The average agreement for test-retest reliability was 82% with a kappa value of 0.39. Senior trainees obtained higher scores compared to junior trainees in all areas of assessment, with a significant difference for patient positioning, preoxygenation, and laryngoscopy technique, depicting good construct validity. CONCLUSION: The tool designed to assess bag-mask ventilation and tracheal intubation skills in anesthesia trainees demonstrated excellent inter-rater reliability, fair test-retest reliability, and good construct validity. The authors recommend its use for formative and summative assessment of junior anesthesia trainees.
BACKGROUND AND AIMS: Gaining expertise in procedural skills is essential for achieving clinical competence during anesthesia training. Supervisors have the important responsibility of deciding when the trainee can be allowed to perform various procedures without direct supervision while ensuring patient safety. This requires robust and reliable assessment techniques. Airway management with bag-mask ventilation and tracheal intubation are routinely performed by anesthesia trainees at induction of anesthesia and to save lives during a cardiorespiratory arrest. The purpose of this study was to evaluate the construct validity, and inter-rater and test-retest reliability of a tool designed to assess competence in bag-mask ventilation followed by tracheal intubation in anesthesia trainees. MATERIAL AND METHODS: Informed consent was obtained from all participants. Tracheal intubation and bag-mask ventilation skills in 10 junior and 10 senior anesthesia trainees were assessed by two investigators on two occasions at a 3-4 weeks interval, using a procedure-specific assessment tool. RESULTS: Average kappa value for inter-rater reliability was 0.91 and 0.99 for the first and second assessments, respectively, with an average agreement of 95%. The average agreement for test-retest reliability was 82% with a kappa value of 0.39. Senior trainees obtained higher scores compared to junior trainees in all areas of assessment, with a significant difference for patient positioning, preoxygenation, and laryngoscopy technique, depicting good construct validity. CONCLUSION: The tool designed to assess bag-mask ventilation and tracheal intubation skills in anesthesia trainees demonstrated excellent inter-rater reliability, fair test-retest reliability, and good construct validity. The authors recommend its use for formative and summative assessment of junior anesthesia trainees.
Learning and mastering procedural skills are major challenges in anesthesia practice and are essential in the process of achieving clinical competence.[12] Anesthesiologists carry out many complex clinical tasks in their routine work which the trainee is expected to learn and master during training. An increased public awareness of healthcare related issues has led to greater accountability of healthcare professionals. This has very rightly led to an increasing focus on patient safety in clinical practice. The supervisors have to undertake the important responsibility of deciding when a trainee can be allowed to perform the various procedures without direct supervision while ensuring patient safety. Supervisors and trainers must accept that not all trainees can be equally quick in learning and equally competent in performing practical procedures[34] and reliable, and objective assessment is, therefore, mandatory.Airway management is an inherent part of the routine day-to-day work of anesthesiologists. They are required to perform this procedure not only in the operation theater, but, also in the Intensive Care Unit, the wards and the Emergency Department. Failure to perform the technique promptly and correctly can lead to serious consequences including death. It is important to ensure that an anesthesia trainee is capable of performing tracheal intubation independently before he or she could be included in a cardiac arrest team, where direct supervision by a senior colleague is not always possible. This requires robust and reliable assessment techniques such as direct observation by senior anesthesiologists using procedure-specific tools while the trainee is performing the procedure on actual patients.[25]When constructing an assessment tool, it is important to explore the literature to see whether there is an already existing instrument that is appropriate and has established reliability and validity.[67] We were successful in retrieving tools for assessment of procedures performed by anesthesiologists, including rapid sequence induction of anesthesia and management of difficult airways.[1348910] Generic tools for assessment of various anesthetic procedures are also available. However, we could not identify a structured tool for assessment of routine airway management with established reliability and validity. We, therefore, constructed a procedure-specific tool for this purpose. The objectives of this study were to evaluate the inter-rater and test-retest reliability and construct validity of a tool designed to assess competence in bag-mask ventilation and tracheal intubation. Reliability of a tool is its ability to assess skills consistently by different assessors at different times while construct validity is the ability of the tool to differentiate among varying levels of expertise.[67111213]
Material and Methods
Approval was granted by the University Ethics Review Committee (1398-Ane-ERC-09) and written informed consent was obtained from all participants. A total of 20 anesthesia trainees, 10 junior and 10 senior were recruited. Junior trainees were described as those having had more than two and <4 months of anesthesia training, while senior residents recruited were those in the fourth year of training and already performing airway management independently. The study protocol was presented in the departmental faculty meeting so as to share it with all faculty members. The purpose of the study was explained to the participating residents at the time of informed consent. The tool was not shared with the residents before the assessments.The participants' bag-mask ventilation and tracheal intubation skills were assessed by the use of a structured procedure-specific assessment tool. All three authors participated in the construction of the tool and advice was taken from two other senior anesthesia consultants.The tool comprised of five major categories with further sub-categories in each, in order to evaluate the performance of the trainee in all the essential steps involved in the procedure [Table 1]. A simple 3-point scale was used to assess each step, where:
Table 1
Steps of bag-mask ventilation and tracheal intubation assessed by direct observation in anesthesia trainees
Steps of bag-mask ventilation and tracheal intubation assessed by direct observation in anesthesia trainees1 (one) meant “step not performed”2 (two) meant “performance below expectations”3 (three) meant “performance meets expectations”A column was added for steps “not applicable” during the performance.“Performance below expectation” was defined in the tool as unsuccessful attempt or incorrectly performed step, while “meets expectation” was defined as step performed adequately and successfully. The procedural steps used for assessment of bag-mask ventilation and tracheal intubation skills are provided in Table 1. Before finalizing the tool for the study, we conducted a pilot study to identify any missing steps and to assess the practicality of using the tool in the operation theater. The pilot study provided a chance for a final check on the content validity and served as a means of training the investigators in rating trainees' performance by direct observation. The authors also attended a half-day workshop on direct observation of procedural skills.The residents were assessed while working in their assigned operation theater under the supervision of the assigned consultant anesthesiologist. Furthermore, they were assessed while anesthetizing patients undergoing elective procedures requiring endotracheal intubation. Routine preoperative assessment was done for each patient. Trainee's assessment was not done if the patient being anesthetized was pregnant or had oral, faciomaxillary or neck pathology or anatomic anomaly, obesity (body mass index > 30), rheumatoid arthritis, ankylosing spondylitis, a history of difficult airway in the past or was found to have limited mouth opening, buck teeth, short thick neck with limited mobility, and Mallampati Grade III or IV.The assessment was done simultaneously by two of the investigators who are senior consultant anesthesiologists and registered supervisors for anesthesia training. The structured assessment tool was filled by both assessors independently. The trainee was observed while managing the airway with bag-mask ventilation and intubating the trachea with a tracheal tube. The assessment time began once the patient was transferred to the operating table for induction of anesthesia and monitors were attached and ended when the endotracheal tube position was confirmed, and the tube was fixed. Any decision to take over the procedure, in case the trainee was unable to intubate the patient's trachea, was left to the discretion of the supervising consultant. It was planned to allow two attempts at laryngoscopy and intubation, and if the trainee was unsuccessful after two attempts, it was to be considered a failed attempt. Each resident was observed performing the same procedure again after 3-4 weeks by the same assessors to evaluate the test-retest reliability of the tool.Sample size was calculated using PASS version 11 (NCSS LLC, Kaysville, Utah). In a test for agreement between raters using the kappa statistic, a sample size of 20 subjects achieves 80% power to detect a true kappa value of 0.90 in a test of H0: Kappa = 0.50 versus H1: Kappa ≠ 0.50 using a two-tailed level of significance of 0.05.
Data analysis
Statistical analysis was performed using Statistical Packages for Social Sciences version 19 (SPSS Inc., Chicago, IL, USA). Inter-rater and test-retest reliability were computed by percent agreement and kappa statistic. Kappa statistic was used to evaluate the level of agreement between assessors' ratings and between the same assessor's ratings at two points in time for each item of the structured assessment form. Kappa is positive when the agreement exceeds what is expected by chance; kappa is negative when the observed agreement is less than the chance agreement. For the interpretation of kappa values the rating indicators are: 0.0-0.2 slight agreement, 0.21-0.40 fair agreement, 0.41-0.60 moderate agreement, 0.61-0.80 substantial agreement, and 0.81-1.0 almost perfect or perfect agreement. Percent agreement and kappa statistic was computed for each assessment criterion. Average agreement and the average kappa value was also calculated. For construct validity, the score of sub-categories of the main criteria were added for each rater in order to perform the analysis by using independent sample t-test and Mann-Whitney U-test (as per rule of normality of the data) to compare the scores between junior and senior residents. The value of ≤0.05 was taken as statistically significant.
Results
Twenty anesthesia trainees participated in the study. There were an equal number of junior and senior residents. Average time taken for the assessment was 9 min. There was no failed attempt at tracheal intubation. The inter-rater agreement between scores at the two assessments is presented in Table 2. Percent agreement and kappa values were found to be high for patient positioning, bag-mask ventilation, chin lift/jaw thrust, and leak around the facemask among the two assessors, and the options of absence of CO2 trace, and difficulty in bag-mask ventilation exhibited 100% agreement. Assessment of professionalism also did not show any significant difference among the raters. The average kappa value for inter-rater reliability for the first assessment session was 0.91 and for the second assessment 0.99, with an average agreement of 95% [Table 2].
Table 2
Inter-rater reliability of the tool for assessment of bag-mask ventilation and tracheal intubation (percentage agreement and kappa values)
Inter-rater reliability of the tool for assessment of bag-mask ventilation and tracheal intubation (percentage agreement and kappa values)Kappa values and percent agreement for test-retest reliability are presented in Table 3. The average agreement for test-retest reliability was 82% with a kappa value of 0.39. Determination of construct validity [Table 4] showed that senior trainees obtained higher scores compared to the junior trainees in all areas of assessment. This difference was statistically significant for the sums of scores for patient positioning, preoxygenation, and laryngoscopy technique.
Table 3
Test-retest reliability of the tool for assessment of bag-mask ventilation and tracheal intubation (percentage agreement and kappa values)
Table 4
Construct validity of the assessment tool for bag-mask ventilation and tracheal intubation
Test-retest reliability of the tool for assessment of bag-mask ventilation and tracheal intubation (percentage agreement and kappa values)Construct validity of the assessment tool for bag-mask ventilation and tracheal intubation
Discussion
Assessment of competence in cognitive knowledge, judgment, communication, including history taking, physical examination, etc., is routinely done by written, oral, and Objective Structured Clinical Examinations.[6] However, procedural skills have historically been assessed with subjective evaluations done by senior colleagues and supervisors without well-defined criteria or through procedure logs maintained by trainees.[13] Work has been done on defining a minimum number of procedures required to attain competency in anesthetic procedures.[348] However the relationship between experience, as judged by number of procedures performed, and competence is difficult to define and differs markedly in trainees.[4]End-of-rotation global rating forms are often filled out by supervising faculty members who have not directly observed trainees performing the procedure on patients.[6714] This form of assessment cannot reliably assess procedural skills in their entirety and cannot be justified for use in decisions about allowing trainees to perform procedures without direct supervision. Direct observation of the trainee, while performing a procedure on an actual patient, is recommended for a more reliable assessment of competence in procedural skills to enhance the quality of clinical training and ensure patient safety.[51516] The construction of procedure-specific assessment tools is therefore required for all complex procedural skills.[2516] It is essential to ensure that the trainee masters the principal components of airway management before he/she is allowed to perform this procedure without direct supervision.[1] The tool employed in this study was designed specifically for novices in anesthesia and hence the technique was broken down into each of its basic steps forming a checklist with a simple rating scale of 1-3 so that the procedure could be assessed in its entirety as recommended for assessment of procedural skills.[17] The inter-rater reliability for the tool was high. During their training, the anesthesia trainees work at multiple sites with multiple consultants who are responsible for their assessment and provision of feedback. Good inter-rater reliability is, therefore, a basic requirement for this assessment tool. This would allow the tool to be used by different assessors in different locations depending upon the initial rotations of the trainee. Many other researchers studying the inter-rater reliability of procedure-specific assessment tools for medical trainees have obtained good to excellent results for inter-rater reliability.[181920]The test-retest reliability for the assessment tool does not show as high agreement or kappa values as for inter-rater reliability. The most probable reason for this seems to be the learning effect involved due to the 3-4 weeks interval between the two assessment sessions. The anesthesia trainees get frequent opportunities to perform bag-mask ventilation and tracheal intubation on a daily basis and thus get the adequate practice to learn and master the skills in the early months of their training. Therefore, their performance might have improved in the 3-4 weeks between the two assessments in this study.We found that the senior trainees obtained higher scores for all steps of bag-mask ventilation and intubation, the difference being significant in many of the steps [Table 4]. This indicates that this procedure-specific structured assessment tool has the ability to discriminate between junior and senior trainees, thus depicting good construct validity. Naik et al.[19] obtained similar results when testing validity and reliability of an assessment tool for brachial plexus regional anesthesia performance and have recommended their tool for routine use during anesthesia training. The main use of the tool employed in the current study will be for assessment of junior anesthesia trainees in their first 6 months of training. Bag-mask ventilation and tracheal intubation are among the first few procedural skills that anesthesia trainees learn at the beginning of training and then use it for the rest of their professional career. The authors hope to use the instrument for formative assessment in novices and for judgment of competence to perform the procedure without direct supervision. The average assessment score obtained by the group of senior trainees could be used to ascertain the score that the junior trainees must reach before they are trained and assessed for more advanced airway management skills required during difficult intubations and rapid sequence induction.Both percent agreement and kappa statistics were used to analyze the reliability of the tool to increase the strength of the analysis. The percent agreement does not take account of the possibility that raters may guess on some scores due to uncertainty. It thus may overestimate the true agreement among raters. It is therefore advised to calculate both percent agreement and kappa for analysis of inter-rater reliability.[21] A limitation of our study is that the assessments were done in real time, and, therefore, the assessors were not blinded to the trainees being assessed. This could have been a source of bias in the assessment scores. Similar studies on assessment tools have been performed by assessing videotaped performance of procedural skills after masking the identity of the trainees or by employing assessors not known to the trainees and vice versa.[11121819] We were not able to arrange this methodology because of lack of funds. Efforts were made to reduce this bias by the inclusion of residents who were not rotating with either of the two assessors at the time of assessment. Another limitation of this study is that a relatively long interval was allowed between the two assessment sessions. This could have affected the value of test-retest reliability due to learning effect, which is the main shortcoming of test-retest reliability studies.[22] We recommend that the second assessment should be done after shorter intervals to ascertain the test-retest reliability of tools used for assessment of frequently performed procedure such as endotracheal intubation. The absence of criteria for passing or failing the assessment may be considered as a limitation of the tool. This has been overcome by adding a sentence: “demonstrates ability to perform all aspects of the procedure independently” with a yes/no option at the end of the procedural steps. This section must be carefully filled by the assessors as it identifies whether or not the candidate was able to perform the entire procedure successfully and thus indicates that he/she has “passed or not passed” in performing the skill.Simulation-based skill assessment is now being described for assessment of residents' ability to perform anesthetic skills.[23] However, financial constraints are a limiting factor in developing countries, where reliable and valid assessment tools like ours would be feasible and practical for routine assessment of trainees. As stated by Cuschieri et al., assessment of trainees is a form of quality assurance for the future.[24] Development of objective procedure-specific assessment tools for evaluation of procedural skills and their integration into training programs are the needs of the day. We believe that objective assessment with direct observation using well-defined criteria and rating scales has the potential to greatly improve assessment of procedural skills. Future research should focus on assessing improvement in procedural skills and quality of patient care with implementation of procedure-specific tools for assessment of skills in anesthesia training programs.
Conclusion
Our results show that the tool designed by us to assess bag-mask ventilation and tracheal intubation skills in anesthesia trainees demonstrates good construct validity, excellent inter-rater reliability, and fair test-retest reliability. We recommend its use for formative and summative assessment of junior anesthesia trainees.
Authors: David J Murray; John R Boulet; Michael Avidan; Joseph F Kras; Bernadette Henrichs; Julie Woodhouse; Alex S Evers Journal: Anesthesiology Date: 2007-11 Impact factor: 7.892