Literature DB >> 34506599

Exploring pooled analysis of pretested items to monitor the performance of medical students exposed to different curriculum designs.

Pedro Tadao Hamamoto Filho¹, Pedro Luiz Toledo de Arruda Lourenção², Joélcio Francisco Abbade³, Dario Cecílio-Fernandes⁴, Jacqueline Teixeira Caramori⁵, Angélica Maria Bicudo⁶.

Abstract

Several methods have been proposed for analyzing differences between test scores, such as using mean scores, cumulative deviation, and mixed-effect models. Here, we explore the pooled analysis of retested Progress Test items to monitor the performance of first-year medical students who were exposed to a new curriculum design. This was a cross-sectional study of students in their first year of a medical program who participated in the annual interinstitutional Progress Tests from 2013 to 2019. We analyzed the performance of first-year students in the 2019 test and compared it with that of first-year students taking the test from 2013 to 2018 and encountering the same items. For each item, we calculated odds ratios with 95% confidence intervals; we also performed meta-analyses with fixed effects for each content area in the pooled analysis and presented the odds ratio (OR) with a 95% confidence interval (CI). In all, we used 63 items, which were divided into basic sciences, internal medicine, pediatrics, surgery, obstetrics and gynecology, and public health. Significant differences were found between groups in basic sciences (OR = 1.172 [CI95% 1.005 CI 1.366], p = 0.043) and public health (OR = 1.54 [CI95% CI 1.25-1.897], p < 0.001), which may reflect the characteristics of the new curriculum. Thus, pooled analysis of pretested items may provide indicators of different performance. This method may complement analysis of score differences on benchmark assessments.

Entities: Chemical

Mesh：

Year: 2021 PMID： 34506599 PMCID： PMC8432842 DOI： 10.1371/journal.pone.0257293

Source DB: PubMed Journal: PLoS One ISSN： 1932-6203 Impact factor: 3.240

Introduction

Over the last 30 years, several medical schools have implemented new undergraduate educational programs, which focus on early contact with patients, the inclusion of humanities sciences, and community-based approaches [1, 2]. Moreover, recognizing the value of problematization and multidisciplinary instruction, teaching methods have been reappraised [3, 4]. When a curriculum is changed, students, faculties, and curriculum managers need meaningful ways to ascertain that the new curriculum has improved upon the previous one [5]. Ultimately, improvements in students’ subsequent professional performance and patient outcomes could serve as the best evidence for a curriculum’s effectiveness. However, this information would not be easy to obtain due to the difficulties to establish a direct linkage between curriculum design, education quality, and health indicators [6, 7]. It is more feasible to measure students’ knowledge, not only to gauge student performance and but also to identify gaps and strengths in a new curriculum [8-11]. In this sense, curriculum-based measurements (CBM) are helpful to assess students’ progression and the effectiveness of the curriculum design [12]. CBM can be aided by benchmark assessments, which are periodic assessments of students’ progress towards achieving their learning objectives. Benchmark assessment provides timely information, allowing adaptation of educational strategies for effective learning [13, 14], either at individual, school, and regional levels [15]. For cross-institutional comparison of student achievement, the Progress Test has been shown to be a possible tool, in medical education, as benchmark assessment if based on longitudinal data [16]. In curriculum comparisons, the Progress Test has been used in two ways: common exams given to different cohorts [11] or at different schools [17]; or different exams given to different cohorts (in this case, equations are necessary to avoid bias and scale scores) [18]. Several methods have been proposed for analyzing differences between scores, such as using mean scores [17], cumulative deviation [19], and mixed-effect models [20]. However, since different progress tests may have different levels of difficulty, mean scores may not allow for reliable comparisons between scores on different exams because exams may have different levels of difficulty. Cumulative deviation, meanwhile, requires a longitudinal appraisal of exams’ standard deviation, making it difficult to estimate conclusions based on single-point tests. Moreover, when comparing different cohorts, these methods rely on different items, and this approach may threaten validity and reliable comparison. Finally, these methods only allow for comparisons at the group level. Using a method that allows comparisons between the same items may overcome these challenges, and a method that allows for comparing individual items may provide richer information on specific knowledge gaps. Here, we explore a statistical method which can be applied to assess the effectiveness of curricular change.

Methods

Study setting and participants

This analysis used data from the Interinstitutional Progress Test, in which the students of Botucatu Medical School, Universidade Estadual Paulista (BMS-UNESP), participate since 2005 [21]. This study was approved by the local institutional review board. Written consent from the students was not necessary because this study dealt with an anonymized database with aggregated information. This cross-sectional study was conducted at BMS-UNESP, in Botucatu, São Paulo State, Brazil. We included students in the first year of the medical program at BMS-UNESP who had participated in the Interinstitutional Progress Test from 2013 to 2019. In Brazil, an undergraduate medical course takes 6 years [22]. Like the majority of the Brazilian schools, the original medical program curriculum at BMS-UNESP was divided into three cycles: basic sciences (1st and 2nd years), clinical sciences (3rd and 4th years), and the clerkship (5th and 6th years). Subject-based teaching was used for the basic sciences, which were organized into traditional subjects (e.g., anatomy, biochemistry, physiology, microbiology, and immunology). In 2019, the new curriculum was implemented, consisting of two cycles: the pre-clinical cycle (1st to 3rd years) and the clerkship (4th to 6th years). In the new curriculum, basic sciences were taught using a systems-based approach (e.g., traditional subjects of cellular biology, biochemistry, hematology were integrated into a course on the “cell,” while neuroanatomy, physiology, embryology, and neurology were integrated into a course on the “nervous system”). Moreover, social sciences-related disciplines (e.g., epidemiology, sociology, public health), which were previously taught independently, were organized along a structured axis for the humanities, including interdisciplinary and community-based approaches. In this study, we analyzed first-year medical students’ performance on the 2019 annual IPT and compared it with that of first-year students taking the exam from 2013 to 2018 and encountering the same items.

Progress Test

Progress Test is a longitudinal assessment that measures students’ knowledge on subsequent yet different tests. Through first to last year of medical training, all students answer the same test and receive feedback on their performance. The Progress Test is based on a blueprint with questions requiring both a lower and a higher level of cognitive processing, covering the content that every just-graduated student should have [23, 24]. As is common in Brazilian schools, the Progress Test is given once a year for formative purposes: each student takes the test each year throughout the undergraduate course and the students’ performance does not affect student advancement decisions. BMS is one of the main public schools in São Paulo state, and it formed a consortium with other medical schools to prepare and administer the state’s annual interinstitutional progress test. The consortium of schools develops the annual IPT using only new items; these conform to a fixed blueprint and cover six content areas: basic sciences, internal medicine, surgery, pediatrics, obstetrics and gynecology, and public health (20 items per area for a total of 120 items). Items are multiple choice questions with four options and a single correct answer. Preferably, the items are clinical vignette-based aiming for applied knowledge rather than knowledge recall [25]. In 2019, four existing consortia in São Paulo state developed the exam by selecting the best pre-tested items (tested between 2013 and 2018 and preferably, between 2016 and 2018, with good discrimination indices) conforming to the commonly used blueprint. The use of pre-tested items in 2019 allowed us to compare the performance of the different groups of first-year students (the first-year students in 2019 versus the first-year students from 2013 to 2018) on the items. Importantly, as the students’ performance does not affect advancement decisions, the students do not study for the test, and item-sharing between the cohorts does not occur.

Statistical analysis

Statistical analyses were performed using MedCalc for Windows, version 19.4 (MedCalc Software, Ostend, Belgium). We presented correct answers as item counts with percentages for each group. Furthermore, we calculated odds ratios (ORs) with a 95% confidence interval (CI) for each item, and we performed meta-analyses with fixed effects for each content area in the pooled analysis (presenting them as OR with a 95% CI). The statistical significance was set at an alpha of 0.05, and I2 statistics were used to assess the heterogeneity among the items’ results (I2 values of 25%, 50%, and 75% are interpreted as representing small, moderate, and high levels of heterogeneity) [26].

Results

Of the 120 items on the full exam, 63 were from our consortium and were, therefore, eligible for comparison. These items were divided into the following categories: 17 from basic sciences; 11 from surgery; 9 from internal medicine, obstetrics and gynecology, and public health; and 8 from pediatrics. Regarding the years when items were previously used, 20, 16, and 21 items were from 2016, 2017, and 2018, respectively. Six other items were from 2013 and 2014 (two in 2013 and four in 2014), all from the basic sciences. Table 1 summarizes these data as well as the number of students who took the exam each year.

Table 1

Distribution of the number of students and number of items previously tested.

	2013	2014	2016	2017	2018	Total
Number of students	88	96	90	95	90	Total
Number of items	2	4	20	16	21	63
Basic sciences	2	4	4	4	3	17
Internal medicine	0	0	4	1	4	9
Pediatrics	0	0	1	2	5	8
Surgery	0	0	2	5	4	11
Obstetrics & gynecology	0	0	3	3	3	9
Public health	0	0	6	1	2	9

Among the six content areas, significant differences were found for basic sciences and public health. In the other four content areas (internal medicine, pediatrics, surgery, and obstetrics and gynecology), there were no differences in performance between the 2019 students and their counterparts in earlier years (Fig 1).

Fig 1

Forest plots of the pooled analysis according to the exam’s different content areas.

Forest plots of the pooled analysis according to the exam’s different content areas.

The vertical axis represents the item number on the 2019 exam. Each point represents the OR with its respective CI (horizontal bars). When points appear on the left of the vertical line identified as “1”, it indicates that students exposed to the old curriculum performed better, whereas points on the right indicate better performance for students with the new curriculum. Significant differences were found in basic sciences and public health. In the basic sciences, the pooled analysis showed that the 2019 students had superior performance (OR = 1.172 [CI95% 1.005 CI 1.366], p = 0.043). Among the 17 items, the 2019 students’ performance was statistically different on four items: superior on three items and inferior on one. In public health, the 2019 students’ performance was also superior (OR = 1.54 [CI95%: CI 1.25–1.897], p < 0.001). The difference was weighted for superior performance on 3/9 items. On one item (relating to epidemiology), the OR reached 7.00 (CI95%: CI 3.62–13.55). When this item appeared on the earlier test, 36.67% of the students answered correctly; in 2019, this percentage increased to 80.21%. In internal medicine, there were differences on two items: one result favored the new curriculum and the other, the former curriculum. However, the pooled analysis showed no statistically significant difference: OR = 0.93 (CI95% CI 0.735–1.176, p = 0.544). In pediatrics, no item showed a significant difference, nor did the pooled analysis: OR = 1.128 (CI95% CI 0.886–1.435, p = 0.329). Among the 11 items from surgery, the students exposed to the new curriculum performed better on one item and worse on three. The pooled analysis showed no difference: OR = 1.015 (CI95% CI 0.725–1.421, p = 0.765). Finally, in obstetrics and gynecology, the 2019 students’ performance was better on two items and worse on one item, with the pooled analysis failing to show any significant difference: OR = 1.164 (CI95% CI 0.944–1.436, p = 0.154). The heterogeneity analysis of the items showed high percentages of variation across the items, with I2 values ranging from 44.47% (Pediatrics) to 80.26% (Public Health). Table 2 shows the I2 values for each content area.

Table 2

I2 statistics for heterogeneity evaluation in the six content areas of the exam.

Content Area	I² value (%)	CI 95%	p value
Basic sciences	61.79	35.33–77.42	0.0004
Internal medicine	55.90	7.00–79.09	0.0202
Pediatrics	47.44	0.00–76.63	0.0647
Surgery	64.92	33.25–81.56	0.0015
Obstetrics & gynecology	65.73	30.36–83.14	0.0029
Public health	80.26	63.31–89.38	< 0.0001

Discussion

Pooled analysis is commonly used in meta-analyses and systematic reviews of clinical trials to summarize the scientific evidence provided by single studies [27]. In medical education, similar approaches have been adopted to estimate the effects of specific education interventions that are linked by a common objective [28]. However, we did not find any evidence of previous pooled analyses based on items from benchmark assessments. In this exploratory study, we delineated how this approach can be used to compare student performance and can serve as an alternative to single-point comparisons, though it cannot reduce the limitation of the low reliability of single measures. Our results suggest that students exposed to the new curriculum performed better than their old-curriculum counterparts in basic sciences and public health. Conversely, no significant differences were observed in the applied clinical sciences. These results should be interpreted with caution, as the majority of content areas showed moderate to high heterogeneity. This means that a large proportion of the variation in the observed estimates was due to heterogeneity across the items in the analysis [29, 30], which may be related to the small sample size in the present case. In cases of high heterogeneity, the qualitative appraisal of results is important to understand the estimated effect. In this regard, the philosophy of the new curriculum (i.e., more integration between the basic sciences and social sciences applicable to medicine) may support the performance of students in basic sciences and public health. Notably, students exposed to the new curriculum integrating basic and clinical sciences may be better prepared for the interinstitutional Progress Test [31], which uses high taxonomy vignette-based items [25]. In addition, experiences in community settings may also contribute to early medical education [32]. Together, these curriculum designs may decisively contribute to better educational outcomes [33]. Accordingly, no performance differences were observed in items related to applied clinical sciences (internal medicine, pediatrics, surgery, obstetrics, and gynecology). This is probably because in both the old and new curricula, students had little exposure to these areas (particularly to the diagnosis and treatment of diseases). Therefore, it is understandable that we detected a greater impact on student performance in basic sciences and public health, as these were content areas that were more substantially changed in the new curriculum. Our study has some limitations that should be mentioned. First, the comparison group is not uniform. The “control” group comprised different cohorts of students, and specific characteristics of student groups may have introduced noise into the results. However, other than the curriculum change, no other institutional changes can explain the detected differences. Second, our sample has half the items of a full exam (63/120 items), and it is known that when using fewer items, reliability is not guaranteed [34]. Third, the Progress Test itself is too brief and covers a broad range of content areas. Thus, obtaining accurate indices of performance in individual content sub-areas is difficult, as the testing of these areas is based only on a few items [35]. Finally, we did not use other tools for the curriculum comparison and, therefore, the superiority of the new one is not unequivocal. However, as stated previously, this is an exploratory study showing the possible use of pooled analysis to compare performance. Our study provides a blueprint for how other investigations might use a similar approach to evaluate programmatic changes in educational settings. This method may be especially useful for: 1) detecting significant differences on tests that employ repeated items; 2) comparing performance at different institutions that use a same test; 3) reporting the performance of students on benchmark assessments, beyond the Progress Test. It is not expected, however, that this method will completely replace other tools. Rather, it can complement the full set of possibilities. Moreover, the comparison of each item may be important to detect knowledge gaps among the students, even when the pooled analysis shows no difference. Further studies may address this point and set comparisons between different statistical procedures. In conclusion, pooled analysis of pretested items can be a statistical method to assess the effectiveness of curricular changes.

Correct answers with percentages for each item and group, and odds ratio calculation.

(XLSX) Click here for additional data file. 16 Jun 2021 PONE-D-20-40753 Using pooled analysis of pretested items from the Progress Test to monitor performance of first-year medical students exposed to different curriculum designs: an exploratory study PLOS ONE Dear Dr. Hamamoto Filho, Thank you for submitting your manuscript to PLOS ONE. After careful consideration, we feel that it has merit but does not fully meet PLOS ONE’s publication criteria as it currently stands. Therefore, we invite you to submit a revised version of the manuscript that addresses the points raised during the review process. ============================== I highly recommend you address the following. 1. Clarify the use of the assessment as a benchmark assessment rather than a progress test and the overall goal of the study. 2. Clarify more specifically the assessment question objectives and how these relate to the overall heterogeneity. 3. Describe whether any other measures were used to compare the curriculum. As was indicated in the discussion, a single measure can yield low reliability. Please ensure that your decision is justified on PLOS ONE’s publication criteria and not, for example, on novelty or perceived impact. For Lab, Study and Registered Report Protocols: These article types are not expected to include results but may include pilot data. ============================== Please submit your revised manuscript by Jul 31 2021 11:59PM. If you will need more time than this to complete your revisions, please reply to this message or contact the journal office at plosone@plos.org. When you're ready to submit your revision, log on to https://www.editorialmanager.com/pone/ and select the 'Submissions Needing Revision' folder to locate your manuscript file. Please include the following items when submitting your revised manuscript: A rebuttal letter that responds to each point raised by the academic editor and reviewer(s). You should upload this letter as a separate file labeled 'Response to Reviewers'. A marked-up copy of your manuscript that highlights changes made to the original version. You should upload this as a separate file labeled 'Revised Manuscript with Track Changes'. An unmarked version of your revised paper without tracked changes. You should upload this as a separate file labeled 'Manuscript'. If you would like to make changes to your financial disclosure, please include your updated statement in your cover letter. Guidelines for resubmitting your figure files are available below the reviewer comments at the end of this letter. If applicable, we recommend that you deposit your laboratory protocols in protocols.io to enhance the reproducibility of your results. Protocols.io assigns your protocol its own identifier (DOI) so that it can be cited independently in the future. For instructions see: http://journals.plos.org/plosone/s/submission-guidelines#loc-laboratory-protocols. Additionally, PLOS ONE offers an option for publishing peer-reviewed Lab Protocol articles, which describe protocols hosted on protocols.io. Read more information on sharing protocols at https://plos.org/protocols?utm_medium=editorial-email&utm_source=authorletters&utm_campaign=protocols. We look forward to receiving your revised manuscript. Kind regards, Amy Prunuske Academic Editor PLOS ONE Additional Editor Comments (if provided): Thank you for your submissions. The reviewers valued your work and have provided a number of suggestions to improve the manuscript. I highly recommend you clarify the following. 1. Clarify the use of the assessment as a benchmark assessment rather than a progress test and the overall goal of the study. 2. Please clarify more specifically the assessment question objectives and how these relate to the overall heterogeneity. 3. Describe whether any other measures were used to compare the curriculum. As was indicated in the discussion, a single measure can yield low reliability. Journal Requirements: When submitting your revision, we need you to address these additional requirements. 1.Please ensure that your manuscript meets PLOS ONE's style requirements, including those for file naming. The PLOS ONE style templates can be found at and https://journals.plos.org/plosone/s/file?id=ba62/PLOSOne_formatting_sample_title_authors_affiliations.pdf 2. We note that the grant information you provided in the ‘Funding Information’ and ‘Financial Disclosure’ sections do not match. When you resubmit, please ensure that you provide the correct grant numbers for the awards you received for your study in the ‘Funding Information’ section. 3. We note that you have indicated that data from this study are available upon request. PLOS only allows data to be available upon request if there are legal or ethical restrictions on sharing data publicly. For more information on unacceptable data access restrictions, please see http://journals.plos.org/plosone/s/data-availability#loc-unacceptable-data-access-restrictions. In your revised cover letter, please address the following prompts: a) If there are ethical or legal restrictions on sharing a de-identified data set, please explain them in detail (e.g., data contain potentially sensitive information, data are owned by a third-party organization, etc.) and who has imposed them (e.g., an ethics committee). Please also provide contact information for a data access committee, ethics committee, or other institutional body to which data requests may be sent. b) If there are no restrictions, please upload the minimal anonymized data set necessary to replicate your study findings as either Supporting Information files or to a stable, public repository and provide us with the relevant URLs, DOIs, or accession numbers. For a list of acceptable repositories, please see http://journals.plos.org/plosone/s/data-availability#loc-recommended-repositories. We will update your Data Availability statement on your behalf to reflect the information you provide. 4. Please include captions for your Supporting Information files at the end of your manuscript, and update any in-text citations to match accordingly. Please see our Supporting Information guidelines for more information: http://journals.plos.org/plosone/s/supporting-information. [Note: HTML markup is below. Please do not edit.] Reviewers' comments: Reviewer's Responses to Questions Comments to the Author 1. Is the manuscript technically sound, and do the data support the conclusions? The manuscript must describe a technically sound piece of scientific research with data that supports the conclusions. Experiments must have been conducted rigorously, with appropriate controls, replication, and sample sizes. The conclusions must be drawn appropriately based on the data presented. Reviewer #1: Partly Reviewer #2: Yes Reviewer #3: Yes ********** 2. Has the statistical analysis been performed appropriately and rigorously? Reviewer #1: I Don't Know Reviewer #2: Yes Reviewer #3: Yes ********** 3. Have the authors made all data underlying the findings in their manuscript fully available? The PLOS Data policy requires authors to make all data underlying the findings described in their manuscript fully available without restriction, with rare exception (please refer to the Data Availability Statement in the manuscript PDF file). The data should be provided as part of the manuscript or its supporting information, or deposited to a public repository. For example, in addition to summary statistics, the data points behind means, medians and variance measures should be available. If there are restrictions on publicly sharing data—e.g. participant privacy or use of data from a third party—those must be specified. Reviewer #1: Yes Reviewer #2: Yes Reviewer #3: Yes ********** 4. Is the manuscript presented in an intelligible fashion and written in standard English? PLOS ONE does not copyedit accepted manuscripts, so the language in submitted articles must be clear, correct, and unambiguous. Any typographical or grammatical errors should be corrected at revision, so please note any specific errors here. Reviewer #1: Yes Reviewer #2: Yes Reviewer #3: Yes ********** 5. Review Comments to the Author Please use the space provided to explain your answers to the questions above. You may also include additional comments for the author, including concerns about dual publication, research ethics, or publication ethics. (Please upload your review as an attachment if it exceeds 20,000 characters) Reviewer #1: This paper can make a worthwhile contribution to medical education literature but in it’s current form, it has several flaws: 1. My understanding of progress testing is that it involves repeated testing of the same cohort. This study seems to report on testing of subsequent year 1 cohorts over several years and therefore better fits the definition of benchmarking assessment rather than progress testing per se. 2. The introduction largely focuses on progress testing as an assessment methodology but the discussion and the value of the study seem to be more about the use of pooled analysis for comparison of performance between cohorts and curricula. I wonder if in fact, the focus of this paper needs to shift away from progress testing and towards the statistical methods which can be applied to assess the effectiveness of curricula change. 3. The re-use of questions in subsequent years can be problematic in that students recall and share questions and they become available for subsequent cohorts. Analysis of the performance of repeated items across years should be conducted to identify whether this has occurred in this case. Further, the authors could provide more information about the context of the different medical schools to offer readers a clearer impression of how sharing may or may not occur between them. 4. Data was collected using the interinstitutional progress test from 2005 – why is only data from 2013 used here? 5. Apart from performance in the assessment, did the cohorts differ in other ways? Are demographic data available to describe the cohorts? Reviewer #2: This manuscript provides a sound example of how pooled analyses of a standardized examination, the Progress Test, can be used to retrospectively evaluate impact of curricular change in medical education. While the findings are limited due to the smaller sample size and high degree of heterogeneity, the authors have done an excellent job of demonstrating how this approach was used, interpreted, and the strengths / limitations of it compared to a single-point comparison analysis. Furthermore, the use of a smaller sample size makes this work relatable to many and provides a template for how others might proceed under similar constraints. Though the work itself isn't ground-breaking, it does provide a very accessible (well written and clearly described) blueprint for how other investigators might use a similar approach to evaluate programmatic change at their own institutions. For this reason, this manuscript offers substantial practical application value to its readers. Reviewer #3: SUMMARY OF THE RESEARCH In this manuscript, the authors describe a method of pooled item analysis to evaluate learning in several disciplines between two curricula. Overall the writing was clear, making it easy to follow the results and conclusions of the study. The “old” curriculum consisted of three phases (Basic Science M1-M2, Clinical Science M3-M4, Clerkship M5-M6) where the basic sciences were presented in subject-based courses. The “new” curriculum is divided into two phases (Preclinical M1-M3, Clerkship M4-M6) where the basic sciences are taught within systems-based units. The authors used a subset of 63 questions from the previously published Progress Test. All questions were used on the Progress Test at some point between 2013-2018, and student performance on those questions at those times was used to assess learning in the “old” curriculum. Students in 2019 who had begun the “new” curriculum answered the same 63 questions and their performance was compared to that in the “old” curriculum. The authors compared performance across curricula for each question, but also used a pooled analysis to evaluate differences in students understanding within a content area (ex: basic sciences, public health, surgery). Results showed improved performance on basic sciences and pubic health questions with the new curriculum compared to the old curriculum. I2 analysis showed a high degree of heterogeneity. Study was done well given the data available, but would have liked more description of specific changes to the curriculum that led to improvements, though that was beyond the scope of the study. Overall, it seems that pooled analysis of items proved a useful analytical tool to compare student performance across curricula. Recommendation: Accept with minor revisions. DISCUSSION OF SPECIFIC AREAS FOR IMPROVEMENT Introduction 1. Need reference for content in lines 48-50. 2. Lines 55-57 – please clarify what is meant by “easy to obtain”. It seems the authors mean it is not time efficient to evaluate long-term metrics like clinical performance and assessments of new curricula need to be done in real time to evaluate current learning. 3. Lines 60-65 – please describe the types of questions posed in the Progress Test – are they clinical vignettes? Are they assessing low-level knowledge? 4. Line 61 - what is meant by “repeated measures”? Do students take the test each year of their training)? Materials & Methods 1. Please describe the typical path of medical training in Brazil. Do all students spend 6 years in medical school? Is the format of your new curriculum the norm there? 2. Study setting and participants a. Describe any exclusion criteria. b. Describe changes to the content of the curriculum. The authors mention the social sciences disciplines were taught differently, but does this equate to inclusion of more content or just redistribution of content? c. Why were first year students used for this study when the first phase lasts at least two years for both curricula? d. Was there a difference in when certain content was covered in the old and new curriculum relative to when students took this test? 3. Progress Test a. Line 110 – clarify if this means each student takes the test each year throughout their training b. The reader needs more context here, please describe types of questions included in this test 4. Line 131 – describe interpretation of I2 analysis Results 1. Why were there no questions selected from 2015? 2. It seems that including questions from so many different years increases the noise of your sample, why not select questions from a smaller subset of years? 3. Have the authors evaluated student performance on these questions in the old vs. new curriculum when students were M2s? 4. Line 151-153 – this needs clarification. When points appear to the right or left of what? 1? 5. Line 162-164 – this is interesting, what was the topic and what differed in the new curriculum that improved understanding of this topic so dramatically? 6. Line 175 – place the p value inside the parenthesis to match formatting of the other paragraphs 7. Issue with formatting for lines 181-186 (seems related to the editorial software) Discussion 1. Line 198-200 – is this what would be expected given changes to the curriculum? 2. Line 220-222 – it is concerning that the “control” group consists of multiple cohorts while the “new” group is all within a single cohort. It seems this would introduce a decent amount of noise into the data. 3. It is unclear if the goal of the study is to demonstrate utility of pooled analysis as a method or demonstrate that their new curriculum improved student understanding of basic sciences and public health. ADDITIONAL POINTS The comparison of student performance in the old vs. new curriculum is valuable for quality assurance at their institution. It is unclear how novel a finding this is. One would hope that after a major curriculum redesign, improved learning could be shown, but it is unclear where to attribute those improvements. What did they do to improve learning in the basic sciences and public health? Do they have recommendations for other educators? It seems the major conclusion the authors are trying to show is that pooled analysis of items is a useful method. However, the data seems very noisy and difficult to draw conclusions from. ********** 6. PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files. If you choose “no”, your identity will remain anonymous but your review may still be made public. Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy. Reviewer #1: No Reviewer #2: No Reviewer #3: No [NOTE: If reviewer comments were submitted as an attachment file, they will be attached to this email and accessible via the submission site. Please log into your account, locate the manuscript record, and check for the action link "View Attachments". If this link does not appear, there are no attachment files.] While revising your submission, please upload your figure files to the Preflight Analysis and Conversion Engine (PACE) digital diagnostic tool, https://pacev2.apexcovantage.com/. PACE helps ensure that figures meet PLOS requirements. To use PACE, you must first register as a user. Registration is free. Then, login and navigate to the UPLOAD tab, where you will find detailed instructions on how to use the tool. If you encounter any issues or have any questions when using PACE, please email PLOS at figures@plos.org. Please note that Supporting Information files do not need this step. 28 Jul 2021 Dear Editor, We thank you and the reviewers for the thoughtful comments on our manuscript. We are sure that they helped us improving its quality and we hope that we have successfully addressed all the concerns and suggestions. Bellow, we answer each of the comments. Editor: I highly recommend you address the following. 1. Clarify the use of the assessment as a benchmark assessment rather than a progress test and the overall goal of the study. We put emphasis on the possible use of the proposed method for benchmark assessment. Therefore, the goal of the study was the presentation of the method and the curriculum comparison was used to exemplify the pooled analysis potential. 2. Clarify more specifically the assessment question objectives and how these relate to the overall heterogeneity. We provided more information about the Progress Test, its design and characteristics, supporting more information about the items. We believe that the heterogeneity may be related to the sample size – this should be tested in next studies, since here, our attempt is to present and explore the use of this measurement. 3. Describe whether any other measures were used to compare the curriculum. As was indicated in the discussion, a single measure can yield low reliability. No other measures were used to compare the curriculums. Since the reviewers asked us to shift the focus of the paper to the proposed method, we believe that more details on curriculum comparisons are not needed in this paper. But we strongly agree that for a better comparison, other tools would be necessary. We stated this point in the Limitations section. Reviewer #1: This paper can make a worthwhile contribution to medical education literature but in it’s current form, it has several flaws: We thank the Reviewer for the good observations made. We believe that our study may be of help to medical education and we hope that the flaws were now overcome. 1. My understanding of progress testing is that it involves repeated testing of the same cohort. This study seems to report on testing of subsequent year 1 cohorts over several years and therefore better fits the definition of benchmarking assessment rather than progress testing per se. We agree with the Reviewer. The comparison of repeated items resembles benchmark assessment. We made changes throughout the manuscript. However, each of the year 1 students were administered different tests (Progress Tests). The 2019 year 1 students were administered a test whose items were used in previous tests. Besides, since the items used came from the Progress Test, it was important to keep some descriptions about it. 2. The introduction largely focuses on progress testing as an assessment methodology but the discussion and the value of the study seem to be more about the use of pooled analysis for comparison of performance between cohorts and curricula. I wonder if in fact, the focus of this paper needs to shift away from progress testing and towards the statistical methods which can be applied to assess the effectiveness of curricula change. We made changes in the Title, Introduction and Discussion to emphasize the statistical method that is proposed. Accordingly, information and references related to the Progress Test were suppressed. 3. The re-use of questions in subsequent years can be problematic in that students recall and share questions and they become available for subsequent cohorts. Analysis of the performance of repeated items across years should be conducted to identify whether this has occurred in this case. Further, the authors could provide more information about the context of the different medical schools to offer readers a clearer impression of how sharing may or may not occur between them. The problem of recall is not a point in this study because the intervention group (new curriculum) has never before undergone the test. Sharing questions could be a problem but, actually, as we observe for more than 15 years, this does not occur because students do not have to study for the test (and really do not study). The test book is given to the students after the its administration and even senior students exposed eventual repeated questions cannot recall these questions. We stated this point in the text. 4. Data was collected using the interinstitutional progress test from 2005 – why is only data from 2013 used here? Actually, 2005 is the year when the school started administering the Progress Test. For the elaboration of the 2019 test, there were included only items from 2013 to 2018 (preferably 2016 to 2018). This was a decision of the Interinstitutional Progress Test commission. 5. Apart from performance in the assessment, did the cohorts differ in other ways? Are demographic data available to describe the cohorts? Since we dealt with aggregate information of the tests, we do not have demographic information. However, there was no particular change in 2019 admission policies of the school that could lead to differences between the cohorts. Reviewer #2: This manuscript provides a sound example of how pooled analyses of a standardized examination, the Progress Test, can be used to retrospectively evaluate impact of curricular change in medical education. While the findings are limited due to the smaller sample size and high degree of heterogeneity, the authors have done an excellent job of demonstrating how this approach was used, interpreted, and the strengths / limitations of it compared to a single-point comparison analysis. Furthermore, the use of a smaller sample size makes this work relatable to many and provides a template for how others might proceed under similar constraints. Though the work itself isn't ground-breaking, it does provide a very accessible (well written and clearly described) blueprint for how other investigators might use a similar approach to evaluate programmatic change at their own institutions. For this reason, this manuscript offers substantial practical application value to its readers. We thank the Reviewer for appreciating our study. Indeed, we are quite aware of the limitations that it has, including the small sample size. However, we do believe that the proposed approach may have a broad audience and use in other institutions. Reviewer #3: SUMMARY OF THE RESEARCH In this manuscript, the authors describe a method of pooled item analysis to evaluate learning in several disciplines between two curricula. Overall the writing was clear, making it easy to follow the results and conclusions of the study. The “old” curriculum consisted of three phases (Basic Science M1-M2, Clinical Science M3-M4, Clerkship M5-M6) where the basic sciences were presented in subject-based courses. The “new” curriculum is divided into two phases (Preclinical M1-M3, Clerkship M4-M6) where the basic sciences are taught within systems-based units. The authors used a subset of 63 questions from the previously published Progress Test. All questions were used on the Progress Test at some point between 2013-2018, and student performance on those questions at those times was used to assess learning in the “old” curriculum. Students in 2019 who had begun the “new” curriculum answered the same 63 questions and their performance was compared to that in the “old” curriculum. The authors compared performance across curricula for each question, but also used a pooled analysis to evaluate differences in students understanding within a content area (ex: basic sciences, public health, surgery). Results showed improved performance on basic sciences and public health questions with the new curriculum compared to the old curriculum. I2 analysis showed a high degree of heterogeneity. Study was done well given the data available, but would have liked more description of specific changes to the curriculum that led to improvements, though that was beyond the scope of the study. Overall, it seems that pooled analysis of items proved a useful analytical tool to compare student performance across curricula. Recommendation: Accept with minor revisions. We thank the reviewer for the meticulous observations made on our manuscript. Certainly, the suggestions improved the quality of the report. We hope that we have successfully addressed all the raised questions. DISCUSSION OF SPECIFIC AREAS FOR IMPROVEMENT Introduction 1. Need reference for content in lines 48-50. References were added. 2. Lines 55-57 – please clarify what is meant by “easy to obtain”. It seems the authors mean it is not time efficient to evaluate long-term metrics like clinical performance and assessments of new curricula need to be done in real time to evaluate current learning. We mean that it is difficulty to establish a direct linkage between curriculum design, education quality, and health indicators. This explanation was added to the text. 3. Lines 60-65 – please describe the types of questions posed in the Progress Test – are they clinical vignettes? Are they assessing low-level knowledge? The majority of the questions are clinical vignette-based targeting to high taxonomic levels. This information was added in the Methods section because the Introduction was reformulated according to other Reviewer’s suggestions. 4. Line 61 - what is meant by “repeated measures”? Do students take the test each year of their training)? Yes, Progress Test is a longitudinal assessment that measures students’ knowledge on subsequent yet different tests. Through first to last year of medical training, all students answer the same test. Again, this information was added in the Methods section due to rewriting of the Introduction. Materials & Methods 1. Please describe the typical path of medical training in Brazil. Do all students spend 6 years in medical school? Is the format of your new curriculum the norm there? In Brazil, the undergraduate medical course lasts 6 years. The 2-2-2 curriculum format (basic sciences, clinical sciences, and clerkship) is the most common design, either for traditional (lecture-based) or problem-based learning approaches. This new design (3-3) is innovative in the country. Brief information about it was provided. 2. Study setting and participants a. Describe any exclusion criteria. We excluded data from students that did not sit the test. However, this information is a mirror of the “inclusion criteria”, and, therefore, does not require detailing. Besides, since we dealt with aggregate data, no further “exclusion criteria” would be necessary. b. Describe changes to the content of the curriculum. The authors mention the social sciences disciplines were taught differently, but does this equate to inclusion of more content or just redistribution of content? We added the information that this change included interdisciplinary and community-based approaches. Since the Editor and other Reviewer suggested a clearer emphasis on the goal of the study (i.e., the pooled analysis), we believe that a very detailed description of the content changes is no longer needed. c. Why were first year students used for this study when the first phase lasts at least two years for both curricula? Because the test with repeated items was administered only once, in the same year in which the new curriculum was adopted, i.e., we did not have second-years students in the new curriculum so far. The repetition of items is uncommon because each year, new items are written. d. Was there a difference in when certain content was covered in the old and new curriculum relative to when students took this test? Again, since the Editor and other Reviewer suggested a clearer emphasis on the goal of the study (i.e., the pooled analysis), we believe that a very detailed description of the content changes is no longer needed. 3. Progress Test a. Line 110 – clarify if this means each student takes the test each year throughout their training Yes, Progress Test is a longitudinal assessment on subsequent yet different tests. Through first to last year of medical training, all students answer the same test. In our school, the test is administered once a year. This information was added to the section. b. The reader needs more context here, please describe types of questions included in this test We added more information regarding the Progress Test, as well as it’s questions. c. Line 131 – describe interpretation of I2 analysis We added reference values for I2 analysis. 4. Results a. Why were there no questions selected from 2015? The committee of Progress Test opted not to use questions from 2015 because in this year, the test was a national exam (reference 21) with some differences from our pattern. b. It seems that including questions from so many different years increases the noise of your sample, why not select questions from a smaller subset of years? Yes, we are quite aware that so many years make the “control group” very noisy. If we use only a subset of years, we would have fewer items to compare. We agree that this is a limitation of our study and we stated it through the text. However, the purpose of study is to explore the new statistical approach. c. Have the authors evaluated student performance on these questions in the old vs. new curriculum when students were M2s? No. At the time that the test was administered, we did not have M2 students in the new curriculum (see 2c). d. Line 151-153 – this needs clarification. When points appear to the right or left of what? 1? Yes, left or right of the vertical axis of “1”. We clarified it in the figure legend. e. Line 162-164 – this is interesting, what was the topic and what differed in the new curriculum that improved understanding of this topic so dramatically? This item addressed an epidemiology content, that is better explored in the 1st year curriculum. We added the information in the text. However, without further details, because we have been asked by the Editor and other Reviewer to focus in the method of comparison, rather than in the curriculum change. f. Line 175 – place the p value inside the parenthesis to match formatting of the other paragraphs Thank you for the observation. g. Issue with formatting for lines 181-186 (seems related to the editorial software) Corrected. 5. Discussion a. Line 198-200 – is this what would be expected given changes to the curriculum? We did not have other indicators to have a well-defined expectation. This was the first measure comparing the curricula. However, surely, the academic community expects that the new curriculum functions better, but we believe that this discussion goes beyond the main purpose of this manuscript. b. Line 220-222 – it is concerning that the “control” group consists of multiple cohorts while the “new” group is all within a single cohort. It seems this would introduce a decent amount of noise into the data. We agree with the Reviewer and this was the reason why we recognized this limitation. However, as we shifted the focus of the manuscript on the pooled analysis. c. It is unclear if the goal of the study is to demonstrate utility of pooled analysis as a method or demonstrate that their new curriculum improved student understanding of basic sciences and public health. This point is in line with other comments. Therefore, we emphasized the utility of the pooled analysis. Thank you for the suggestion 6. ADDITIONAL POINTS The comparison of student performance in the old vs. new curriculum is valuable for quality assurance at their institution. It is unclear how novel a finding this is. One would hope that after a major curriculum redesign, improved learning could be shown, but it is unclear where to attribute those improvements. What did they do to improve learning in the basic sciences and public health? Do they have recommendations for other educators? It seems the major conclusion the authors are trying to show is that pooled analysis of items is a useful method. However, the data seems very noisy and difficult to draw conclusions from. Yes, it is very difficult to state that the new curriculum is directly related to the students’ better performance. We added some comments about it in the Discussion. And yes, the major conclusion is the utility of the pooled analysis. We made it clear throughout the text. Submitted filename: Response to Reviewers.docx Click here for additional data file. 31 Aug 2021 Exploring pooled analysis of pretested items to monitor the performance of medical students exposed to different curriculum designs PONE-D-20-40753R1 Dear Dr. Hamamoto Filho, We’re pleased to inform you that your manuscript has been judged scientifically suitable for publication and will be formally accepted for publication once it meets all outstanding technical requirements. Within one week, you’ll receive an e-mail detailing the required amendments. When these have been addressed, you’ll receive a formal acceptance letter and your manuscript will be scheduled for publication. An invoice for payment will follow shortly after the formal acceptance. To ensure an efficient process, please log into Editorial Manager at http://www.editorialmanager.com/pone/, click the 'Update My Information' link at the top of the page, and double check that your user information is up-to-date. If you have any billing related questions, please contact our Author Billing department directly at authorbilling@plos.org. If your institution or institutions have a press office, please notify them about your upcoming paper to help maximize its impact. If they’ll be preparing press materials, please inform our press team as soon as possible -- no later than 48 hours after receiving the formal acceptance. Your manuscript will remain under strict press embargo until 2 pm Eastern Time on the date of publication. For more information, please contact onepress@plos.org. Kind regards, Amy Prunuske Academic Editor PLOS ONE Additional Editor Comments (optional): Reviewers' comments: Reviewer's Responses to Questions Comments to the Author 1. If the authors have adequately addressed your comments raised in a previous round of review and you feel that this manuscript is now acceptable for publication, you may indicate that here to bypass the “Comments to the Author” section, enter your conflict of interest statement in the “Confidential to Editor” section, and submit your "Accept" recommendation. Reviewer #4: (No Response) ********** 2. Is the manuscript technically sound, and do the data support the conclusions? The manuscript must describe a technically sound piece of scientific research with data that supports the conclusions. Experiments must have been conducted rigorously, with appropriate controls, replication, and sample sizes. The conclusions must be drawn appropriately based on the data presented. Reviewer #4: Yes ********** 3. Has the statistical analysis been performed appropriately and rigorously? Reviewer #4: Yes ********** 4. Have the authors made all data underlying the findings in their manuscript fully available? The PLOS Data policy requires authors to make all data underlying the findings described in their manuscript fully available without restriction, with rare exception (please refer to the Data Availability Statement in the manuscript PDF file). The data should be provided as part of the manuscript or its supporting information, or deposited to a public repository. For example, in addition to summary statistics, the data points behind means, medians and variance measures should be available. If there are restrictions on publicly sharing data—e.g. participant privacy or use of data from a third party—those must be specified. Reviewer #4: Yes ********** 5. Is the manuscript presented in an intelligible fashion and written in standard English? PLOS ONE does not copyedit accepted manuscripts, so the language in submitted articles must be clear, correct, and unambiguous. Any typographical or grammatical errors should be corrected at revision, so please note any specific errors here. Reviewer #4: (No Response) ********** 6. Review Comments to the Author Please use the space provided to explain your answers to the questions above. You may also include additional comments for the author, including concerns about dual publication, research ethics, or publication ethics. (Please upload your review as an attachment if it exceeds 20,000 characters) Reviewer #4: The authors present a pooled item analysis to evaluate learning in several disciplines between two curricula. Considering the available data - which are weak from the point of view of the demographic information of the respondents and in terms of sample size - the authors did a good job and the analysis provides useful information. ********** 7. PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files. If you choose “no”, your identity will remain anonymous but your review may still be made public. Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy. Reviewer #4: No 2 Sep 2021 PONE-D-20-40753R1 Exploring pooled analysis of pretested items to monitor the performance of medical students exposed to different curriculum designs Dear Dr. Hamamoto Filho: I'm pleased to inform you that your manuscript has been deemed suitable for publication in PLOS ONE. Congratulations! Your manuscript is now with our production department. If your institution or institutions have a press office, please let them know about your upcoming paper now to help maximize its impact. If they'll be preparing press materials, please inform our press team within the next 48 hours. Your manuscript will remain under strict press embargo until 2 pm Eastern Time on the date of publication. For more information please contact onepress@plos.org. If we can help with anything else, please email us at plosone@plos.org. Thank you for submitting your work to PLOS ONE and supporting open access. Kind regards, PLOS ONE Editorial Office Staff on behalf of Dr. Amy Prunuske Academic Editor PLOS ONE

27 in total

1. "No fear" curricular change: monitoring curricular change in the W. K. Kellogg Foundation's National Initiative on Community Partnerships and Health Professions Education.

Authors: C J Bland; S Starnaman; D Harris; R Henry; L Hembroff
Journal: Acad Med Date: 2000-06 Impact factor: 6.893

2. Quantifying heterogeneity in a meta-analysis.

Authors: Julian P T Higgins; Simon G Thompson
Journal: Stat Med Date: 2002-06-15 Impact factor: 2.373

3. Monitoring the medical education revolution.

Authors: Val Wass; Tessa Richards; Peter Cantillon
Journal: BMJ Date: 2003-12-13

4. Randomized controlled trials and meta-analysis in medical education: what role do they play?

Authors: David A Cook
Journal: Med Teach Date: 2012-04-10 Impact factor: 3.650

5. Using the cumulative deviation method for cross-institutional benchmarking in the Berlin progress test.

Authors: Stefan Schauber; Zineb M Nouns
Journal: Med Teach Date: 2010 Impact factor: 3.650

Review 6. Assessment in medical education.

Authors: Ronald M Epstein
Journal: N Engl J Med Date: 2007-01-25 Impact factor: 91.245

7. Benchmarking by cross-institutional comparison of student achievement in a progress test.

Authors: Arno M M Muijtjens; Lambert W T Schuwirth; Janke Cohen-Schotanus; Arnold J N M Thoben; Cees P M van der Vleuten
Journal: Med Educ Date: 2008-01 Impact factor: 6.251

8. Use of the NBME Comprehensive Basic Science Examination as a progress test in the preclerkship curriculum of a new medical school.

Authors: Teresa R Johnson; Mohammed K Khalil; Richard D Peppler; Diane D Davey; Jonathan D Kibble
Journal: Adv Physiol Educ Date: 2014-12 Impact factor: 2.288

9. Back from basics: integration of science and practice in medical education.

Authors: Glen Bandiera; Ayelet Kuper; Maria Mylopoulos; Cynthia Whitehead; Mariela Ruetalo; Kulamakan Kulasegaram; Nicole N Woods
Journal: Med Educ Date: 2017-10-10 Impact factor: 6.251

10. Medical education and the healthcare system--why does the curriculum need to be reformed?

Authors: Gustavo A Quintero
Journal: BMC Med Date: 2014-11-12 Impact factor: 8.775