Literature DB >> 31805073

What do we really know about the appropriateness of radiation emitting imaging for low back pain in primary and emergency care? A systematic review and meta-analysis of medical record reviews.

Gabrielle S Logan¹, Andrea Pike², Bethan Copsey³, Patrick Parfrey¹, Holly Etchegary¹, Amanda Hall^1,2.

Abstract

BACKGROUND: Since 2000, guidelines have been consistent in recommending when diagnostic imaging for low back pain should be obtained to ensure patient safety and reduce unnecessary tests. This systematic review and meta-analysis was conducted to determine the pooled proportion of CT and x-ray imaging of the lumbar spine that were considered appropriate in primary and emergency care.
METHODS: Pubmed, CINAHL, The Cochrane Database of Systematic Reviews and Embase were searched for synonyms of "low back pain", "guidelines", and "adherence" that were published after 2000. Titles, abstracts, and full texts were reviewed for inclusion with forward and backward tracking on included studies. Included studies had data extracted and synthesized. Risk of bias was performed on all studies, and GRADE was performed on included studies that provided data on CT and x-ray separately. A random effect, single proportion meta-analysis model was used.
RESULTS: Six studies were included in the descriptive synthesis, and 5 studies included in the meta-analysis. Five of the 6 studies assessed appropriateness of x-rays; two of the six studies assessed appropriateness of CTs. The pooled estimate for appropriateness of x-rays was 43% (95% CI: 30%, 56%) and the pooled estimate for appropriateness of CTs was 54% (95% CI: 51%, 58%). Studies did not report adequate information to fulfill the RECORD checklist (reporting guidelines for research using observational data). Risk of bias was high in 4 studies, moderate in one, and low in one. GRADE for x-ray appropriateness was low-quality and for CT appropriateness was very-low-quality.
CONCLUSION: While this study determined a pooled proportion of appropriateness for both x-ray and CT imaging for low back pain, there is limited confidence in these numbers due to the downgrading of the evidence using GRADE. Further research on this topic is needed to inform our understanding of x-ray and CT appropriateness in order to improve healthcare systems and decrease patient harms.

Entities: Chemical Disease Gene Species

Year: 2019 PMID： 31805073 PMCID： PMC6894771 DOI： 10.1371/journal.pone.0225414

Source DB: PubMed Journal: PLoS One ISSN： 1932-6203 Impact factor: 3.240

Introduction

Guidelines for the assessment and treatment of low back pain (LBP) have been in circulation since the 1980s with more than 11 countries publishing their own LBP clinical guidelines in the last two decades.[1] While most early versions of LBP guidelines did not recommend routine use of radiographic imaging for assessment of LBP, there were discrepancies about when to image (e.g., some guidelines provided specific criteria or timeframes for imaging and others did not). In the 1980s and 1990s, x-ray imaging was commonly recommended in the assessment of LBP persisting longer than four weeks[1] and Computed Tomography (CT) was often recommended in patients experiencing neurological deficits, including radicular symptoms.[2,3] For the last 25 years, there has been increased congruence among LBP guidelines regarding when and under what circumstances to use diagnostic imaging. Since 2000, the recommendations typically state that diagnostic imaging is warranted only when patients with LBP present with red flag symptoms that suggest the presence of one of four known specific spinal pathologies (severe cauda equina, infection, fracture, and cancer).[4,5] Guidelines have also been updated with respect to the potential direct and indirect patient harms of diagnostic imaging, particularly x-ray and CT, as well as their lack of clinical utility for non-specific LBP. While MRI is another form of diagnostic imaging, it does not expose patients to the ionising radiation that x-ray and CT both emit; thus we are focusing only on those two imaging modalities.

Harms of over-testing

Patient harms

Both x-ray and CT imaging expose patients to ionizing radiation, a known mutagen that can increase risk of cancer, with CT exposing patients to more radiation than x-ray.[6] The human body can tolerate some radiation, but the more exposure that a patient has to radiation, the greater their cancer risk. This risk of radiation is even greater to children and young adults as radiation can effect both male and female fertility.[7] Thus, radiologists typically recommend using x-ray and CT only when medically necessary and clinically justified to patient care.[8,9] In addition to the harms from radiation, imaging can reveal incidental findings, such as anatomical abnormalities, that are extremely common in asymptomatic patients, and only weakly correlated with patient symptoms.[10] For example, a systematic review in 2014 found that disc degeneration was present in 96% of asymptomatic adults aged 80 and up, and disc bulges found in 80%.[11] Moreover, patients who receive diagnostic imaging do not have better patient outcomes compared to those treated without imaging.[5,10] Chou et al. performed a systematic review and meta-analysis to compare physical outcomes of patients with LBP who received imaging to those who did not.[12] They found that patients who received immediate imaging for non-serious LBP had similar pain and function outcomes both in the short and long term compared to patients who received usual care without imaging.[12] The harm of incidental findings is that patients may have to be sent for further tests or procedures to confirm that the finding is in fact benign, which may delay the patient receiving the appropriate treatment.

Health system burden

In addition to patient harms, over-testing results in a substantial economic burden to healthcare systems.[13] In the United States, the amount of dollars spent on all CTs in 2000 was $975 million, and by 2006, the amount increased to $2.17 billion.[13,14] In countries with a public healthcare system, it is difficult to quantify in dollars the cost of unnecessary imaging, but in Canada the rate of CT imaging has almost doubled since 2003,[15] suggesting that the cost of imaging has also drastically increased. This financial increase is also associated with trickle-down effects such as increased need for follow-up, further investigations of incidental findings, referrals to specialists, and even surgery.[10,16]

Importance of assessing appropriateness

Given the potential patient harms and added health care costs of using diagnostic imaging, it is essential to understand if these tests are being used appropriately according to the current guidelines. This information allows healthcare providers to understand whether and to what degree patient safety and quality of care are compromised with the use of unnecessary tests. A recent systematic review of diagnostic imaging appropriateness for LBP found that approximately one third of imaging referrals were not appropriate; however, this review included imaging referrals from any healthcare provider for any imaging modality (including MRIs).[17] X-ray and CT pose the most direct harm to patients due to their radiation emissions; thus we intend to provide a focused estimate of appropriateness for these tests only. Additionally, since physicians in family practice or emergency department settings are the most common setting for imaging referrals for patients with LBP and follow the same guidelines for imaging ordering, we will focus our question to this provider population. This will also allow us to reduce any heterogeneity in our estimate due to potentially different ordering practices or guidelines amongst different providers.

Aim

We aim to synthesize the evidence from all studies investigating the appropriateness of physician-made referrals for CTs and x-rays for LBP in primary and emergency care, which from here on we will refer to both as primary care. Our review adds to the literature by providing clinicians, implementation researchers and policy makers with an estimate of imaging appropriateness for CT imaging and x-ray imaging separately that is specific to physicians working in family practice and emergency department settings.

Methods

This study was performed according to the PRISMA methodology.

Search strategy

Four databases, PubMed, CINAHL, EMBASE and The Cochrane Database of Systematic Reviews, were searched for terms related to the PICO keywords of low back pain, guidelines, and adherence. The search string was developed with a research librarian. Databases were searched from inception to May 2018 (see S2 Appendix). Titles and abstracts from each database search were imported to Endnote (version 10), and duplicates were removed before screening. Forward and backward citation tracking as well as reference lists of relevant systematic reviews and policy documents were done on all included papers in order to ensure our database search captured all applicable published research articles.

Inclusion criteria

Studies were included if (i) the design was a retrospective or prospective review/audit of medical records, (ii) the data item was data on lumbar CT and x-ray images, (iii) the imaging referrals were made by a physician in either general practice or emergency department settings, (iv) the analysis compared the reason for imaging referral to a guideline source, and (v) the outcome was the proportion of appropriate or inappropriate referrals based on adherence to the guidelines. All LBP types were eligible for inclusion. We excluded studies that looked at appropriateness of imaging referred by other providers such as chiropractors, physiotherapists, or nurse practitioners. Only studies that reported individual or aggregate data from chart reviews for CT and x-ray imaging were included. If other tests or imaging modalities (e.g., MRI) were combined with x-rays or CTs, the study authors were contacted to confirm if x-ray and CT data could be reported separately, if not, the study would be excluded. Other study designs, such as self-reported surveys or simulated patient visits were excluded. Since there was potential for variation in imaging recommendations found in guidelines published prior to the year 2000 that could impact in the definition of appropriateness, we excluded all studies in which the data and guidelines were from 2000 and older. Two reviewers (GL, AH) screened titles and abstracts and created a shortlist of full texts to be screened. Full texts were scrutinized by two reviewers (GL, AH) to assess eligibility against the inclusion/exclusion criteria. Any discrepancy was resolved upon discussion of the difference and consensus of the categorization for inclusion. Authors of studies that did not have a full text available (abstract or conference proceedings only) were contacted to determine if there was a published full-text. Authors of studies that did not report imaging modalities included were contacted to determine if MRI was included in the aggregate data.

Data extraction

An electronic data collection form was developed to extract information from all included studies on study characteristics and outcome data. For each study the healthcare setting, LBP type, sample size, and outcome data were extracted. Outcomes included both the proportion of appropriate and inappropriate images. Additional outcome information extracted included: the guidelines source used for comparison, the definition used to assess appropriateness (or inappropriateness), the outcome denominator (if outcome reported the number of patients, images, visits), and measurement error (if reported).

Quality of reporting and risk of bias assessment

Quality of reporting was assessed for each study according to the “Reporting of studies Conducted using Observational Routinely-collected health data” (RECORD) Statement checklist, which is an expansion of the "Strengthening the Reporting of Observational Studies in Epidemiology" STROBE Statement checklist.[18-21] Every included study was compared to the RECORD Statement’s 35-item checklist to determine if the study reported pertinent information to fulfill the checklist. No widely accepted tool exists for assessing Risk of Bias (RoB) for this type of observational study. Guidance was provided by a review authored by Sanderson et al. which provides a list of specific domains to be considered.[22] RoB for these observational, non-randomised studies was determined by using items that related to the following 4 domains: Representativeness of patients, misclassification of patients, misclassification of outcome measurement, and inconsistent data. Overall study RoB was judged to be low if 4 out of the 4 domains were judged as low risk, moderate if 3 domains were considered low risk or high if two or less domain items were low risk.

Data synthesis and analysis

Our main outcome was appropriateness of x-ray or CTs. For this review CT and x-ray appropriateness was broadly defined as suspicion of any of the red flag conditions (fracture, cauda equina, infection, malignancy). Since there is some variation in the guidelines about the exact criteria for appropriateness we anticipated some clinical heterogeneity in the definitions used by studies. Data were summarized separately for appropriateness of x-rays and appropriateness of CTs. We extracted estimates of the proportion of appropriate x-rays or CTs (and 95% confidence intervals) from each included study. In one case, the study only included an estimate of inappropriateness.[48] In this case the authors were contacted to confirm`that we could accurately use the inverse of their estimate as the proportion of appropriate x-rays. When studies did not provide CIs for their appropriate percentage, we calculated the 95% CI using the formula for calculating confidence intervals for a single proportion in Stata (v 15). Meta-analysis for a single proportion using a random effects model was completed on studies that were determined to be clinically homogenous.[23] The pooled proportion was calculated with Stata (v 15). We applied the GRADE (Grading of Recommendations, Assessment, Development and Evaluation) approach to assess certainty of the estimates of appropriateness.[24] Certainty was downgraded based on 4 factors: Risk of Bias: Twenty-five percent or more of the participants were from studies rated as having a high RoB. Inconsistency in results: Determined by examining whether the estimates were similar in magnitude (overlapping confidence intervals). Indirectness of evidence: More than 50% of the participants were outside the target group (e.g., differences in populations, outcome measures, and interventions). Imprecision of evidence: Determined based on the width of the confidence interval (CI) associated with the proportion of appropriateness (+/- 3%) and the overall sample size (at least 2000 participants).

Results

We identified a total of 919 publications from database searching (n = 918) and additional sources (n = 1), which was reduced to 696 studies after deduplication (Fig 1). We reviewed 185 full texts of which 22 were excluded for very specific reasons (see S2 Appendix).[25-46] Of the six final included studies,[47-52] one study was published in Spanish but was translated for analysis,[52] and two studies were abstracts only for which there was no full publication according to the authors of the abstracts.[47,48]

Fig 1

PRISMA flow diagram of search strategy.

Study characteristics

The studies were conducted in Finland, Ireland, Spain, & the United States (Table 1). In all studies, imaging referrals were made by physicians from a mixture of both primary care clinics or hospital settings. Sample sizes ranged from 30 to 3908. The duration of LBP in the different studies was undefined. Five of 6 studies assessed appropriateness of x-rays; two of the six studies assessed appropriateness of CTs. The studies used a range of different guidelines to select the criteria for determining appropriateness. Of the six studies included, nine different guidelines were used; some studies were directed by more than one guideline source.

Table 1

Study characterised and reported outcomes of appropriateness organised by image type.

Study / Country	Setting¹ Patient age	Database / Data source	Definition of Appropriateness (Guideline Source)	Denominator (sample size)²	% Appropriate (95%CI)	Risk of Bias
x-ray
Baez 2011 USA	Mixed 18-40years	EMR/ Imaging referral³	Adherence to ACR, ACP and APS guidelines	Consecutive patients (18-40yrs) who received lumbar spine imaging (n = 100)	34% (25, 43%)	High
Culleton 2013 Ireland	Mixed ≥65years	EMR/ Radiology findings	Adherence to RCR guidelines	All referrals for lumbar spine x-rays in patients >65yrs over a 5 month period (n = 414)	18% (14, 22%)	High
Muntion-Alfaro 2006, Spain	Mixed NR	Medical Records/ Unclear	Adherence to red flag indicators listed in RCGP, AHCPR, and ICSI guidelines	Consecutive patients who presented at 1 GP clinic with low back pain who received a referral for an x-ray exam over a 1 year period. (n = 126)	47% (43, 51%)	Moderate
Schlemmer 2015 USA	ED NR	Insurance Claims/ Imaging referral³	Adherence to red flag indicators, or >6-weeks of LBP as listed in the ACR and NCQA guidelines	All patients with a claim for a lumbar spine x-ray examination over a 1 year period. Note: this included only one x-ray claim per patient. (n = 3908)	56% (55, 58%)	Low
Tahvonen 2016 Finland	Mixed NR	Medical Records/ Imaging referral Medical notes	Unclear (EC)	Consecutive patients (>16yrs) who received lumbar spine imaging referrals over a 6 month period (n = 50)	32% (19, 45%)	High
CTs
Oikarinen 2009 Finland	Mixed < 35years	Medical Records Imaging referral³	Adherence to symptoms of fracture as listed in EC guidelines	Consecutive patients (<35yrs) who received a lumbar spine CT examination starting in January 2005 (n = 30)	23% (8, 39%)	High
Schlemmer 2015 USA	ED NR	Insurance Claims Imaging referral³	Adherence to red flag indicators, or >6-weeks of LBP as listed in the ACR and NCQA Guidelines	All patients with a claim for a lumbar spine CT examination over a 1 year period. Note: this included only one CT claim per patient. (n = 648)	56% (52, 60%)	Low

2 The total number of lumbar spine imaging/referrals reviewed.

3 In addition to the referral, patient charts may have been accessed to determine patient information for determining appropriateness

NR: not reported.

EBG: Evidence Based Guidelines.

NCQA: National Committee for Quality Assurance; RCGP: Royal College of General Practitioners; AHCPR: Agency for Health Care Policy and Research; ICSI: Institute for Clinical Systems Improvement; RCR: Royal College of Radiologists; ACR: American College of Radiologists; ACP: American College of Physicians; APS: American Pain Society; EC: European Commission

The type of low back pain (e.g. acute, chronic) was not specified in any of the studies.

Reporting quality using the RECORD checklist

1 A mixed setting refers to studies that used a data source of imaging referrals in which the referring physician could be practicing in a family practice, in-hospital or emergency department setting. 2 The total number of lumbar spine imaging/referrals reviewed. 3 In addition to the referral, patient charts may have been accessed to determine patient information for determining appropriateness NR: not reported. EBG: Evidence Based Guidelines. NCQA: National Committee for Quality Assurance; RCGP: Royal College of General Practitioners; AHCPR: Agency for Health Care Policy and Research; ICSI: Institute for Clinical Systems Improvement; RCR: Royal College of Radiologists; ACR: American College of Radiologists; ACP: American College of Physicians; APS: American Pain Society; EC: European Commission The type of low back pain (e.g. acute, chronic) was not specified in any of the studies. Reporting quality using the RECORD checklist

Study design

The included studies were all retrospective chart reviews/audits (see S2 Appendix), though not all used common terms to indicate that.[47] The majority of studies were a general chart audit/review done specifically to quantify appropriate imaging for LBP. However, one study’s objective was to quantify appropriateness of CT imaging in young patients and included more than CT imaging of the lumbar spine (e.g., thoracic spine, head, etc.).[49]

Setting

All included studies were a general chart review of medical records and were conducted in a primary care provider setting and reported adequate information for the settings according to the RECORD checklist. The settings were identified as a hospital or health centre, with only one study mentioning data coming from the ED setting alone.[51]

Participants and study size

Participants were largely identified either by patient records, or records of images. Coding used to identify the included records was clearly described in only two studies.[51,52] These two studies were the only studies to justify their sample sizes.

Data sources/variables

Most studies took the information from the patients’ hospital or clinic charts directly. If there was a specific database or computer program that was accessed, it was not communicated in the published paper. Electronic medical records were specified in three studies, but the applications were not identified by name.[48,51,52] One study utilized an insurance claims database.[51]

Data access, cleaning, linkage, and supplementary information

These reporting criteria were poorly or not at all discussed in the studies. If there was linkage involved it was not clarified and if the data cleaning occurred the details were not explained sufficiently. No study mentioned the level of database access researchers had. Only Schlemmer et al. provided supplementary data that was available for access online.[51]

Risk of bias

The four domains that were assessed for RoB were representativeness of patients, misclassification of patients, misclassification of outcome measurement, and inconsistency in data reporting (Fig 2). Four studies were judged to have a high risk of bias, one to have moderate RoB[52] and one to have low RoB.[51]

Fig 2

Risk of bias of included studies as determined by the representativeness of patients, risk of misclassification of patients, misclassification of the outcome of interest, and inconsistent data.

Estimates of appropriateness

X-rays

We found five studies with 4,598 participants that reported the appropriateness of x-rays, with four studies that used the reason for referral to determine appropriateness (Table 1)[47,50-52] One study, by Culleton et al., used the radiology findings report interpreting the image to determine appropriateness.[48] It was excluded from the meta-analysis due to the heterogeneity of outcome assessment and data source. From the four studies with 4,184 participants, we found low quality evidence that 43% (95% CI: 30%, 56%) of x-rays were appropriate (Fig 3). The quality of evidence was downgraded for two reasons; inconsistency and indirectness (Table 2). The estimate was determined to be inconsistent based on non-overlapping confidence intervals of individual estimates across studies. As well, the estimate was downgraded due to indirectness as one of the studies was conducted solely in an ED setting while all others were in a mixed setting health centres with both general and ED physicians.

Fig 3

Meta-analysis for proportion of appropriate x-rays and CT scans for low back pain.

Table 2

GRADE summary of findings for the outcome of appropriateness of x-ray and CT imaging for patients with low back pain.

Appropriateness of x-ray and CT imaging in patients with LBP ordered by primary and emergency care physicians
Population: Patients with any type of low back painSetting: Emergency department, General Practice, HospitalComparison: Back pain guidelines for imaging, assumed to focus on red flag indicators
Outcome	Effect	Number of participants in Studies	Certainty
Appropriateness of x-ray	43% (30 to 56%)	n = 4,184; four studies	Low²^,⁴ ⨁⨁OO
Appropriateness of CTs	54% (51 to 58%)	n = 678; two studies	Very low²^,³^,⁴ ⨁OOO

* GRADE Working Group grades of evidence.

High quality: Further research is very unlikely to change our confidence in the estimate of effect.

Moderate quality: Further research is likely to have an important impact on our confidence in the estimate of effect and may change the estimate.

Low quality: Further research is very likely to have an important impact on our confidence in the estimate of effect and is likely to change the estimate.

Very low quality: We are very uncertain about the estimate.

1 Downgraded due to Risk of Bias

2 Downgraded on Inconsistency

3 Downgraded imprecision

4 Downgraded on indirectness

* GRADE Working Group grades of evidence. High quality: Further research is very unlikely to change our confidence in the estimate of effect. Moderate quality: Further research is likely to have an important impact on our confidence in the estimate of effect and may change the estimate. Low quality: Further research is very likely to have an important impact on our confidence in the estimate of effect and is likely to change the estimate. Very low quality: We are very uncertain about the estimate. 1 Downgraded due to Risk of Bias 2 Downgraded on Inconsistency 3 Downgraded imprecision 4 Downgraded on indirectness CTs. We found two studies with 678 participants that reported the appropriateness of CTs (Table 1). Both studies used the reason for referral to determine appropriateness but used different criteria to define the outcome. Schlemmer et al.[51] defined appropriateness as any red flag condition or pain that has persisted greater than 6 weeks and Oikarinen et al.[49] restricted the definition to only situations of trauma. Using both studies, we found very low-quality evidence that 54% (95% CI: 51%, 58%) of CTs for LBP were appropriate (Fig 3). Similar to the outcome of x-ray appropriateness, the certainty of the estimate for CT appropriateness was downgraded due to inconsistency because of non-overlapping confidence intervals and indirectness because there were differences in the setting that would influence the outcome. Additionally, the estimate was downgraded due to imprecision, although the confidence intervals were somewhat narrow, the estimate is based on a sample size that is less than 2000 participants which challenges the certainty of the estimate (Table 2).

Discussion

Few studies have been published reporting on the appropriateness of x-ray and CT scans ordered by primary care physicians (in general practice or emergency medicine) individually for patients with LBP. Among the studies we identified, most were conducted in European countries. No audit was conducted in countries such as Canada and Australia despite these countries having ongoing national campaigns to reduce unnecessary imaging for LBP (e.g., Choosing Wisely Canada, etc.).[7] From the available evidence, we found that only half of x-rays and CTs are being ordered according to guidelines. However, due to several factors related to inconsistency and indirectness, we have low certainty in this estimate. Our lack of certainty stems largely from the variation or lack of reporting how appropriateness had been defined in these studies. Moreover, the majority of the studies we identified were conducted with very small sample sizes (and were thus underpowered to provide reliable estimates) and were of low methodological and reporting quality. In order to advance the science in this area, better quality studies that are adequately powered and adhere to guidelines for conducting and reporting clinical audits using routinely collected data are required. While another systematic review has investigated imaging appropriateness, it had heterogeneity by including multiple providers and included multiple imaging modality types, including MRI.[17] Our review adds to the current knowledge base in this area by answering a specific question regarding the appropriateness of radiation emitting x-ray and CT for patients with LBP in settings where patients typically seek care. Given that there have been several recent (past 5 years) international campaigns targeting physicians in general practice and emergency departments to reduce imaging, providing a robust assessment of the appropriateness specific to this recommendation is necessary to help clarify the issue and set targets for change.[7] With respect to the estimate of imaging appropriateness, it is important to discuss that we found wide variation in the methods and reporting of the included studies. The six included studies cited 9 different guideline sources, which were not always internationally recognized. In addition, although the names and sometimes references of guidelines were mentioned as the source for determining appropriateness, it was not clear which criteria were used to define the outcome. For example, many guidelines recommended imaging only when red flags were present, and others provided additional criteria, which recommended imaging after a certain duration of LBP and non-response to treatment. It was unclear how these criteria were operationalized to code the reasons for referral as appropriate or not. This could lead to misclassification of the outcome or low reliability of the results. Better reporting of criteria for defining appropriateness and examples of operationalizing the coding protocol would improve our understanding of possible heterogeneity in the outcomes across studies. Other sources of potential heterogeneity included the differences in inclusion criteria regarding patient population, the setting in which imaging referrals were made, and the medical record data sources. For example, two studies looked at patients that were under the age of 40, while one study looked only at patients older than 65 years. While most studies included a mixture of settings with referrals made from hospital-based or general practice-based physicians, one study focused solely on referrals made within an emergency department setting. Lastly, one study collected data from an insurance database, while two looked at EMR, and three did not describe the database other than to mention medical records. These potential sources of clinical heterogeneity may explain some of the inconsistency in the estimates across studies.

Strengths

As with most systematic reviews and meta-analyses, we adhered to the PRISMA guidance for conducting and reporting systematic reviews and meta-analysis using observational data.[53,54] This included a) having two reviewers screen studies and extract data, b) providing an assessment of methodological quality and heterogeneity among the included studies, and c) forward and backward citation tracking to ensure all relevant studies were captured. We focused on an exact question of what the pooled proportion of radiation emitting imaging for patients with LBP in ED and primary care settings were appropriate which allowed us to understand how frequent these test orders are appropriate for these modalities that also cause harm to patients. Exclusion of older guidelines allows us to focus on recent studies that are most applicable to the current guideline recommendations and current health care provider practice. Finally, we used the “RECORD checklist” to provide a robust assessment of the quality of reporting which allowed us to make sound recommendations for advancing the quality and replicability of the science in these types of study designs.

Limitations

Despite its strengths, this study is limited in a few ways. First, due to resource constraints we chose to use a more specific search strategy meaning that it may not have been sufficiently sensitive to identify an exhaustive list of all potentially relevant studies. However, after consultation with a research librarian about this decision we included forward and backward citation tracking to enhance our specific search of electronic databases. While additional citation tracking did identify several potentially relevant studies all but one[51] were later excluded for various reasons (see S2 Appendix). Other limitations of this systematic review involve the quality, risk of bias assessments, and heterogeneity of the included studies. Many of the studies were not described in sufficient detail to assess the quality for replicability. Since a tool does not already exist to help grade the studies that are reporting routinely collected health data, the domains for potential introduction of bias were selected based on expert opinion. This makes it difficult to compare to other systematic reviews. As mentioned, the clinical heterogeneity of the included studies with respect to the definition of appropriateness and differences in the inclusion criteria of patient ages also limits the certainty of our findings around the estimate of appropriateness, which we have reflected in our GRADE assessment.

Future research

Based on this review’s findings, we identified several areas for future research that would improve our knowledge about the appropriateness of LBP imaging. First, only 2 studies assessed the appropriateness of CT images for LBP that were ordered by physicians. One of these studies had a very small sample size and high risk of bias and the other was methodologically sound but was conducted in an ED setting. Future studies in other countries, using similar methods to Schlemmer et al. in both general practice and emergency settings, would be helpful to confirm appropriateness of CTs for LBP. This would involve adhering to the RECORD statement for improved reporting quality. Additionally, for both outcomes of x-rays and CTs, we found that the definition of appropriateness varied among studies and in many cases the definition was often unclear or too vague to allow meaningful interpretation or replication. Thus, as a first essential step, we recommend future research clearly report the definition of appropriateness they are using and the operationalization of the definition for coding purposes. Second, and possibly most important, this field of research would benefit from a standardized definition of appropriateness for x-rays and CTs. This could be based on a spectrum to reflect some variation in the guidelines, ranging from a very strict cut-off (e.g., appropriate if only trauma-indicated used in the Oikarinen et al. study) to more inclusive definitions (e.g., any red-flag indicated and/or having pain greater than 6 weeks as was used in Schlemmer et al).[49,51]

Implications for practice

The results of this systematic review show that in several countries about half of the referrals for LBP imaging (x-rays and CTs) are not appropriate according to the guidelines. Due to the associated patient harms of x-ray and CTs scans including radiation exposure, high rates of incidental findings and risk of delayed recovery, non-adherence to the guidelines represents low-value care for patients.[27] Hence, it is important to better understand why these referrals are made through future research.

Conclusion

Recently there has been a push to reduce unnecessary and inappropriate imaging, not only to save costs, but also to provide better patient care.[10] This review provides an estimate of appropriateness for radiation emitting imaging for LBP, which indicates that only about half of imaging is appropriate according to recent guidelines. However, due to lack of published research, this estimate was not informed by data from many of the countries promoting the reduction of inappropriate imaging such as Canada, Australia and the UK. Moving forward, what we need is for more countries to undertake high quality studies with sufficiently large sample sizes using clear definitions of appropriateness.

Search strategies.

(DOCX) Click here for additional data file.

Studies identified in search strategy (including forward and backward tracking) and the reason(s) they were excluded from descriptive synthesis and meta-analysis.

(DOCX) Click here for additional data file.

RECORD and STROBE checklist items for included studies in descriptive synthesis.

(DOCX) Click here for additional data file. 28 Aug 2019 PONE-D-19-16551 What do we really know about the appropriateness of radiation emitting imaging for low back pain in primary care? A systematic review and meta-analysis of medical record reviews PLOS ONE Dear Mrs. Logan, Thank you for submitting your manuscript to PLOS ONE. After careful consideration, we feel that it has merit but does not fully meet PLOS ONE’s publication criteria as it currently stands. Therefore, we invite you to submit a revised version of the manuscript that addresses the points raised during the review process. ============================== The review question is important, but there are several aspects of the paper that need to be clarified. The three reviewers provided consistent recommendations for changes and I would strongly encourage that the authors address all the comments (particularly those made by reviewer #1). The introduction is quite long and I would suggest to cut some words. The authors should also carefully justify what their analyses add to the Jenkins et al (2018) review, particularly in light of all the sensitivity analyses they presented. ============================== We would appreciate receiving your revised manuscript by Oct 12 2019 11:59PM. When you are ready to submit your revision, log on to https://www.editorialmanager.com/pone/ and select the 'Submissions Needing Revision' folder to locate your manuscript file. If you would like to make changes to your financial disclosure, please include your updated statement in your cover letter. To enhance the reproducibility of your results, we recommend that if applicable you deposit your laboratory protocols in protocols.io, where a protocol can be assigned its own identifier (DOI) such that it can be cited independently in the future. For instructions see: http://journals.plos.org/plosone/s/submission-guidelines#loc-laboratory-protocols Please include the following items when submitting your revised manuscript: A rebuttal letter that responds to each point raised by the academic editor and reviewer(s). This letter should be uploaded as separate file and labeled 'Response to Reviewers'. A marked-up copy of your manuscript that highlights changes made to the original version. This file should be uploaded as separate file and labeled 'Revised Manuscript with Track Changes'. An unmarked version of your revised paper without tracked changes. This file should be uploaded as separate file and labeled 'Manuscript'. Please note while forming your response, if your article is accepted, you may have the opportunity to make the peer review history publicly available. The record will include editor decision letters (with reviews) and your responses to reviewer comments. If eligible, we will contact you to opt in or out. We look forward to receiving your revised manuscript. Kind regards, Gustavo Machado, PhD Academic Editor PLOS ONE Journal Requirements: 1. When submitting your revision, we need you to address these additional requirements. Please ensure that your manuscript meets PLOS ONE's style requirements, including those for file naming. The PLOS ONE style templates can be found at http://www.journals.plos.org/plosone/s/file?id=wjVg/PLOSOne_formatting_sample_main_body.pdf and http://www.journals.plos.org/plosone/s/file?id=ba62/PLOSOne_formatting_sample_title_authors_affiliations.pdf 2. Please include captions for your Supporting Information files at the end of your manuscript, and update any in-text citations to match accordingly. Please see our Supporting Information guidelines for more information: http://journals.plos.org/plosone/s/supporting-information. 3. Please amend your manuscript to include your abstract after the title page. [Note: HTML markup is below. Please do not edit.] Reviewers' comments: Reviewer's Responses to Questions Comments to the Author 1. Is the manuscript technically sound, and do the data support the conclusions? The manuscript must describe a technically sound piece of scientific research with data that supports the conclusions. Experiments must have been conducted rigorously, with appropriate controls, replication, and sample sizes. The conclusions must be drawn appropriately based on the data presented. Reviewer #1: Partly Reviewer #2: Yes Reviewer #3: Yes ********** 2. Has the statistical analysis been performed appropriately and rigorously? Reviewer #1: Yes Reviewer #2: Yes Reviewer #3: N/A ********** 3. Have the authors made all data underlying the findings in their manuscript fully available? The PLOS Data policy requires authors to make all data underlying the findings described in their manuscript fully available without restriction, with rare exception (please refer to the Data Availability Statement in the manuscript PDF file). The data should be provided as part of the manuscript or its supporting information, or deposited to a public repository. For example, in addition to summary statistics, the data points behind means, medians and variance measures should be available. If there are restrictions on publicly sharing data—e.g. participant privacy or use of data from a third party—those must be specified. Reviewer #1: Yes Reviewer #2: Yes Reviewer #3: Yes ********** 4. Is the manuscript presented in an intelligible fashion and written in standard English? PLOS ONE does not copyedit accepted manuscripts, so the language in submitted articles must be clear, correct, and unambiguous. Any typographical or grammatical errors should be corrected at revision, so please note any specific errors here. Reviewer #1: Yes Reviewer #2: No Reviewer #3: Yes ********** 5. Review Comments to the Author Please use the space provided to explain your answers to the questions above. You may also include additional comments for the author, including concerns about dual publication, research ethics, or publication ethics. (Please upload your review as an attachment if it exceeds 20,000 characters) Reviewer #1: Thank you for asking me to review this interesting paper. The paper presented findings from a systematic review and meta-analysis specific to the appropriate use of X-ray and CT for low back pain. The paper reported generally sound methodology and is well written; however, I have a few concerns regarding the included papers and data analysis. Major revisions: 1. Two of the included studies (Baez, 2011 and Culleton, 2013) are only referenced in abstract form. Were full studies published to enable complete data extraction and analysis of risk of bias, or were the authors contacted for further details? More information regarding this should be included, or the reference should be changed to that of the published full text. In addition, one study (Muntion-Alfaro, 2006) was published in Spanish with only the abstract available in English - was the full text translated? 2. More detail regarding the numerator and denominator extracted from the studies to calculate the proportion of appropriate imaging is required. For example was it the proportion of: number of appropriate image referrals/total number of image referrals OR was it the proportion of: number of patients referred for imaging/number of patients determined as appropriate for imaging? Both give different measures of appropriateness but with different denominators they are not directly comparable. Currently it is unclear which measure has been used, or whether both have been used which will impact on the suitability of meta-analysis. This should be made clear in Table 1. 3. The suitability of the meta-analyses I think needs to be further considered. It is unclear in the methods what factors were considered when assessing for clinical homogeneity and this should be further described. The CT meta-analysis only includes 2 studies, of which one is weighted 94% so I am questioning the suitability of pooling these, especially as one of the studies had more limited determination of appropriateness compared to the other (ie. trauma indications only). For the X-ray meta-analysis it is unclear if the outcome measures are comparable (see point 2 above). Minor revisions: Abstract- 1. The conclusion is not clearly articulated - try to make the main finding of the review and possible implications more clear. Introduction - 1. The introduction is quite long and I feel the section of harms of over testing could be summarised more succinctly. Given the aim of doing a review specific to X-ray and CT as opposed to including MRI, some more explanation of why you chose to do this could be useful. In particular you state that X-ray and CT provide the most direct harm, but with no references to support this. Although MRI doesn't have ionising radiation it could arguably reveal more incidental findings and have more costs associated. The use of CT/MRI has also been increasing over time (Downie A, Hancock M, Jenkins H, et al How common is imaging for low back pain in primary and emergency care? Systematic review and meta-analysis of over 4 million imaging requests across 21 years British Journal of Sports Medicine Published Online First: 13 February 2019. doi: 10.1136/bjsports-2018-100087). 2. Page 2 under 'Harms of over-testing' references are required for the first 2 statements. 3. Page 3 top paragraph - I would also consider the potential increased risk of carcinogenic changes in children 4. Page 4 top[ paragraph, line 3. Swap is and also: This financial increase is also associated with... Methods - 1. Was the review registered in Prospero, if so please provide details. 2. Page 5 Under inclusion criteria the numbers (i), (ii) etc. for the five points are repetitive and not sequential in a list: (ii) and (iii) are both listed twice - please modify 3. Page 6 Excluded studies prior to 2000 due to guidelines - but wouldn't this more depend on which guidelines were used rather than the year of data collection (later studies may have used older guidelines) - perhaps consider exclusion on the type of guideline. 4. Page 6 Under data extraction, last sentence: remove the 'was extracted' from the end of the sentence and add 'extracted' to the beginning of the sentence - 'Additional outcome information extracted included...' Discussion - 1. Page 15 paragraph 2. I would disagree with the statement: 'Prior to our review, it was difficult to say anything regarding the appropriateness of imaging for LBP according to the guidelines'. I am an author on the 2018 review into this topic that is then discussed in the same paragraph. This review also looked at imaging appropriateness and conclusions can be made from the data presented in the review (indeed, you referenced one of these conclusions in the introduction). Although the 2018 review did have heterogeneity of included studies as mentioned, the data analysis in the review accounted for this by performing different meta-analyses with respect to the guidelines used to assess appropriateness and the outcome measure used, and by performing sensitivity analysis to account for clinical setting, type of imaging and year of publication. In this paragraph it would be better to see a comparison of the results of your study to that of the previous review with a discussion of possible reasons for similarities/differences, which would include the more specific inclusion criteria of the current review. 2. Page 15 paragraph 2. You state that there have been several campaigns to reduce X-ray and CT use but only provide one reference specific to Canada. I am also not certain whether such campaigns are specific only to Xray and CT in general, or whether they often include all imaging which would include MRI. 3. Page 15 paragraph 3 to Page 16 paragraph 2 - this information may be better moved to under limitations. 4. Page 16 Strengths: Most of the strengths you have listed are fairly standard practice for SLRs. Are there any particular strengths that you feel are more unique to your review - ie. in the way you analysed the data, the question you asked etc. 5. Page 16-17 Limitations: If the meta-analyses are left as is (after considering the points made above) then I feel there should be more discussion of the potential limitations of these. Conclusion - 1. Statement again 'Before this review, it was difficult to say anything regarding how appropriate imaging for LBP is according to the guidelines'. I would again disagree with this statement as discussed above. I would remove this and re-phrase the conclusion accordingly Reviewer #2: I thank the authors for the opportunity to review this manuscript. The authors aimed to investigate the proportion of XR and CT imaging requests for low back pain that were appropriate. This is an extremely important question. Reducing the inappropriate use of imaging is a priority for numerous healthcare organisations and initiatives that aim to reduce low-value care (e.g. Choosing Wisely). However, before resources are spent on strategies to reduce imaging, it is important to understand the size of this problem. Although the review question is important, I don't think the rationale is strong enough for why this review is sufficiently different from the review by Jenkins et al (2018). The authors should carefully justify what their analyses add to the Jenkins et al (2018) review, particularly in light of all the sensitivity analyses that are presented in Table S5 (https://www.sciencedirect.com/science/article/pii/S1529943018302031?via%3Dihub#ec0015). There are also numerous issues with grammar that need to be addressed. For example, the following phrases in the Abstract need to be revised: - 'pooled proportion of appropriateness of CT and XR imaging for low back pain' should be 'pooled proportion of CT and XR imaging for low back pain that were considered appropriate - 'Four studies reported XR appropriateness, one study reported CT appropriateness should be ' Four studies reported on the appropriateness of XR imaging, one on the appropriateness of CT, ...' - the abstract conclusion is similar - The authors should carefully scan the manuscript for similar examples and correct them Abstract - I am unclear what the RECORD checklist is from just reading the abstract. Is it possible to provide a brief explanation in the abstract methods? - I think the abstract conclusion could better reflect the results. For example, 'There is low to very-low quality evidence that only half of XRs and CTs ordered for LBP are appropriate - I would also add the need for future research to properly examine 'appropriateness' given the low quality of the evidence Introduction - remove the abbreviation for diagnostic imaging (DI) as it is not a commonly used phrase -Page 3, 1st paragraph: the authors need to acknowledge that CT exposes patients to substantially more potentially harmful radiation than XRs -Page 3, 2nd paragraph: can the authors also provide data for younger age groups? -Page 3, 2nd paragraph: the authors could also mention that incidental findings can lead to surgery -Page 4, 2nd paragraph: the authors should elaborate on why CT and XR post more direct harms to patients when compared with MRI. I’m not really sure why MRI imaging was excluded from this review. Method -Page 8: For the GRADE criteria 'indirectness of evidence', could the authors provide an example of participants being outside the target group? Results -Page 13, 1st paragraph: I think it is a big assumption that all people presenting to ED with LBP are doing so because of trauma. Do the authors have a reference to support this? - Appendix 3 is difficult to interpret because there is no reference to what each item means. I suggest including a description of the items directly under the table. Table 1 -in the column labelled 'definition of appropriateness', 'no red flags' and 'red flag indicators' appear to be contradictory. Wouldn't the presence of red flags be an indication for appropriateness? -for 'Culleton 2013', there seems to be an 'NR' value included by mistake - please add the setting for each study in this table (i.e. primary care or emergency) Table 2 - the first row in table 2 mentions 'primary care physicians' but my understanding is that studies from ED were also included in this review. My understanding is that ED physicians are not primary care physicians. Could the authors please clarify this and ensure the terminology used throughout the manuscript is consistent in regards to this issue Discussion -Page 14, 1st paragraph: the authors could also make reference to the Choosing Wisely campaign in Australia -Page 16, 3rd paragraph: please remove the use of a random effects meta-analysis as a strength of this review Reviewer #3: Thank you for asking me to review this manuscript. This study is a systematic review and meta-analysis of appropriateness of radiation emitting imaging for low back pain. The manuscript is well written. Please see below some minor comments/suggestions for improvement: 1. Introduction reads well. 2. It is unclear whether the protocol was registered/study followed a registered protocol. 3. Authors have mentioned that they searched Pubmed, CINAHL, and Embase in the abstract but mentioned four databases in the manuscript. Suggest adding the fourth database- The Cochrane Database of Systematic Reviews in the abstract as well. 4. Page 7 – it’s not really an ‘effect size,’ it’s a proportion or pooled proportion. Suggest changing these terms throughout e.g. in Table 2. 5. No data from low-middle income countries – discussion point 6. Authors have lumped proportions with different denominators (% of images vs % patients presenting for care). Does it make sense to do this? Perhaps pooling the proportions with the same denominator would be better. Probably ok to lump scan types in together 7. Are the numbers in Table 2 the number of patients presenting for care, or the number of patients who were referred for imaging? Please make this clear in the manuscript. 8. Details on the number of studies assessing appropriateness of x-rays and CTs do not match in the abstract and manuscript. I’d suggest using consistent language to avoid confusion. For eg, abstract says “Four studies reported x-rays appropriateness, one study reported CT appropriateness, and one study reported on both imaging modalities.” Manuscript says “Five of 6 studies assessed appropriateness of x-rays; two of the six studies assessed appropriateness of CTs” 9. In study methods the authors do not mention whether the study followed PRISMA guidance. It was only mentioned in the Strengths section. Please consider adding it in the Methods section as well. 10. In ‘Estimates of Appropriateness’ section, when describing x-Rays, I suggest adding number of participants (similar to what you have done in ‘CTs’) to make it consistent. Eg, we found five studies with 5010 participants that reported the appropriateness of x-rays. 11. Title of the study is “….appropriateness of imaging for back pain in primary care” but includes studies in emergency department and hospital settings. 12. From Table 1“A mixed setting refers to studies that used a data source of imaging referrals in which the referring physician could be practicing in a family practice, in-hospital or emergency department setting” Some clarity is needed on how the authors have defined primary care. In some healthcare systems hospital-based care is not considered primary care. 13. Suggest using word ‘imaging’ instead of ‘images’ in inclusion criteria, second point, page 5. 14. In ‘Data Access, Cleaning, Linkage, and Supplementary Information’ section, page 12, please add ‘for’ in the sentence ‘No study mentioned the level of database access researchers’. 15. Consider rewording the sentence- Of the six studies, nine different guidelines were used (in study characteristics, page 9). ********** 6. PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files. If you choose “no”, your identity will remain anonymous but your review may still be made public. Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy. Reviewer #1: No Reviewer #2: No Reviewer #3: Yes: Ms Sweekriti Sharma Dr Adrian Traeger [NOTE: If reviewer comments were submitted as an attachment file, they will be attached to this email and accessible via the submission site. Please log into your account, locate the manuscript record, and check for the action link "View Attachments". If this link does not appear, there are no attachment files to be viewed.] While revising your submission, please upload your figure files to the Preflight Analysis and Conversion Engine (PACE) digital diagnostic tool, https://pacev2.apexcovantage.com/. PACE helps ensure that figures meet PLOS requirements. To use PACE, you must first register as a user. Registration is free. Then, login and navigate to the UPLOAD tab, where you will find detailed instructions on how to use the tool. If you encounter any issues or have any questions when using PACE, please email us at figures@plos.org. Please note that Supporting Information files do not need this step. 17 Oct 2019 Reviewer #1: Thank you for asking me to review this interesting paper. The paper presented findings from a systematic review and meta-analysis specific to the appropriate use of X-ray and CT for low back pain. The paper reported generally sound methodology and is well written; however, I have a few concerns regarding the included papers and data analysis. Major revisions: 1. Two of the included studies (Baez, 2011 and Culleton, 2013) are only referenced in abstract form. Were full studies published to enable complete data extraction and analysis of risk of bias, or were the authors contacted for further details? More information regarding this should be included, or the reference should be changed to that of the published full text. In addition, one study (Muntion-Alfaro, 2006) was published in Spanish with only the abstract available in English - was the full text translated? RESPONSE: Thank you for pointing this out for clarification. We did contact Baez 2011 and Culleton 2013 (and mentioned this in the Methods section on page 6), but neither author had published full manuscripts. The full article by Muntion-Alfaro 2006 was translated from Spanish into English. We have clarified this process in the manuscript (Pg 7 & 8). 2. More detail regarding the numerator and denominator extracted from the studies to calculate the proportion of appropriate imaging is required. For example was it the proportion of: number of appropriate image referrals/total number of image referrals OR was it the proportion of: number of patients referred for imaging/number of patients determined as appropriate for imaging? Both give different measures of appropriateness but with different denominators they are not directly comparable. Currently it is unclear which measure has been used, or whether both have been used which will impact on the suitability of meta-analysis. This should be made clear in Table 1. RESPONSE: Thank you for your comment. We have added in the specific definitions to clarify the numerator and denominator used in each study in Table 1 on pg 10. Importantly, we want to be clear that in the calculations of appropriateness for each of the included studies we are confident that the numerator included the number of referrals that met the criteria for appropriateness and the denominator included the total number of images during the study period. In all cases the denominator included only one image per patient. Therefore, the denominators for the included studies are considered clinically homogenous 3. The suitability of the meta-analyses I think needs to be further considered. It is unclear in the methods what factors were considered when assessing for clinical homogeneity and this should be further described. The CT meta-analysis only includes 2 studies, of which one is weighted 94% so I am questioning the suitability of pooling these, especially as one of the studies had more limited determination of appropriateness compared to the other (ie. trauma indications only). For the X-ray meta-analysis it is unclear if the outcome measures are comparable (see point 2 above). RESPONSE: Thank you for this comment and raising two important points regarding the rationale for choosing to include a meta-analysis in this review. On the first point regarding the CT scans and clinical homogeneity, we anticipated a level of clinical heterogeneity (particularly in how the outcome of appropriateness would be defined). For this review, appropriateness was defined as suspicion of any of the red flag conditions (fracture, cauda equina, infection, malignancy). Since our question was about appropriateness in general we made the choice to pool these estimates to answer our broad question. We have added this additional information to the data synthesis section on page 7. We decided to downgrade the quality of the evidence based on indirectness in cases where a large portion of the study data came from studies that focused on a sub-group of participants that may influence the estimate. This can be noted in our GRADE assessment. I hope this answers your questions. Minor revisions: Abstract- 1. The conclusion is not clearly articulated - try to make the main finding of the review and possible implications more clear. RESPONSE: Thank you. We have reworded the conclusion statement in the abstract. Introduction - 1. The introduction is quite long and I feel the section of harms of over testing could be summarised more succinctly. Given the aim of doing a review specific to X-ray and CT as opposed to including MRI, some more explanation of why you chose to do this could be useful. In particular you state that X-ray and CT provide the most direct harm, but with no references to support this. Although MRI doesn't have ionising radiation it could arguably reveal more incidental findings and have more costs associated. The use of CT/MRI has also been increasing over time (Downie A, Hancock M, Jenkins H, et al How common is imaging for low back pain in primary and emergency care? Systematic review and meta-analysis of over 4 million imaging requests across 21 years British Journal of Sports Medicine Published Online First: 13 February 2019. doi: 10.1136/bjsports-2018-100087). RESPONSE: Thank you for this comment. However, we did not state that CT and x-ray expose patients to more direct harms than MRI. We simply mentioned that there are direct harms to the patient from these imaging types. We have edited the introduction to be more concise and have tried to clarify this misunderstanding. 2. Page 2 under 'Harms of over-testing' references are required for the first 2 statements. RESPONSE: Thank you. As you have recommended decreasing the introduction, and since these sentences were not necessary, we have deleted them to be concise. 3. Page 3 top paragraph - I would also consider the potential increased risk of carcinogenic changes in children RESPONSE: Thank you for this comment. We have added a minor comment on paediatric populations. We have refrained from further elaboration, as this is not the population of focus for this systematic review. 4. Page 4 top [ paragraph, line 3. Swap is and also: This financial increase is also associated with... RESPONSE: Thank you for catching this grammar error. We have adjusted it. Methods - 1. Was the review registered in Prospero, if so please provide details. RESPONSE: This review was not registered in Prospero due to time restrictions on the project. 2. Page 5 Under inclusion criteria the numbers (i), (ii) etc. for the five points are repetitive and not sequential in a list: (ii) and (iii) are both listed twice - please modify RESPONSE: Thank you for catching this. We have corrected this mistake. 3. Page 6 Excluded studies prior to 2000 due to guidelines - but wouldn't this more depend on which guidelines were used rather than the year of data collection (later studies may have used older guidelines) - perhaps consider exclusion on the type of guideline. RESPONSE: This is in fact what we were trying to communicate. We apologise for the lack on clarity and have reworded it (pg 6). 4. Page 6 Under data extraction, last sentence: remove the 'was extracted' from the end of the sentence and add 'extracted' to the beginning of the sentence - 'Additional outcome information extracted included...' RESPONSE: We have made this edit. Thank you. Discussion - 1. Page 15 paragraph 2. I would disagree with the statement: 'Prior to our review, it was difficult to say anything regarding the appropriateness of imaging for LBP according to the guidelines'. I am an author on the 2018 review into this topic that is then discussed in the same paragraph. This review also looked at imaging appropriateness and conclusions can be made from the data presented in the review (indeed, you referenced one of these conclusions in the introduction). Although the 2018 review did have heterogeneity of included studies as mentioned, the data analysis in the review accounted for this by performing different meta-analyses with respect to the guidelines used to assess appropriateness and the outcome measure used, and by performing sensitivity analysis to account for clinical setting, type of imaging and year of publication. In this paragraph it would be better to see a comparison of the results of your study to that of the previous review with a discussion of possible reasons for similarities/differences, which would include the more specific inclusion criteria of the current review. RESPONSE: Thank you for your comment. While we thought that this statement was broad enough, we understand your point of view and have removed the sentence. 2. Page 15 paragraph 2. You state that there have been several campaigns to reduce X-ray and CT use but only provide one reference specific to Canada. I am also not certain whether such campaigns are specific only to Xray and CT in general, or whether they often include all imaging which would include MRI. RESPONSE: We only cited one Choosing Wisely Campaign here, but are aware that there are others in different jurisdictions. We were hoping to minimise our word count by only referring to one Choosing Wisely Campaign. However, we have edited the statement to be more accurate with Choosing Wisely recommendations. 3. Page 15 paragraph 3 to Page 16 paragraph 2 - this information may be better moved to under limitations. RESPONSE: Thank you for your suggestion. We think the section on heterogeneity of appropriateness definitions is an important point for the discussion for systematic reviews beyond its limitations. We have left this paragraph as is but we have also referred to it in the limitations as you have thoughtfully suggested. 4. Page 16 Strengths: Most of the strengths you have listed are fairly standard practice for SLRs. Are there any particular strengths that you feel are more unique to your review – ie. In the way you analysed the data, the question you asked etc. RESPONSE: Thank you for your comment. We have added some adjustments to our strengths section. 5. Page 16-17 Limitations: If the meta-analyses are left as is (after considering the points made above) then I feel there should be more discussion of the potential limitations of these. RESPONSE: Thank you for this suggestion. We hope our previous responses have clarified our rationale for pooling the data and that our discussion of the clinical heterogeneity is satisfactory for the reviewer. We have also edited the limitations section to reflect how these decisions may impact the certainty of the estimates presented. Conclusion - 1. Statement again ‘Before this review, it was difficult to say anything regarding how appropriate imaging for LBP is according to the guidelines’. I would again disagree with this statement as discussed above. I would remove this and re-phrase the conclusion accordingly RESPONSE: We have taken your suggestion and removed the sentence. Thank you for your comment. Reviewer #2: I thank the authors for the opportunity to review this manuscript. The authors aimed to investigate the proportion of XR and CT imaging requests for low back pain that were appropriate. This is an extremely important question. Reducing the inappropriate use of imaging is a priority for numerous healthcare organisations and initiatives that aim to reduce low-value care (e.g. Choosing Wisely). However, before resources are spent on strategies to reduce imaging, it is important to understand the size of this problem. Although the review question is important, I don't think the rationale is strong enough for why this review is sufficiently different from the review by Jenkins et al (2018). The authors should carefully justify what their analyses add to the Jenkins et al (2018) review, particularly in light of all the sensitivity analyses that are presented in Table S5 (https://www.sciencedirect.com/science/article/pii/S1529943018302031?via%3Dihub#ec0015). There are also numerous issues with grammar that need to be addressed. For example, the following phrases in the Abstract need to be revised: - 'pooled proportion of appropriateness of CT and XR imaging for low back pain' should be 'pooled proportion of CT and XR imaging for low back pain that were considered appropriate - 'Four studies reported XR appropriateness, one study reported CT appropriateness should be ' Four studies reported on the appropriateness of XR imaging, one on the appropriateness of CT, ...' - the abstract conclusion is similar - The authors should carefully scan the manuscript for similar examples and correct them RESPONSE TO ABOVE FOUR POINTS: Thank you and apologies for these errors. We have made grammar edits as we have noticed them. Abstract - I am unclear what the RECORD checklist is from just reading the abstract. Is it possible to provide a brief explanation in the abstract methods? - I think the abstract conclusion could better reflect the results. For example, 'There is low to very-low quality evidence that only half of XRs and CTs ordered for LBP are appropriate - I would also add the need for future research to properly examine 'appropriateness' given the low quality of the evidence RESPONSE: Thank you. Edits have been made. Introduction - remove the abbreviation for diagnostic imaging (DI) as it is not a commonly used phrase -Page 3, 1st paragraph: the authors need to acknowledge that CT exposes patients to substantially more potentially harmful radiation than XRs -Page 3, 2nd paragraph: can the authors also provide data for younger age groups? -Page 3, 2nd paragraph: the authors could also mention that incidental findings can lead to surgery -Page 4, 2nd paragraph: the authors should elaborate on why CT and XR post more direct harms to patients when compared with MRI. I’m not really sure why MRI imaging was excluded from this review. RESPONSE TO ABOVE 5 POINTS: Thank you. We have removed the DI abbreviation and clarified that CT emits more radiation than XRs. We have briefly mentioned paediatric populations and the risk procedures from imaging. We have chosen not to expand on paediatric populations because that is beyond the scope of this paper and not the population of interest. For the sake of length we have also not expanded on surgery risks. Finally, we have clarified why CT and x-ray are the primary focus of this manuscript. Thank you for all of these suggestions and we hope we have addressed them adequately. Method -Page 8: For the GRADE criteria 'indirectness of evidence', could the authors provide an example of participants being outside the target group? RESPONSE: We have added examples to the indirectness of evidence bullet point “(e.g., differences in populations, outcome measures, and interventions)”. Thank you for this suggestion. Results -Page 13, 1st paragraph: I think it is a big assumption that all people presenting to ED with LBP are doing so because of trauma. Do the authors have a reference to support this? RESPONSE: Thank you for your insight and we agree that this is an assumption. We have removed the sentence, as it was not necessary for the point being made. - Appendix 3 is difficult to interpret because there is no reference to what each item means. I suggest including a description of the items directly under the table. RESPONSE: This appendix is based off of a large and thorough checklist created for reporting reliable information for observational studies. Adding in the description of each item in the checklist would significantly add to the size of this table, thus we have referenced the original table to ensure that this is easy to find. Table 1 -in the column labelled 'definition of appropriateness', 'no red flags' and 'red flag indicators' appear to be contradictory. Wouldn't the presence of red flags be an indication for appropriateness? -for 'Culleton 2013', there seems to be an 'NR' value included by mistake - please add the setting for each study in this table (i.e. primary care or emergency) RESPONSE: Thank you for catching these mistakes in our table 1. We have corrected the definition of appropriateness for the Muntion-Alfaro study, and removed the unnecessary NR in the Culleton row. The settings for each study is found in Table 1 in the second column. Table 2 - the first row in table 2 mentions 'primary care physicians' but my understanding is that studies from ED were also included in this review. My understanding is that ED physicians are not primary care physicians. Could the authors please clarify this and ensure the terminology used throughout the manuscript is consistent in regard to this issue RESPONSE: Thank you for this insight. While we understand that there is some debate regarding whether primary care is inclusive of the emergency department, in our context it is often used to refer to both general and emergency physicians, as they are the first point of contact into the healthcare system. We have edited our introduction to address this issue briefly and continue to discuss both settings as primary care. We hope this explanation is satisfactory. Discussion -Page 14, 1st paragraph: the authors could also make reference to the Choosing Wisely campaign in Australia -Page 16, 3rd paragraph: please remove the use of a random effects meta-analysis as a strength of this review RESPONSE: Thank you. We have removed the random effects sentence, and we did not make reference to other jurisdictions for the sake of word count. Reviewer #3: Thank you for asking me to review this manuscript. This study is a systematic review and meta-analysis of appropriateness of radiation emitting imaging for low back pain. The manuscript is well written. Please see below some minor comments/suggestions for improvement: 1. Introduction reads well. RESPONSE: Thank you for your comment. 2. It is unclear whether the protocol was registered/study followed a registered protocol. RESPONSE: We appreciate the suggestion. However, we did not register a protocol due to time constraints. 3. Authors have mentioned that they searched Pubmed, CINAHL, and Embase in the abstract but mentioned four databases in the manuscript. Suggest adding the fourth database- The Cochrane Database of Systematic Reviews in the abstract as well. RESPONSE: Thank you for your suggestion. We have added it to the abstract. 4. Page 7 – it’s not really an ‘effect size,’ it’s a proportion or pooled proportion. Suggest changing these terms throughout e.g. in Table 2. RESPONSE: Thank you for your suggestion. We have made this change. 5. No data from low-middle income countries – discussion point RESPONSE: Thank you for this observation. Due to our already extended discussion we have chosen not to discuss this for the sake of discussion length. 6. Authors have lumped proportions with different denominators (% of images vs % patients presenting for care). Does it make sense to do this? Perhaps pooling the proportions with the same denominator would be better. Probably ok to lump scan types in together RESPONSE: Please see the response to Comment 3 from Reviewer 1, found on page 2. We hope that response adequately addresses the concerns from this reviewer and have revised Table 1 to be more clear. Thank you for your comments. 7. Are the numbers in Table 2 the number of patients presenting for care, or the number of patients who were referred for imaging? Please make this clear in the manuscript. RESPONSE: We have clarified this point. Thank you. 8. Details on the number of studies assessing appropriateness of x-rays and CTs do not match in the abstract and manuscript. I’d suggest using consistent language to avoid confusion. For eg, abstract says “Four studies reported x-rays appropriateness, one study reported CT appropriateness, and one study reported on both imaging modalities.” Manuscript says “Five of 6 studies assessed appropriateness of x-rays; two of the six studies assessed appropriateness of CTs” RESPONSE: Thank you. We agree that the wording is confusing, and have made changes to be clearer. 9. In study methods the authors do not mention whether the study followed PRISMA guidance. It was only mentioned in the Strengths section. Please consider adding it in the Methods section as well. RESPONSE: This has been mentioned in the Methods section. Thank you for your suggestion. 10. In ‘Estimates of Appropriateness’ section, when describing x-Rays, I suggest adding number of participants (similar to what you have done in ‘CTs’) to make it consistent. Eg, we found five studies with 5010 participants that reported the appropriateness of x-rays. RESPONSE: We have made this change. 11. Title of the study is “….appropriateness of imaging for back pain in primary care” but includes studies in emergency department and hospital settings. RESPONSE: This comment has been made by a previous reviewer and we have addressed it above. 12. From Table 1“A mixed setting refers to studies that used a data source of imaging referrals in which the referring physician could be practicing in a family practice, in-hospital or emergency department setting” Some clarity is needed on how the authors have defined primary care. In some healthcare systems hospital-based care is not considered primary care. RESPONSE: This is a similar comment to another reviewer and we have addressed it above (pg 7 of this response letter). Thank you for your comment! 13. Suggest using word ‘imaging’ instead of ‘images’ in inclusion criteria, second point, page 5. RESPONSE: We have made this change. Thank you. 14. In ‘Data Access, Cleaning, Linkage, and Supplementary Information’ section, page 12, please add ‘for’ in the sentence ‘No study mentioned the level of database access researchers’. RESPONSE: There is in fact a word missing from that sentence and we have corrected it. Thank you for catching our mistake. 15. Consider rewording the sentence- Of the six studies, nine different guidelines were used (in study characteristics, page 9). RESPONSE: We have made this change. Thank you. Submitted filename: Response to PLOS ONE reviewers (1).docx Click here for additional data file. 5 Nov 2019 What do we really know about the appropriateness of radiation emitting imaging for low back pain in primary and emergency care? A systematic review and meta-analysis of medical record reviews PONE-D-19-16551R1 Dear Gabrielle, We are pleased to inform you that your manuscript has been judged scientifically suitable for publication and will be formally accepted for publication once it complies with all outstanding technical requirements. Within one week, you will receive an e-mail containing information on the amendments required prior to publication. When all required modifications have been addressed, you will receive a formal acceptance letter and your manuscript will proceed to our production department and be scheduled for publication. Shortly after the formal acceptance letter is sent, an invoice for payment will follow. To ensure an efficient production and billing process, please log into Editorial Manager at https://www.editorialmanager.com/pone/, click the "Update My Information" link at the top of the page, and update your user information. If you have any billing related questions, please contact our Author Billing department directly at authorbilling@plos.org. If your institution or institutions have a press office, please notify them about your upcoming paper to enable them to help maximize its impact. If they will be preparing press materials for this manuscript, you must inform our press team as soon as possible and no later than 48 hours after receiving the formal acceptance. Your manuscript will remain under strict press embargo until 2 pm Eastern Time on the date of publication. For more information, please contact onepress@plos.org. With kind regards, Gustavo Machado, PhD Academic Editor PLOS ONE Additional Editor Comments (optional): Reviewers' comments: 12 Nov 2019 PONE-D-19-16551R1 What do we really know about the appropriateness of radiation emitting imaging for low back pain in primary and emergency care? A systematic review and meta-analysis of medical record reviews Dear Dr. Logan: I am pleased to inform you that your manuscript has been deemed suitable for publication in PLOS ONE. Congratulations! Your manuscript is now with our production department. If your institution or institutions have a press office, please notify them about your upcoming paper at this point, to enable them to help maximize its impact. If they will be preparing press materials for this manuscript, please inform our press team within the next 48 hours. Your manuscript will remain under strict press embargo until 2 pm Eastern Time on the date of publication. For more information please contact onepress@plos.org. For any other questions or concerns, please email plosone@plos.org. Thank you for submitting your work to PLOS ONE. With kind regards, PLOS ONE Editorial Office Staff on behalf of Dr. Gustavo de Carvalho Machado Academic Editor PLOS ONE

46 in total

Review 1. Appropriate use of lumbar imaging for evaluation of low back pain.

Authors: Roger Chou; Richard A Deyo; Jeffrey G Jarvik
Journal: Radiol Clin North Am Date: 2012-07 Impact factor: 2.303

2. Imaging for low back pain: is clinical use consistent with guidelines? A systematic review and meta-analysis.

Authors: Hazel J Jenkins; Aron S Downie; Chris G Maher; Niamh A Moloney; John S Magnussen; Mark J Hancock
Journal: Spine J Date: 2018-05-03 Impact factor: 4.166

3. Preferred reporting items for systematic reviews and meta-analyses: the PRISMA statement.

Authors: David Moher; Alessandro Liberati; Jennifer Tetzlaff; Douglas G Altman
Journal: Ann Intern Med Date: 2009-07-20 Impact factor: 25.391

4. Eleventh annual Warren K. Sinclair keynote address-science, radiation protection and NCRP: building on the past, looking to the future.

Authors: Jerrold T Bushberg
Journal: Health Phys Date: 2015-02 Impact factor: 1.316

5. Early diagnostic evaluation of low back pain.

Authors: R A Deyo
Journal: J Gen Intern Med Date: 1986 Sep-Oct Impact factor: 5.128

6. Lumbar spine films in primary care: current use and effects of selective ordering criteria.

Authors: R A Deyo; A K Diehl
Journal: J Gen Intern Med Date: 1986 Jan-Feb Impact factor: 5.128

7. Over-imaging in uncomplicated low back pain: a 12-month audit of a general medical unit.

Authors: M H Rego; S Nagiah
Journal: Intern Med J Date: 2016-12 Impact factor: 2.048

8. Adherence of Irish general practitioners to European guidelines for acute low back pain: a prospective pilot study.

Authors: Brona M Fullen; Thomas Maher; Gerard Bury; Aodan Tynan; Leslie E Daly; Deirdre A Hurley
Journal: Eur J Pain Date: 2006-11-27 Impact factor: 3.931

9. Clinical guidelines for the management of low back pain in primary care: an international comparison.

Authors: B W Koes; M W van Tulder; R Ostelo; A Kim Burton; G Waddell
Journal: Spine (Phila Pa 1976) Date: 2001-11-15 Impact factor: 3.468

10. Utilization of medical services for the treatment of acute low back pain: conformance with clinical guidelines.

Authors: W S Schroth; J M Schectman; E G Elinsky; J C Panagides
Journal: J Gen Intern Med Date: 1992 Sep-Oct Impact factor: 5.128

3 in total

1. Physical Therapists Are Routinely Performing the Requisite Skills to Directly Refer for Musculoskeletal Imaging: An Observational Study.

Authors: Lance M Mabry; Richard Severin; Angela S Gisselman; Michael D Ross; Todd E Davenport; Brian A Young; Aaron P Keil; Don L Goss
Journal: J Man Manip Ther Date: 2022-08-13

2. Barriers to following imaging guidelines for the treatment and management of patients with low-back pain in primary care: a qualitative assessment guided by the Theoretical Domains Framework.

Authors: Andrea Pike; Andrea Patey; Rebecca Lawrence; Kris Aubrey-Bassler; Jeremy Grimshaw; Sameh Mortazhejri; Shawn Dowling; Yamile Jasaui; Amanda Hall
Journal: BMC Prim Care Date: 2022-06-03

Review 3. Characterizing and quantifying low-value diagnostic imaging internationally: a scoping review.

Authors: Elin Kjelle; Eivind Richter Andersen; Arne Magnus Krokeide; Lesley J J Soril; Leti van Bodegom-Vos; Fiona M Clement; Bjørn Morten Hofmann
Journal: BMC Med Imaging Date: 2022-04-21 Impact factor: 2.795

3 in total