Literature DB >> 21128088

Methodological quality of a systematic review on physical therapy for temporomandibular disorders: influence of hand search and quality scales.

Bart Craane1, Pieter Ubele Dijkstra, Karel Stappaerts, Antoon De Laat.   

Abstract

The validity of a systematic review depends on completeness of identifying randomised clinical trials (RCTs) and the quality of the included RCTs. The aim of this study was to analyse the effects of hand search on the number of identified RCTs and of four quality lists on the outcome of quality assessment of RCTs evaluating the effect of physical therapy on temporomandibular disorders. In addition, we investigated the association between publication year and the methodological quality of these RCTs. Cochrane, Medline and Embase databases were searched electronically. The references of the included studies were checked for additional trials. Studies not electronically identified were labelled as "obtained by means of hand search". The included RCTs (69) concerning physical therapy for temporomandibular disorders were assessed using four different quality lists: the Delphi list, the Jadad list, the Megens & Harris list and the Risk of Bias list. The association between the quality scores and the year of publication were calculated. After electronic database search, hand search resulted in an additional 17 RCTs (25%). The mean quality score of the RCTs, expressed as a percentage of the maximum score, was low to moderate and varied from 35.1% for the Delphi list to 54.3% for the Risk of Bias list. The agreement among the four quality assessment lists, calculated by the Interclass Correlation Coefficient, was 0.603 (95% CI, 0.389; 0.749). The Delphi list scored significantly lower than the other lists. The Risk of Bias list scored significantly higher than the Jadad list. A moderate association was found between year of publication and scores on the Delphi list (r = 0.50), the Jadad list (r = 0.33) and the Megens & Harris list (r = 0.43).

Entities:  

Mesh:

Year:  2010        PMID: 21128088      PMCID: PMC3259329          DOI: 10.1007/s00784-010-0490-y

Source DB:  PubMed          Journal:  Clin Oral Investig        ISSN: 1432-6981            Impact factor:   3.573


Introduction

Temporomandibular disorders (TMD) is a collective term embracing a number of clinical problems that involve the masticatory musculature, the temporomandibular joint and associated structures, or both [1]. Physical therapy (PT) is defined as “treatment modalities (including exercise, heat and cold application, electrotherapy, massage, stretching, mobilisation, instructions) in order to prevent, correct and alleviate movement dysfunction and pain of anatomic or physiologic origin” and is frequently used as part of the conservative and non-invasive management of TMD. Although papers on physical treatment for TMD have been published since 1952 [2], the first evidence for its effectiveness based on randomised clinical trials (RCTs) was described in the studies of Kopp and Stenn et al. [3, 4]. In a recent systematic review, 69 RCTs regarding PT for TMD were identified up to February 2010. Retrieving evidence from large electronic databases such as Medline, Embase and the Cochrane Central Register of Controlled Trials is challenging. The use of adequate search strategies can increase the number of relevant studies while minimising the number of non-relevant studies. In addition to the electronic search strategies, hand searching of all the references of the electronically identified RCTs found, as well as the references of the references of the newly discovered RCTs (manual cross-reference search), may again increase the number of relevant RCTs. The first aim of the present study was to assess the influence of hand searching on the number of RCTs found in a systematic review. Quality assessment of the identified RCTs is important. Various methods, such as quality scales, criteria lists and checklists can be used [5]. Quality of RCTs defined as ‘the likelihood of the trial design to generate unbiased results’ covers only the dimension of internal validity [6]. Most quality lists however, measure at least three dimensions: internal validity, external validity and statistical validity [7, 8]. Even an ethical component in the concept of quality can be distinguished. The ethical principles of beneficence (doing the best for one’s patients and clients), non-malfeasances (doing no harm), patients’ autonomy, justice and equity are positively associated with the quality of a trial [9]. Up to now, it is not clear what the effect is of the different quality lists on the outcome of quality assessment of a particular study. The second aim of the present study therefore was to analyse the effect of four quality lists (Delphi, Jadad, Megens & Harris and Risk of Bias) on the quality assessment of RCTs. The four different lists were applied on the set of 69 RCTs regarding PT for TMD. PT is a relatively young profession evolving over time. The last decades, the number of published RCTs regarding the effect of the PT interventions on musculoskeletal problems in general and on TMDs in particular, has increased. Assessing the methodological quality of the RCTs in our recent systematic review prompted the question: ‘Has the methodological quality of RCTs increased over time?’, and consequently, the third aim of this study was to analyse the association between publication year and methodological quality as assessed by the different criteria lists. In summary, based on a recently completed systematic review on the effectiveness of PT on TMD, the aims of the present study were: (1) to analyse the importance of hand search in identifying relevant studies; (2) to analyse the influence of different quality lists on the results of the quality assessment of RCTs; (3) to analyse the association between publication year and the quality of the RCTs (assessed by four different criteria lists).

Material and methods

Importance of hand search

Three databases, Cochrane, Medline and Embase, were searched electronically via OVID (last search date: February 2010) for relevant RCTs concerning the effects of PT on TMD. The search strategies are based on the search strategy developed for Medline but revised appropriately for each database to take in to account differences in controlled vocabulary (MeSH) and syntax rules (Appendix). All identified studies were screened for their relevance. A study was included in the review process if the title, abstract or full text indicated a RCT regarding PT and TMD. In addition to these databases, the Web of Science was also searched. All studies identified in the database search, published in 2000 and later, were imported in the Web of Science to search for publications citing the studies identified in the searches (Cited Reference Search). The publications found in Web of Science were then again screened for relevance on their title, abstract or full text. In a next step, the references of all the included RCTs were checked manually for relevant RCTs (reference check) and finally the references of (systematic) reviews concerning PT and TMD that were identified through the electronic search were checked manually for relevant RCTs. All RCTs not identified by means of electronic databases were labelled as “obtained by means of hand search”.

Influence of criteria list used

All included RCTs (n = 69) were assessed on their methodological quality by one observer (BC) using four different quality lists. The Delphi list was developed by consensus among experts. It consists of ten items (scoring range, 0 to 10). The Delphi list assesses three dimensions of quality: internal and external validity and statistical considerations [10]. The Risk of Bias list was developed by a workgroup of methodologists, editors and review authors and is recommended by The Cochrane Collaboration [11]. It consists of six items (scoring range, 0 to 6). The Megens & Harris list [12] was developed by the McMaster Occupational Therapy Evidence-Based Practice Research Group [13, 14]. It consists of ten items (scoring range, 0 to 11). The Jadad list [6] is a criteria list initially compiled by a multidisciplinary panel of six “judges” and narrowed down by means of the Nominal Group Consensus Technique [7]. It consists of three items which assess internal validity (scoring range, 0 to 5). An overview of the lists has been summarised in Table 1.
Table 1

Overview of four quality lists: Delphi, Risk of Bias (RoB), Megens and Harris (M&H) and Jadad

DelphiRoBM&HJadad
Randomization
Was the method of randomisation performedWas the allocation sequence adequately generated?Was the study randomised (this includes the use of words such as randomly, random and randomisation)?Was the study described as randomised (this includes the use of words such as randomly, random and randomisation)?
+1 Described+1 Described and appropriate
−1 Described and inappropriate
If subjects were randomly allocated to treatment groups, was the method of random allocation concealed?Was the allocation adequately concealed?
Similarity of groups
Were the groups similar at baseline regarding the most important prognostic characteristics?Were the groups similar at baseline?
Inclusion/exclusion criteria
Were both inclusion and exclusion criteria specified?Were the inclusion and exclusion criteria listed for the subjects?
Blinding
Was the outcome assessor blinded?Was knowledge of the allocated interventions adequately prevented during the study?Was the patient, the treatment provider and the assessor blinded?Was the study described as double blind? (blinding of patients and evaluators, not necessarily therapist)
Was the patient blinded?+1 Method of blinding described and appropriate
Was the care provider blinded?−1 Method of blinding not appropriate
Statistics
Were point estimates and measures of variability presented for primary outcome measure(s)?
Did the analysis include an ‘intention-to-treat’ analysis?
Drop-outs and completeness data
Were the drop-outs described and acceptable?Were incomplete outcome data adequately addressed?Were the drop outs reported?Was there a description of withdrawals and drop-outs? (explicit statement that all included patients were analysed or if the number and reasons for dropouts in all groups are given separately)
Description of other criteria for trial quality
Was the treatment protocol sufficiently described to be replicable?
Was the validity of data obtained with the outcome measures addressed?
Was the reliability of data obtained with the outcome measures investigated?
Was the follow-up minimum 6 months?
Was a home program adherence investigated? If included!
Are reports of the study free of suggestion of selective outcome reporting?
Was the study apparently free of other problems that could put it at a risk of bias?
Overview of four quality lists: Delphi, Risk of Bias (RoB), Megens and Harris (M&H) and Jadad A score of 1 was given for each item fulfilled by the RCT. A score of 0 was given if the item was not fulfilled or when it was unclearly reported. The scores were summed and for comparison between lists, the percentage of the total possible score was calculated (= quality score (QS)). This percentage was used for the statistical analysis. The agreement among the four quality lists for the complete set of 69 RCTs was calculated by the interclass correlation coefficient (ICC) as described by Portney and Watkins [15]. Since the four scales can be regarded as a random sample of all possible quality lists, the ICC expresses inter-scale agreement in a single rating. Differences between the different quality lists were analysed with repeated measures ANOVA and a post hoc analysis (Bonferroni corrected).

Quality of RCTs related to the year of publication

The quality of the RCTs, assessed as the percentage number of positive items scored on the different quality lists, was correlated (Pearson’s r) with the year of publication (from 1978 to 2009). For all statistic calculations, we used SPSS® Software Version 16.

Results

After removing duplicate studies (281), the electronic and hand search of the literature resulted in 407 articles. After applying the inclusion and exclusion criteria, 69 RCTs concerning PT and TMD remained for systematic review. Reasons for exclusion were: no data on treatment effect (251), reviews (29), no randomised controlled trials (37), data of a subsequently published trial (7), physical therapy after neoplastic conditions or systemic diseases (2), no TMD pathology (4), no PT as previously defined (5), irrelevant outcome variables (2), and therapy on painless TMD symptoms (1). The source of identification of the included studies is presented in Fig. 1. The electronic search identified 52 (75%) studies included in the review. Hand search resulted in an additional 17 (25%) RCTs. The Cochrane Central Register of Controlled Trials provided 35 (51%), the Embase database 36 (52%) and the Medline database 39 (57%) of the included studies. Twenty (29%) studies were identified in all three databases.
Fig. 1

Number of RCTs according to the source of identification (Cochrane = the Cochrane Central Register of Controlled Trials)

Number of RCTs according to the source of identification (Cochrane = the Cochrane Central Register of Controlled Trials)

Influence of criteria lists

Scrutinising the criteria composing the different quality lists resulted in the following observations: all criteria list includes items to identify randomisation or the procedure of randomisation. The requirement to score positively on this item is different for the different lists. All four lists include items about ‘randomisation’, ‘blinding’ and ‘dropouts’. The Delphi list differentiates between the ‘levels of blinding’ (patient, therapist or observer) whereas the Jadad list includes ‘a description of the blinding method’. The Delphi list and the Risk of Bias list, assess ‘treatment allocation’ and ‘statistical analysis’. ‘The presentation of the data’ is assessed only in the Delphi list. The Megens & Harris list is the only one that scores, ‘the length of follow-up’, ‘home programme’, ‘reliability’ and ‘validity of the outcome measurement’ and ‘description of treatment protocol’. Only the Delphi and the Megens & Harris lists assess ‘the similarity of the groups at baseline’. The Risk of Bias list contains ‘selective outcome reporting’ and ‘other potential threats to validity’. In Table 2, the included studies are presented with their quality scores according to the different quality assessment methods. The Delphi scores varied between 0 and 8 points out of 10. The Risk of Bias scores varied between 0 and 6 out of 6. The Megens & Harris scores varied between 2 and 9 out of 10 and between 2 and 11 out of 11 (if ‘home programme adherence’ was investigated). The Jadad scores varied between 0 and 4 out of 5. Two studies scored maximum scores for the Risk of Bias list and one study scored maximum in the Megens & Harris list. None of the studies were assigned maximum scores on any other criteria lists. The mean (SE) quality score of the 69 RCTs, expressed as a percentage of the maximum possible score, varied from 35.1 (2.2) for the Delphi list, 48.7 (2.4) for the Jadad list, 49.5 (2.2) for the Megens & Harris list to 54.3 (2.4) for the Risk of Bias list. The agreement between the four quality assessment lists (ICC) was 0.603 (95% CI, 0.389; 0.749). In repeated measures ANOVA, a significant difference was found between the scores of the different scales. (F 3,204 = 44.2819 (p = <0.001)). Post hoc analysis (Bonferroni corrected) made it clear that the Delphi list scored significantly lower than the other three lists and that the Risk of Bias list scored significantly higher than the Jadad list (Table 3).
Table 2

Results of the quality score for the different criteria lists expressed as a percentage of the maximum possible positive items scored

AuthorDelphi (%)RoB (%)M&H (%)Jadad (%)
Al-Badawi 200440674040
Alvaraz 200210332020
Bakke 200840336040
Bender 199120333020
Bertolucci 199520333020
Brooke 198310503040
Burgess 198830504040
Carlson 200160676480
Carmeli 200130505060
Conti 199740505060
Crockett 198620333620
Dahlstrom 19840502740
Dalen 19861002720
De Abreu 200540675060
DeLaat 200340676060
Dogu 200920504020
Dohrman 197830503660
Dworkin 199460837380
Dworkin 2002a40506440
Dworkin 2002b40503640
Erlandson 198910503040
Funch 198430337360
Gardea 200160677060
Gavish 200640676060
Glaros 200750676460
Glas 200030674660
Gray 199430334080
Ismaïl 200750503040
Kavuncu 199920503040
Klobas 2006401007380
Komiyama 199930506040
Kopp 197920673060
Kruger 199820333020
Kulekcioglu 200340505040
Linde 198530503640
Magnussen 19992004640
Maloney 200220334020
Mazzetto 200740505080
Michelotti 200450506440
Minakuchi 200470837080
Monteiro 19880331820
Moystad 199040505040
Mulet 200760838280
Nunez 200630504040
Okeson 198320503640
Olson 198720674060
Peroz 2004801007080
Reid 199450506060
Schiffman 199660676060
Schiffman 20077010010080
Shin 199730504040
Stam 198430676040
Stegenga 199320505040
Stenn 197930333040
Talaat 19860332020
Taube 198830674060
Taylor 198750674060
Taylor 199440504040
Townsend 200120507020
Treacy 199930675040
Truelove 200670838280
Tullberg 200370836080
Turk 199320332740
Turner 200870839060
Wahlund 200330676460
Wright 199550508060
Wright 200050677360
Yuasa 20011017300
Yoshida 200540676060
Table 3

The mean quality scores (+standard error) expressed as a percentage of the maximum possible score

ScaleMean scoreStd. error95% Confidence interval
Delphi35.12.230.6; 39.5
Risk of bias54.32.449.5; 59.2
MH49.52.245.1; 53.9
Jadad48.72.443.9; 53.5
Results of the quality score for the different criteria lists expressed as a percentage of the maximum possible positive items scored The mean quality scores (+standard error) expressed as a percentage of the maximum possible score

Quality of RCTs related to year of publication

The correlation between trial quality and the year of publication was 0.497 (95% CI, 0.295; 0.656) for the Delphi list, 0.329 (95% CI, 0.101; 0.525) for the Risk of Bias list, 0.481 (95% CI, 0.276; 0.644) for the Megens & Harris list, and 0.219 (95% CI, −0.018; 0.433) for the Jadad list.

Discussion

Hand search identified 17 RCTs (25%) that were not found in the electronic databases. In a recent study, Egger and Smith concluded that the Cochrane Central Register of Controlled Trials is still likely to be the best source of information and should be the first one to be examined by those carrying out systematic reviews [16]. In the present study, 51% of the studies were found in the Cochrane Central Register of Controlled Trials, 52% in Embase and 57% in Medline. This illustrates that consulting also other databases is important to reduce the selection bias in identifying studies to be included. In addition, since Cochrane, Medline and Embase searches together resulted in only 75% of the included reports, our present study indicates that hand search plays a valuable role in identifying randomised controlled trials. Similar results were found in a previous report in which 82% of the studies were identified by means of complex electronic searches [17]. The present results, therefore, concur with Richards [18] who commented that although complex electronic searches using a range of databases may identify the majority of trials, hand searching is still valuable in identifying randomised trials. Also Crumley et al. highlighted the importance of searching multiple sources for conducting a systematic review [19]. For example, only 23 of 33 (67%) studies were found while searching Embase in a study of Al-Hajeri et al. [20]. Possible reasons why electronic searches fail are multiple: lack of relevant indexing terms, inconsistency by indexers, reports published as abstracts and/or included in supplements that are not routinely indexed by electronic databases [21, 22]. The Cochrane Collaboration has recognised the importance of searching journals page-by-page and reference-by-reference to trace as many relevant articles as possible and has set up a worldwide journals hand searching programme to identify RCTs [23]. The use of a criteria list allows estimating the methodological quality of the design and conduct of the trial. The items of the different criteria lists focus on different methodological aspects of RCTs and enable assessment of methodological quality by a summation of criteria scores. Calculating summary scores inevitably involves assigning a particular ‘weight’ to different items in the scale, and it is difficult to justify the weights assigned. Therefore, the summation scores must be simply interpreted as a ‘number of items scored positively’ on the list. The summation of these quality scores results in a hierarchical list in which more positive items indicate a better methodological quality [24]. However, different sets of criteria applied to the same set of trials do not always provide similar results [25]. The present study compared the overall QS resulting from different quality lists and showed significant differences in mean scores expressed as a percentage. These observed differences probably result in part from the variation of items included in the different lists. Only 3 out of 15 different items used in the four quality scales are represented in all four of them: ‘randomisation’, ‘blinding’ and ‘drop-outs’. Additionally, the ‘wording’ of similar items is different in the different lists. In the Delphi and Risk of Bias lists, assessment of randomisation requires more specific information, while in the Megens & Harris and the Jadad list, the simple use of words such as randomly, random and randomisation is sufficient to score positive for this item. ‘Blinding’ is represented in all four lists, but the Delphi list discriminates between outcome assessor, therapist and patient and consequently ‘blinding’ scores 3 items out of 10. By contrast, in the Risk of Bias method, blinding is represented as only 1 item out of 6, and in the Megens & Harris list as 1 item out of 10 or 11. In the Jadad list, an extra point can be earned if the method of randomisation is explicitly described and therefore ‘blinding’ accounts for 2 items out of 5. In most of the PT interventions, blinding of the therapist and patient is impossible. Consequently the ‘weight’ of blinding as 3 out of 10 items for the Delphi list and 1 out of 6 for the Risk of Bias list could cause lower quality scores for PT studies using the Delphi list. A typical example in the present review was the study of Carmeli et al. [26] that scored 3 on the Risk of Bias list and also 3 on the Delphi list. Whereas ‘blinding’ represents 1 item out of 6 for the Risk of Bias list (=17%), it counts for 3 items out of 10 for the Delphi list (=33%). Well-conducted RCTs provide the best evidence on the efficacy of a particular treatment. Since the publication of a study undertaken for Britain’s Medical Research Council by Hill in 1948, that may have been the first to have all the methodological elements of a modern RCT [27], the number of RCTs published each year increases immensely: according to Pubmed, over 9,000 new RCTs were published in 2008. For the practising clinician, it becomes impossible to keep up with the recent evidence. To appraise and synthesise this information, systematic reviews can be of great help. Of course, the validity of the conclusions of a systematic review depends on the quality of the included studies, and one could wonder whether the methodological quality of RCTs improved over the years. The present study analysed the correlation of the different quality scores with the year of publication and showed improvement of the methodological quality of RCTs as assessed by the Delphi list, the Megens & Harris list, the Jadad list and the Risk of Bias list. The correlation between year of publication and the results obtained with the Jadad list was not significant. A possible reason for this finding is the low number of items included (3 items versus 10 or 11 for Delphi and Megens & Harris lists). Similar to our findings, Falagas et al. [28] observed a temporal evolution of methodological quality of RCTs in various research fields (including PT), but he concluded that only certain aspects of the methodological quality improved significantly over time. In our study, we did not analyse the temporal trend for the different items separately. The results of the study of Falagas et al. may explain the different correlations for the different lists since the contents of the assessment differ per list. However, it must be noted that the 95% confidence intervals around the correlations found in the present study overlap for all lists. Our findings are in contrasts with those of Koes et al. [29] who did not find an association between the year of publication and the methodological quality of physiotherapeutic interventions studies. Although the highest methodological scores were attained during the last decade, Fernández-de-las-Peñas compared the methodological quality of RCTs evaluating PT in tension-type headache, migraine and cervicogenic headache, published before and after 2000 and found no significant differences [30].

Conclusion

Hand searching contributes considerably to the search results for RCTs. Different quality lists lead to significantly different scores. Therefore, a specific criteria list must be carefully chosen when quality scores are taken into account in drawing conclusions on evidence. The quality of RCTs regarding PT for TMD does improve over time if assessed by the Delphi list, the Megens & Harris list and the Risk of Bias list. Below is the link to the electronic supplementary material. Electronic search strategy for the Cochrane Central Register of Controlled Trials (CENTRAL), for Medline and Embase. (DOCX 13 kb)
  81 in total

1.  Transcutaneous electrical nerve stimulation in the treatment of myofascial pain dysfunction.

Authors:  L R Kruger; W J van der Linden; P E Cleaton-Jones
Journal:  S Afr J Surg       Date:  1998-02       Impact factor: 0.375

2.  Usefulness of posture training for patients with temporomandibular disorders.

Authors:  E F Wright; M A Domenech; J R Fischer
Journal:  J Am Dent Assoc       Date:  2000-02       Impact factor: 3.634

3.  Effect of indomethacin phonophoresis on the relief of temporomandibular joint pain.

Authors:  S M Shin; J K Choi
Journal:  Cranio       Date:  1997-10       Impact factor: 2.020

4.  A comparison of treatment modes in the management of myofascial pain dysfunction syndrome.

Authors:  D J Crockett; M E Foreman; L Alden; B Blasberg
Journal:  Biofeedback Self Regul       Date:  1986-12

5.  Management of mouth opening in patients with temporomandibular disorders through low-level laser therapy and transcutaneous electrical neural stimulation.

Authors:  Silvia Cristina Núñez; Aguinaldo Silva Garcez; Selly Sayuri Suzuki; Martha Simões Ribeiro
Journal:  Photomed Laser Surg       Date:  2006-02       Impact factor: 2.796

6.  Effects of intraoral appliance and biofeedback/stress management alone and in combination in treating pain and depression in patients with temporomandibular disorders.

Authors:  D C Turk; H S Zaki; T E Rudy
Journal:  J Prosthet Dent       Date:  1993-08       Impact factor: 3.426

7.  Biofeedback and relaxation therapy for chronic temporomandibular joint pain: predicting successful outcomes.

Authors:  D P Funch; E N Gale
Journal:  J Consult Clin Psychol       Date:  1984-12

8.  Transcutaneous nerve stimulation in a group of patients with rheumatic disease involving the temporomandibular joint.

Authors:  A Møystad; B S Krogstad; T A Larheim
Journal:  J Prosthet Dent       Date:  1990-11       Impact factor: 3.426

9.  The additional value of a home physical therapy regimen versus patient education only for the treatment of myofascial pain of the jaw muscles: short-term results of a randomized clinical trial.

Authors:  Ambra Michelotti; Michel H Steenks; Mauro Farella; Francesca Parisini; Roberta Cimino; Roberto Martina
Journal:  J Orofac Pain       Date:  2004

10.  Low intensity laser application in temporomandibular disorders: a phase I double-blind study.

Authors:  Marcelo O Mazzetto; Thaise G Carrasco; Eliana F Bidinelo; Renata C de Andrade Pizzo; Rafaela G Mazzetto
Journal:  Cranio       Date:  2007-07       Impact factor: 2.020

View more
  3 in total

1.  Trial quality checklists: on the need to multiply (not add) scores.

Authors:  Kaitlin E Palys; Vance W Berger; Sunny Alperson
Journal:  Clin Oral Investig       Date:  2013-06-22       Impact factor: 3.573

2.  How effective is undergraduate palliative care teaching for medical students? A systematic literature review.

Authors:  Jason W Boland; Megan E L Brown; Angelique Duenas; Gabrielle M Finn; Jane Gibbins
Journal:  BMJ Open       Date:  2020-09-09       Impact factor: 2.692

3.  Transnational migration and Mexican women who remain behind: An intersectional approach.

Authors:  Higinio Fernández-Sánchez
Journal:  PLoS One       Date:  2020-09-14       Impact factor: 3.240

  3 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.