Literature DB >> 30997838

The diagnostic accuracy of headache measurement instruments: A systematic review and meta-analysis focusing on headaches associated with musculoskeletal symptoms.

Hedwig A van der Meer1,2,3,4,5,6, Corine M Visscher2, Tom Vredeveld3, Maria Wg Nijhuis van der Sanden4, Raoul Hh Engelbert3,5, Caroline M Speksnijder6.   

Abstract

AIM: To systematically review the available literature on the diagnostic accuracy of questionnaires and measurement instruments for headaches associated with musculoskeletal symptoms.
DESIGN: Articles were eligible for inclusion when the diagnostic accuracy (sensitivity/specificity) was established for measurement instruments for headaches associated with musculoskeletal symptoms in an adult population. The databases searched were PubMed (1966-2018), Cochrane (1898-2018) and Cinahl (1988-2018). Methodological quality was assessed with the Quality Assessment of Diagnostic Accuracy Studies tool (QUADAS-2) and COnsensus-based Standards for the selection of health Measurement INstruments (COSMIN) checklist for criterion validity. When possible, a meta-analysis was performed. The Grading of Recommendations Assessment, Development and Evaluation (GRADE) recommendations were applied to establish the level of evidence per measurement instrument.
RESULTS: From 3450 articles identified, 31 articles were included in this review. Eleven measurement instruments for migraine were identified, of which the ID-Migraine is recommended with a moderate level of evidence and a pooled sensitivity of 0.87 (95% CI: 0.85-0.89) and specificity of 0.75 (95% CI: 0.72-0.78). Six measurement instruments examined both migraine and tension-type headache and only the Headache Screening Questionnaire - Dutch version has a moderate level of evidence with a sensitivity of 0.69 (95% CI 0.55-0.80) and specificity of 0.90 (95% CI 0.77-0.96) for migraine, and a sensitivity of 0.36 (95% CI 0.21-0.54) and specificity of 0.86 (95% CI 0.74-0.92) for tension-type headache. For cervicogenic headache, only the cervical flexion rotation test was identified and had a very low level of evidence with a pooled sensitivity of 0.83 (95% CI 0.72-0.94) and specificity of 0.82 (95% CI 0.73-0.91). DISCUSSION: The current review is the first to establish an overview of the diagnostic accuracy of measurement instruments for headaches associated with musculoskeletal factors. However, as most measurement instruments were validated in one study, pooling was not always possible. Risk of bias was a serious problem for most studies, decreasing the level of evidence. More research is needed to enhance the level of evidence for existing measurement instruments for multiple headaches.

Entities:  

Keywords:  Diagnostics; headache; migraine; tension-type headache

Year:  2019        PMID: 30997838      PMCID: PMC6710620          DOI: 10.1177/0333102419840777

Source DB:  PubMed          Journal:  Cephalalgia        ISSN: 0333-1024            Impact factor:   6.292


Introduction

Primary headaches like tension-type headache (TTH) and migraine are associated with various musculoskeletal factors. TTH is, for example, associated with pericranial tenderness, myofascial trigger points and lower muscle coordination of the upper neck flexors (1–4). Furthermore, migraine may be triggered by myofascial trigger points or bruxism (1,5–7). These primary headaches are not caused by musculoskeletal disfunction but are associated with different musculoskeletal symptoms (8). There are several secondary headaches that are actually caused by musculoskeletal problems, such as cervicogenic headache (CGH), headache after whiplash trauma and secondary headache attributed to temporomandibular dysfunction (TMD) (8). The physiotherapist (PT) is a specialist in the musculoskeletal field, and often treats patients with headaches associated with musculoskeletal symptoms. The type of headache must be diagnosed within the physiotherapeutic diagnostic process to choose the proper treatment options and collaborate with medical specialists when needed (9). The International Headache Society (IHS) published the International Classification of Headache Disorders – 3rd edition (ICHD-3), which contains clear diagnostic criteria for all types of headache (8). Several headache measurement instruments are developed for PTs and other health care professionals to classify different headache types (10–14). The ability of a test to discriminate between the target condition and health or not having the target condition, is called the diagnostic accuracy of the test (15). The diagnostic accuracy is often quantified through measures of sensitivity and specificity (15). Insight into the diagnostic accuracy of these instruments for headaches associated with musculoskeletal symptoms is needed to determine the type of headache. Currently there is, to our knowledge, no overview of diagnostic accuracy of the different headache measurement instruments related to the level of evidence. Therefore, the aim of this study was to systematically review the available literature on the diagnostic accuracy of questionnaires and measurement instruments for headaches associated with musculoskeletal symptoms.

Methods

Protocol and registration

This review has been performed according to the PRISMA statement (17) and registered in PROSPERO (registration number: CRD42017062472). Due to the magnitude of articles found within the original search strategy, there were two review questions created. The focus of the current review is the diagnostic accuracy of measurement instruments for headaches associated with musculoskeletal symptoms. A second review (in preparation) will focus on the clinimetric properties of the instruments that measure other outcomes, based on the International Classification of Functioning, Disability and Health (16); for example, measurement instruments for pain, range of motion, limitations in activity, and quality of life.

Eligibility criteria

Only full text original articles were included concerning the diagnostic accuracy, expressed in sensitivity and specificity, of diagnostic headache tests usable for PTs. Further inclusion criteria were: a) adult patients ( ≥18 years) and b) patients that experienced headaches associated with musculoskeletal symptoms. These include migraine, TTH, CGH, headache after whiplash and headache attributed to TMD (8,19,20). There was no minimum sample size for inclusion. No restrictions were put on the year of publication. Intervention studies, prediction models and measurement instruments not usable for PTs (e.g. imaging, nerve blocks) (21) were excluded. Only articles in English were included .

Information sources

The electronic databases PubMed (1966–2018), Cochrane (1898–2018) and Cinahl (1988–2018) were searched for literature. The last search was performed on 25 October 2018. If full texts could not be obtained, the corresponding author was contacted through email to request the full text.

Search

The search strategies included search terms for the construct (e.g. pain, diagnosis), the target population (e.g. migraine, TTH), the instrument (e.g. questionnaire, test) and the methodological PubMed search filter for measurement instruments (21). The search filters for the Cochrane and Cinahl databases were derivatives from the PubMed search filter. The full search strategies for each database can be found in Supplemental material 1. References of retrieved articles were screened for additional relevant studies.

Study selection

Two reviewers (HvdM, CMV) independently assessed titles, abstracts and reference lists of the studies, using the online program Covidence (22). In case of disagreement between the two reviewers, a third reviewer (CMS) made the decision regarding inclusion of the article. After initial screening of the titles and abstracts, HvdM and CMV read the full texts of included articles and screened these for eligibility. All reviewers are orofacial physiotherapists and researchers in this field.

Data collection process

Two reviewers (HvdM, CMS) independently extracted data from the included articles and registered this in a pre-made, empty Table 1 format. The data extracted were: First author, year of publication, target population, information about the index test (aim, language and name), reference test, study population, diagnostic accuracy (sensitivity/specificity).
Table 1.

Study characteristics of included articles stratified for the target populations of patients with migraine, migraine and tension-type headache and cervicogenic headache.

Index test
Population
Diagnostic accuracy
Measurement instrumentAuthor, yearLanguage of index testAim of index testReference testN (%F)Age: mean ± SDSensitivitySpecificity
Target Population: Migraine
3-Question Screen[]Cady, 2004 (10)EnglishTriageICHD (2)3014 (85.2)40.0 ± –0.78*0.27*
Pryse-Phillips, 2002 (59)EnglishTriageNeurologist476 (81.9)40.4 ± –0.860.73
[]Wahab, 2016 (41)EnglishTriageICHD-3 (15)1513 (50.1)23.3 ± 2.50.660.98
Diagnostic screenMichel, 1993 (II) (37)EnglishTriageNeurologist160 (83.3)39.9 ± 0.70.440.93
ID- MigraineBrighina, 2006 (44)ItalianTriageICHD-II (4)222 (73.4)37.8 ± 11.00.950.72
de Mattos, 2017 (45)PortugueseTriageICHD-II (4)232 (82.0)48.9 ± 11.20.920.60
Ertas, 2008 (46)TurkishTriageICHD-II (4)2625 (58.5)43.3–47.3 ± 16–180.80–0.880.74–0.76
[]Gil-Gouveia, 2009 (47)PortugueseTriageICHD-II (4)142 (82.8)39.2 ± 13.90.940.60
[]Karli, 2007 (49)TurkishTriageICHD-II (4)3682 (62.9)45.2 ± 17.00.920.63
Kim, 2006 (50)KoreanTriageICHD-II (4)176 (81.2)30.7 ± 9.30.580.98
Lipton, 2003 (12)EnglishTriageICHD (2)451 (75.6)39.3 ± 10.10.810.75
[]Lipton, 2016 (34)EnglishTriageICHD-3 (15)111 (82.9)46.2 ± 13.40.810.89
[]Siva, 2008 (40)TurkishTriageICHD-II (4)227 (65.6)31.9 ± 5.90.710.79
MSMDQRueda-Sanchez, 2004 (38)SpanishUnclearNeurologist170 ()0.380.99
Migraine Assessment ToolMarcus, 2004 (35)EnglishTriageNeurologist80 (88.8)33.7 ± 9.90.890.79
Migraine Screen QuestionnaireLáinez, 2005 (11)EnglishTriageICHD (2)140 (73.0)39.2 ± 13.00.930.81
Láinez, 2010 (51)EnglishTriageICHD-II (4)9670 (61.9)48.9 ± 17.20.820.97
Migraine-specific questionnaireKallela, 2001 (48)FinnishTriageICHD (2)94 (71.3)44.6 ± 18.00.990.96
Migraine-4Walters, 2015 (42)EnglishTriageICHD-3 (15)1829 (71.5)19.1 ± 2.10.940.92
Modified Algorithm for IHS MigraineMichel, 1993 (I) (36)EnglishReplacementNeurologist267 (70.3)0.95–0.980.53 – 0.78
Screening itemsWang, 2008 (43)TriageICHD-II (4)755 (71.0)37 ± 150.89*0.67*
Structured Migraine Interview QuestionnaireShaik, 2015 (39)MalayUnclearICHD-II (4)157 (100)26.8 ± 8.30.970.63
Target population: Migraine and tension-type headache
Computerized Headache Assessment TestMaizels, 2007 (54)EnglishReplacementHeadache nurse117 (–)M: 0.83–1.00 TTH: 1.00– –
German language questionnaire[]Fritsche, 2007 (13)GermanReplacementICHD-II (4)278 (51.1)43.9 ± –M: 0.73 TTH: 0.85M: 0.96 TTH: 0.98
[]Yoon, 2008 (56)GermanReplacementICHD-II (4)193 (68.4)45.4 ± 12.4M: 0.85 T: 0.60M: 0.85 T: 0.88
Headache Screening Questionnaire – Dutch versionvan der Meer, 2017 (14)DutchTriageICHD-3 (15)105 (78.1)40.3 ± 14.5M: 0.69 PM: 0.89 TTH: 0.36 PTTH: 0.92M: 0.90 PM: 0.54 TTH: 0.86 PTTH: 0.48
Headache questionsHagen, 2010 (53)NorwegianUnclearICHD-II (4)297 (49.0)52.3 ± –M:0.49–0.67 TTH: 0.96 CT: 0.64M: 0.91–0.95 TTH: 0.69 CTTH: 1.00
Self-administered headache questionnaireRasmussen, 1991 (55)DanishReplacementNeurologist713 (–)M: 0.51 TTH: 0.43M: 0.92 TTH: 0.96
Structured Headache Questionnaireel-Sherbiny, 2017 (52)ArabicUnclearICHD-3 (15)232 (72.8)41.2 ± 10.9M: 0.86 CM: 0.71 TTH: 0.93 CTTH: 0.70M: 0.94 CM: 0.98 TTH: 0.93 CTTH: 0.96
Target population: Cervicogenic headache
Cervical Flexion-Rotation Test (CFRT)[]Hall, 2010 (57)n/aUnclearSjaastad criteria (32)60 (63.3)30–35 ± 6.5–10.90.700.70
[]Ogince, 2007 (58)n/aUnclearSjaastad criteria (32)58 (65.5)37–46 ± –0.910.91

Not given in article, therefore calculated based on the published 2 × 2 table.

Articles included in meta-analysis as shown as in Table 3.

MSMDQ: Michel's Standardized Migraine Diagnosis Questionnaire; –: missing data; F: female; SD: standard deviation; M: migraine; CM: chronic migraine; PM: probable migraine; TTH: tension-type headache; CTTH: chronic tension-type headache; PTTH: probable tension-type headache; n/a: not applicable.

Study characteristics of included articles stratified for the target populations of patients with migraine, migraine and tension-type headache and cervicogenic headache. Not given in article, therefore calculated based on the published 2 × 2 table. Articles included in meta-analysis as shown as in Table 3.
Table 3.

Pooled sensitivity and specificity of the 3-Question screen, ID-Migraine, German language questionnaire and Cervical Flexion-Rotation Test.

Measurement instrumentTarget populationNumber of studies; author, yearPooled sensitivity (95% CI)Pooled specificity (95% CI)
3-Question screenMigraine2; Cady, 2004 (10) Wahab, 2016 (41)0.73 (0.71–0.75)0.93 (0.9–0.94)
ID-MigraineMigraine4; Lipton, 2016 (34) Siva, 2008 (40) Gil-Gouveia, 2009 (47) Karli, 2007 (49)0.87 (0.85–0.89)0.75 (0.72–0.78)
German language questionnaireMigraine TTH2; Fritsche, 2007 (13) Yoon, 2008 (56)0.69 (0.63–0.75) 0.81 (0.75–0.87)0.90 (0.86–0.94) 0.96 (0.94–0.98)
Cervical Flexion- Rotation TestCervicogenic headache2; Hall, 2010 (57) Ogince, 2007 (58)0.83 (0.72–0.94)0.82 (0.73–0.91)

N: number; CI: confidence interval; TTH: tension-type headache.

MSMDQ: Michel's Standardized Migraine Diagnosis Questionnaire; –: missing data; F: female; SD: standard deviation; M: migraine; CM: chronic migraine; PM: probable migraine; TTH: tension-type headache; CTTH: chronic tension-type headache; PTTH: probable tension-type headache; n/a: not applicable.

Risk of bias in individual studies

The methodological quality of the included studies was assessed using the Quality Assessment of Diagnostic Accuracy Studies tool (QUADAS-2) (23,24). This tool assesses the risk of bias within four domains: Patient selection, index test, reference standard, and flow and timing (24). Concerns regarding applicability were also determined for the first three domains (24). Methodological quality of studies regarding the criterion validity was assessed using the COSMIN checklist (25). Criterion validity is defined as the degree to which the scores of an instrument are an adequate reflection of a gold standard (26). Within diagnostic accuracy, criterion validity is an essential measurement property. For criterion validity, box H of the COSMIN was used (25). Data extraction and assessment of methodological quality were performed by two reviewers independently (HvdM, CMS). HvdM was trained to use the QUADAS-2 tool and CMS was trained by the COSMIN team on quality appraisal and data extraction. The protocol for methodological assessment using the QUADAS-2 tool for this review was made available for the review authors (Supplemental material 2). The protocol for the COSMIN checklist is published elsewhere (25).

Summary measures

Sensitivity and specificity were used as measures of diagnostic accuracy.

Synthesis of results

A best evidence synthesis was performed using the GRADE recommendations for diagnostic accuracy studies with the GRADE pro online software (27). These recommendations provide a step-by-step assessment to determine the certainty of evidence of a diagnostic test, which results in a comprehensive and transparent approach for developing the recommendations for these tests. To determine the impact of the test, both the sensitivity and specificity of the test must be known as well as the prevalence of the target condition (27). Based on the prevalence of the target population, the pre-test probability of the presence of the headache was determined for a population of 1000 people (27) . The test sensitivity and specificity was used to determine how many people would be accurately diagnosed (true positive) or excluded from having the headache (true negative). A pooled sensitivity and specificity was used for each measurement instrument when there were multiple studies for one measurement tool. The pooled measurements were calculated using the ‘rmeta’ package for the R statistical software (28). A bivariate model resulting in a summary estimate for sensitivity and specificity together was used, as recommended by the Cochrane Collaboration (29,30). This model takes potential threshold effects and the correlation between sensitivity and specificity into account (29,30). The pooled sensitivity and specificity were used for the GRADE recommendations. When there was only one study for a measurement instrument, the published sensitivity and specificity of that measurement instrument were used. Finally, a summary receiver operating characteristics (S-ROC) curve was created using the ‘mada’ package for the R statistical software (29,31,32). Factors determining the quality of evidence according to the GRADE approach are: a) Limitations in study design or execution (risk of bias); b) inconsistency of results; c) indirectness of evidence; d) imprecision; and e) publication bias (27). For limitations, the risk of bias assessment from the QUADAS-2 was used to determine if downgrading of the evidence was needed. When ≥50% of the assessed domains scored a “high” or “unclear” risk of bias, this was considered “serious” and the level of evidence was downgraded by one. When ≥75% of the assessed domains scored a “high” or “unclear” risk of bias, this was considered “very serious” and the level of evidence was downgraded by two. Inconsistency refers to unexplained heterogeneity of the results between multiple studies, after which the level of evidence may be downgraded. The indirectness of evidence was determined by the applicability assessment of the QUADAS-2 tool with the same rules as the risk of bias assessment. In the case where there was only one article studying a measurement tool, the evidence was downgraded for imprecision. All steps of the synthesis of results are depicted in Figure 1.
Figure 1.

Flow of steps after article inclusion.

Flow of steps after article inclusion.

Risk of bias across studies

Methods to detect publication bias are not very reliable in diagnostic accuracy studies (30). As diagnostic accuracy studies have sensitivity and specificity values as outcome measures rather than a stated null hypothesis with a p-value, it is unlikely for publication bias to be associated with statistical nonsignificance (33). Therefore, no publication bias assessment was applied in this review.

Results

The search in all three databases resulted in 4129 articles, which were imported in Covidence (22). After removing duplicates and assessment of eligibility on title/abstract, 150 articles remained to be assessed full text. Of these, 52 articles were excluded based on the inclusion and exclusion criteria (Supplemental material 3) and 67 articles assessed other clinimetric outcome measures than diagnostic accuracy. These 67 articles will be included in the second review regarding clinimetric outcome measures based on the ICF. This resulted in 31 articles to be included in the current review. The complete flowchart of the study selection can be found in Figure 2. No authors were contacted to obtain the full texts of any study.
Figure 2.

Study flow diagram.

Study flow diagram.

Study characteristics

The included headaches associated with musculoskeletal symptoms in this review are migraine, TTH and CGH. No measurement instruments were found that studied the diagnostic accuracy for instruments related to secondary headache attributed to TMD or headache attributed to whiplash injury. Table 1 shows the study characteristics of the 31 included studies, stratified by target population of the measurement instrument. From the 31 studies, 22 articles had migraine as the target population (10–12,34–51). Seven articles had both migraine and TTH as target population (13,14,52–56), and two articles examined patients with CGH (57,58). In total, 28,246 people were included in the 31 studies. Of the included population, 64% were female, though three articles did not describe the gender distribution (38,54,55). Mean age varied from 19 (42) to 52 years (53). For migraine, 11 different measurement instruments were studied (10–12,34–37,40–43,44–51,59). ID-Migraine was the most studied measurement instrument, with nine studies in five languages (12,34,40, 44–47,49,50). Eight of these instruments were screening instruments, one was a replacement test for the diagnostic process, and for two instruments the aim of the test was unclear. Out of the seven studies for both migraine and TTH, only two articles looked at the same questionnaire (13,56). From the seven instruments, one was a screening test, three were replacement tests, and the aim of two was unclear. Both studies on CGH researched the cervical flexion-rotation test (CFRT) (57,58). The aim of the CFRT compared to the ICHD-3 criteria for cervicogenic headache is unclear.

Risk of bias within studies

The risk of bias was assessed for patient selection, index test, reference standard and flow and timing. The summarized assessment of the QUADAS-2 can be found in Table 2. The complete assessment, including reasons for the given scores, can be found in Supplemental material 4. Only one study received a low risk of bias on all domains (43). Twenty-two articles received a “high” risk of bias on ≥1 domain (10–14,35,37, 39–41,43,45–50,55–59). The remaining articles received an “unclear” risk of bias on ≥1 domain (12,35,37, 41,50–53). Risk of bias for the index test and the reference standard was generally scored unclear, because there was uncertainty if the index test was conducted and interpreted without knowledge of the results of the reference standard.
Table 2.

Methodological quality assessment with QUADAS-2 and clinimetric evaluation of the criterion validity with the COSMIN checklist Box H.

Risk of Bias
Applicability concerns
Measurement instrumentStudy1a. Patient selection2a. Index test3a. Reference standard4. Flow and timing1b. Patient selection2b. Index test3b. Reference standardCOSMIN Box H
Target population: Migraine
3-Question ScreenCady, 2004 (10)HighUnclearUnclearHighLowLowLowPoor
Pryse-Phillips, 2002 (59)HighUnclearHighHighLowLowLowPoor
Wahab, 2016 (41)UnclearUnclearUnclearLowLowLowLowFair
Diagnostic ScreenMichel, 1993 (37)UnclearUnclearUnclearHighLowLowLowFair
ID-MigraineBrighina, 2006 (44)LowLowLowLowLowLowLowFair
de Mattos, 2017 (45)HighLowLowUnclearLowLowLowFair
Ertas, 2008 (46)HighLowUnclearLowLowLowLowFair
Gil-Gouveia, 2009 (47)HighLowLowLowLowLowLowFair
Karli, 2007 (49)HighUnclearUnclearHighLowLowLowPoor
Kim, 2006 (50)UnclearLowUnclearLowLowLowLowFair
Lipton, 2003 (12)UnclearUnclearLowUnclearLowLowLowFair
Lipton, 2016 (34)HighUnclearUnclearUnclearLowLowLowFair
Siva, 2008 (40)HighLowLowUnclearLowLowLowFair
MSMDQRueda-Sánchez, 2004 (38)LowUnclearUnclearHighLowLowLowFair
MATMarcus, 2004 (35)LowLowUnclearLowLowLowLowGood
Migraine Screen QuestionnaireLáinez, 2010 (51)LowLowUnclearLowLowLowLowFair
Láinez, 2005 (11)HighHighLowUnclearLowLowLowFair
MSQKallela, 2001 (48)LowLowUnclearHighLowLowLowFair
Migraine-4Walters, 2015 (42)LowUnclearUnclearHighLowLowLowFair
MA-HIS-MMichel, 1993 (36)UnclearLowLowUnclearLowLowLowFair
Screening itemsWang, 2008 (43)UnclearUnclearUnclearHighLowLowLowFair
SMIQShaik, 2015 (39)HighUnclearUnclearLowLowLowLowFair
Target population: Migraine and tension-type headache
CHATMaizels, 2007 (54)HighLowUnclearUnclearLowLowLowPoor
German Language QuestionnaireFritsche, 2007 (13)HighLowLowHighLowLowLowPoor
Yoon, 2008 (56)HighLowUnclearHighLowLowLowPoor
HSQ-DVvan der Meer, 2017 (14)LowLowLowHighLowLowLowExcellent
Headache questionsHagen, 2010 (53)LowUnclearUnclearUnclearLowLowLowFair
SAHQRasmussen, 1991 (55)LowLowHighUnclearLowLowLowPoor
SHQEl-Sherbiny, 2017 (52)UnclearLowUnclearUnclearLowLowLowFair
Target population: cervicogenic headache
Cervical Flexion-Rotation TestHall, 2010 (57)HighLowUnclearUnclearLowLowLowFair
Ogince, 2005 (58)HighUnclearUnclearHighLowLowLowPoor

MSMDQ: Michel's Standardized Migraine Diagnosis Questionnaire; MAT: Migraine Assessment Questionnaire; MSQ: Migraine-specific questionnaire; MA-IHS-M; Modified Algorithm for IHS Migraine; SMIQ: Structured Migraine Interview Questionnaire; CHAT: Computerized Headache Assessment Test; HSQ-DV: Headache Screening Questionnaire – Dutch Version; SAHQ: Self-Administered Headache Questionnaire; SHQ: Structured Headache Questionnaire. An extended version of this table including explanation of judgement can be found in Appendix 4.

Methodological quality assessment with QUADAS-2 and clinimetric evaluation of the criterion validity with the COSMIN checklist Box H. MSMDQ: Michel's Standardized Migraine Diagnosis Questionnaire; MAT: Migraine Assessment Questionnaire; MSQ: Migraine-specific questionnaire; MA-IHS-M; Modified Algorithm for IHS Migraine; SMIQ: Structured Migraine Interview Questionnaire; CHAT: Computerized Headache Assessment Test; HSQ-DV: Headache Screening Questionnaire – Dutch Version; SAHQ: Self-Administered Headache Questionnaire; SHQ: Structured Headache Questionnaire. An extended version of this table including explanation of judgement can be found in Appendix 4. The clinimetric evaluation of the criterion validity was established with the COSMIN Box H. One study scored excellent (14), one good (35), 21 fair (11,12,34,36–48,50–53,57) and the remaining eight scored poor (10,13,50,55–57,59). Of the studies scoring poor, all but two (54,55) also scored a high risk of bias on ≥2 domains (10,12,13,50,55,57,59).

Migraine measurement instruments

Results of individual studies

The sensitivity of the measurement instruments for migraine ranged from 0.38 (38) to 0.99 (48) (see Table 1). Only three studies had a sensitivity below 0.70 (38,41,50) and eight studies found a sensitivity of 0.90 or higher (11,39,42,44, 45,47–49). Half of these studies with a high sensitivity were researching the ID-Migraine (44,45,47,49). Specificity ranged from 0.27 (10) to 0.99 (37). Six studies found a specificity of 0.70 or lower (10,39,43,45, 47,49), and a specificity above 0.90 was found in six other studies (38,41,42,48,50,51). Eleven studies had both sensitivity and specificity above 0.70 (11,12,34, 35,40,42,44,46,48,51,59), of which two studies had both above 0.90 (42,48).

Synthesis of results

For two measurement instruments, the sensitivity and specificity could be pooled. For the 3-question Screen the pooled sensitivity was 0.73 and specificity was 0.93 (Table 3) based on two (10,41) out of three studies, due to missing data in one article (59). The pooled sensitivity for the ID-Migraine was 0.87 and specificity was 0.75 (Table 3, Figures 3(a) and 3(b)). The results were based on four studies (34,40,47,49) as the other five studies (12,44–46,50) did not have sufficient data available to perform the analyses.
Figure 3.

(a) Summary Receiver Operating Characteristics (S-ROC) curves for pooled sensitivity and specificity of the 3-question screen; (b) S-ROC curves for pooled sensitivity and specificity of the ID-migraine; (c) S-ROC curves for pooled sensitivity and specificity of the German questionnaire for migraine; (d) S-ROC curves for pooled sensitivity and specificity of the German questionnaire for tension-type headache; (e) S-ROC curves for pooled sensitivity and specificity of the cervical flexion rotation test.

(a) Summary Receiver Operating Characteristics (S-ROC) curves for pooled sensitivity and specificity of the 3-question screen; (b) S-ROC curves for pooled sensitivity and specificity of the ID-migraine; (c) S-ROC curves for pooled sensitivity and specificity of the German questionnaire for migraine; (d) S-ROC curves for pooled sensitivity and specificity of the German questionnaire for tension-type headache; (e) S-ROC curves for pooled sensitivity and specificity of the cervical flexion rotation test. Pooled sensitivity and specificity of the 3-Question screen, ID-Migraine, German language questionnaire and Cervical Flexion-Rotation Test. N: number; CI: confidence interval; TTH: tension-type headache. There was a very low level of evidence for six measurement instruments for migraine related to the GRADE recommendations: Diagnostic Screen (37), Michel's Standardized Migraine Diagnosis Questionnaire (38), Migraine Specific Questionnaire (48), Migraine-4 (42), Modified Algorithm for IHS Migraine (36), Screening Items (43), and the Structured Migraine Interview Questionnaire (see Table 4) (39). For two measurement instruments, there was a low level of evidence: The 3-question Screen (10,41) and the Migraine Screen Questionnaire (11,51). There was a moderate level of evidence for the ID-Migraine (34,40,47,49) and also for the Migraine Assessment Tool (35).
Table 4.

GRADE recommendations for measurement instruments for target population Migraine, stratified per measurement instrument.

Sensitivity (95% CI)Factors that may decrease certainty of evidence
Effect per 1.000 patients tested*
Measurement instrumentSpecificity (95% CI)OutcomeNumber of studies (number of patients)Study designRisk of biasIndirectnessInconsistencyImprecisionPublication biasPre-test probability of 14.7%*Test accuracy CoE
3-Question Screen (10,41,59)0.73 (0.71–0.75)[]TPTwo studies 2539 patientsCross-sectional (cohort type accuracy study)Serious±Not seriousSeriousNot seriousNone107 (104 to 110)⨁⨁◯◯ Low
FN40 (37 to 43)
0.93 (0.92–0.94)[]TNTwo studies 1988 patientsSerious±Not seriousSeriousNot seriousNone793 (785 to 802)⨁⨁◯◯ Low
FP60 (51 to 68)
Diagnostic Screen (37)0.44 (0.35–0.53)TPOne study 125 patientsCross-sectional (cohort type accuracy study)Very Serious¥Not seriousNot seriousSerious×None65 (51 to 78)⨁◯◯◯ Very low
FN82 (69 to 96)
0.93 (0.85–1.00)TNOne study 41 patientsVery serious¥Not seriousNot seriousSerious×None793 (725 to 530)⨁◯◯◯ Very low
FP60 (0 to 128)
ID-Migraine (34,40,47,49)0.87 (0.85–0.89)[]TPFour studies 1257 patientsCross-sectional (cohort type accuracy study)Serious±Not seriousNot seriousNot seriousNone128 (125 to 131)⨁⨁⨁◯ Moderate
FN19 (16 to 22)
0.75 (0.72–0.78)[]TNFour studies 1109 patientsSerious±Not seriousNot seriousNot seriousNone640 (614 to 665)⨁⨁⨁◯ Moderate
FP213 (188 to 239)
Michel's Standardized Migraine Diagnosis Questionnaire[38]0.38 (0.26–0.52)TPOne study ? patientsCross-sectional (cohort type accuracy study)Very serious¥SeriousNot seriousSerious×None56 (38 to 76)⨁◯◯◯ Very low
FN91 (71 to 109)
0.99 (0.95–1.00)TNOne study ? patientsVery serious¥SeriousNot seriousSerious×None844 (810 to 853)⨁◯◯◯ Very low
FP9 (0 to 43)
Migraine Assessment Tool (35)0.89 (0.80–0.98)[]TPOne study 46 patientsCross-sectional (cohort type accuracy study)Not seriousNot seriousNot seriousSerious×None131 (118 to 144)⨁⨁⨁◯ Moderate
FN16 (3 to 29)
0.79 (0.65-0.93)[]TNOne study 34 patientsNot seriousNot seriousNot seriousSerious×None674 (554 to 793)⨁⨁⨁◯ Moderate
FP179 (60 to 299)
Migraine Screen Questionnaire (11,51)0.82–0.93TPTwo studies ? patientsCross-sectional (cohort type accuracy study)Serious±Serious bNot seriousNot seriousNone121 to 137⨁⨁◯◯ Low
FN10 to 26
0.81–0.97TNTwo studies ? patientsSerious±Serious bNot seriousNot seriousNone691 to 827⨁⨁◯◯ Low
FP26 to 162
Migraine Specific Questionnaire (48)0.99 (0.97–1.00)[]TPOne study 69 patientsCross-sectional (cohort type accuracy study)Serious±SeriousNot seriousSerious×None146 (143 to 147)⨁◯◯◯ Very low
FN1 (0 to 4)
0.96 (0.88–1.00)[]TNOne study 25 patientsSerious±SeriousNot seriousSerious×None819 (751 to 853)⨁◯◯◯ Very low
FP34 ( 0 to 102)
Migraine-4 (42)0.94 (0.87–0.98)TPOne study ? patientsCross-sectional (cohort type accuracy study)Very serious¥Not seriousNot seriousSerious×None138 (128 to 144)⨁◯◯◯ Very low
FN9 (3 to 19)
0.92 (0.90–0.94)TNOne study ? patientsVery serious¥Not seriousNot seriousSerious×None785 (768 to 802)⨁◯◯◯ Very low
FP68 (51 to 85)
Modified Algorithm for IHS Migraine (36)0.95–0.98TPOne study 126 patientsCross-sectional (cohort type accuracy study)Serious±SeriousSeriousSerious×None144 to 144⨁◯◯◯ Very low
FN3 to 7
0.53–0.78TNOne study 141 patientsSerious±SeriousSeriousSerious×None452 to 665⨁◯◯◯ Very low
FP188 to 401
Screening Items (43)0.89 (0.86–0.92)[]TPOne study 363 patientsCross-sectional (cohort type accuracy study)Very serious¥Not seriousNot seriousSerious×None131 (126 to 135)⨁◯◯◯ Very low
FN16 (12 to 21)
0.67 (0.63–0.72)[]TNOne study 392 patientsVery serious¥Not seriousNot seriousSerious×None572 (537 to 614)⨁◯◯◯ Very low
FP281 (239 to 316)
Structured Migraine Interview Questionnaire (39)0.97 (0.94–1.00)[]TPOne study 100 patientsCross-sectional (cohort type accuracy study)Very serious¥Not seriousNot seriousSerious×None143 (138 to 147)⨁◯◯◯ Very low
FN4 (0 to 9)
0.63 (0.50–0.76)[]TNOne study 57 patientsVery serious¥Not seriousNot seriousSerious×None542 (427 to 648)⨁◯◯◯ Very low
FP316 (205 to 426)

Prevalence in the general population of 14.7% is used (65). CoE: certainty of evidence.

±“Unclear” or “high” risk of bias on ≥50 < 75% of the domains on QUADAS-2.

“Unclear” or “high” risk of bias on ≥75% of the domains on QUADAS-2.

×Results based on the outcome of one single study.

95% confidence interval (CI) calculated by reviewers.

GRADE recommendations for measurement instruments for target population Migraine, stratified per measurement instrument. Prevalence in the general population of 14.7% is used (65). CoE: certainty of evidence. ±“Unclear” or “high” risk of bias on ≥50 < 75% of the domains on QUADAS-2. “Unclear” or “high” risk of bias on ≥75% of the domains on QUADAS-2. ×Results based on the outcome of one single study. 95% confidence interval (CI) calculated by reviewers.

Combined migraine and TTH measurement instruments

The aim of the index tests differed between the included seven articles, where four were ‘replacement’ tests (13,54–56), one a ‘triage’ test (14) and two aims were unclear (52,53). Three articles established the diagnostic accuracy for several migraine and TTH ICHD diagnoses aside from the “standard” diagnoses, including chronic migraine, chronic TTH, probable migraine, and probable TTH (14,52,53). For migraine, the sensitivity ranged from 0.49 (53) to 1.00 (54) and the specificity ranged from 0.85 (56) to 0.96 (13). For chronic migraine, the sensitivity and specificity were 0.71 and 0.98 respectively (52). Probable migraine had a sensitivity of 0.89 and a specificity of 0.54 (14). The sensitivity for TTH ranged from 0.36 (14) to 1.00 (54) and the specificity range was 0.69 (53) to 0.98 (13). One study did not establish the specificity results from their test (54). Chronic TTH was tested in two studies, for which the sensitivity was 0.64 (53) to 0.70 (52) and the specificity 0.96 (52) to 1.00 (53). The test for probable TTH had a sensitivity of 0.92 and a specificity of 0.48 (14). For migraine, chronic migraine, and probable migraine (13,14,52,54,56) five studies had a sensitivity above 0.70, which was also found for TTH, chronic TTH, and probable TTH in five studies (see Table 1) (13,14,52–54). All six studies that reported specificity, had a specificity of 0.70 or higher for migraine, chronic migraine, and probable migraine and for TTH chronic TTH, and probable TTH (13,14,52,53,55,56). One instrument, the German Language Questionnaire, was supported by two studies (13,56). The pooled sensitivity and specificity for migraine were 0.69 and 0.90 respectively (Table 3, Figure 3(c)). For TTH, the pooled sensitivity and specificity were 0.81 and 0.96 respectively (Table 3, Figure 3(d)). The five other measurement instruments (14,52–55) were supported by one study and therefore downgraded for imprecision (see also Table 5).
Table 5.

GRADE recommendations for measurement instruments for target populations Migraine and Tension-Type Headache, stratified per measurement instrument.

Sensitivity (95% CI)Factors that may decrease certainty of evidence
Effect per 1.000 patients tested*
Measurement instrumentTarget populationSpecificity (95% CI)Outcome№ of studies № of patients)Study designRisk of biasIndirectnessInconsistencyImprecisionPublication biasPre-test probability of 14.7%* /62.6%**Test accuracy CoE
Computerized Headache Assessment Test (CHAT) (54)Migraine 0.98[] (0.93–1.00)TPOne study 41 patientsCross-sectional (cohort type accuracy study)Very serious¥SeriousNot seriousSerious×None144 (137 to 147)⨁◯◯◯ Very low
FN3 (0 to 10)
1.00[] (1.00–1.00)TNOne study 76 patientsVery serious¥Very seriousNot seriousSerious×None853 (853 to 853)⨁◯◯◯ Very low
FP0 (0 to 0)
TTH 1.00[] (1.00–1.00)TPOne study 14 patientsVery serious¥SeriousNot seriousSerious×None626 (626 to 626)⨁◯◯◯ Very low
FN0 (0 to 0)
1.00[] (1.00–1.00)TNOne study 14 patientsVery serious¥Very seriousNot seriousSerious×None374 (374 to 374)⨁◯◯◯ Very low
FP0 (0 to 0)
German Language Questionnaire (13,56)Migraine 0.69[] (0.63–0.75)TPTwo studies 217 patientsCross-sectional (cohort type accuracy study)Serious±SeriousNot seriousNot seriousNone101 (81 to 118)⨁⨁◯◯ Low
FN46 (29 to 66)
0.90[] (0.86–0.94)TNTwo studies 254 patientsSerious±SeriousNot seriousNot seriousNone768 (657 to 819)⨁⨁◯◯ Low
FP85 (34 to 196)
TTH 0.81[] (0.75–0.87)TPTwo studies 177 patientsSerious±SeriousNot seriousNot seriousNone507 (470 to 545)⨁⨁◯◯ Low
FN119 (81 to 156)
0.96[] (0.94–0.98) TN Two studies 294 patientsSerious±SeriousNot seriousNot seriousNone359 (352 to 367)⨁⨁◯◯ Low
FP 15 (7 to 22)
Headache Screening Questionnaire – Dutch Version (14)Migraine 0.69 (0.55–0.80)TPOne study 55 patientsCross-sectional (cohort type accuracy study)Not seriousNot seriousNot seriousSerious×None101 (81 to 118)⨁⨁⨁◯ Moderate
FN46 (29 to 66)
0.90 (0.77–0.96)TNOne study 50 patientsNot seriousNot seriousNot seriousSerious×None768 (657 to 819)⨁⨁⨁◯ Moderate
FP85 (34 to 196)
TTH 0.36 (0.21–0.54)TPOne study 36 patientsNot seriousNot seriousNot seriousSerious×None225 (131 to 338)⨁⨁⨁◯ Moderate
FN401 (288 to 495)
0.86 (0.74–0.92)TNOne study 69 patientsNot seriousNot seriousNot seriousSerious×None322 (277 to 344)⨁⨁⨁◯ Moderate
FP52 (30 to 97)
Headache Questions (53)Migraine 0.49 (–)[]TPOne study ? patientsCross-sectional (cohort type accuracy study)Very serious¥Not seriousSeriousSerious×None72 (- to -)⨁◯◯◯ Very low
FN75 (- to -)
0.91 (–)[]TNOne study ? patientsVery serious¥Not seriousSeriousSerious×None776 (- to -)⨁◯◯◯ Very low
FP77 (- to -)
TTH 0.96 (0.94–0.98)TPOne study ? patientsVery serious¥Not seriousNot seriousSerious×None601 (588 to 613)⨁◯◯◯ Very low
FN25 (13 to 38)
0.69 (0.63–0.75)TNOne study ? patientsVery serious¥Not seriousNot seriousSerious×None258 (236 to 281)⨁◯◯◯ Very low
FP116 (93 to 138)
Self-administered Headache Questionnaire (55)Migraine 0.51[] (0.41–0.61)TPOne study 93 patientsCross-sectional (cohort type accuracy study)Serious±Not seriousNot seriousSerious×None75 (60 to 90)⨁⨁◯◯ Low
FN72 (57 to 87)
0.92[] (0.90–0.94)TNOne study 619 patientsSerious±Not seriousNot seriousSerious×None785 (768 to 802)⨁⨁◯◯ Low
FP68 (51 to 85)
TTH 0.43[] (0.39–0.47)TPOne study 468 patientsSerious±Not seriousNot seriousSerious×None269 (244 to 294)⨁⨁◯◯ Low
FN357 (332 to 382)
0.96[] (0.94–0.98)TNOne study 244 patientsSerious±Not seriousNot seriousSerious×None359 (352 to 367)⨁⨁◯◯ Low
FP15 (7 to 22)
Structured Headache Questionnaire (52)Migraine 0.86 (0.78–0.97)TPOne study ? patientscross-sectional (cohort type accuracy study)Very serious¥Not seriousNot seriousSerious×None126 (115 to 143)⨁◯◯◯ Very low
FN21 (4 to 32)
0.94 (0.86–0.98)TNOne study ? patientsVery serious¥Not seriousNot seriousSerious×None802 (734 to 836)⨁◯◯◯ Very low
FP51 (17 to 119)
TTH 0.93 (0.79–0.98)TPOne study ? patientsVery serious¥Not seriousNot seriousSerious×None582 (495 to 613)⨁◯◯◯ Very low
FN44 (13 to 131)
0.93 (0.86–1.00)TNOne study ? patientsVery serious¥Not seriousNot seriousSerious×None348 (322 to 374)⨁◯◯◯ Very low
FP26 (0 to 52)

*Prevalence in the general population of 14.7% is used for migraine.

**Prevalence in the general population of 62.6% is used for TTH (65).

CoE: certainty of evidence.

±“Unclear” or “high” risk of bias on ≥50 < 75% of the domains on QUADAS-2.

“Unclear” or “high” risk of bias on ≥75% of the domains on QUADAS-2.

×Results based on the outcome of one single study.

95% confidence interval (CI) calculated by reviewers.

Not possible to calculate 95% CI.

GRADE recommendations for measurement instruments for target populations Migraine and Tension-Type Headache, stratified per measurement instrument. *Prevalence in the general population of 14.7% is used for migraine. **Prevalence in the general population of 62.6% is used for TTH (65). CoE: certainty of evidence. ±“Unclear” or “high” risk of bias on ≥50 < 75% of the domains on QUADAS-2. “Unclear” or “high” risk of bias on ≥75% of the domains on QUADAS-2. ×Results based on the outcome of one single study. 95% confidence interval (CI) calculated by reviewers. Not possible to calculate 95% CI. There was a very low level of evidence for the Computerized Headache Assessment Test (CHAT) (54), the use of Headache Questions (53) and the Structured Headache Questionnaire (52). The German Language Questionnaire (13,54) and the Self-Administered Headache Questionnaire (55) are both supported with a low level of evidence. Only the Headache Screening Questionnaire (HSQ)– Dutch Version was found to have a moderate level of evidence (14).

Cervicogenic headache measurement instruments

The two included studies for CGH established the diagnostic accuracy of the Cervical Flexion-Rotation Test (CFRT) (57,58). Both sensitivity and specificity ranged from 0.70 (57) to 0.91 (58). The pooled sensitivity was 0.83 and the pooled specificity was 0.82 (Table 3, Figure 3(e)). Based on the GRADE recommendations (Table 6), there is a low level of evidence for the use of the CFRT for patients with cervicogenic headache (57,58).
Table 6.

GRADE recommendations for measurement instruments for target population Cervicogenic Headache.

Sensitivity (95% CI)Factors that may decrease certainty of evidence
Effect per 1.000 patients tested*
Measurement instrumentSpecificity (95% CI)OutcomeNumber of studies (number of patients)Study designRisk of biasIndirectnessInconsistencyImprecisionPublication biasPre-test probability of 4.1%*Test accuracy CoE
Cervical Flexion Rotation Test (57,58) 0.83[] (0.72–0.94)TPTwo studies 43 patientsCross-sectional (cohort type accuracy study)Very serious¥Not seriousSeriousNot seriousNone34 (30 to 39)⨁◯◯◯ Very low
FN7 (2 to 11)
0.82[] (0.73–0.91)TNTwo studies 74 patientsVery serious¥Not seriousSeriousNot seriousNone786 (700 to 873)⨁◯◯◯ Very low
FP173 (86 to 259)

Prevalence in the general population of 4.1% is used (76).

CoE: certainty of evidence.

¥“Unclear” or “high” risk of bias on ≥75% of the domains on QUADAS-2.

95% confidence interval (CI) calculated by reviewers.

GRADE recommendations for measurement instruments for target population Cervicogenic Headache. Prevalence in the general population of 4.1% is used (76). CoE: certainty of evidence. ¥“Unclear” or “high” risk of bias on ≥75% of the domains on QUADAS-2. 95% confidence interval (CI) calculated by reviewers.

Discussion

Within this review, for migraine alone 11 tools were identified (10–12, 34–37,40–51,59), for the combination of migraine and TTH six (13,14,52–56), and for CGH one tool (57,58). The sensitivity and specificity of the measurement instruments for migraine ranged from 0.38 (38) to 0.99 (48) and 0.27 (10) to 0.99 (37) respectively. The sensitivity and specificity for migraine based on the combined measurement instruments ranged from 0.49 (53) to 1.00 (54) and 0.85 (56) to 0.96 (13) respectively. For TTH, the sensitivity and specificity ranged from 0.36 (14) to 1.00 (54) and 0.59 (53) to 0.98 (13) respectively. For the CFRT, the only measurement instrument for cervicogenic headache, both the sensitivity and specificity ranged from 0.70 (57) to 0.91 (58). All measurement tools for migraine and TTH were questionnaires. The measurement tool for CGH was a physical examination test. Migraine and TTH are solely based on information from the history of the patient (15), allowing the diagnosis to be derived from a questionnaire. However, the choice of gold standard within headache research is inconsistent. Some studies used the International Classification of Headache Disorders (ICHD) first, second or third edition (15,60,61), others used the diagnosis of a neurologist or a headache nurse and for CGH the Sjaastad criteria were used (62). As the ICHD is based on the most recent scientific findings and clinical expertise from experts worldwide, the newest version of the ICHD is recommended as the gold standard (15,63). The aim of each measurement instrument is described in Table 1. This was unclear for five measurement instruments. Nine measurement instruments are meant to be used as a screening tool in a broader population before seeing a medical specialist for a definitive diagnosis. These screening instruments are recommended for health care providers like PTs, as they are not trained for medical diagnoses but do see these patients often and can refer them to the medical specialist (64). Three measurement instruments studied were meant as a replacement test for the gold standard. This may be efficient for research purposes, as this allows the researchers to diagnose the patients without an extensive visit to a specialist. However, no conclusion was drawn from the included articles as to whether the measurement instruments were better than the gold standard (the medical specialist), therefore the presence of a medical specialist is still recommended in clinical practice. For each measurement tool, the cut-off criteria to recognize headache should be described to allow for comparison of outcomes between studies. In reality, cut-off criteria differed between studies, which resulted in highly variable sensitivity and specificity. The lack of established cut-off points was taken into account within the ‘Index Test’ domain when assessing both methodological qualities and risk of bias. From the 11 measurement instruments found for migraine, only three were supported by evidence of two or more articles: The 3-question screen (10,41,59), the ID-migraine (12,34,40,44–47,49,50) and the Migraine Screen Questionnaire (11,51). Several studies introduced serious patient selection bias by only recruiting patients with the headache they were interested in studying (10). By doing so, there were no false positives or true negatives present, which resulted in more favourable diagnostic accuracy outcome measures. Other studies excluded participants who had a secondary headache (45), or who did not screen positive for a preliminary screening for migraine (45,46,49). One study selected their participants so 50% had a confirmed migraine diagnosis prior to the index test and 50% did not have migraine (11). This also introduced selection bias in favour of the outcomes, as the prevalence of the studied disorder (50% in the tested group versus 14.7% in the general population) determines the pre-test probability and thus the chance of correct diagnosis (65,66). Furthermore, serious bias was introduced in the “flow and timing” section of the articles, as some articles did not properly describe the order of receiving the index test and the reference standard diagnosis. Other studies did not include all participants in the analysis (11,12,34,37,38,40,42,43,48,49,59). The introduced biases on both domains resulted in a downgrade of the certainty of evidence on all measurement instruments except for the Migraine Assessment Tool (35). However, as this tool is only studied in one article, the level of evidence was also downgraded for imprecision. Therefore, there are no measurement instruments for migraine with a high level of evidence. Out of the six measurement instruments that looked at both migraine and TTH, only the German language questionnaire is supported by two articles (13,57). However, due to a serious risk of bias and indirectness, there is only a low level of evidence for this questionnaire. In both studies, only patients with headaches that were also studied in the questionnaire were included, which introduced a serious selection bias (13,57). Similarly, the Computerized Headache Assessment Tool (CHAT) presented a sensitivity of 1.00 for both migraine and TTH, but no true negatives or false positives were available, and no specificity was presented (54). In this study, the gold standard was the diagnosis established by a headache nurse (54). As stated before, this is an unreliable gold standard for a headache diagnosis (63). The seven articles differed in population. Some study samples were retrieved from the general population (53,55,56), others from urgent care or family practice (54), and others from a headache clinic (13,14). In one study, the sample origin was unclear (52). The prevalence used in the GRADE recommendations was for the general population, but in health care settings the prevalence is higher. This increases the pre-test probability of a positive headache diagnosis. This must be taken into consideration when interpreting the results of those studies (14,54,56). Regarding the flow and timing of these studies, not all participants received both the index test and reference standard (52–54,56). Other studies did not include all participants in the final analyses (13,14,53,55). By excluding participants in these ways, the generalization of results is compromised. All these components resulted in very low to moderate level of evidence for the six combined migraine and TTH measurement instruments. Both articles studying the diagnostic accuracy of the cervical flexion rotation test (CFRT) for CGH showed selection bias, as participants were selected based on headache type (57,58). In one study, the sensitivity and specificity were both 0.70 (57), whereas in the other study the sensitivity was 0.91 and the specificity 0.90 (58). In the study with lower diagnostic accuracy, the control group consisted of other headache forms (migraine or multiple headache forms) (57). This makes differentiating between headache types more difficult as other headaches are related to neck problems (5,67,68). The study with higher diagnostic accuracy compared patients with CGH with asymptomatic participants and several patients with migraine (58), which made it easier to recognize the CGH. When this test is applied in the clinic, patients will have a headache complaint and will not be asymptomatic, so the sensitivity and specificity of 0.70 will likely be more accurate. Just as in the current review, another recent systematic review describing physical examination tests for screening and diagnosis of CGH, the CFRT was determined to be the most useful test with the highest reliability and strongest diagnostic accuracy (69). There is, however, a debate in the literature on the reliability of manual ROM tests of the spine (70). Inter-examiner reliability for the cervical spine passive ROM ranged from poor to substantial. The manual tests of the upper cervical spine (C1/2, C2/3) have a fair to substantial level of reliability (70). The reliability of the CFRT has been established to be good to excellent (71). However, CFRT reliability was established by comparing a manual diagnosis of C1/2 dysfunction with the outcome of the CFRT (71). If the reliability of the manual diagnosis of dysfunction is only fair, then the reliability of the CFRT is questionable. However, in another study where the cervical ROM was measured with a device (CROM), a significant difference was found between the ROM in patients with CGH compared to patients with migraine and healthy subjects, which confirms the findings of the included papers of this review (57,58,72). In conclusion, the CFRT is a valid and reliable measure to recognize CGH, though the reliability is higher when using a CROM device rather than assessing the ROM manually.

Strengths and limitations of the study

The current review is, to the authors' knowledge, the first review establishing an overview of the diagnostic accuracy of measurement instruments for headaches associated with musculoskeletal symptoms. By using the QUADAS-2 and COSMIN tool, the methodological quality was assessed in a well-known and internationally accepted manner (24,25). By using the GRADE recommendations, the findings of this review are transparent and easy to translate to the clinical practice (27). There are, however, also a few limitations of this study. Comparison between index and reference test was not easy, as the validation of the index test was performed in a different population compared to the population in which the reference standard was developed. It is important to keep in mind that the diagnostic accuracy is dependent on the prevalence of the target condition in the population; the study sample needs to be taken into consideration when interpreting the results. The prevalence of the target condition is the pre-test probability of a person having that condition, and a good measurement instrument will increase the chance of recognizing the target condition correctly. However, if the study sample is biased by having a very high prevalence in the target condition whereas the measurement instrument would normally be used in a setting with a low prevalence of the target condition, the diagnostic accuracy is not valid for that specific population. Validation studies of measurement instruments should therefore always test the measurement instrument in the population and setting for which it is being validated. Also, some measurement tools were used in different languages and cultures, which must also be considered when interpreting these results. In this review, great variability was found between the different studies, as illustrated in the S-ROC curves in Figure 3(a) and (c). These S-ROC curves show the uncertainty of the findings compared to reality, so the pooled data should be used with caution. The clear gap between the diagnostic accuracy of some measurement instruments between studies showed the necessity of conformation by multiple studies within the same population and against the same reference standard.

Implications for practice

The findings of the current review support the use of the ID-Migraine questionnaire to diagnose migraine with a moderate level of certainty (Table 4). However, patients with headaches often experience multiple headache forms (7,13,74). This warrants a measurement instrument that can diagnose more than one headache. From the questionnaires that looked at both migraine and TTH, the HSQ has the highest level of evidence within this review (Table 5). To establish if there is a migraine and/or a TTH present, this questionnaire is therefore recommended. As CGH needs to be confirmed by physical examination (15), the CFRT is recommended (Table 6). No other measurement instruments for secondary headache related to musculoskeletal complaints were found. Therefore, for these headache types, such as secondary headache attributed to temporomandibular disorders or headache attributed to whiplash injury, no recommendations can be made.

Implications for future research

Currently, there are many questionnaires for migraine and TTH, most of them validated by one study. Future research should use the recommended measurement instruments and validate them in different samples of the same population to increase the level of certainty that the diagnostic accuracy is realistic. The QUADAS-2 and COSMIN tools should be used when designing their studies to enhance their methodological quality. Furthermore, additional clinimetric properties of measurement instruments for headache should be examined. Clinimetric properties such as reliability and responsiveness are important to enhance the care of headache complaints and monitor the course of these complaints. For that reason, the authors are conducting a complementary review to establish the clinimetric properties of measurement instruments for these symptoms and factors (Figure 2). In conclusion, only a few measurement instruments reached a moderate level of evidence for the diagnostic accuracy. For migraine, the ID-Migraine is recommended. For migraine and TTH, the HSQ is recommended, and the CFRT is advised to be used for CGH. However, more studies are needed to validate these instruments further to enhance the level of evidence.

Article highlights

ID-migraine is the most studied diagnostic accuracy measurement instrument for migraine and has a moderate level of certainty. Six measurement instruments are examined that establish the diagnostic accuracy for both migraine and tension-type headache. The Headache Screening Questionnaire has the highest level of evidence to screen for both migraine and tension-type headache. Only the Cervical Flexion Rotation Test studies the diagnostic accuracy for cervicogenic headache, but the level of evidence is very low. Click here for additional data file. Supplemental material, Supplemental Material1 for The diagnostic accuracy of headache measurement instruments: A systematic review and meta-analysis focusing on headaches associated with musculoskeletal symptoms by Hedwig A van der Meer, Corine M Visscher, Tom Vredeveld, Maria WG Nijhuis van der Sanden, Raoul HH Engelbert and Caroline M Speksnijder in Cephalalgia Click here for additional data file. Supplemental material, Supplemental Material2 for The diagnostic accuracy of headache measurement instruments: A systematic review and meta-analysis focusing on headaches associated with musculoskeletal symptoms by Hedwig A van der Meer, Corine M Visscher, Tom Vredeveld, Maria WG Nijhuis van der Sanden, Raoul HH Engelbert and Caroline M Speksnijder in Cephalalgia Click here for additional data file. Supplemental material, Supplemental Material3 for The diagnostic accuracy of headache measurement instruments: A systematic review and meta-analysis focusing on headaches associated with musculoskeletal symptoms by Hedwig A van der Meer, Corine M Visscher, Tom Vredeveld, Maria WG Nijhuis van der Sanden, Raoul HH Engelbert and Caroline M Speksnijder in Cephalalgia Click here for additional data file. Supplemental material, Supplemental Material4 for The diagnostic accuracy of headache measurement instruments: A systematic review and meta-analysis focusing on headaches associated with musculoskeletal symptoms by Hedwig A van der Meer, Corine M Visscher, Tom Vredeveld, Maria WG Nijhuis van der Sanden, Raoul HH Engelbert and Caroline M Speksnijder in Cephalalgia
  68 in total

1.  The International Classification of Headache Disorders: 2nd edition.

Authors: 
Journal:  Cephalalgia       Date:  2004       Impact factor: 6.292

Review 2.  Inter-examiner reliability of passive assessment of intervertebral motion in the cervical and lumbar spine: a systematic review.

Authors:  E van Trijffel; Q Anderegg; P M M Bossuyt; C Lucas
Journal:  Man Ther       Date:  2005-07-01

3.  Validation of a migraine-specific questionnaire for use in family studies.

Authors:  M Kallela; M Wessman; M Färkkilä
Journal:  Eur J Neurol       Date:  2001-01       Impact factor: 6.089

4.  The co-occurrence of headache and musculoskeletal symptoms amongst 51 050 adults in Norway.

Authors:  K Hagen; C Einarsen; J-A Zwart; S Svebak; G Bovim
Journal:  Eur J Neurol       Date:  2002-09       Impact factor: 6.089

5.  Validation of a brief nurse-administered migraine assessment tool.

Authors:  Dawn A Marcus; Cindy Kapelewski; Rolf G Jacob; Thomas E Rudy; Joseph M Furman
Journal:  Headache       Date:  2004-04       Impact factor: 5.887

6.  Simple and efficient recognition of migraine with 3-question headache screen.

Authors:  Roger K Cady; Leona D Borchert; William Spalding; Carolyn C Hart; Fred D Sheftell
Journal:  Headache       Date:  2004-04       Impact factor: 5.887

7.  A self-administered screener for migraine in primary care: The ID Migraine validation study.

Authors:  R B Lipton; D Dodick; R Sadovsky; K Kolodner; J Endicott; J Hettiarachchi; W Harrison
Journal:  Neurology       Date:  2003-08-12       Impact factor: 9.910

8.  Validation of a migraine screening questionnaire in a Colombian university population.

Authors:  M Rueda-Sánchez; L A Díaz-Martínez
Journal:  Cephalalgia       Date:  2004-10       Impact factor: 6.292

9.  A headache diagnosis project.

Authors:  William Pryse-Phillips; Michel Aubé; Marek Gawel; Robert Nelson; Allan Purdy; Keith Wilson
Journal:  Headache       Date:  2002-09       Impact factor: 5.887

10.  The development of QUADAS: a tool for the quality assessment of studies of diagnostic accuracy included in systematic reviews.

Authors:  Penny Whiting; Anne W S Rutjes; Johannes B Reitsma; Patrick M M Bossuyt; Jos Kleijnen
Journal:  BMC Med Res Methodol       Date:  2003-11-10       Impact factor: 4.615

View more
  2 in total

1.  The HUNT4 study: the validity of questionnaire-based diagnoses.

Authors:  Knut Hagen; Anders Nikolai Åsberg; Benjamin L Uhlig; Erling Tronvik; Eiliv Brenner; Trond Sand
Journal:  J Headache Pain       Date:  2019-06-13       Impact factor: 7.277

2.  Can physical testing be used to distinguish between migraine and cervicogenic headache sufferers? A protocol for a systematic review.

Authors:  Ernesto Anarte; Gabriela Ferreira Carvalho; Annika Schwarz; Kerstin Luedtke; Deborah Falla
Journal:  BMJ Open       Date:  2019-11-10       Impact factor: 2.692

  2 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.