Literature DB >> 30997838

The diagnostic accuracy of headache measurement instruments: A systematic review and meta-analysis focusing on headaches associated with musculoskeletal symptoms.

Hedwig A van der Meer^1,2,3,4,5,6, Corine M Visscher², Tom Vredeveld³, Maria Wg Nijhuis van der Sanden⁴, Raoul Hh Engelbert^3,5, Caroline M Speksnijder⁶.

Abstract

AIM: To systematically review the available literature on the diagnostic accuracy of questionnaires and measurement instruments for headaches associated with musculoskeletal symptoms.
DESIGN: Articles were eligible for inclusion when the diagnostic accuracy (sensitivity/specificity) was established for measurement instruments for headaches associated with musculoskeletal symptoms in an adult population. The databases searched were PubMed (1966-2018), Cochrane (1898-2018) and Cinahl (1988-2018). Methodological quality was assessed with the Quality Assessment of Diagnostic Accuracy Studies tool (QUADAS-2) and COnsensus-based Standards for the selection of health Measurement INstruments (COSMIN) checklist for criterion validity. When possible, a meta-analysis was performed. The Grading of Recommendations Assessment, Development and Evaluation (GRADE) recommendations were applied to establish the level of evidence per measurement instrument.
RESULTS: From 3450 articles identified, 31 articles were included in this review. Eleven measurement instruments for migraine were identified, of which the ID-Migraine is recommended with a moderate level of evidence and a pooled sensitivity of 0.87 (95% CI: 0.85-0.89) and specificity of 0.75 (95% CI: 0.72-0.78). Six measurement instruments examined both migraine and tension-type headache and only the Headache Screening Questionnaire - Dutch version has a moderate level of evidence with a sensitivity of 0.69 (95% CI 0.55-0.80) and specificity of 0.90 (95% CI 0.77-0.96) for migraine, and a sensitivity of 0.36 (95% CI 0.21-0.54) and specificity of 0.86 (95% CI 0.74-0.92) for tension-type headache. For cervicogenic headache, only the cervical flexion rotation test was identified and had a very low level of evidence with a pooled sensitivity of 0.83 (95% CI 0.72-0.94) and specificity of 0.82 (95% CI 0.73-0.91). DISCUSSION: The current review is the first to establish an overview of the diagnostic accuracy of measurement instruments for headaches associated with musculoskeletal factors. However, as most measurement instruments were validated in one study, pooling was not always possible. Risk of bias was a serious problem for most studies, decreasing the level of evidence. More research is needed to enhance the level of evidence for existing measurement instruments for multiple headaches.

Entities: Disease Gene Species

Keywords: Diagnostics; headache; migraine; tension-type headache

Year: 2019 PMID： 30997838 PMCID： PMC6710620 DOI： 10.1177/0333102419840777

Source DB: PubMed Journal: Cephalalgia ISSN： 0333-1024 Impact factor: 6.292

Introduction

Primary headaches like tension-type headache (TTH) and migraine are associated with various musculoskeletal factors. TTH is, for example, associated with pericranial tenderness, myofascial trigger points and lower muscle coordination of the upper neck flexors (1–4). Furthermore, migraine may be triggered by myofascial trigger points or bruxism (1,5–7). These primary headaches are not caused by musculoskeletal disfunction but are associated with different musculoskeletal symptoms (8). There are several secondary headaches that are actually caused by musculoskeletal problems, such as cervicogenic headache (CGH), headache after whiplash trauma and secondary headache attributed to temporomandibular dysfunction (TMD) (8). The physiotherapist (PT) is a specialist in the musculoskeletal field, and often treats patients with headaches associated with musculoskeletal symptoms. The type of headache must be diagnosed within the physiotherapeutic diagnostic process to choose the proper treatment options and collaborate with medical specialists when needed (9). The International Headache Society (IHS) published the International Classification of Headache Disorders – 3rd edition (ICHD-3), which contains clear diagnostic criteria for all types of headache (8). Several headache measurement instruments are developed for PTs and other health care professionals to classify different headache types (10–14). The ability of a test to discriminate between the target condition and health or not having the target condition, is called the diagnostic accuracy of the test (15). The diagnostic accuracy is often quantified through measures of sensitivity and specificity (15). Insight into the diagnostic accuracy of these instruments for headaches associated with musculoskeletal symptoms is needed to determine the type of headache. Currently there is, to our knowledge, no overview of diagnostic accuracy of the different headache measurement instruments related to the level of evidence. Therefore, the aim of this study was to systematically review the available literature on the diagnostic accuracy of questionnaires and measurement instruments for headaches associated with musculoskeletal symptoms.

Methods

Protocol and registration

This review has been performed according to the PRISMA statement (17) and registered in PROSPERO (registration number: CRD42017062472). Due to the magnitude of articles found within the original search strategy, there were two review questions created. The focus of the current review is the diagnostic accuracy of measurement instruments for headaches associated with musculoskeletal symptoms. A second review (in preparation) will focus on the clinimetric properties of the instruments that measure other outcomes, based on the International Classification of Functioning, Disability and Health (16); for example, measurement instruments for pain, range of motion, limitations in activity, and quality of life.

Eligibility criteria

Only full text original articles were included concerning the diagnostic accuracy, expressed in sensitivity and specificity, of diagnostic headache tests usable for PTs. Further inclusion criteria were: a) adult patients ( ≥18 years) and b) patients that experienced headaches associated with musculoskeletal symptoms. These include migraine, TTH, CGH, headache after whiplash and headache attributed to TMD (8,19,20). There was no minimum sample size for inclusion. No restrictions were put on the year of publication. Intervention studies, prediction models and measurement instruments not usable for PTs (e.g. imaging, nerve blocks) (21) were excluded. Only articles in English were included .

Information sources

The electronic databases PubMed (1966–2018), Cochrane (1898–2018) and Cinahl (1988–2018) were searched for literature. The last search was performed on 25 October 2018. If full texts could not be obtained, the corresponding author was contacted through email to request the full text.

Search

The search strategies included search terms for the construct (e.g. pain, diagnosis), the target population (e.g. migraine, TTH), the instrument (e.g. questionnaire, test) and the methodological PubMed search filter for measurement instruments (21). The search filters for the Cochrane and Cinahl databases were derivatives from the PubMed search filter. The full search strategies for each database can be found in Supplemental material 1. References of retrieved articles were screened for additional relevant studies.

Study selection

Two reviewers (HvdM, CMV) independently assessed titles, abstracts and reference lists of the studies, using the online program Covidence (22). In case of disagreement between the two reviewers, a third reviewer (CMS) made the decision regarding inclusion of the article. After initial screening of the titles and abstracts, HvdM and CMV read the full texts of included articles and screened these for eligibility. All reviewers are orofacial physiotherapists and researchers in this field.

Data collection process

Two reviewers (HvdM, CMS) independently extracted data from the included articles and registered this in a pre-made, empty Table 1 format. The data extracted were: First author, year of publication, target population, information about the index test (aim, language and name), reference test, study population, diagnostic accuracy (sensitivity/specificity).

Table 1.

Study characteristics of included articles stratified for the target populations of patients with migraine, migraine and tension-type headache and cervicogenic headache.

		Index test			Population		Diagnostic accuracy
Measurement instrument	Author, year	Language of index test	Aim of index test	Reference test	N (%F)	Age: mean ± SD	Sensitivity	Specificity
Target Population: Migraine
3-Question Screen	[†]Cady, 2004 (10)	English	Triage	ICHD (2)	3014 (85.2)	40.0 ± –	0.78*	0.27*
	Pryse-Phillips, 2002 (59)	English	Triage	Neurologist	476 (81.9)	40.4 ± –	0.86	0.73
	[†]Wahab, 2016 (41)	English	Triage	ICHD-3 (15)	1513 (50.1)	23.3 ± 2.5	0.66	0.98
Diagnostic screen	Michel, 1993 (II) (37)	English	Triage	Neurologist	160 (83.3)	39.9 ± 0.7	0.44	0.93
ID- Migraine	Brighina, 2006 (44)	Italian	Triage	ICHD-II (4)	222 (73.4)	37.8 ± 11.0	0.95	0.72
	de Mattos, 2017 (45)	Portuguese	Triage	ICHD-II (4)	232 (82.0)	48.9 ± 11.2	0.92	0.60
	Ertas, 2008 (46)	Turkish	Triage	ICHD-II (4)	2625 (58.5)	43.3–47.3 ± 16–18	0.80–0.88	0.74–0.76
	[†]Gil-Gouveia, 2009 (47)	Portuguese	Triage	ICHD-II (4)	142 (82.8)	39.2 ± 13.9	0.94	0.60
	[†]Karli, 2007 (49)	Turkish	Triage	ICHD-II (4)	3682 (62.9)	45.2 ± 17.0	0.92	0.63
	Kim, 2006 (50)	Korean	Triage	ICHD-II (4)	176 (81.2)	30.7 ± 9.3	0.58	0.98
	Lipton, 2003 (12)	English	Triage	ICHD (2)	451 (75.6)	39.3 ± 10.1	0.81	0.75
	[†]Lipton, 2016 (34)	English	Triage	ICHD-3 (15)	111 (82.9)	46.2 ± 13.4	0.81	0.89
	[†]Siva, 2008 (40)	Turkish	Triage	ICHD-II (4)	227 (65.6)	31.9 ± 5.9	0.71	0.79
MSMDQ	Rueda-Sanchez, 2004 (38)	Spanish	Unclear	Neurologist	170 ()	–	0.38	0.99
Migraine Assessment Tool	Marcus, 2004 (35)	English	Triage	Neurologist	80 (88.8)	33.7 ± 9.9	0.89	0.79
Migraine Screen Questionnaire	Láinez, 2005 (11)	English	Triage	ICHD (2)	140 (73.0)	39.2 ± 13.0	0.93	0.81
	Láinez, 2010 (51)	English	Triage	ICHD-II (4)	9670 (61.9)	48.9 ± 17.2	0.82	0.97
Migraine-specific questionnaire	Kallela, 2001 (48)	Finnish	Triage	ICHD (2)	94 (71.3)	44.6 ± 18.0	0.99	0.96
Migraine-4	Walters, 2015 (42)	English	Triage	ICHD-3 (15)	1829 (71.5)	19.1 ± 2.1	0.94	0.92
Modified Algorithm for IHS Migraine	Michel, 1993 (I) (36)	English	Replacement	Neurologist	267 (70.3)	–	0.95–0.98	0.53 – 0.78
Screening items	Wang, 2008 (43)	–	Triage	ICHD-II (4)	755 (71.0)	37 ± 15	0.89*	0.67*
Structured Migraine Interview Questionnaire	Shaik, 2015 (39)	Malay	Unclear	ICHD-II (4)	157 (100)	26.8 ± 8.3	0.97	0.63
Target population: Migraine and tension-type headache
Computerized Headache Assessment Test	Maizels, 2007 (54)	English	Replacement	Headache nurse	117 (–)	–	M: 0.83–1.00 TTH: 1.00	– –
German language questionnaire	[†]Fritsche, 2007 (13)	German	Replacement	ICHD-II (4)	278 (51.1)	43.9 ± –	M: 0.73 TTH: 0.85	M: 0.96 TTH: 0.98
German language questionnaire	[†]Yoon, 2008 (56)	German	Replacement	ICHD-II (4)	193 (68.4)	45.4 ± 12.4	M: 0.85 T: 0.60	M: 0.85 T: 0.88
Headache Screening Questionnaire – Dutch version	van der Meer, 2017 (14)	Dutch	Triage	ICHD-3 (15)	105 (78.1)	40.3 ± 14.5	M: 0.69 PM: 0.89 TTH: 0.36 PTTH: 0.92	M: 0.90 PM: 0.54 TTH: 0.86 PTTH: 0.48
Headache questions	Hagen, 2010 (53)	Norwegian	Unclear	ICHD-II (4)	297 (49.0)	52.3 ± –	M:0.49–0.67 TTH: 0.96 CT: 0.64	M: 0.91–0.95 TTH: 0.69 CTTH: 1.00
Self-administered headache questionnaire	Rasmussen, 1991 (55)	Danish	Replacement	Neurologist	713 (–)	–	M: 0.51 TTH: 0.43	M: 0.92 TTH: 0.96
Structured Headache Questionnaire	el-Sherbiny, 2017 (52)	Arabic	Unclear	ICHD-3 (15)	232 (72.8)	41.2 ± 10.9	M: 0.86 CM: 0.71 TTH: 0.93 CTTH: 0.70	M: 0.94 CM: 0.98 TTH: 0.93 CTTH: 0.96
Target population: Cervicogenic headache
Cervical Flexion-Rotation Test (CFRT)	[†]Hall, 2010 (57)	n/a	Unclear	Sjaastad criteria (32)	60 (63.3)	30–35 ± 6.5–10.9	0.70	0.70
	[†]Ogince, 2007 (58)	n/a	Unclear	Sjaastad criteria (32)	58 (65.5)	37–46 ± –	0.91	0.91

Not given in article, therefore calculated based on the published 2 × 2 table.

Articles included in meta-analysis as shown as in Table 3.

MSMDQ: Michel's Standardized Migraine Diagnosis Questionnaire; –: missing data; F: female; SD: standard deviation; M: migraine; CM: chronic migraine; PM: probable migraine; TTH: tension-type headache; CTTH: chronic tension-type headache; PTTH: probable tension-type headache; n/a: not applicable.

Study characteristics of included articles stratified for the target populations of patients with migraine, migraine and tension-type headache and cervicogenic headache. Not given in article, therefore calculated based on the published 2 × 2 table. Articles included in meta-analysis as shown as in Table 3.

Table 3.

Pooled sensitivity and specificity of the 3-Question screen, ID-Migraine, German language questionnaire and Cervical Flexion-Rotation Test.

Measurement instrument	Target population	Number of studies; author, year	Pooled sensitivity (95% CI)	Pooled specificity (95% CI)
3-Question screen	Migraine	2; Cady, 2004 (10) Wahab, 2016 (41)	0.73 (0.71–0.75)	0.93 (0.9–0.94)
ID-Migraine	Migraine	4; Lipton, 2016 (34) Siva, 2008 (40) Gil-Gouveia, 2009 (47) Karli, 2007 (49)	0.87 (0.85–0.89)	0.75 (0.72–0.78)
German language questionnaire	Migraine TTH	2; Fritsche, 2007 (13) Yoon, 2008 (56)	0.69 (0.63–0.75) 0.81 (0.75–0.87)	0.90 (0.86–0.94) 0.96 (0.94–0.98)
Cervical Flexion- Rotation Test	Cervicogenic headache	2; Hall, 2010 (57) Ogince, 2007 (58)	0.83 (0.72–0.94)	0.82 (0.73–0.91)

N: number; CI: confidence interval; TTH: tension-type headache.

Risk of bias in individual studies

The methodological quality of the included studies was assessed using the Quality Assessment of Diagnostic Accuracy Studies tool (QUADAS-2) (23,24). This tool assesses the risk of bias within four domains: Patient selection, index test, reference standard, and flow and timing (24). Concerns regarding applicability were also determined for the first three domains (24). Methodological quality of studies regarding the criterion validity was assessed using the COSMIN checklist (25). Criterion validity is defined as the degree to which the scores of an instrument are an adequate reflection of a gold standard (26). Within diagnostic accuracy, criterion validity is an essential measurement property. For criterion validity, box H of the COSMIN was used (25). Data extraction and assessment of methodological quality were performed by two reviewers independently (HvdM, CMS). HvdM was trained to use the QUADAS-2 tool and CMS was trained by the COSMIN team on quality appraisal and data extraction. The protocol for methodological assessment using the QUADAS-2 tool for this review was made available for the review authors (Supplemental material 2). The protocol for the COSMIN checklist is published elsewhere (25).

Summary measures

Sensitivity and specificity were used as measures of diagnostic accuracy.

Synthesis of results

A best evidence synthesis was performed using the GRADE recommendations for diagnostic accuracy studies with the GRADE pro online software (27). These recommendations provide a step-by-step assessment to determine the certainty of evidence of a diagnostic test, which results in a comprehensive and transparent approach for developing the recommendations for these tests. To determine the impact of the test, both the sensitivity and specificity of the test must be known as well as the prevalence of the target condition (27). Based on the prevalence of the target population, the pre-test probability of the presence of the headache was determined for a population of 1000 people (27) . The test sensitivity and specificity was used to determine how many people would be accurately diagnosed (true positive) or excluded from having the headache (true negative). A pooled sensitivity and specificity was used for each measurement instrument when there were multiple studies for one measurement tool. The pooled measurements were calculated using the ‘rmeta’ package for the R statistical software (28). A bivariate model resulting in a summary estimate for sensitivity and specificity together was used, as recommended by the Cochrane Collaboration (29,30). This model takes potential threshold effects and the correlation between sensitivity and specificity into account (29,30). The pooled sensitivity and specificity were used for the GRADE recommendations. When there was only one study for a measurement instrument, the published sensitivity and specificity of that measurement instrument were used. Finally, a summary receiver operating characteristics (S-ROC) curve was created using the ‘mada’ package for the R statistical software (29,31,32). Factors determining the quality of evidence according to the GRADE approach are: a) Limitations in study design or execution (risk of bias); b) inconsistency of results; c) indirectness of evidence; d) imprecision; and e) publication bias (27). For limitations, the risk of bias assessment from the QUADAS-2 was used to determine if downgrading of the evidence was needed. When ≥50% of the assessed domains scored a “high” or “unclear” risk of bias, this was considered “serious” and the level of evidence was downgraded by one. When ≥75% of the assessed domains scored a “high” or “unclear” risk of bias, this was considered “very serious” and the level of evidence was downgraded by two. Inconsistency refers to unexplained heterogeneity of the results between multiple studies, after which the level of evidence may be downgraded. The indirectness of evidence was determined by the applicability assessment of the QUADAS-2 tool with the same rules as the risk of bias assessment. In the case where there was only one article studying a measurement tool, the evidence was downgraded for imprecision. All steps of the synthesis of results are depicted in Figure 1.

Figure 1.

Flow of steps after article inclusion.

Risk of bias across studies

Methods to detect publication bias are not very reliable in diagnostic accuracy studies (30). As diagnostic accuracy studies have sensitivity and specificity values as outcome measures rather than a stated null hypothesis with a p-value, it is unlikely for publication bias to be associated with statistical nonsignificance (33). Therefore, no publication bias assessment was applied in this review.

Results

The search in all three databases resulted in 4129 articles, which were imported in Covidence (22). After removing duplicates and assessment of eligibility on title/abstract, 150 articles remained to be assessed full text. Of these, 52 articles were excluded based on the inclusion and exclusion criteria (Supplemental material 3) and 67 articles assessed other clinimetric outcome measures than diagnostic accuracy. These 67 articles will be included in the second review regarding clinimetric outcome measures based on the ICF. This resulted in 31 articles to be included in the current review. The complete flowchart of the study selection can be found in Figure 2. No authors were contacted to obtain the full texts of any study.

Figure 2.

Study flow diagram.

Study characteristics

The included headaches associated with musculoskeletal symptoms in this review are migraine, TTH and CGH. No measurement instruments were found that studied the diagnostic accuracy for instruments related to secondary headache attributed to TMD or headache attributed to whiplash injury. Table 1 shows the study characteristics of the 31 included studies, stratified by target population of the measurement instrument. From the 31 studies, 22 articles had migraine as the target population (10–12,34–51). Seven articles had both migraine and TTH as target population (13,14,52–56), and two articles examined patients with CGH (57,58). In total, 28,246 people were included in the 31 studies. Of the included population, 64% were female, though three articles did not describe the gender distribution (38,54,55). Mean age varied from 19 (42) to 52 years (53). For migraine, 11 different measurement instruments were studied (10–12,34–37,40–43,44–51,59). ID-Migraine was the most studied measurement instrument, with nine studies in five languages (12,34,40, 44–47,49,50). Eight of these instruments were screening instruments, one was a replacement test for the diagnostic process, and for two instruments the aim of the test was unclear. Out of the seven studies for both migraine and TTH, only two articles looked at the same questionnaire (13,56). From the seven instruments, one was a screening test, three were replacement tests, and the aim of two was unclear. Both studies on CGH researched the cervical flexion-rotation test (CFRT) (57,58). The aim of the CFRT compared to the ICHD-3 criteria for cervicogenic headache is unclear.

Risk of bias within studies

The risk of bias was assessed for patient selection, index test, reference standard and flow and timing. The summarized assessment of the QUADAS-2 can be found in Table 2. The complete assessment, including reasons for the given scores, can be found in Supplemental material 4. Only one study received a low risk of bias on all domains (43). Twenty-two articles received a “high” risk of bias on ≥1 domain (10–14,35,37, 39–41,43,45–50,55–59). The remaining articles received an “unclear” risk of bias on ≥1 domain (12,35,37, 41,50–53). Risk of bias for the index test and the reference standard was generally scored unclear, because there was uncertainty if the index test was conducted and interpreted without knowledge of the results of the reference standard.

Table 2.

Methodological quality assessment with QUADAS-2 and clinimetric evaluation of the criterion validity with the COSMIN checklist Box H.

		Risk of Bias				Applicability concerns
Measurement instrument	Study	1a. Patient selection	2a. Index test	3a. Reference standard	4. Flow and timing	1b. Patient selection	2b. Index test	3b. Reference standard	COSMIN Box H
Target population: Migraine
3-Question Screen	Cady, 2004 (10)	High	Unclear	Unclear	High	Low	Low	Low	Poor
	Pryse-Phillips, 2002 (59)	High	Unclear	High	High	Low	Low	Low	Poor
	Wahab, 2016 (41)	Unclear	Unclear	Unclear	Low	Low	Low	Low	Fair
Diagnostic Screen	Michel, 1993 (37)	Unclear	Unclear	Unclear	High	Low	Low	Low	Fair
ID-Migraine	Brighina, 2006 (44)	Low	Low	Low	Low	Low	Low	Low	Fair
	de Mattos, 2017 (45)	High	Low	Low	Unclear	Low	Low	Low	Fair
	Ertas, 2008 (46)	High	Low	Unclear	Low	Low	Low	Low	Fair
	Gil-Gouveia, 2009 (47)	High	Low	Low	Low	Low	Low	Low	Fair
	Karli, 2007 (49)	High	Unclear	Unclear	High	Low	Low	Low	Poor
	Kim, 2006 (50)	Unclear	Low	Unclear	Low	Low	Low	Low	Fair
	Lipton, 2003 (12)	Unclear	Unclear	Low	Unclear	Low	Low	Low	Fair
	Lipton, 2016 (34)	High	Unclear	Unclear	Unclear	Low	Low	Low	Fair
	Siva, 2008 (40)	High	Low	Low	Unclear	Low	Low	Low	Fair
MSMDQ	Rueda-Sánchez, 2004 (38)	Low	Unclear	Unclear	High	Low	Low	Low	Fair
MAT	Marcus, 2004 (35)	Low	Low	Unclear	Low	Low	Low	Low	Good
Migraine Screen Questionnaire	Láinez, 2010 (51)	Low	Low	Unclear	Low	Low	Low	Low	Fair
Migraine Screen Questionnaire	Láinez, 2005 (11)	High	High	Low	Unclear	Low	Low	Low	Fair
MSQ	Kallela, 2001 (48)	Low	Low	Unclear	High	Low	Low	Low	Fair
Migraine-4	Walters, 2015 (42)	Low	Unclear	Unclear	High	Low	Low	Low	Fair
MA-HIS-M	Michel, 1993 (36)	Unclear	Low	Low	Unclear	Low	Low	Low	Fair
Screening items	Wang, 2008 (43)	Unclear	Unclear	Unclear	High	Low	Low	Low	Fair
SMIQ	Shaik, 2015 (39)	High	Unclear	Unclear	Low	Low	Low	Low	Fair
Target population: Migraine and tension-type headache
CHAT	Maizels, 2007 (54)	High	Low	Unclear	Unclear	Low	Low	Low	Poor
German Language Questionnaire	Fritsche, 2007 (13)	High	Low	Low	High	Low	Low	Low	Poor
German Language Questionnaire	Yoon, 2008 (56)	High	Low	Unclear	High	Low	Low	Low	Poor
HSQ-DV	van der Meer, 2017 (14)	Low	Low	Low	High	Low	Low	Low	Excellent
Headache questions	Hagen, 2010 (53)	Low	Unclear	Unclear	Unclear	Low	Low	Low	Fair
SAHQ	Rasmussen, 1991 (55)	Low	Low	High	Unclear	Low	Low	Low	Poor
SHQ	El-Sherbiny, 2017 (52)	Unclear	Low	Unclear	Unclear	Low	Low	Low	Fair
Target population: cervicogenic headache
Cervical Flexion-Rotation Test	Hall, 2010 (57)	High	Low	Unclear	Unclear	Low	Low	Low	Fair
	Ogince, 2005 (58)	High	Unclear	Unclear	High	Low	Low	Low	Poor

MSMDQ: Michel's Standardized Migraine Diagnosis Questionnaire; MAT: Migraine Assessment Questionnaire; MSQ: Migraine-specific questionnaire; MA-IHS-M; Modified Algorithm for IHS Migraine; SMIQ: Structured Migraine Interview Questionnaire; CHAT: Computerized Headache Assessment Test; HSQ-DV: Headache Screening Questionnaire – Dutch Version; SAHQ: Self-Administered Headache Questionnaire; SHQ: Structured Headache Questionnaire. An extended version of this table including explanation of judgement can be found in Appendix 4.

Methodological quality assessment with QUADAS-2 and clinimetric evaluation of the criterion validity with the COSMIN checklist Box H. MSMDQ: Michel's Standardized Migraine Diagnosis Questionnaire; MAT: Migraine Assessment Questionnaire; MSQ: Migraine-specific questionnaire; MA-IHS-M; Modified Algorithm for IHS Migraine; SMIQ: Structured Migraine Interview Questionnaire; CHAT: Computerized Headache Assessment Test; HSQ-DV: Headache Screening Questionnaire – Dutch Version; SAHQ: Self-Administered Headache Questionnaire; SHQ: Structured Headache Questionnaire. An extended version of this table including explanation of judgement can be found in Appendix 4. The clinimetric evaluation of the criterion validity was established with the COSMIN Box H. One study scored excellent (14), one good (35), 21 fair (11,12,34,36–48,50–53,57) and the remaining eight scored poor (10,13,50,55–57,59). Of the studies scoring poor, all but two (54,55) also scored a high risk of bias on ≥2 domains (10,12,13,50,55,57,59).

Migraine measurement instruments

Results of individual studies

The sensitivity of the measurement instruments for migraine ranged from 0.38 (38) to 0.99 (48) (see Table 1). Only three studies had a sensitivity below 0.70 (38,41,50) and eight studies found a sensitivity of 0.90 or higher (11,39,42,44, 45,47–49). Half of these studies with a high sensitivity were researching the ID-Migraine (44,45,47,49). Specificity ranged from 0.27 (10) to 0.99 (37). Six studies found a specificity of 0.70 or lower (10,39,43,45, 47,49), and a specificity above 0.90 was found in six other studies (38,41,42,48,50,51). Eleven studies had both sensitivity and specificity above 0.70 (11,12,34, 35,40,42,44,46,48,51,59), of which two studies had both above 0.90 (42,48).

Synthesis of results

For two measurement instruments, the sensitivity and specificity could be pooled. For the 3-question Screen the pooled sensitivity was 0.73 and specificity was 0.93 (Table 3) based on two (10,41) out of three studies, due to missing data in one article (59). The pooled sensitivity for the ID-Migraine was 0.87 and specificity was 0.75 (Table 3, Figures 3(a) and 3(b)). The results were based on four studies (34,40,47,49) as the other five studies (12,44–46,50) did not have sufficient data available to perform the analyses.

Figure 3.

(a) Summary Receiver Operating Characteristics (S-ROC) curves for pooled sensitivity and specificity of the 3-question screen; (b) S-ROC curves for pooled sensitivity and specificity of the ID-migraine; (c) S-ROC curves for pooled sensitivity and specificity of the German questionnaire for migraine; (d) S-ROC curves for pooled sensitivity and specificity of the German questionnaire for tension-type headache; (e) S-ROC curves for pooled sensitivity and specificity of the cervical flexion rotation test. Pooled sensitivity and specificity of the 3-Question screen, ID-Migraine, German language questionnaire and Cervical Flexion-Rotation Test. N: number; CI: confidence interval; TTH: tension-type headache. There was a very low level of evidence for six measurement instruments for migraine related to the GRADE recommendations: Diagnostic Screen (37), Michel's Standardized Migraine Diagnosis Questionnaire (38), Migraine Specific Questionnaire (48), Migraine-4 (42), Modified Algorithm for IHS Migraine (36), Screening Items (43), and the Structured Migraine Interview Questionnaire (see Table 4) (39). For two measurement instruments, there was a low level of evidence: The 3-question Screen (10,41) and the Migraine Screen Questionnaire (11,51). There was a moderate level of evidence for the ID-Migraine (34,40,47,49) and also for the Migraine Assessment Tool (35).

Table 4.

GRADE recommendations for measurement instruments for target population Migraine, stratified per measurement instrument.

	Sensitivity (95% CI)				Factors that may decrease certainty of evidence					Effect per 1.000 patients tested*
Measurement instrument	Specificity (95% CI)	Outcome	Number of studies (number of patients)	Study design	Risk of bias	Indirectness	Inconsistency	Imprecision	Publication bias	Pre-test probability of 14.7%*	Test accuracy CoE
3-Question Screen (10,41,59)	0.73 (0.71–0.75)[‡]	TP	Two studies 2539 patients	Cross-sectional (cohort type accuracy study)	Serious^±	Not serious	Serious	Not serious	None	107 (104 to 110)	⨁⨁◯◯ Low
	0.73 (0.71–0.75)[‡]	FN	Two studies 2539 patients		Serious^±	Not serious	Serious	Not serious	None	40 (37 to 43)	⨁⨁◯◯ Low
	0.93 (0.92–0.94)[‡]	TN	Two studies 1988 patients		Serious^±	Not serious	Serious	Not serious	None	793 (785 to 802)	⨁⨁◯◯ Low
	0.93 (0.92–0.94)[‡]	FP	Two studies 1988 patients		Serious^±	Not serious	Serious	Not serious	None	60 (51 to 68)	⨁⨁◯◯ Low
Diagnostic Screen (37)	0.44 (0.35–0.53)	TP	One study 125 patients	Cross-sectional (cohort type accuracy study)	Very Serious^¥	Not serious	Not serious	Serious^×	None	65 (51 to 78)	⨁◯◯◯ Very low
	0.44 (0.35–0.53)	FN	One study 125 patients		Very Serious^¥	Not serious	Not serious	Serious^×	None	82 (69 to 96)	⨁◯◯◯ Very low
	0.93 (0.85–1.00)	TN	One study 41 patients		Very serious^¥	Not serious	Not serious	Serious^×	None	793 (725 to 530)	⨁◯◯◯ Very low
	0.93 (0.85–1.00)	FP	One study 41 patients		Very serious^¥	Not serious	Not serious	Serious^×	None	60 (0 to 128)	⨁◯◯◯ Very low
ID-Migraine (34,40,47,49)	0.87 (0.85–0.89)[‡]	TP	Four studies 1257 patients	Cross-sectional (cohort type accuracy study)	Serious^±	Not serious	Not serious	Not serious	None	128 (125 to 131)	⨁⨁⨁◯ Moderate
	0.87 (0.85–0.89)[‡]	FN	Four studies 1257 patients		Serious^±	Not serious	Not serious	Not serious	None	19 (16 to 22)	⨁⨁⨁◯ Moderate
	0.75 (0.72–0.78)[‡]	TN	Four studies 1109 patients		Serious^±	Not serious	Not serious	Not serious	None	640 (614 to 665)	⨁⨁⨁◯ Moderate
	0.75 (0.72–0.78)[‡]	FP	Four studies 1109 patients		Serious^±	Not serious	Not serious	Not serious	None	213 (188 to 239)	⨁⨁⨁◯ Moderate
Michel's Standardized Migraine Diagnosis Questionnaire[38]	0.38 (0.26–0.52)	TP	One study ? patients	Cross-sectional (cohort type accuracy study)	Very serious^¥	Serious	Not serious	Serious^×	None	56 (38 to 76)	⨁◯◯◯ Very low
	0.38 (0.26–0.52)	FN	One study ? patients		Very serious^¥	Serious	Not serious	Serious^×	None	91 (71 to 109)	⨁◯◯◯ Very low
	0.99 (0.95–1.00)	TN	One study ? patients		Very serious^¥	Serious	Not serious	Serious^×	None	844 (810 to 853)	⨁◯◯◯ Very low
	0.99 (0.95–1.00)	FP	One study ? patients		Very serious^¥	Serious	Not serious	Serious^×	None	9 (0 to 43)	⨁◯◯◯ Very low
Migraine Assessment Tool (35)	0.89 (0.80–0.98)[‡]	TP	One study 46 patients	Cross-sectional (cohort type accuracy study)	Not serious	Not serious	Not serious	Serious^×	None	131 (118 to 144)	⨁⨁⨁◯ Moderate
	0.89 (0.80–0.98)[‡]	FN	One study 46 patients		Not serious	Not serious	Not serious	Serious^×	None	16 (3 to 29)	⨁⨁⨁◯ Moderate
	0.79 (0.65-0.93)[‡]	TN	One study 34 patients		Not serious	Not serious	Not serious	Serious^×	None	674 (554 to 793)	⨁⨁⨁◯ Moderate
	0.79 (0.65-0.93)[‡]	FP	One study 34 patients		Not serious	Not serious	Not serious	Serious^×	None	179 (60 to 299)	⨁⨁⨁◯ Moderate
Migraine Screen Questionnaire (11,51)	0.82–0.93	TP	Two studies ? patients	Cross-sectional (cohort type accuracy study)	Serious^±	Serious ^b	Not serious	Not serious	None	121 to 137	⨁⨁◯◯ Low
	0.82–0.93	FN	Two studies ? patients		Serious^±	Serious ^b	Not serious	Not serious	None	10 to 26	⨁⨁◯◯ Low
	0.81–0.97	TN	Two studies ? patients		Serious^±	Serious ^b	Not serious	Not serious	None	691 to 827	⨁⨁◯◯ Low
	0.81–0.97	FP	Two studies ? patients		Serious^±	Serious ^b	Not serious	Not serious	None	26 to 162	⨁⨁◯◯ Low
Migraine Specific Questionnaire (48)	0.99 (0.97–1.00)[‡]	TP	One study 69 patients	Cross-sectional (cohort type accuracy study)	Serious^±	Serious	Not serious	Serious^×	None	146 (143 to 147)	⨁◯◯◯ Very low
	0.99 (0.97–1.00)[‡]	FN	One study 69 patients		Serious^±	Serious	Not serious	Serious^×	None	1 (0 to 4)	⨁◯◯◯ Very low
	0.96 (0.88–1.00)[‡]	TN	One study 25 patients		Serious^±	Serious	Not serious	Serious^×	None	819 (751 to 853)	⨁◯◯◯ Very low
	0.96 (0.88–1.00)[‡]	FP	One study 25 patients		Serious^±	Serious	Not serious	Serious^×	None	34 ( 0 to 102)	⨁◯◯◯ Very low
Migraine-4 (42)	0.94 (0.87–0.98)	TP	One study ? patients	Cross-sectional (cohort type accuracy study)	Very serious^¥	Not serious	Not serious	Serious^×	None	138 (128 to 144)	⨁◯◯◯ Very low
	0.94 (0.87–0.98)	FN	One study ? patients		Very serious^¥	Not serious	Not serious	Serious^×	None	9 (3 to 19)	⨁◯◯◯ Very low
	0.92 (0.90–0.94)	TN	One study ? patients		Very serious^¥	Not serious	Not serious	Serious^×	None	785 (768 to 802)	⨁◯◯◯ Very low
	0.92 (0.90–0.94)	FP	One study ? patients		Very serious^¥	Not serious	Not serious	Serious^×	None	68 (51 to 85)	⨁◯◯◯ Very low
Modified Algorithm for IHS Migraine (36)	0.95–0.98	TP	One study 126 patients	Cross-sectional (cohort type accuracy study)	Serious^±	Serious	Serious	Serious^×	None	144 to 144	⨁◯◯◯ Very low
	0.95–0.98	FN	One study 126 patients		Serious^±	Serious	Serious	Serious^×	None	3 to 7	⨁◯◯◯ Very low
	0.53–0.78	TN	One study 141 patients		Serious^±	Serious	Serious	Serious^×	None	452 to 665	⨁◯◯◯ Very low
	0.53–0.78	FP	One study 141 patients		Serious^±	Serious	Serious	Serious^×	None	188 to 401	⨁◯◯◯ Very low
Screening Items (43)	0.89 (0.86–0.92)[‡]	TP	One study 363 patients	Cross-sectional (cohort type accuracy study)	Very serious^¥	Not serious	Not serious	Serious^×	None	131 (126 to 135)	⨁◯◯◯ Very low
	0.89 (0.86–0.92)[‡]	FN	One study 363 patients		Very serious^¥	Not serious	Not serious	Serious^×	None	16 (12 to 21)	⨁◯◯◯ Very low
	0.67 (0.63–0.72)[‡]	TN	One study 392 patients		Very serious^¥	Not serious	Not serious	Serious^×	None	572 (537 to 614)	⨁◯◯◯ Very low
	0.67 (0.63–0.72)[‡]	FP	One study 392 patients		Very serious^¥	Not serious	Not serious	Serious^×	None	281 (239 to 316)	⨁◯◯◯ Very low
Structured Migraine Interview Questionnaire (39)	0.97 (0.94–1.00)[‡]	TP	One study 100 patients	Cross-sectional (cohort type accuracy study)	Very serious^¥	Not serious	Not serious	Serious^×	None	143 (138 to 147)	⨁◯◯◯ Very low
	0.97 (0.94–1.00)[‡]	FN	One study 100 patients		Very serious^¥	Not serious	Not serious	Serious^×	None	4 (0 to 9)	⨁◯◯◯ Very low
	0.63 (0.50–0.76)[‡]	TN	One study 57 patients		Very serious^¥	Not serious	Not serious	Serious^×	None	542 (427 to 648)	⨁◯◯◯ Very low
	0.63 (0.50–0.76)[‡]	FP	One study 57 patients		Very serious^¥	Not serious	Not serious	Serious^×	None	316 (205 to 426)	⨁◯◯◯ Very low

Prevalence in the general population of 14.7% is used (65). CoE: certainty of evidence.

±“Unclear” or “high” risk of bias on ≥50 < 75% of the domains on QUADAS-2.

“Unclear” or “high” risk of bias on ≥75% of the domains on QUADAS-2.

×Results based on the outcome of one single study.

95% confidence interval (CI) calculated by reviewers.

GRADE recommendations for measurement instruments for target population Migraine, stratified per measurement instrument. Prevalence in the general population of 14.7% is used (65). CoE: certainty of evidence. ±“Unclear” or “high” risk of bias on ≥50 < 75% of the domains on QUADAS-2. “Unclear” or “high” risk of bias on ≥75% of the domains on QUADAS-2. ×Results based on the outcome of one single study. 95% confidence interval (CI) calculated by reviewers.

Combined migraine and TTH measurement instruments

The aim of the index tests differed between the included seven articles, where four were ‘replacement’ tests (13,54–56), one a ‘triage’ test (14) and two aims were unclear (52,53). Three articles established the diagnostic accuracy for several migraine and TTH ICHD diagnoses aside from the “standard” diagnoses, including chronic migraine, chronic TTH, probable migraine, and probable TTH (14,52,53). For migraine, the sensitivity ranged from 0.49 (53) to 1.00 (54) and the specificity ranged from 0.85 (56) to 0.96 (13). For chronic migraine, the sensitivity and specificity were 0.71 and 0.98 respectively (52). Probable migraine had a sensitivity of 0.89 and a specificity of 0.54 (14). The sensitivity for TTH ranged from 0.36 (14) to 1.00 (54) and the specificity range was 0.69 (53) to 0.98 (13). One study did not establish the specificity results from their test (54). Chronic TTH was tested in two studies, for which the sensitivity was 0.64 (53) to 0.70 (52) and the specificity 0.96 (52) to 1.00 (53). The test for probable TTH had a sensitivity of 0.92 and a specificity of 0.48 (14). For migraine, chronic migraine, and probable migraine (13,14,52,54,56) five studies had a sensitivity above 0.70, which was also found for TTH, chronic TTH, and probable TTH in five studies (see Table 1) (13,14,52–54). All six studies that reported specificity, had a specificity of 0.70 or higher for migraine, chronic migraine, and probable migraine and for TTH chronic TTH, and probable TTH (13,14,52,53,55,56). One instrument, the German Language Questionnaire, was supported by two studies (13,56). The pooled sensitivity and specificity for migraine were 0.69 and 0.90 respectively (Table 3, Figure 3(c)). For TTH, the pooled sensitivity and specificity were 0.81 and 0.96 respectively (Table 3, Figure 3(d)). The five other measurement instruments (14,52–55) were supported by one study and therefore downgraded for imprecision (see also Table 5).

Table 5.

GRADE recommendations for measurement instruments for target populations Migraine and Tension-Type Headache, stratified per measurement instrument.

		Sensitivity (95% CI)				Factors that may decrease certainty of evidence					Effect per 1.000 patients tested*
Measurement instrument	Target population	Specificity (95% CI)	Outcome	№ of studies № of patients)	Study design	Risk of bias	Indirectness	Inconsistency	Imprecision	Publication bias	Pre-test probability of 14.7%* /62.6%**	Test accuracy CoE
Computerized Headache Assessment Test (CHAT) (54)	Migraine	0.98[‡] (0.93–1.00)	TP	One study 41 patients	Cross-sectional (cohort type accuracy study)	Very serious^¥	Serious	Not serious	Serious^×	None	144 (137 to 147)	⨁◯◯◯ Very low
		0.98[‡] (0.93–1.00)	FN	One study 41 patients		Very serious^¥	Serious	Not serious	Serious^×	None	3 (0 to 10)	⨁◯◯◯ Very low
		1.00[‡] (1.00–1.00)	TN	One study 76 patients		Very serious^¥	Very serious	Not serious	Serious^×	None	853 (853 to 853)	⨁◯◯◯ Very low
		1.00[‡] (1.00–1.00)	FP	One study 76 patients		Very serious^¥	Very serious	Not serious	Serious^×	None	0 (0 to 0)	⨁◯◯◯ Very low
	TTH	1.00[‡] (1.00–1.00)	TP	One study 14 patients		Very serious^¥	Serious	Not serious	Serious^×	None	626 (626 to 626)	⨁◯◯◯ Very low
		1.00[‡] (1.00–1.00)	FN	One study 14 patients		Very serious^¥	Serious	Not serious	Serious^×	None	0 (0 to 0)	⨁◯◯◯ Very low
		1.00[‡] (1.00–1.00)	TN	One study 14 patients		Very serious^¥	Very serious	Not serious	Serious^×	None	374 (374 to 374)	⨁◯◯◯ Very low
		1.00[‡] (1.00–1.00)	FP	One study 14 patients		Very serious^¥	Very serious	Not serious	Serious^×	None	0 (0 to 0)	⨁◯◯◯ Very low
German Language Questionnaire (13,56)	Migraine	0.69[‡] (0.63–0.75)	TP	Two studies 217 patients	Cross-sectional (cohort type accuracy study)	Serious^±	Serious	Not serious	Not serious	None	101 (81 to 118)	⨁⨁◯◯ Low
		0.69[‡] (0.63–0.75)	FN	Two studies 217 patients		Serious^±	Serious	Not serious	Not serious	None	46 (29 to 66)	⨁⨁◯◯ Low
		0.90[‡] (0.86–0.94)	TN	Two studies 254 patients		Serious^±	Serious	Not serious	Not serious	None	768 (657 to 819)	⨁⨁◯◯ Low
		0.90[‡] (0.86–0.94)	FP	Two studies 254 patients		Serious^±	Serious	Not serious	Not serious	None	85 (34 to 196)	⨁⨁◯◯ Low
	TTH	0.81[‡] (0.75–0.87)	TP	Two studies 177 patients		Serious^±	Serious	Not serious	Not serious	None	507 (470 to 545)	⨁⨁◯◯ Low
		0.81[‡] (0.75–0.87)	FN	Two studies 177 patients		Serious^±	Serious	Not serious	Not serious	None	119 (81 to 156)	⨁⨁◯◯ Low
		0.96[‡] (0.94–0.98)	TN	Two studies 294 patients		Serious^±	Serious	Not serious	Not serious	None	359 (352 to 367)	⨁⨁◯◯ Low
		0.96[‡] (0.94–0.98)	FP	Two studies 294 patients		Serious^±	Serious	Not serious	Not serious	None	15 (7 to 22)	⨁⨁◯◯ Low
Headache Screening Questionnaire – Dutch Version (14)	Migraine	0.69 (0.55–0.80)	TP	One study 55 patients	Cross-sectional (cohort type accuracy study)	Not serious	Not serious	Not serious	Serious^×	None	101 (81 to 118)	⨁⨁⨁◯ Moderate
		0.69 (0.55–0.80)	FN	One study 55 patients		Not serious	Not serious	Not serious	Serious^×	None	46 (29 to 66)	⨁⨁⨁◯ Moderate
		0.90 (0.77–0.96)	TN	One study 50 patients		Not serious	Not serious	Not serious	Serious^×	None	768 (657 to 819)	⨁⨁⨁◯ Moderate
		0.90 (0.77–0.96)	FP	One study 50 patients		Not serious	Not serious	Not serious	Serious^×	None	85 (34 to 196)	⨁⨁⨁◯ Moderate
	TTH	0.36 (0.21–0.54)	TP	One study 36 patients		Not serious	Not serious	Not serious	Serious^×	None	225 (131 to 338)	⨁⨁⨁◯ Moderate
		0.36 (0.21–0.54)	FN	One study 36 patients		Not serious	Not serious	Not serious	Serious^×	None	401 (288 to 495)	⨁⨁⨁◯ Moderate
		0.86 (0.74–0.92)	TN	One study 69 patients		Not serious	Not serious	Not serious	Serious^×	None	322 (277 to 344)	⨁⨁⨁◯ Moderate
		0.86 (0.74–0.92)	FP	One study 69 patients		Not serious	Not serious	Not serious	Serious^×	None	52 (30 to 97)	⨁⨁⨁◯ Moderate
Headache Questions (53)	Migraine	0.49 (–)[†]	TP	One study ? patients	Cross-sectional (cohort type accuracy study)	Very serious^¥	Not serious	Serious	Serious^×	None	72 (- to -)	⨁◯◯◯ Very low
		0.49 (–)[†]	FN	One study ? patients		Very serious^¥	Not serious	Serious	Serious^×	None	75 (- to -)	⨁◯◯◯ Very low
		0.91 (–)[†]	TN	One study ? patients		Very serious^¥	Not serious	Serious	Serious^×	None	776 (- to -)	⨁◯◯◯ Very low
		0.91 (–)[†]	FP	One study ? patients		Very serious^¥	Not serious	Serious	Serious^×	None	77 (- to -)	⨁◯◯◯ Very low
	TTH	0.96 (0.94–0.98)	TP	One study ? patients		Very serious^¥	Not serious	Not serious	Serious^×	None	601 (588 to 613)	⨁◯◯◯ Very low
		0.96 (0.94–0.98)	FN	One study ? patients		Very serious^¥	Not serious	Not serious	Serious^×	None	25 (13 to 38)	⨁◯◯◯ Very low
		0.69 (0.63–0.75)	TN	One study ? patients		Very serious^¥	Not serious	Not serious	Serious^×	None	258 (236 to 281)	⨁◯◯◯ Very low
		0.69 (0.63–0.75)	FP	One study ? patients		Very serious^¥	Not serious	Not serious	Serious^×	None	116 (93 to 138)	⨁◯◯◯ Very low
Self-administered Headache Questionnaire (55)	Migraine	0.51[‡] (0.41–0.61)	TP	One study 93 patients	Cross-sectional (cohort type accuracy study)	Serious^±	Not serious	Not serious	Serious^×	None	75 (60 to 90)	⨁⨁◯◯ Low
		0.51[‡] (0.41–0.61)	FN	One study 93 patients		Serious^±	Not serious	Not serious	Serious^×	None	72 (57 to 87)	⨁⨁◯◯ Low
		0.92[‡] (0.90–0.94)	TN	One study 619 patients		Serious^±	Not serious	Not serious	Serious^×	None	785 (768 to 802)	⨁⨁◯◯ Low
		0.92[‡] (0.90–0.94)	FP	One study 619 patients		Serious^±	Not serious	Not serious	Serious^×	None	68 (51 to 85)	⨁⨁◯◯ Low
	TTH	0.43[‡] (0.39–0.47)	TP	One study 468 patients		Serious^±	Not serious	Not serious	Serious^×	None	269 (244 to 294)	⨁⨁◯◯ Low
		0.43[‡] (0.39–0.47)	FN	One study 468 patients		Serious^±	Not serious	Not serious	Serious^×	None	357 (332 to 382)	⨁⨁◯◯ Low
		0.96[‡] (0.94–0.98)	TN	One study 244 patients		Serious^±	Not serious	Not serious	Serious^×	None	359 (352 to 367)	⨁⨁◯◯ Low
		0.96[‡] (0.94–0.98)	FP	One study 244 patients		Serious^±	Not serious	Not serious	Serious^×	None	15 (7 to 22)	⨁⨁◯◯ Low
Structured Headache Questionnaire (52)	Migraine	0.86 (0.78–0.97)	TP	One study ? patients	cross-sectional (cohort type accuracy study)	Very serious^¥	Not serious	Not serious	Serious^×	None	126 (115 to 143)	⨁◯◯◯ Very low
		0.86 (0.78–0.97)	FN	One study ? patients		Very serious^¥	Not serious	Not serious	Serious^×	None	21 (4 to 32)	⨁◯◯◯ Very low
		0.94 (0.86–0.98)	TN	One study ? patients		Very serious^¥	Not serious	Not serious	Serious^×	None	802 (734 to 836)	⨁◯◯◯ Very low
		0.94 (0.86–0.98)	FP	One study ? patients		Very serious^¥	Not serious	Not serious	Serious^×	None	51 (17 to 119)	⨁◯◯◯ Very low
	TTH	0.93 (0.79–0.98)	TP	One study ? patients		Very serious^¥	Not serious	Not serious	Serious^×	None	582 (495 to 613)	⨁◯◯◯ Very low
		0.93 (0.79–0.98)	FN	One study ? patients		Very serious^¥	Not serious	Not serious	Serious^×	None	44 (13 to 131)	⨁◯◯◯ Very low
		0.93 (0.86–1.00)	TN	One study ? patients		Very serious^¥	Not serious	Not serious	Serious^×	None	348 (322 to 374)	⨁◯◯◯ Very low
		0.93 (0.86–1.00)	FP	One study ? patients		Very serious^¥	Not serious	Not serious	Serious^×	None	26 (0 to 52)	⨁◯◯◯ Very low

*Prevalence in the general population of 14.7% is used for migraine.

**Prevalence in the general population of 62.6% is used for TTH (65).

CoE: certainty of evidence.

±“Unclear” or “high” risk of bias on ≥50 < 75% of the domains on QUADAS-2.

“Unclear” or “high” risk of bias on ≥75% of the domains on QUADAS-2.

×Results based on the outcome of one single study.

95% confidence interval (CI) calculated by reviewers.

Not possible to calculate 95% CI.

GRADE recommendations for measurement instruments for target populations Migraine and Tension-Type Headache, stratified per measurement instrument. *Prevalence in the general population of 14.7% is used for migraine. **Prevalence in the general population of 62.6% is used for TTH (65). CoE: certainty of evidence. ±“Unclear” or “high” risk of bias on ≥50 < 75% of the domains on QUADAS-2. “Unclear” or “high” risk of bias on ≥75% of the domains on QUADAS-2. ×Results based on the outcome of one single study. 95% confidence interval (CI) calculated by reviewers. Not possible to calculate 95% CI. There was a very low level of evidence for the Computerized Headache Assessment Test (CHAT) (54), the use of Headache Questions (53) and the Structured Headache Questionnaire (52). The German Language Questionnaire (13,54) and the Self-Administered Headache Questionnaire (55) are both supported with a low level of evidence. Only the Headache Screening Questionnaire (HSQ)– Dutch Version was found to have a moderate level of evidence (14).

Cervicogenic headache measurement instruments

The two included studies for CGH established the diagnostic accuracy of the Cervical Flexion-Rotation Test (CFRT) (57,58). Both sensitivity and specificity ranged from 0.70 (57) to 0.91 (58). The pooled sensitivity was 0.83 and the pooled specificity was 0.82 (Table 3, Figure 3(e)). Based on the GRADE recommendations (Table 6), there is a low level of evidence for the use of the CFRT for patients with cervicogenic headache (57,58).

Table 6.

GRADE recommendations for measurement instruments for target population Cervicogenic Headache.

	Sensitivity (95% CI)				Factors that may decrease certainty of evidence					Effect per 1.000 patients tested*
Measurement instrument	Specificity (95% CI)	Outcome	Number of studies (number of patients)	Study design	Risk of bias	Indirectness	Inconsistency	Imprecision	Publication bias	Pre-test probability of 4.1%*	Test accuracy CoE
Cervical Flexion Rotation Test (57,58)	0.83[‡] (0.72–0.94)	TP	Two studies 43 patients	Cross-sectional (cohort type accuracy study)	Very serious^¥	Not serious	Serious	Not serious	None	34 (30 to 39)	⨁◯◯◯ Very low
	0.83[‡] (0.72–0.94)	FN	Two studies 43 patients		Very serious^¥	Not serious	Serious	Not serious	None	7 (2 to 11)	⨁◯◯◯ Very low
	0.82[‡] (0.73–0.91)	TN	Two studies 74 patients		Very serious^¥	Not serious	Serious	Not serious	None	786 (700 to 873)	⨁◯◯◯ Very low
	0.82[‡] (0.73–0.91)	FP	Two studies 74 patients		Very serious^¥	Not serious	Serious	Not serious	None	173 (86 to 259)	⨁◯◯◯ Very low

Prevalence in the general population of 4.1% is used (76).

CoE: certainty of evidence.

¥“Unclear” or “high” risk of bias on ≥75% of the domains on QUADAS-2.

95% confidence interval (CI) calculated by reviewers.

GRADE recommendations for measurement instruments for target population Cervicogenic Headache. Prevalence in the general population of 4.1% is used (76). CoE: certainty of evidence. ¥“Unclear” or “high” risk of bias on ≥75% of the domains on QUADAS-2. 95% confidence interval (CI) calculated by reviewers.

Discussion

Within this review, for migraine alone 11 tools were identified (10–12, 34–37,40–51,59), for the combination of migraine and TTH six (13,14,52–56), and for CGH one tool (57,58). The sensitivity and specificity of the measurement instruments for migraine ranged from 0.38 (38) to 0.99 (48) and 0.27 (10) to 0.99 (37) respectively. The sensitivity and specificity for migraine based on the combined measurement instruments ranged from 0.49 (53) to 1.00 (54) and 0.85 (56) to 0.96 (13) respectively. For TTH, the sensitivity and specificity ranged from 0.36 (14) to 1.00 (54) and 0.59 (53) to 0.98 (13) respectively. For the CFRT, the only measurement instrument for cervicogenic headache, both the sensitivity and specificity ranged from 0.70 (57) to 0.91 (58). All measurement tools for migraine and TTH were questionnaires. The measurement tool for CGH was a physical examination test. Migraine and TTH are solely based on information from the history of the patient (15), allowing the diagnosis to be derived from a questionnaire. However, the choice of gold standard within headache research is inconsistent. Some studies used the International Classification of Headache Disorders (ICHD) first, second or third edition (15,60,61), others used the diagnosis of a neurologist or a headache nurse and for CGH the Sjaastad criteria were used (62). As the ICHD is based on the most recent scientific findings and clinical expertise from experts worldwide, the newest version of the ICHD is recommended as the gold standard (15,63). The aim of each measurement instrument is described in Table 1. This was unclear for five measurement instruments. Nine measurement instruments are meant to be used as a screening tool in a broader population before seeing a medical specialist for a definitive diagnosis. These screening instruments are recommended for health care providers like PTs, as they are not trained for medical diagnoses but do see these patients often and can refer them to the medical specialist (64). Three measurement instruments studied were meant as a replacement test for the gold standard. This may be efficient for research purposes, as this allows the researchers to diagnose the patients without an extensive visit to a specialist. However, no conclusion was drawn from the included articles as to whether the measurement instruments were better than the gold standard (the medical specialist), therefore the presence of a medical specialist is still recommended in clinical practice. For each measurement tool, the cut-off criteria to recognize headache should be described to allow for comparison of outcomes between studies. In reality, cut-off criteria differed between studies, which resulted in highly variable sensitivity and specificity. The lack of established cut-off points was taken into account within the ‘Index Test’ domain when assessing both methodological qualities and risk of bias. From the 11 measurement instruments found for migraine, only three were supported by evidence of two or more articles: The 3-question screen (10,41,59), the ID-migraine (12,34,40,44–47,49,50) and the Migraine Screen Questionnaire (11,51). Several studies introduced serious patient selection bias by only recruiting patients with the headache they were interested in studying (10). By doing so, there were no false positives or true negatives present, which resulted in more favourable diagnostic accuracy outcome measures. Other studies excluded participants who had a secondary headache (45), or who did not screen positive for a preliminary screening for migraine (45,46,49). One study selected their participants so 50% had a confirmed migraine diagnosis prior to the index test and 50% did not have migraine (11). This also introduced selection bias in favour of the outcomes, as the prevalence of the studied disorder (50% in the tested group versus 14.7% in the general population) determines the pre-test probability and thus the chance of correct diagnosis (65,66). Furthermore, serious bias was introduced in the “flow and timing” section of the articles, as some articles did not properly describe the order of receiving the index test and the reference standard diagnosis. Other studies did not include all participants in the analysis (11,12,34,37,38,40,42,43,48,49,59). The introduced biases on both domains resulted in a downgrade of the certainty of evidence on all measurement instruments except for the Migraine Assessment Tool (35). However, as this tool is only studied in one article, the level of evidence was also downgraded for imprecision. Therefore, there are no measurement instruments for migraine with a high level of evidence. Out of the six measurement instruments that looked at both migraine and TTH, only the German language questionnaire is supported by two articles (13,57). However, due to a serious risk of bias and indirectness, there is only a low level of evidence for this questionnaire. In both studies, only patients with headaches that were also studied in the questionnaire were included, which introduced a serious selection bias (13,57). Similarly, the Computerized Headache Assessment Tool (CHAT) presented a sensitivity of 1.00 for both migraine and TTH, but no true negatives or false positives were available, and no specificity was presented (54). In this study, the gold standard was the diagnosis established by a headache nurse (54). As stated before, this is an unreliable gold standard for a headache diagnosis (63). The seven articles differed in population. Some study samples were retrieved from the general population (53,55,56), others from urgent care or family practice (54), and others from a headache clinic (13,14). In one study, the sample origin was unclear (52). The prevalence used in the GRADE recommendations was for the general population, but in health care settings the prevalence is higher. This increases the pre-test probability of a positive headache diagnosis. This must be taken into consideration when interpreting the results of those studies (14,54,56). Regarding the flow and timing of these studies, not all participants received both the index test and reference standard (52–54,56). Other studies did not include all participants in the final analyses (13,14,53,55). By excluding participants in these ways, the generalization of results is compromised. All these components resulted in very low to moderate level of evidence for the six combined migraine and TTH measurement instruments. Both articles studying the diagnostic accuracy of the cervical flexion rotation test (CFRT) for CGH showed selection bias, as participants were selected based on headache type (57,58). In one study, the sensitivity and specificity were both 0.70 (57), whereas in the other study the sensitivity was 0.91 and the specificity 0.90 (58). In the study with lower diagnostic accuracy, the control group consisted of other headache forms (migraine or multiple headache forms) (57). This makes differentiating between headache types more difficult as other headaches are related to neck problems (5,67,68). The study with higher diagnostic accuracy compared patients with CGH with asymptomatic participants and several patients with migraine (58), which made it easier to recognize the CGH. When this test is applied in the clinic, patients will have a headache complaint and will not be asymptomatic, so the sensitivity and specificity of 0.70 will likely be more accurate. Just as in the current review, another recent systematic review describing physical examination tests for screening and diagnosis of CGH, the CFRT was determined to be the most useful test with the highest reliability and strongest diagnostic accuracy (69). There is, however, a debate in the literature on the reliability of manual ROM tests of the spine (70). Inter-examiner reliability for the cervical spine passive ROM ranged from poor to substantial. The manual tests of the upper cervical spine (C1/2, C2/3) have a fair to substantial level of reliability (70). The reliability of the CFRT has been established to be good to excellent (71). However, CFRT reliability was established by comparing a manual diagnosis of C1/2 dysfunction with the outcome of the CFRT (71). If the reliability of the manual diagnosis of dysfunction is only fair, then the reliability of the CFRT is questionable. However, in another study where the cervical ROM was measured with a device (CROM), a significant difference was found between the ROM in patients with CGH compared to patients with migraine and healthy subjects, which confirms the findings of the included papers of this review (57,58,72). In conclusion, the CFRT is a valid and reliable measure to recognize CGH, though the reliability is higher when using a CROM device rather than assessing the ROM manually.

Strengths and limitations of the study

The current review is, to the authors' knowledge, the first review establishing an overview of the diagnostic accuracy of measurement instruments for headaches associated with musculoskeletal symptoms. By using the QUADAS-2 and COSMIN tool, the methodological quality was assessed in a well-known and internationally accepted manner (24,25). By using the GRADE recommendations, the findings of this review are transparent and easy to translate to the clinical practice (27). There are, however, also a few limitations of this study. Comparison between index and reference test was not easy, as the validation of the index test was performed in a different population compared to the population in which the reference standard was developed. It is important to keep in mind that the diagnostic accuracy is dependent on the prevalence of the target condition in the population; the study sample needs to be taken into consideration when interpreting the results. The prevalence of the target condition is the pre-test probability of a person having that condition, and a good measurement instrument will increase the chance of recognizing the target condition correctly. However, if the study sample is biased by having a very high prevalence in the target condition whereas the measurement instrument would normally be used in a setting with a low prevalence of the target condition, the diagnostic accuracy is not valid for that specific population. Validation studies of measurement instruments should therefore always test the measurement instrument in the population and setting for which it is being validated. Also, some measurement tools were used in different languages and cultures, which must also be considered when interpreting these results. In this review, great variability was found between the different studies, as illustrated in the S-ROC curves in Figure 3(a) and (c). These S-ROC curves show the uncertainty of the findings compared to reality, so the pooled data should be used with caution. The clear gap between the diagnostic accuracy of some measurement instruments between studies showed the necessity of conformation by multiple studies within the same population and against the same reference standard.

Implications for practice

The findings of the current review support the use of the ID-Migraine questionnaire to diagnose migraine with a moderate level of certainty (Table 4). However, patients with headaches often experience multiple headache forms (7,13,74). This warrants a measurement instrument that can diagnose more than one headache. From the questionnaires that looked at both migraine and TTH, the HSQ has the highest level of evidence within this review (Table 5). To establish if there is a migraine and/or a TTH present, this questionnaire is therefore recommended. As CGH needs to be confirmed by physical examination (15), the CFRT is recommended (Table 6). No other measurement instruments for secondary headache related to musculoskeletal complaints were found. Therefore, for these headache types, such as secondary headache attributed to temporomandibular disorders or headache attributed to whiplash injury, no recommendations can be made.

Implications for future research

Currently, there are many questionnaires for migraine and TTH, most of them validated by one study. Future research should use the recommended measurement instruments and validate them in different samples of the same population to increase the level of certainty that the diagnostic accuracy is realistic. The QUADAS-2 and COSMIN tools should be used when designing their studies to enhance their methodological quality. Furthermore, additional clinimetric properties of measurement instruments for headache should be examined. Clinimetric properties such as reliability and responsiveness are important to enhance the care of headache complaints and monitor the course of these complaints. For that reason, the authors are conducting a complementary review to establish the clinimetric properties of measurement instruments for these symptoms and factors (Figure 2). In conclusion, only a few measurement instruments reached a moderate level of evidence for the diagnostic accuracy. For migraine, the ID-Migraine is recommended. For migraine and TTH, the HSQ is recommended, and the CFRT is advised to be used for CGH. However, more studies are needed to validate these instruments further to enhance the level of evidence.

Article highlights

ID-migraine is the most studied diagnostic accuracy measurement instrument for migraine and has a moderate level of certainty. Six measurement instruments are examined that establish the diagnostic accuracy for both migraine and tension-type headache. The Headache Screening Questionnaire has the highest level of evidence to screen for both migraine and tension-type headache. Only the Cervical Flexion Rotation Test studies the diagnostic accuracy for cervicogenic headache, but the level of evidence is very low. Click here for additional data file. Supplemental material, Supplemental Material1 for The diagnostic accuracy of headache measurement instruments: A systematic review and meta-analysis focusing on headaches associated with musculoskeletal symptoms by Hedwig A van der Meer, Corine M Visscher, Tom Vredeveld, Maria WG Nijhuis van der Sanden, Raoul HH Engelbert and Caroline M Speksnijder in Cephalalgia Click here for additional data file. Supplemental material, Supplemental Material2 for The diagnostic accuracy of headache measurement instruments: A systematic review and meta-analysis focusing on headaches associated with musculoskeletal symptoms by Hedwig A van der Meer, Corine M Visscher, Tom Vredeveld, Maria WG Nijhuis van der Sanden, Raoul HH Engelbert and Caroline M Speksnijder in Cephalalgia Click here for additional data file. Supplemental material, Supplemental Material3 for The diagnostic accuracy of headache measurement instruments: A systematic review and meta-analysis focusing on headaches associated with musculoskeletal symptoms by Hedwig A van der Meer, Corine M Visscher, Tom Vredeveld, Maria WG Nijhuis van der Sanden, Raoul HH Engelbert and Caroline M Speksnijder in Cephalalgia Click here for additional data file. Supplemental material, Supplemental Material4 for The diagnostic accuracy of headache measurement instruments: A systematic review and meta-analysis focusing on headaches associated with musculoskeletal symptoms by Hedwig A van der Meer, Corine M Visscher, Tom Vredeveld, Maria WG Nijhuis van der Sanden, Raoul HH Engelbert and Caroline M Speksnijder in Cephalalgia

68 in total

1. The International Classification of Headache Disorders: 2nd edition.

Authors:
Journal: Cephalalgia Date: 2004 Impact factor: 6.292

Review 2. Inter-examiner reliability of passive assessment of intervertebral motion in the cervical and lumbar spine: a systematic review.

Authors: E van Trijffel; Q Anderegg; P M M Bossuyt; C Lucas
Journal: Man Ther Date: 2005-07-01

3. Validation of a migraine-specific questionnaire for use in family studies.

Authors: M Kallela; M Wessman; M Färkkilä
Journal: Eur J Neurol Date: 2001-01 Impact factor: 6.089

4. The co-occurrence of headache and musculoskeletal symptoms amongst 51 050 adults in Norway.

Authors: K Hagen; C Einarsen; J-A Zwart; S Svebak; G Bovim
Journal: Eur J Neurol Date: 2002-09 Impact factor: 6.089

5. Validation of a brief nurse-administered migraine assessment tool.

Authors: Dawn A Marcus; Cindy Kapelewski; Rolf G Jacob; Thomas E Rudy; Joseph M Furman
Journal: Headache Date: 2004-04 Impact factor: 5.887

6. Simple and efficient recognition of migraine with 3-question headache screen.

Authors: Roger K Cady; Leona D Borchert; William Spalding; Carolyn C Hart; Fred D Sheftell
Journal: Headache Date: 2004-04 Impact factor: 5.887

7. A self-administered screener for migraine in primary care: The ID Migraine validation study.

Authors: R B Lipton; D Dodick; R Sadovsky; K Kolodner; J Endicott; J Hettiarachchi; W Harrison
Journal: Neurology Date: 2003-08-12 Impact factor: 9.910

8. Validation of a migraine screening questionnaire in a Colombian university population.

Authors: M Rueda-Sánchez; L A Díaz-Martínez
Journal: Cephalalgia Date: 2004-10 Impact factor: 6.292

9. A headache diagnosis project.

Authors: William Pryse-Phillips; Michel Aubé; Marek Gawel; Robert Nelson; Allan Purdy; Keith Wilson
Journal: Headache Date: 2002-09 Impact factor: 5.887

10. The development of QUADAS: a tool for the quality assessment of studies of diagnostic accuracy included in systematic reviews.

Authors: Penny Whiting; Anne W S Rutjes; Johannes B Reitsma; Patrick M M Bossuyt; Jos Kleijnen
Journal: BMC Med Res Methodol Date: 2003-11-10 Impact factor: 4.615

2 in total

1. The HUNT4 study: the validity of questionnaire-based diagnoses.

Authors: Knut Hagen; Anders Nikolai Åsberg; Benjamin L Uhlig; Erling Tronvik; Eiliv Brenner; Trond Sand
Journal: J Headache Pain Date: 2019-06-13 Impact factor: 7.277

2. Can physical testing be used to distinguish between migraine and cervicogenic headache sufferers? A protocol for a systematic review.

Authors: Ernesto Anarte; Gabriela Ferreira Carvalho; Annika Schwarz; Kerstin Luedtke; Deborah Falla
Journal: BMJ Open Date: 2019-11-10 Impact factor: 2.692

2 in total