Literature DB >> 34406554

Diagnostic accuracy of screening questionnaires for obstructive sleep apnoea in adults in different clinical cohorts: a systematic review and meta-analysis.

Lizelle Bernhardt^1,2, Emer M Brady³, Suzanne C Freeman⁴, Helena Polmann⁵, Jéssica Conti Réus⁵, Carlos Flores-Mir⁶, Graziela De Luca Canto⁵, Noelle Robertson⁷, Iain B Squire³.

Abstract

PURPOSE: The majority of individuals with clinically significant obstructive sleep apnoea (OSA) are undiagnosed and untreated. A simple screening tool may support risk stratification, identification, and appropriate management of at-risk patients. Therefore, this systematic review and meta-analysis evaluated and compared the accuracy and clinical utility of existing screening questionnaires for identifying OSA in different clinical cohorts.
METHODS: We conducted a systematic review and meta-analysis of observational studies assessing the diagnostic value of OSA screening questionnaires. We identified prospective studies, validated against polysomnography, and published to December 2020 from online databases. To pool the results, we used random effects bivariate binomial meta-analysis.
RESULTS: We included 38 studies across three clinical cohorts in the meta-analysis. In the sleep clinic cohort, the Berlin questionnaire's pooled sensitivity for apnoea-hypopnoea index (AHI) ≥ 5, ≥ 15, and ≥ 30 was 85%, 84%, and 89%, and pooled specificity was 43%, 30%, and 33%, respectively. The STOP questionnaire's pooled sensitivity for AHI ≥ 5, ≥ 15, and ≥ 30 was 90%, 90%, and 95%, and pooled specificity was 31%, 29%, and 21%. The pooled sensitivity of the STOP-Bang questionnaire for AHI ≥ 5, ≥ 15, and ≥ 30 was 92%, 95%, and 96%, and pooled specificity was 35%, 27%, and 28%. In the surgical cohort (AHI ≥ 15), the Berlin and STOP-Bang questionnaires' pooled sensitivity were 76% and 90% and pooled specificity 47% and 27%.
CONCLUSION: Among the identified questionnaires, the STOP-Bang questionnaire had the highest sensitivity to detect OSA but lacked specificity. Subgroup analysis considering other at-risk populations was not possible. Our observations are limited by the low certainty level in available data.

Entities: Chemical

Keywords: Diagnostic accuracy; Meta-analysis; Obstructive sleep apnoea; Screening questionnaires; Systematic review

Mesh：

Year: 2021 PMID： 34406554 PMCID： PMC8370860 DOI： 10.1007/s11325-021-02450-9

Source DB: PubMed Journal: Sleep Breath ISSN： 1520-9512 Impact factor: 2.655

Introduction

With an estimated 425 million individuals affected worldwide, clinically important obstructive sleep apnoea (OSA) poses a global public health problem [1]. Characterised by upper airway collapse, exaggerated negative intrathoracic pressure, oxidative stress, and systemic inflammation, OSA is associated with significant cardiovascular and metabolic complications, including hypertension, stroke, heart failure, and diabetes [2-7]. Despite the high prevalence and associated sequelae, most individuals with OSA remain undiagnosed, posing a significant risk to the individual patient and health care systems as complications develop [1, 8–10]. Barriers to the diagnosis and treatment of OSA are multifaceted and include geographical variation and inequity in the availability of sleep services and access to polysomnography (PSG), often limited by cost and long waiting times [11]. To support risk stratification and appropriate referrals in individuals at-risk, a simple and reliable screening tool may help triage patients at risk of OSA, for consideration of referral to specialist services for appropriate management [12-14]. Clinical prediction formulae have been developed but are limited by complexity and the requirement for a computer or mathematical calculations [15]. In contrast, OSA screening questionnaires are less complicated and may be a viable alternative to clinical prediction formulae in specific settings. To date, there have been four systematic reviews exploring the accuracy of OSA screening tools in adults [12, 16–18]. One of the first systematic reviews and meta-analyses to explore the accuracy of screening tools for OSA identified four screening questionnaires; however, due to heterogeneity pertaining to the questionnaire, OSA definition, and threshold, these were not meta-analysed [16]. Ramachandran [17] reported that clinical prediction models performed better than the eight questionnaires studied to predict OSA in pre-operative cohorts. Abrishami [12] focused on a ‘sleep disorder’ cohort and a cohort ‘without a history of sleep disorders’. It was concluded that questionnaires were useful for early detection of OSA, especially in the surgical population. Despite finding it difficult to draw a definite conclusion about questionnaire accuracy, the STOP and STOP-Bang questionnaires were recommended for screening in a surgical population [12]. Recently, Chui [18] compared the diagnostic accuracy of the Berlin, STOP-Bang, STOP, and Epworth Sleepiness Scale. In line with Abrishami [12], they reported the STOP-Bang to have the highest sensitivity in both the sleep clinic and surgical populations. Since the publication of these systematic reviews, new OSA screening questionnaires have emerged, further validation studies conducted, and different clinical settings and patient cohorts considered. As test performance often varies across clinical cohorts, it is recommended that tools are evaluated in clinically relevant cohorts [19]. Hence, the objective of this systematic review and meta-analysis was to evaluate the accuracy and clinical utility of existing questionnaires, when used alone, as screening tools for the identification of OSA in adults in different clinical cohorts.

Methods

The protocol was registered at the International Prospective Register of Systematic Reviews (PROSPERO) (CRD42018104018) and conducted according to the Preferred Reporting Items for Systematic Reviews and Meta-analysis (PRISMA) guidelines [20].

Types of studies

We included observational studies that met the following eligibility criteria: Inclusion criteria: (1) prospective studies measuring the diagnostic value of screening questionnaires for OSA; (2) studies in adults (> 18 years of age); (3) studies in which the accuracy of the questionnaire was validated by level one or two PSG; (4) OSA was defined as apnoea-hypopnoea index (AHI) or Respiratory Disturbance Index (RDI) > 5; (5) data allowed for construction of 2 × 2 contingency tables; (6) publication in English, Spanish, or Portuguese. Exclusion criteria: (1) studies measuring the diagnostic value of clinical scales, scores, and prediction equations as screening tools for OSA; (2) conference proceedings, reviews, or case reports; (3) insufficient data for analysis after several attempts to contact the author; (4) studies in children (< 18 years of age); (5) level three and four portable studies were used as the reference standard; (6) studies conducted in in-patient settings; (7) publication language is other than English, Spanish, or Portuguese. Index test: the test under evaluation was only OSA screening questionnaires (self-reported or clinician completed). Reference standard: the reference standard was a level one or two PSG. Target conditions: the target condition was OSA, defined as AHI or RDI. AHI/RDI ≥ 5—diagnostic cut-off for OSA AHI/RDI ≥ 15—diagnostic cut-off for moderate to severe OSA AHI/RDI ≥ 30—diagnostic cut-off for severe OSA

Search methods for identification of studies

Comprehensive literature searches in CINAHL PLUS, Scopus, PubMed, Web of Science, and the Latin American and Caribbean Health Sciences Literature (LILACS) database were conducted from inception to 18 December 2020. Detailed individual search strategies (Online Resource 1 & 2), with appropriate truncation and word combinations, were developed for each database. Additional records were identified from grey literature sources comprising ETHos, OpenGrey, Google Scholar, ProQuest, and New York Grey Literature Report. The reference lists from the final articles for analysis and related review articles were manually searched for references that could have been omitted during the electronic database searches.

Data collection and analysis

Study selection

Two reviewers (LB, EB) screened the titles and abstracts of the electronic search results independently to identify studies eligible for inclusion in the review. Records classified as ‘excluded’ by both reviewers were excluded. The full text of any study about which there was disagreement or uncertainty was assessed independently against the selection criteria and resolved through discussion and consultation with a third reviewer (IS or NR). Duplicates were identified and excluded before recording the selection process in sufficient detail to complete the PRISMA flow diagram and tables describing the characteristics of the excluded studies (Online Resource 3) [20].

Data extraction and management

Two reviewers (LB, EB) independently conducted data extraction on all studies included and extracted the data required to reconstruct the 2 × 2 contingency tables, including true positive (TP), false positive (FP), true negative (TN), and false negative (FN) values. Where these values were not documented, we extrapolated the values from equations when data allowed. A data collection form tailored to the research question and fulfilling the data entry requirements of MetaDTA (Diagnostic Test Accuracy Meta-Analysis v1.43) was utilised [21]. HP and JR extracted the study characteristics and demographic data for all included studies, and LB and EB entered the data into Review Manager 5.3 [22]. No studies with inconclusive results were identified.

Assessment of methodological quality

The quality of studies included was appraised independently by the reviewers (LB, EB) utilising the Quality Assessment for Diagnostic Accuracy Studies tool (QUADAS-2) with disagreements resolved through consultation with a third reviewer (IS or NR) [23].

Statistical analysis and data synthesis

Statistical analysis was performed according to “Chapter 10” of the Cochrane Handbook for Systematic Review of Diagnostic Test Accuracy [24]. Questionnaire screening was considered positive for OSA if the questionnaire score was above the defined threshold specified in the primary study and negative if the questionnaire score was below the defined threshold. The TP, FP, TN, and FN results were produced by cross-classifying the questionnaire results with those of the PSG results. These were based on the ability of screening questionnaires to classify and detect OSA correctly. The sensitivity and specificity of individual studies were calculated using 2 × 2 contingency tables and presented as forest plots. The meta-analysis was conducted using MetaDTA version 1.43, which models sensitivity and specificity by fitting the random effects bivariate binomial model of Chu and Cole [25, 26]. The summary receiver operating characteristic (SROC) plot was drawn using the hierarchical SROC parameters, which are estimated from the bivariate model parameters using the equivalence equations of Harbord [27]. Following guidance from the Cochrane Handbook for Systematic Review of Diagnostic Test Accuracy, we did not pool the positive and negative predictive values due to the prevalence of OSA varying across studies [24]. As per the Cochrane DTA handbook, we investigated heterogeneity by plotting the observed study results and SROC curve in the ROC space alongside the 95% confidence region [24]. We conducted a meta-regression to investigate differences in sensitivity and specificity between questionnaires, including the type of questionnaire as a covariate. Meta-regression was conducted in R version 4.0.1 using the lme4 package [28]. To assess the robustness of the meta-analysis, sensitivity analyses were conducted by excluding studies based on their QUADAS-2 assessment score [23]. Those identified as high risk in any QUADAS-2 domain or as unclear in four domains were excluded. Different AASM (American Academy of Sleep Medicine) scoring criteria and desaturation (and arousal) thresholds were applied to the included studies. We conducted additional sensitivity analyses by analysing studies that applied the ≥ 3% desaturation scoring criteria together and those that applied the ≥ 4% desaturation scoring criteria (summarised in Table 1).

Table 1

Study characteristics

Study	Country	Questionnaire	N	Validation tool	OSA definition	Apnoea definition	Hypopnoea definition	% desaturation	Scoring criteria
Sleep clinic population
Abdullah 2018 [29]	Malaysia	STOP-Bang	134	Level 1—Lab PSG	AHI ≥ 5	Complete cessation of airflow for ≥ 10 s	A reduction of airflow of ≥ 50% for ≥ 10 s associated with 3% desaturation	3%	AASM 2012
Alhouqani 2015 [30]	UAE	STOP-Bang	193	Level 1—Lab PSG	AHI ≥ 5	A cessation of airflow for > 10 s	≥ 30% decrease in airflow with 3% oxygen desaturation and/or arousal	3%	AASM 2012
Amra 2013 [31]	Iran	Berlin	157	Level 1—Lab PSG	AHI > 5	Complete cessation of airflow for ≥ 10 s	A reduction of airflow of ≥ 50% for ≥ 10 s associated with ≥ 3% desaturation	≥ 3%	AASM 1999
Amra 2018 [32]	China	Berlin, STOP-Bang	400	Level 1—Lab PSG	AHI ≥ 5	≥ 90% drop in airflow from baseline for ≥ 10 s	≥ 30% decrease in airflow with 3% oxygen desaturation and/or arousal	3%	AASM 2012
Oktay Arslan 2020 [33]	Turkey	Berlin	1003	Level 1—Lab PSG	AHI ≥ 5	≥ 90% drop in airflow from baseline for ≥ 10 s	≥ 30% decrease in airflow with 3% oxygen desaturation and/or arousal	3%	AASM 2012
Avincsal 2017 [34]	Turkey	STOP-Bang	162	Level 1—Lab PSG	AHI > 5	≥ 90% drop in airflow from baseline, which lasted ≥ 10 s	≥ 30% reduction in airflow that lasted ≥ 10 s and was associated with ≥ 4% desaturation	≥ 4%	AASM 2007
BaHammam 2015 [35]	Saudi Arabia	STOP-Bang	100	Level 1—Lab PSG	AHI ≥ 5	≥ 90% drop in airflow from baseline, which lasted ≥ 10 s	≥ 30% decrease in airflow with 3% oxygen desaturation and/or arousal	≥ 3%	AASM 2012
Boynton 2013 [36]	USA	STOP-Bang	219	Level 1—Lab PSG	AHI > 5	Complete absence of airflow for ≥ 10 s	≥ 50% decrease in airflow followed by an arousal, awakening, or ≥ 3% desaturation	≥ 3%	AASM 2007
Deflandre 2018 [37]	Belgium	STOP-Bang, OSA50	159	Level 1—Lab PSG	AHI > 30	≥ 90% drop in airflow from baseline, which lasted ≥ 10 s	≥ 30% reduction in airflow that lasted ≥ 10 s and associated with ≥ 4% desaturation	≥ 4%	AASM 2012
Delgado-Vargas 2020 [38]	Spain	STOP-Bang	193	Level 1—Lab PSG	AHI ≥ 5	≥ 90% drop in airflow from baseline, which lasted ≥ 10 s	≥ 30% reduction in airflow that lasted ≥ 10 s and associated with ≥ 4% desaturation	≥ 4%	AASM 2012
Duarte 2017 [39]	Brazil	STOP-Bang	456	Level 1—Lab PSG	AHI ≥ 5	≥ 90% drop in airflow from baseline, which lasted ≥ 10 s	≥ 30% decrease in airflow with 3% oxygen desaturation and/or arousal	≥ 3%	AASM 2012
Duarte 2020 [40]	Brazil	STOP-Bang	3606	Level 1—Lab PSG	AHI ≥ 5	≥ 90% drop in airflow from baseline, which lasted ≥ 10 s	≥ 30% decrease in airflow with 3% oxygen desaturation and/or arousal	≥ 3%	AASM 2012
El Sayed 2012 [41]	Egypt	Berlin, STOP, STOP-Bang	234	Level 1—Lab PSG	AHI ≥ 5	≥ 90% drop in airflow from baseline, which lasted ≥ 10 s	≥ 30% reduction in airflow that lasted ≥ 10 s and associated with ≥ 4% desaturation	≥ 4%	AASM 2007
Ha 2014 [42]	China	Berlin, STOP, STOP-Bang, ASA checklist	141	Level 1—Lab PSG	RDI > 5	≥ 90% drop in airflow from baseline, which lasted ≥ 10 s	≥ 30% reduction in airflow that lasted ≥ 10 s and associated with ≥ 4% desaturation	≥ 4%	AASM 2007
Hu 2019 [43]	China	STOP-Bang	196	Level 1—Lab PSG	AHI ≥ 5	≥ 90% drop in airflow from baseline, which lasted ≥ 10 s	≥ 30% decrease in airflow with 3% oxygen desaturation and/or arousal	3%	AASM 2012
Kashaninasab 2017 [44]	Iran	Berlin, STOP, STOP-Bang	250	Level 1—Lab PSG	AHI ≥ 5	A decrease in airflow that lasted ≥ 10 s	A reduction in airflow by ≥ 50% or a 3% desaturation	≥ 3%	AASM 2005
Khaledi-Paveh 2016 [45]	Iran	Berlin	100	Level 1—Lab PSG	AHI ≥ 5	≥ 90% drop in airflow from baseline, which lasted ≥ 10 s	≥ 30% reduction in airflow that lasted ≥ 10 s and associated with ≥ 4% desaturation	≥ 4%	AASM 2007
Kim 2015 [46]	Korea	Berlin, STOP-Bang, SA-SDQ	592	Level 1—Lab PSG	AHI > 5	≥ 90% drop in airflow from baseline, which lasted ≥ 10 s	≥ 30% reduction in airflow that lasted ≥ 10 s and was associated with ≥ 4% desaturation	≥ 4%	AASM 2007
Ong 2010 [47]	Singapore	STOP-Bang	314	Level 1—Lab PSG	AHI > 5	Cessation of airflow for > 10 s	≥ 30% decrease in airflow with 4% oxygen desaturation or with 3% oxygen desaturation and arousal	Unclear	Rechtschaffen & Kales 1968
Pataka 2016 [48]	Greece	STOP, STOP-Bang, Berlin	204	Level 1—Lab PSG	AHI ≥ 5	≥ 90% drop in airflow from baseline, which lasted ≥ 10 s	≥ 30% reduction in airflow that lasted ≥ 10 s and was associated with ≥ 4% desaturation	≥ 4%	AASM 2007
Pecotic 2012 [49]	Croatia	STOP	425	Level 1—Lab PSG	AHI > 5	Complete cessation of airflow for ≥ 10 s	A decrease in airflow > 50% from baseline for ≥ 10 and ≥ 3% desaturation	≥ 3%	Not referenced
Pereira 2013 [50]	Canada	Berlin, STOP-Bang	128	Level 1—Lab PSG	AHI ≥ 5	A cessation of airflow ≥ 50% for ≥ 10 s	A decrease in airflow of > 50% for ≥ 10 s followed by ≥ 3% oxygen desaturation	≥ 3%	AASM 2007
Perumalsamy 2017 [51]	Chennai	STOP-Bang, Berlin	62	Level 1—Lab PSG	AHI ≥ 5	Not described	Not described	Not described	AASM Manual—not referenced
Sadeghniiat-Haghighi 2015 [52]	Iran	STOP, STOP-Bang	603	Level 1—Lab PSG	AHI > 5	Total cessation of airflow for ≥ 10 s	Reduction of airflow for > 50% for ≥ 10 s with 3% desaturation or with arousal	3%	AASM 2007
Saleh 2011 [53]	Egypt	Berlin	100	Level 1—Lab PSG	AHI > 5	≥ 90% drop in airflow from baseline, which lasted ≥ 10 s	≥ 30% reduction in airflow that lasted ≥ 10 s and was associated with ≥ 4% desaturation	≥ 4%	AASM 2007
Sangkum 2017 [54]	USA	STOP, STOP-Bang	208	Level 1—Lab PSG	AHI > 5	≥ 90% drop in airflow from baseline, which lasted ≥ 10 s	≥ 30% decrease in airflow with 3% oxygen desaturation and/or arousal	≥ 3%	AASM 2012
Suksakorn 2014 [55]	Thailand	Berlin	132	Level 1—Lab PSG	AHI > 5	Not reported or referenced	Not reported or referenced	Not reported or referenced	Not reported or referenced
Vana 2013 [56]	USA	STOP-Bang	47	Level 1—Lab PSG	AHI > 5	Complete cessation of airflow > 10 s In paper	≥ 30% reduction in airflow that lasted ≥ 10 s and associated with 4% desaturation or arousal	4%	AASM 2007
Yüceege 2015 [57]	Turkey	Berlin	433	Level 1—Lab PSG	AHI > 5	≥ 90% drop in airflow from baseline, which lasted ≥ 10 s	A ≥ 50% drop in flow for ≥ 10 associated with ≥ 3% oxygen desaturation	≥ 3%	AASM 2007
Surgical population
Chung 2008a [58]	Canada	Berlin, STOP, ASA checklist	177	Level 1—Lab PSG	AHI > 5	Complete cessation of breathing	A decrease of > 50% of airflow for ≥ 10 s and > 3% desaturation or an arousal	> 3%	AASM1999
Chung 2008b [59]	Canada	STOP, STOP-Bang	177	Level 1—Lab PSG	AHI ≥ 5	Complete cessation of breathing	A decrease of > 50% of airflow for ≥ 10 s and > 3% desaturation or an arousal	≥ 3%	AASM 1999
Chung 2013 [60]	Canada	STOP-Bang	384	Level 2—portable PSG	AHI > 5	≥ 90% drop in airflow from baseline, which lasted ≥ 10 s	≥ 30% reduction in airflow that lasts ≥ 10 s and ≥ 4% desaturation	≥ 4%	AASM 2007
Chung 2014 [61]	Canada	STOP-Bang	516	Level 2—portable PSG	AHI > 5	≥ 90% drop in airflow from baseline, which lasted ≥ 10 s	≥ 50% reduction in airflow that lasted ≥ 10 s followed by ≥ 3% oxygen desaturation	≥ 3%	AASM 2007
Deflandre 2017 [62]	Belgium	STOP-Bang, OSA50	150	Level 1—Lab PSG	AHI > 5	≥ 90% drop in airflow from baseline, which lasted ≥ 10 s	≥ 30% decrease in airflow for ≥ 10 s with ≥ 3% oxygen desaturation and/or arousal	≥ 3%	AASM 2012
Nunes 2015 [63]	Brazil	STOP-Bang, Berlin	81	Level 1—Lab PSG	AHI ≥ 15	Apnoea was defined as complete cessation of airflow for ≥ 10 s	≥ 50% reduction in airflow for 10 s associated with desaturation of > 3% or an arousal	> 3%	AASM 2007
Xia 2018 [64]	China	STOP-Bang	790	Level 1—Lab PSG	AHI ≥ 5	≥ 90% drop in airflow from baseline, which lasted ≥ 10 s	≥ 30% reduction in air flow and ≥ 4% desaturation	≥ 4%	AASM 2012
Resistant hypertension
Giampá 2018 [65]	Brazil	Berlin	119	Level 1—Lab PSG	AHI > 5	≥ 90% drop in airflow from baseline, which lasted ≥ 10 s	At least 30% decrease in airflow with 3% desaturation and/or arousal	3%	AASM 2007
Margello 2014 [66]	Brazil	Berlin	422	Level 1—Lab PSG	AHI ≥ 5	≥ 90% drop in airflow from baseline, which lasted ≥ 10 s	At least 30% decrease of airflow for ≥ 10 s accompanied by desaturation ≥ 4%	≥ 4%	AASM 2007
Asthma population
Lu 2017 [67]	China	Berlin, STOP-Bang	123	Level 1—Lab PSG	AHI ≥ 5	Apnoea was defined as the absence of oronasal airflow for ≥ 10 s	≥ 50% reduction in airflow accompanied by oxygen desaturation ≥ 4%	≥ 4%	AASM 2007
Community clinic
Gantner 2010 [68]	China	Berlin	143	Level 1—Lab PSG	AHI ≥ 15	Not considered necessary to distinguish apnoeas from hypopnoeas	> 50% reduction in airflow for ≥ 10 s associated with > 3% desaturation or arousal	> 3%	AASM 1999
Highway bus drivers
Firat 2012 [69]	Turkey	Berlin, STOP, STOP-Bang, OSA50	85	Level 1—Lab PSG (daytime)	AHI > 15	≥ 90% decrease in airflow persisting for ≥ 10 s	≥ 50% decrease in the airflow with 3% desaturation or arousal, persisting for at least 10 s	≥ 3%	AASM 2007
Neurology population
ElKholy 2017 [70]	Egypt	Berlin	30	Level 1—Lab PSG	AHI ≥ 5	Not reported	Not reported	Not reported	Rechstchaffen and Kales (1968)
Primary care
Bouloukaki 2013 [71]	Crete	Berlin	129	Level 1—Lab PSG	AHI ≥ 5	≥ 90% drop in airflow from baseline, which lasted ≥ 10 s	≥ 30% reduction in airflow that lasted ≥ 10 s and was associated with ≥ 4% desaturation	≥ 4%	AASM 2007
Respiratory population
Yunus 2013 [72]	Malaysia	Berlin	144	Level 1—Lab PSG	AHI ≥ 5	Not reported	Not reported	Not reported	Not reported
Snoring clinic
Banhiran 2014 [73]	Thailand	STOP, STOP-Bang	303	Level 1—Lab PSG	AHI ≥ 5	≥ 90% drop in airflow from baseline, which lasted ≥ 10 s	≥ 30% reduction in airflow that lasted ≥ 10 s and was associated with ≥ 4% desaturation	≥ 4%	AASM 2007

Study characteristics Complete cessation of airflow > 10 s In paper We neither explored reporting bias, nor assessed publication bias due to the uncertainty about the determinants of publication bias for diagnostic accuracy studies, and the inadequacy of tests for detecting funnel plot asymmetry [74].

Results

Search results and study characteristics

Search results are summarised in Fig. 1.

Fig. 1

Flowchart of search results

Flowchart of search results Of 45 studies, 29 were included for meta-analysis in the sleep clinic population (n = 10,951), 7 were included for meta-analysis in the surgical population (n = 2275), and 2 were included in the resistant hypertension population (n = 541). The remaining 7 studies were excluded from the meta-analysis due to heterogeneity of included populations. Study characteristics and demographic data of the included studies are summarised in Tables 1 and 2 [29-73]. Overall, 10 clinical settings were identified, of which the sleep clinic, surgical, and resistant hypertension cohorts had sufficient studies for inclusion in the meta-analysis.

Table 2

Demographic data

Study	Age (mean ± SD)	Gender (%), male	BMI (kg/m²)	NC (cm) (mean ± SD)	WC (cm) (mean ± SD)	AHI (mean ± SD)
Sleep clinic population
Abdullah 2018 [29]	41.22 ± 12.66	63.4	n/a	n/a	n/a	n/a
Alhouqani 2015 [30]	42.87 ± 11.838	77.7	34.9 ± 8.602	39.5 ± 3.463	n/a	34.87 ± 31.273
Amra 2013 [31]	52.3 ± 13.6	55.4	31.5 ± 6	n/a	n/a	37.8 ± 30.8
Amra 2018 [32]	49.29 ± 9.75	58.5	32.4 ± 7.43	40.8 ± 3.13	n/a	n/a
Oktay Arslan 2020 [33]	50.65 ± 11.38	70	32.5 ± 5.9	41.6 ± 3.3	n/a	n/a
Avincsal 2017 [34]	50 ± 0.79	69.8	34.2 ± 0.42	41.3 ± 0.32	n/a	n/a
BaHammam 2015 [35]	46.6 ± 14	61.0	34.4 ± 7.8	38.0 ± 3.81	n/a	50 ± 37
Boynton 2013 [36]	46.3 ± 13.9	44.8	33.4 ± 8.76	39.9 ± 5.4	n/a	n/a
Deflandre 2018 [37]	55.8 ± 14	68.0	31.8 ± 12.07	n/a	n/a	n/a
Delgado-Vargas 2020 [38]	50.42 ± 12.05	73	29.83 ± 6.12	n/a	n/a	n/a
Duarte 2017 [39]	43.7 ± 12.5	63.0	32.1 ± 7.8	40.8 ± 4.3	n/a	24.6 ± 25.2
Duarte 2020 [40]	45.7 ± 14.6	54	32.9 ± 7.7	40.5 ± 4.8	n/a	n/a
El Sayed 2012 [41]	50.38 ± 11.29	85.5	37.8 ± 9.54	42.4 ± 4.26	n/a	45.57 ± 32.74
Ha 2014 [42]	45 ± 11	81.6	26.0 ± 4	n/a	n/a	25 ± 24
Hu 2019 [43]	18–70 (range)	84	27.23 ± 4.03	40.41 ± 3.7	n/a	n/a
Kashaninasab 2017 [44]	48.1 ± 12	76.0	n/a	40.2 ± 3.8	n/a	44.1 ± 31.2
Khaledi-Paveh 2016 [45]	47.8 ± 14.1	60.0	29.5 ± 6.1	n/a	n/a	24 ± 21.5
Kim 2015 [46]	47.6 ± 12.7	83.3	24.7 ± 3.5	39.2 ± 3.5	n/a	25.9 ± 21.8
Ong 2010 [47]	46.8 ± 15	70.5	27.9 ± 6	39.8 ± 4.1	n/a	26.2 ± 26.9
Pataka 2016 [48]	51.8 ± 13.8	77.5	32.8 ± 6.2	41.6 ± 3.9	1 ± 0.4	29.7 ± 24.7
Pecotic 2012 [49]	55	70.0	30.1 ± 4.7	n/a	n/a	31.4 ± 22.6
Pereira 2013 [50]	50 ± 12.3	65.6	31.0 ± 6.6	41.0 ± 4.4	n/a	33.1 ± 28
Perumalsamy 2017 [51]	53	59.7	n/a	n/a	n/a	n/a
Sadeghniiat-Haghighi 2015 [52]	45.8 ± 12.7	74.8	29.2 ± 5.9	39.7 ± 3.6	n/a	n/a
Saleh 2011 [53]	45	51.0	33.1	n/a	n/a	n/a
Sangkum 2017 [54]	52 ± 0.9	36.0	36.9 ± 0.7	40.6 ± 0.4	0.95 ± 0.01	n/a
Suksakorn 2014 [55]	48.15 ± 8.8	68.2	29.2 ± 6.8	n/a	n/a	28 ± 29.7
Vana 2013 [56]	46.4 ± 13.2	34.0	36.3 ± 9.2	n/a	n/a	n/a
Yüceege 2015 [57]	47.5 ± 10.5	65.8	31.1 ± 5.6	39.4 ± 3.9	102.9 ± 12.9	28.27 ± 26.5
Surgical population
Chung 2008a [58]	55 ± 13	49.7	30.0 ± 6	39.0 ± 6	n/a	20 ± 6
Chung 2008b [59]	55 ± 13	49.7	30.0 ± 6	39.0 ± 6	n/a	20 ± 6
Chung 2013 [60]	60 ± 11	46.0	31.2 ± 7	39.1 ± 4	n/a	n/a
Chung 2014 [61]	59.5 ± 12	54.0	30.6 ± 7	39.0 ± 4	n/a	n/a
Deflandre 2017 [61]	59.66 ± 12.41	70.0	32.4 ± 2.26	42.0 ± 4.64	n/a	n/a
Nunes 2015 [63]	56 ± 07	70.0	29.5 ± 5	n/a	n/a	n/a
Xia 2018 [64]	41.4 ± 10	58.5	28.5 ± 4.7	40.0 ± 4	n/a	13.1 ± 4.4
Resistant hypertension
Giampá 2018 [65]	52 ± 9	43	52 ± 9	40.0 ± 4	104 ± 14	27 ± 24
Margello 2014 [66]	62.4 ± 9.9	31.3	37.8 ± 3.7	37.8 ± 3.7	101 ± 12.1	n/a
Asthma population
Lu 2017 [67]	47.56 ± 12.12	57.7	26.4 ± 2.99	36.3 ± 2.97	n/a	15.07 ± 12.87
Community clinic
Gantner 2010 [68]	62.2 ± 7.6	40.5	26.6 ± 3.7	37.5 ± 4	n/a	n/a
Highway bus drivers
Firat 2012 [69]	48 ± 5.7	100	29.1 ± 3.8	41.1 ± 2.8	99.6 ± 9.8	21.1 ± 17.4
Neurology population
ElKholy 2017 [70]	50.67 ± 14.94	60	n/a	n/a	n/a	29.11 ± 33.16
Primary care
Bouloukaki 2013 [71]	47 ± 13	61.9	35.0 ± 25.1	n/a	n/a	41 ± 32
Respiratory population
Yunus 2013 [72]	44.7 ± 11.5	64	36.3 ± 11	39.3 ± 4.9	94.1 ± 16.9	38.8 ± 31.9
Snoring clinic
Banhiran 2014 [73]	49.6	61.4	27.5	37.05	n/a	n/a

Demographic data OSA obstructive sleep apnoea, AHI apnoea-hypopnoea index, RDI respiratory disturbance index, Lab laboratory, PSG polysomnography, AASM American Academy of Sleep Medicine. SD standard deviation, kg kilogramme, m metre, cm centimetre, NC neck circumference, WC waist circumference, AHI apnoea-hypopnoea index, n/a not applicable.

Methodological quality of included studies

Results of the QUADAS-2 assessment are summarised in Fig. 2 and Online Resource 4 [29-73].

Fig. 2

Risk of bias summary using the QUADAS-2 tool

Risk of bias summary using the QUADAS-2 tool In the patient selection domain, 3 studies were rated as high risk of bias due to the case–control study design. For both the index test and reference standard domains, 18 studies were rated as unclear risk of bias due to inadequate information related to blinding; it was unclear if the index test and reference standard findings were interpreted without the knowledge of the other. Thirty-four studies were rated as unclear risk of bias in the flow and timing domain due to lack of reporting on the time interval between the index test and the reference standard. Applicability was rated as low risk in all 45 studies.

Sleep clinic population

In the sleep clinic population (N = 10,951) (Fig. 3), the Berlin (score cut-off ≥ 2) (Online Resource 5), STOP (score cut-off ≥ 2), and STOP-Bang (score cut-off ≥ 3) (Online Resource 6) questionnaires were included in the meta-analysis [58, 75]. The ASA checklist, SA-SDQ, and STOP-Bang (cut-off ≥ 5) questionnaires were excluded due to insufficient studies.

Fig. 3

Questionnaire studies in sleep clinic population

Predictive parameters of the Berlin questionnaire (score cut-off ≥ 2)

The prevalence of AHI ≥ 5 (all OSA), AHI ≥ 15 (moderate to severe), and AHI ≥ 30 (severe) OSA was 84%, 64%, and 50% respectively. The pooled sensitivity of the Berlin questionnaire to predict all OSA, moderate–severe, and severe OSA was 85% (95% confidence interval (CI): 79%, 89%), 84% (95% CI: 79%, 89%), and 89% (95% CI: 80%, 94%) respectively. Pooled sensitivity remained consistent across OSA severity. Pooled specificity was 43% (95% CI: 30%, 58%), 30% (95% CI: 20%, 41%), and 33% (95% CI: 21%, 46%) respectively. The corresponding diagnostic odds ratio (DOR) were 4.3 (95% CI: 0.7, 7.8), 2.3 (95% CI: 1.3, 3.3), and 3.9 (95% CI: 2.1, 5.7) (Fig. 4, Table 3).

Fig. 4

Forest plots for Berlin questionnaire in sleep clinic population (generated using the software Review Manager 5.3, The Cochrane Collaboration)

Table 3

Summary statistics for Berlin, STOP, and STOP-Bang questionnaires in the sleep clinic population

Predictive parameters	Berlin AHI ≥ 5	STOP AHI ≥ 5	STOPB AHI ≥ 5	Berlin AHI ≥ 15	STOP AHI ≥ 15	STOPB AHI ≥ 15	Berlin AHI ≥ 30	STOP AHI ≥ 30	STOPB AHI ≥ 30
Predictive parameters	13 studies; n = 3503	7 studies; n = 2063	21 studies; n = 9250	11 studies; n = 3374	6 studies; n = 1638	19 studies; n = 8819	8 studies; n = 1345	6 studies; n = 1637	16 studies; n = 7203
Prevalence (% and range)	83.8 (14.71–97.2)	66.94 (13.79–97.20)	79.98 (13.73–97.20)	64 (25.49–86.72)	58.42 (25.49–86.75)	58.78 (25.00–86.75)	50.11 (27.0–72.65)	45.94 (28.37–76.00)	39.25 (28.31–83.0)
Sensitivity (95% CI)	0.848 (0.79, 0.891)	0.904 (0.824, 0.95)	0.919 (0.874, 0.949)	0.843 (0.785, 0.887)	0.903 (0.754, 0.966)	0.945 (0.920, 0.963)	0.886 (0.804, 0.936)	0.945 (0.883, 0.975)	0.959 (0.930, 0.976)
Specificity (95% CI)	0.433 (0.296, 0.582)	0.306 (0.148, 0.528)	0.345 (0.248, 0.457)	0.298 (0.204, 0.413)	0.29 (0.098, 0.606)	0.271 (0.181, 0.384)	0.334 (0.211, 0.458)	0.214 (0.104, 0.391)	0.282 (0.199, 0.384)
False positive rate	0.567 (0.418, 0.704)	0.694 (0.472, 0.852)	0.655 (0.543, 0.752)	0.702 (0.587, 0.796)	0.71 (0.394, 0.902)	0.729 (0.616, 0.819)	0.666 (0.515, 0.789)	0.786 (0.609, 0.896)	0.718 (0.616, 0.801)
Log LR + ve (95% CI)	1.497 (1.066, 1.927)	1.304 (0.970, 1.637)	1. 403 (1.232, 1.598)	1.201 (1.049, 1.353)	1.273 (0.904, 1.642)	1.296 (1.125, 1.466)	1.330 (0.110, 1.550)	1.203 (1.027, 1.279)	1.336 (1.184, 1.488)
Log LR − ve (95% CI)	0.350 (0.155–0.546)	0.312 (0.118–0.506)	0.235 (0.183, 0.301)	0.527 (0.361, 0.693)	0.333 (0.194, 0.472)	0.203 (0.123, 0.466)	0.343 (0.210, 0.475)	0.256 (0.154, 0.357)	0.146 (0.095, 0.196)
DOR (95% CI)	4.270 (0.718, 7.822)	4.174 (0.767, 7.581)	5.969 (4.410, 7.529)	2.279 (1.309, 3.249)	3.825 (1.7, 5.949)	6.383 (3.255, 9.511)	3.882 (2.06, 5.704)	4.704 (2.615, 6.794)	9.168 (5.932, 12.405)

AHI apnoea-hypopnoea index, STOPB STOP-Bang, CI confidence interval, LR likelihood ratio, DOR diagnostic odds ratio, LR likelihood ratio, + positive, − negative.

Forest plots for Berlin questionnaire in sleep clinic population (generated using the software Review Manager 5.3, The Cochrane Collaboration) Summary statistics for Berlin, STOP, and STOP-Bang questionnaires in the sleep clinic population AHI apnoea-hypopnoea index, STOPB STOP-Bang, CI confidence interval, LR likelihood ratio, DOR diagnostic odds ratio, LR likelihood ratio, + positive, − negative.

Predictive parameters of the STOP questionnaire (score cut-off ≥ 2)

The prevalence of AHI ≥ 5 (all OSA), AHI ≥ 15 (moderate to severe), and AHI ≥ 30 (severe) OSA was 67%, 58%, and 46% respectively. The pooled sensitivity of the STOP questionnaire to predict all OSA, moderate–severe, and severe OSA was 90% (95% CI: 82%, 95%), 90% (95% CI: 75%, 97%), and 95% (95% CI: 88%, 98%) respectively. The pooled specificity was 31% (95% CI: 15%, 53%), 29% (95% CI: 10%, 61%), and 21% (95% CI: 10%, 39%) respectively. The corresponding DOR were 4.2 (95% CI: 0.8, 7.6), 3.8 (95% CI: 1.7, 5.9), and 4.7 (95% CI: 2.6, 6.8) respectively (Fig. 5, Table 3). Greater uncertainty and variability in specificity were noted in the CI width and scatter of individual study estimates.

Fig. 5

Forest plots for STOP questionnaire in sleep clinic population (generated using the software Review Manager 5.3, The Cochrane Collaboration)

Predictive parameters of the STOP-Bang questionnaire (score cut-off ≥ 3)

The prevalence of AHI ≥ 5 (all OSA), AHI ≥ 15 (moderate to severe), and AHI ≥ 30 (severe) OSA was 80%, 59%, and 39%, respectively. The pooled sensitivity of the STOP-Bang questionnaire to predict all OSA, moderate–severe, and severe OSA was 92% (95% CI: 87%, 95%), 95% (95% CI: 92%, 96%), and 96% (95% CI: 93%, 98%) respectively. The pooled specificity was 35% (95% CI: 25%, 46%), 27% (95% CI: 18%, 34%), and 28% (95% CI: 20%, 38%) respectively. The corresponding DOR were 6.0 (95% CI: 4.4, 7.6), 6.4 (95% CI: 3.3, 9.5), and 9.2 (95% CI: 5.9, 12.4) respectively (Fig. 6, Table 3). Greater uncertainty and variability in specificity were noted in the CI width and scatter of individual trial estimates, particularly for AHI ≥ 5.

Fig. 6

Forest plots for STOP-Bang questionnaire in sleep clinic population (generated using the software Review Manager 5.3, The Cochrane Collaboration)

Forest plots for STOP-Bang questionnaire in sleep clinic population (generated using the software Review Manager 5.3, The Cochrane Collaboration) SROC plots were used to display the results of individual questionnaires in the ROC space, plotting each questionnaire as a single sensitivity–specificity point [24]. When we plotted the SROC for all three questionnaires on the same axes, the confidence regions of the Berlin, STOP, and STOP-Bang questionnaires, for all OSA (AHI ≥ 5) (Fig. 7) and severe OSA (AHI ≥ 30) (Fig. 9), overlapped, suggesting that there was no statistically significant difference in sensitivity among the 3 questionnaires.

Fig. 7

Summary ROC for Berlin, STOP, and STOP-Bang questionnaires AHI ≥ 5 (generated using the software Review Manager 5.3, The Cochrane Collaboration)

Fig. 9

Summary ROC for Berlin, STOP, and STOP-Bang questionnaires AHI ≥ 30 (generated using the software Review Manager 5.3, The Cochrane Collaboration)

Summary ROC for Berlin, STOP, and STOP-Bang questionnaires AHI ≥ 5 (generated using the software Review Manager 5.3, The Cochrane Collaboration) Figure 8 shows no overlap of the confidence regions for the Berlin and STOP-Bang questionnaires, suggesting a possible difference in sensitivity between the two questionnaires. A meta-regression model assuming equal variances for logit sensitivity and logit specificity suggested that the expected sensitivity or specificity differed between the two tests (chi-square = 14.1, 2df, p = 0.0008) (Fig. 9).

Fig. 8

Summary ROC for Berlin, STOP, and STOP-Bang questionnaires AHI ≥ 15 (generated using the software Review Manager 5.3, The Cochrane Collaboration)

Summary ROC for Berlin, STOP, and STOP-Bang questionnaires AHI ≥ 15 (generated using the software Review Manager 5.3, The Cochrane Collaboration) Summary ROC for Berlin, STOP, and STOP-Bang questionnaires AHI ≥ 30 (generated using the software Review Manager 5.3, The Cochrane Collaboration)

Surgical population

In the surgical population (n = 2710) (Fig. 10), we identified the Berlin, STOP, and STOP-Bang questionnaires for inclusion in the meta-analysis. The ASA checklist and OSA50 questionnaires were excluded from meta-analysis due to an insufficient number of studies. Nunes included two surgical cohorts, abdominal and coronary artery bypass grafting, which were entered as separate cohorts [63].

Fig. 10

Questionnaire studies in surgical population

Questionnaire studies in surgical population Two studies were included in the meta-analysis of the Berlin Questionnaire for moderate to severe OSA (AHI ≥ 15) (Fig. 11). Due to insufficient data, we were unable to conduct a meta-analysis for all (AHI > 5) and severe OSA (AHI > 30).

Fig. 11

Forest plot for Berlin questionnaire in surgical population for AHI ≥ 15 (generated using the software Review Manager 5.3, The Cochrane Collaboration)

Forest plot for Berlin questionnaire in surgical population for AHI ≥ 15 (generated using the software Review Manager 5.3, The Cochrane Collaboration) The prevalence of moderate to severe OSA or AHI of ≥ 15 was 42%. The pooled sensitivity of the Berlin questionnaire to predict moderate to severe OSA (AHI ≥ 15) was 76% (95% CI: 66%, 84%), and the pooled specificity was 47% (95% CI: 32%, 62%). The DOR was 2.9 (95% CI: 0.2, 5.5) (Table 4).

Table 4

Pooled predictive parameters of Berlin and STOP-Bang questionnaires in surgical population for AHI ≥ 15

Parameter (95% CI)	Berlin, AHI ≥ 15	STOP-Bang, AHI > 5	STOP-Bang, AHI ≥ 15	STOP-Bang, AHI ≥ 30
Parameter (95% CI)	2 studies, n = 258	4 studies, n = 1227	6 studies, n = 2098	3 studies, n = 1050
Prevalence (% and range)	41.86 (39.55–52.50)	71.56 (67.05–89.33)	32.79 (17.22–62.00)	21.24 (16.67–42.00)
Sensitivity (95% CI)	0.764 (0.661, 0.843)	0.846 (0.811, 0.876)	0.903 (0.871, 0.927)	0.96 (0.924, 0.979)
Specificity (95% CI)	0.468 (0.320, 0.623)	0.394 (0.298, 0.498)	0.269 (0.189, 0.367)	0.261 (0.232, 0.292)
False positive rate (95% CI)	0.532 (0.377, 0.680)	0.606 (0.502, 0.702)	0.731 (0.633, 0.811)	0.739 (0.708, 0.768)
Log LR + ve (95% CI)	1.437 (0.929, 1.945)	1.395 (1.187, 1.603)	1.235 (1.093, 1.377)	1.299 (1.236, 1.362)
Log LR − ve (95% CI)	0.504 (0.206, 0.802)	0.391 (0.304, 0.479)	0.361 (0.235, 0.487)	3.169 (2.502, 3.836)
DOR (95% CI)	2.849 (0.194, 5.505)	3.564 (2.311, 4.817)	3.421 (1.892, 4.949)	8.406 (2.65, 14.162)

CI confidence interval, LR likelihood ratio, DOR diagnostic odds ratio, + positive, − negative.

Pooled predictive parameters of Berlin and STOP-Bang questionnaires in surgical population for AHI ≥ 15 CI confidence interval, LR likelihood ratio, DOR diagnostic odds ratio, + positive, − negative. Two studies were eligible for inclusion in the STOP questionnaire meta-analysis for moderate to severe OSA (AHI ≥ 15). However, due to insufficient studies and large heterogeneity around the specificity, the STOP questionnaire was excluded from the meta-analysis (Fig. 12).

Fig. 12

Forest plot for STOP questionnaire in surgical population for AHI ≥ 15 (generated using the software Review Manager 5.3, The Cochrane Collaboration)

Forest plot for STOP questionnaire in surgical population for AHI ≥ 15 (generated using the software Review Manager 5.3, The Cochrane Collaboration) We included 6 studies in the meta-analysis of the STOP-Bang questionnaire for moderate to severe OSA (AHI ≥ 15) (Fig. 13).

Fig. 13

Forest plots for STOP-Bang questionnaire in surgical population (generated using the software Review Manager 5.3, The Cochrane Collaboration)

Forest plots for STOP-Bang questionnaire in surgical population (generated using the software Review Manager 5.3, The Cochrane Collaboration) The prevalence of AHI ≥ 5 (all OSA), AHI ≥ 15 (moderate to severe), and AHI ≥ 30 (severe) OSA was 72%, 33%, and 21%, respectively. The pooled sensitivity of the STOP-Bang questionnaire to predict all OSA, moderate–severe, and severe OSA was 85% (95% CI: 81%, 88%), 90% (95% CI: 87%, 93%), and 96% (95% CI: 92%, 98%) respectively. The pooled specificity was 40% (95% CI: 30%, 50%), 27% (95% CI: 19%, 37%), and 26% (95% CI: 21%, 46%). The corresponding DOR were 3.6 (95% CI: 2.3, 4.8), 3.4 (95% CI: 1.9, 4.9), and 8.4 (95% CI: 2.7, 14.2), respectively (Table 4). Compared to the Berlin and STOP questionnaires, individual trial estimates of sensitivity appeared to be more homogeneous for the STOP-Bang questionnaire (Figs. 11, 12, and 13).

Predictive performance of STOP-Bang questionnaires at various questionnaire scores

In the surgical population, two of six studies reported data at multiple cut-off points for the STOP-Bang questionnaire for moderate-to-severe OSA (AHI ≥ 15) [62, 63]. Increasing the threshold from 4 to 7 increased specificity from 31% (95% CI: 0.2, 0.4) to 96% (95% CI: 0.89, 0.99) and was greatest at cut-off values ≥ 6 and ≥ 7 (Table 5). However, increase in specificity was at the expense of a reduction in sensitivity.

Table 5

STOP-Bang questionnaire at different questionnaire cut-offs for moderate–severe OSA (AHI ≥ 15) in the surgical population

Predictive parameters	SBQ ≥ 4, AHI ≥ 15	SBQ ≥ 5, AHI ≥ 15	SBQ ≥ 6, AHI ≥ 15	SBQ ≥ 7, AHI ≥ 15
Predictive parameters	2 studies; n = 231
Prevalence (% and range)	56.71 (41.46–62)
Sensitivity (95% CI)	0.893 (0.828, 0.936)	0.640 (0.480, 0.774)	0.372 (0.211, 0.567)	0.120 (0.033, 0.353)
Specificity (95% CI)	0.310 (0.227, 0.407)	0.575 (0.458, 0.684)	0.807 (0.676, 0.894)	0.958 (0.889, 0.985)
False positive rate (95% CI)	0.690 (0.593, 0.773)	0.425 (0.316, 0.542)	0.193 (0.106, 0.324)	0.042 (0.015, 0.111)
Log LR + ve (95% CI)	1.294 (1.108, 1.481)	1.505 (1.052, 1.958)	1.930 (0.859, 3.000)	2.875 (− 2.580, 7.647)
Log LR − ve (95% confidence interval)	0.345 (0.147, 0.543)	0.627 (0.372, 0.881)	2.481 (0.547, 4.414)	0.918 (0.756, 1.080)
DOR (95% CI)	3.755 (1.135, 6.374)	2.402 (0.763, 4.041)	8.406 (2.650, 14.162)	3.131 (− 2.580, 8.841)

CI confidence interval, LR likelihood ratio, DOR diagnostic odds ratio, + positive, − negative, OSA obstructive sleep apnoea, AHI apnoea-hypopnoea index.

STOP-Bang questionnaire at different questionnaire cut-offs for moderate–severe OSA (AHI ≥ 15) in the surgical population CI confidence interval, LR likelihood ratio, DOR diagnostic odds ratio, + positive, − negative, OSA obstructive sleep apnoea, AHI apnoea-hypopnoea index.

Resistant hypertension population

We included 2 studies (n = 517) in the meta-analysis of the Berlin questionnaire (cut-off ≥ 2) for all OSA (AHI of ≥ 5) [65, 66]. Due to insufficient study data, we were unable to conduct a meta-analysis for moderate–severe (AHI > 15) and severe OSA (AHI > 30). The prevalence of all OSA or an AHI of ≥ 5 was 80%. The Berlin questionnaire’s pooled sensitivity to predict all OSA or AHI of ≥ 5 was 80% (95% CI: 60%, 92%), and the pooled specificity was 36% (95% CI: 21%, 55%). The DOR was 2.2 (95% CI: 0.7, 3.8).

Other cohorts

Asthma, community clinic, highway bus drivers, neurology clinic, primary care, respiratory and snoring clinic cohorts were identified but were excluded from the meta-analysis due to having only one study per cohort (Online Resource 7) [67-73].

Sensitivity analyses

Risk of bias

No studies were evaluated as high risk in the surgical and resistant hypertension populations; therefore, no sensitivity analyses were conducted. In the sleep clinic population, sensitivity analyses were conducted for the Berlin (Online Resource 8), STOP-Bang (Online Resource 9), and the STOP questionnaires for AHI > 5, AHI ≥ 15, and AHI ≥ 30 (Online Resource 10) excluding studies identified as high risk in any QUADAS-2 domain, unclear in four domains or outliers. We excluded one study for the STOP questionnaire for AHI > 5 [49], AHI ≥ 15 [44], and AHI ≥ 30 [44]. For the STOP-Bang questionnaire, we excluded five studies for AHI > 5 [29–31, 34, 46] and four studies for AHI ≥ 15 [30, 35, 38, 44] and AHI ≥ 30 [30, 35, 38, 44]. For the Berlin questionnaire AHI > 5 [45, 46, 53, 55] and AHI ≥ 15 [45, 46, 53, 55], we excluded four studies, and for an AHI ≥ 30 [44, 45, 55], we excluded three studies. Across all three questionnaires, exclusion of studies was associated with stable or slightly increased sensitivity. In contrast, sensitivity analysis was associated with reduced specificity (Online Resources 8–10). The STOP-Bang questionnaire remained the most effective questionnaire with the highest sensitivity compared to the Berlin and STOP questionnaires. Specificity among all three questionnaires remained low.

Desaturation and arousal criteria

Due to an insufficient number of studies, no sensitivity analysis was conducted in the resistant hypertension population. In the surgical population, the Berlin and STOP questionnaire studies utilised the ≥ 3% desaturation scoring criteria; therefore, no sensitivity analyses were conducted. For the STOP-Bang questionnaire, studies applied either ≥ 3% or ≥ 4% desaturation criteria. When we applied the ≥ 3% desaturation criteria to the STOP-Bang questionnaire, we excluded one study for AHI > 5 [60], two studies for AHI ≥ 15 [60, 64], and one study for AHI ≥ 30 [60]. In turn, when we applied the ≥ 4% desaturation criteria, we excluded four studies for AHI ≥ 15 [59, 61–63]. Across the three AHI thresholds, sensitivity remained stable, compared to a stable or slightly decreased sensitivity with application of the ≥ 3% desaturation criteria. For AHI ≥ 15, application of the ≥ 4% desaturation criterion was associated with a slight reduction in sensitivity and an increase in specificity (Online Resource 11). We conducted a sensitivity analysis in the sleep clinic population for the Berlin, STOP, and STOP-Bang questionnaires, applying both the ≥ 3% and ≥ 4% desaturation criteria respectively. Studies were excluded on the basis of high risk of bias, scoring criteria not specified, and desaturation criteria (≥ 3% or ≥ 4%) (Online Resource 12). Across all three questionnaires in the sleep clinic population, exclusion of studies was associated with stable sensitivity and reduced specificity, particularly when applying the ≥ 4% desaturation criterion (Online Resources 13, 14, 15). Overall, the STOP-Bang questionnaire remained the most effective questionnaire with the highest sensitivity compared to the Berlin and STOP questionnaires. Specificity among all three questionnaires remained low.

Discussion

This systematic review and meta-analysis investigated questionnaires’ accuracy and clinical utility as screening tools for OSA in adults in different clinical cohorts. Consistent with previous studies, our findings showed that the STOP-Bang questionnaire (score cut-off ≥ 3) suggested the highest sensitivity to detect OSA and the highest diagnostic odds ratio in both the sleep clinic and surgical populations [12, 18, 76]. However, the STOP-Bang questionnaire was limited by consistently low specificity across all AHI thresholds, resulting in high false positive rates. The Berlin questionnaire (score cut-off ≥ 2) appeared to be the least useful, demonstrating overall low sensitivity and low specificity across all three cohorts [12, 18, 77]. Although there was no comparison with other questionnaires in the resistant hypertension cohort, findings were comparable with the sleep clinic and surgical cohorts. OSA screening questionnaires are intended to provide the information required to identify patients most likely to benefit from downstream management decisions, such as onward referral for objective sleep testing and possible treatment following a positive full diagnostic test. The potential utility of OSA screening questionnaires in risk stratification of patients has been demonstrated in several cohorts. Not only has OSA been associated with risk of peri-operative complications and consequent longer length of hospital stay, but it has also been linked to poor clinical outcomes including higher rates of post CABG atrial fibrillation [78-80]. In the context of the ongoing coronavirus disease 2019 (Covid-19) pandemic, a recent study reported worse clinical outcomes in patients with Covid-19 classified by the Berlin questionnaire as high risk, compared to those at low risk, of OSA [81]. The study also highlighted the challenges with objective assessment of OSA with PSG during the Covid-19 pandemic, emphasising the need for alternative approaches beyond PSG, such as validated screening questionnaires. In this context, we would encourage the assessment and validation of OSA screening questionnaires, in particular STOP-Bang, as screening tools for risk stratification appropriate clinical settings, with the aim of improving outcomes for patients. Although sensitivity and specificity provide us with the necessary information to discern between the available screening questionnaires, the clinical value and application of the screening questionnaires are demonstrated by means of the positive and negative predictive values which are dependent on the prevalence of the disease in the given clinical population. Although we were unable to pool the predictive values of individual questionnaires due to variation in prevalence across studies, the point estimates of PPV and negative predictive value (NPV) for the STOP-Bang questionnaire in both the sleep clinic and surgical population (Online Resource 16) demonstrated an increase in NPV as OSA severity increases. The combination of high sensitivity and NPV of the STOP-Bang questionnaire is therefore useful to help clinicians exclude patients with low risk of clinically significant OSA. At the same time, the low specificity of the STOP-Bang questionnaire (and therefore its relative inability to correctly identify patients without OSA) leads to a high rate of false positive findings; this may have emotional and cognitive implications for individual patients with added consequences for clinical services, not least cost [80, 82]. This systematic review’s main strength lies in our comprehensive literature search with stringent eligibility criteria to identify all relevant studies reporting on the accuracy and clinical utility of existing OSA screening questionnaires that were validated against the gold standard PSG. Our inclusion of the LILACS database expanded our search to include Latin America and the Caribbean studies. Of previous reports, the review by Ramachandran [17] was limited to a search of two databases, English publications only, and omitted any grey literature sources in their search strategy. Additionally, it was unclear if Ross [16] and Abrishami [12] included any grey literature sources in their searches. Two independent reviewers completed data extraction, and we used the QUADAS-2 tool to assess rigorously all included studies for risk of bias. To evaluate the robustness of the meta-analysis, we conducted sensitivity analyses to investigate the potential influence on our findings from studies at high, or unclear, risk of bias. Although our study did not explore source differences from an ethnicity or geographical perspective, we conducted a further sensitivity analysis to evaluate the impact of varying scoring criteria on our study findings. The utilisation of different AASM scoring criteria and desaturation (and arousal) thresholds across studies created a source of variability [83-85]. Although the definition for apnoeas remained stable, there has been much controversy about the definition of hypopnoeas, specific to flow reduction, oxygen desaturation, and the presence or absence of arousal [86]. Varying definitions of hypopnoea not only impacts on prevalence estimates but is likely to underestimate OSA in patients who may benefit from treatment [86]. A study by Guilleminault et al. (2009) showed that by using the 30% flow reduction and 4% desaturation without arousal criteria would have missed 40% of patients who were identified using the criteria with arousal and who were responsive to CPAP therapy with reduction in AHI and symptomatic improvement [87]. On this background, our review is based on a larger number of studies than prior analyses [12, 16, 17]. Although the review by Chiu [18] encompassed a larger dataset, that report carried a greater risk of bias due to the inclusion of retrospective studies and studies that used PSG and portable monitoring as the reference standard. This review considered all existing OSA screening questionnaires for inclusion. In contrast, Chui [18] pre-selected four questionnaires, including the ESS, which was not developed as a screening questionnaire, but as a measure of daytime sleepiness. Similar to Abrishami [12] and Chui [18], our review focused on questionnaires only, in contrast to Ross [16] and Ramachandran [17], who also included portable monitoring and clinical prediction tools, respectively. There are a number of limitations to this work. Our findings are influenced by the limitations of the included studies. In several, the true risk of bias was unclear in several of the QUADAS-2 domains due to underreporting in the index test, reference standard, and flow and timing domains. Similarly, it was often unclear if the results of the index test and the reference standard were interpreted independently. Very few studies provided adequate information to determine if the time interval between the index test and the reference standard was appropriate. Our decision to exclude seven additional clinical cohorts may be considered a limitation; however, in the context of unclear, and possibly substantial, differences among these studies in the patient spectrum and disease prevalence, we felt it appropriate not to include these in the meta-analysis. Because the accuracy of screening tools varies according to the spectrum of disease, this further reiterates the need for validation studies in similar clinical cohorts. There was a high degree of heterogeneity among included studies with the possibility of selection bias, especially in the sleep clinic population. Consequently, reported sensitivity estimates will be higher than lower-risk populations, making it difficult to extrapolate the true utility of the questionnaire in clinical practice. In conclusion, our review investigated the accuracy and clinical utility of existing OSA screening questionnaires in different clinical cohorts. While the STOP-Bang questionnaire had a high sensitivity to detect OSA in both the sleep clinic and surgical cohorts, it lacked adequate specificity. This review highlights the issue of low specificity across OSA screening questionnaires. Research is required to explore reasons for low specificity and strategies for improvement, ideally without reducing sensitivity. The validation of screening questionnaires in sleep clinic populations is limited by possible selection and spectrum bias, reiterating the need for diagnostic validation studies in clinically similar cohorts. Additionally, further research is needed in resistant hypertension and other at-risk populations that we could not include in the meta-analysis. Improvement in the conduct and reporting of diagnostic validation studies must ensure quality and low risk of bias. Finally, to enable the extrapolation of the true accuracy and clinical utility of screening questionnaires, validation studies of high methodological quality in comparable, clinically relevant cohorts are required. Below is the link to the electronic supplementary material. Supplementary file1 (78.0 KB)

74 in total

Review 1. Obstructive sleep apnea and heart failure: pathophysiologic and therapeutic implications.

Authors: Takatoshi Kasai; T Douglas Bradley
Journal: J Am Coll Cardiol Date: 2011-01-11 Impact factor: 24.094

2. Predictive abilities of the STOP-Bang and Epworth Sleepiness Scale in identifying sleep clinic patients at high risk for obstructive sleep apnea.

Authors: Kimberly D Vana; Graciela E Silva; Rochelle Goldberg
Journal: Res Nurs Health Date: 2012-09-24 Impact factor: 2.228

3. Incorporating body-type (apple vs. pear) in STOP-BANG questionnaire improves its validity to detect OSA.

Authors: Lisa Sangkum; Ikrita Klair; Chok Limsuwat; Sabrina Bent; Leann Myers; Supat Thammasitboon
Journal: J Clin Anesth Date: 2017-01-08 Impact factor: 9.452

4. Using the Berlin Questionnaire to identify patients at risk for the sleep apnea syndrome.

Authors: N C Netzer; R A Stoohs; C M Netzer; K Clark; K P Strohl
Journal: Ann Intern Med Date: 1999-10-05 Impact factor: 25.391

5. BMI 35 kg/m² does not fit everyone: a modified STOP-Bang questionnaire for sleep apnea screening in the Chinese population.

Authors: Ming Xia; Su Liu; Ningning Ji; Jianguo Xu; Zhiqiang Zhou; Jianhua Tong; Yongmei Zhang
Journal: Sleep Breath Date: 2018-01-10 Impact factor: 2.816

6. Potential underdiagnosis of obstructive sleep apnoea in the cardiology outpatient setting.

Authors: Lucas E Costa; Carlos Henrique G Uchôa; Rebeca R Harmon; Luiz A Bortolotto; Geraldo Lorenzi-Filho; Luciano F Drager
Journal: Heart Date: 2015-04-20 Impact factor: 5.994

7. Pre-Operative Ability of Clinical Scores to Predict Obstructive Sleep Apnea (OSA) Severity in Susceptible Surgical Patients.

Authors: E Deflandre; S Degey; J-F Brichant; A-F Donneau; R Frognier; R Poirrier; V Bonhomme
Journal: Obes Surg Date: 2017-03 Impact factor: 4.129

8. Diagnostic properties of the STOP-Bang and its modified version in screening for obstructive sleep apnea in Thai patients.

Authors: Wish Banhiran; Anuch Durongphan; Chopetch Saleesing; Cheerasook Chongkolwatana
Journal: J Med Assoc Thai Date: 2014-06

9. The use of clinical prediction formulas in the evaluation of obstructive sleep apnea.

Authors: J A Rowley; L S Aboussouan; M S Badr
Journal: Sleep Date: 2000-11-01 Impact factor: 5.849

10. STOP questionnaire: a tool to screen patients for obstructive sleep apnea.

Authors: Frances Chung; Balaji Yegneswaran; Pu Liao; Sharon A Chung; Santhira Vairavanathan; Sazzadul Islam; Ali Khajehdehi; Colin M Shapiro
Journal: Anesthesiology Date: 2008-05 Impact factor: 7.892

3 in total

1. A New Berlin Questionnaire Simplified by Machine Learning Techniques in a Population of Italian Healthcare Workers to Highlight the Suspicion of Obstructive Sleep Apnea.

Authors: Giorgio De Nunzio; Luana Conte; Roberto Lupo; Elsa Vitale; Antonino Calabrò; Maurizio Ercolani; Maicol Carvello; Michele Arigliani; Domenico Maurizio Toraldo; Luigi De Benedetto
Journal: Front Med (Lausanne) Date: 2022-05-25

2. Brazilian Thoracic Association Consensus on Sleep-disordered Breathing.

Authors: Ricardo Luiz de Menezes Duarte; Sonia Maria Guimarães Pereira Togeiro; Luciana de Oliveira Palombini; Fabíola Paula Galhardo Rizzatti; Simone Chaves Fagondes; Flavio José Magalhães-da-Silveira; Marília Montenegro Cabral; Pedro Rodrigues Genta; Geraldo Lorenzi-Filho; Danielle Cristina Silva Clímaco; Luciano Ferreira Drager; Vitor Martins Codeço; Carlos Alberto de Assis Viegas; Marcelo Fouad Rabahi
Journal: J Bras Pneumol Date: 2022-07-08 Impact factor: 2.800

3. Cross Sectional Study of the Community Self-Reported Risk of Obstructive Sleep Apnoea (OSA) and Awareness in Thessaly, Greece.

Authors: Petros Kassas; Georgios D Vavougios; Chrissi Hatzoglou; Konstantinos I Gourgoulianis; Sotirios G Zarogiannis
Journal: Clocks Sleep Date: 2022-02-10

3 in total