Literature DB >> 33907946

Identifying Novel Biomarkers Ready for Evaluation in Low-Prevalence Populations for the Early Detection of Lower Gastrointestinal Cancers: A Systematic Review and Meta-Analysis.

Paige Druce¹, Natalia Calanzani², Claudia Snudden², Kristi Milley³, Rachel Boscott², Dawnya Behiyat², Javiera Martinez-Gutierrez⁴, Smiji Saji², Jasmeen Oberoi³, Garth Funston², Mike Messenger⁵, Fiona M Walter^3,2, Jon Emery^3,2.

Abstract

INTRODUCTION: Lower gastrointestinal (GI) cancers are a major cause of cancer deaths worldwide. Prognosis improves with earlier diagnosis, and non-invasive biomarkers have the potential to aid with early detection. Substantial investment has been made into the development of biomarkers; however, studies are often carried out in specialist settings and few have been evaluated for low-prevalence populations.
METHODS: We aimed to identify novel biomarkers for the detection of lower GI cancers that have the potential to be evaluated for use in primary care. MEDLINE, Embase, Emcare and Web of Science were systematically searched for studies published in English from January 2000 to October 2019. Reference lists of included studies were also assessed. Studies had to report on measures of diagnostic performance for biomarkers (single or in panels) used to detect colorectal or anal cancers. We included all designs and excluded studies with fewer than 50 cases/controls. Data were extracted from published studies on types of biomarkers, populations and outcomes. Narrative synthesis was used, and measures of specificity and sensitivity were meta-analysed where possible.
RESULTS: We identified 142 studies reporting on biomarkers for lower GI cancers, for 24,844 cases and 45,374 controls. A total of 378 unique biomarkers were identified. Heterogeneity of study design, population type and sample source precluded meta-analysis for all markers except methylated septin 9 (mSEPT9) and pyruvate kinase type tumour M2 (TuM2-PK). The estimated sensitivity and specificity of mSEPT9 was 80.6% (95% CI 76.6-84.0%) and 88.0% (95% CI 79.1-93.4%) respectively; TuM2-PK had an estimated sensitivity of 81.6% (95% CI 75.2-86.6%) and specificity of 80.1% (95% CI 76.7-83.0%).
CONCLUSION: Two novel biomarkers (mSEPT9 and TuM2-PK) were identified from the literature with potential for use in lower-prevalence populations. Further research is needed to validate these biomarkers in primary care for screening and assessment of symptomatic patients.

Entities: Chemical Disease Gene Species

Keywords: Biomarkers; Clinical practice; Colorectal cancer; Early detection; Lower gastrointestinal cancers; Primary care

Year: 2021 PMID： 33907946 PMCID： PMC8078393 DOI： 10.1007/s12325-021-01645-6

Source DB: PubMed Journal: Adv Ther ISSN： 0741-238X Impact factor: 3.845

Key Summary Points

Digital Features

This article is published with digital features, including a summary slide, to facilitate understanding of the article. To view digital features for this article go to 10.6084/m9.figshare.13664042.

Introduction

Gastrointestinal (GI) cancers account for over 25% of global cancer incidence and 35% of all cancer-related deaths [1]. Lower GI cancers, particularly colorectal cancer (CRC), contribute the most significant proportion with over 1.8 million new cases in 2018 [1]. CRC is the most commonly diagnosed GI cancer and constitutes 1 in 10 cancer cases and deaths [1]. Around 90% of patients with cancer first present with symptoms in primary care, highlighting a key role for primary care providers in the early detection of GI cancers [2, 3]. Diagnosis of GI cancers can prove challenging in the community setting: while gastrointestinal symptoms are commonly encountered, they are usually due to benign or self-limiting conditions and rarely to GI cancer [3]. Initial symptoms are often non-specific, and more specific symptoms usually represent more advanced disease [3]. Increased demand for diagnostic services for lower GI cancer and pressure on waiting times have been seen internationally in countries like Australia, the UK and Canada, where primary care plays a ‘gatekeeping’ role to specialist care. In many countries implementation of faecal occult blood tests (FOBTs) or faecal immunochemical tests (FITs) for CRC screening and diagnostic triage adds further pressure on colonoscopy services. In some healthcare systems, over-screening via colonoscopy is also an issue [4]. New diagnostic approaches are needed to help reduce the burden on specialist care, particularly in the current context of COVID-19 and associated delays in access to cancer diagnostic and treatment services [5]. There is considerable interest in the potential of biomarkers to detect GI cancers [3]. To date, carcinoembryonic antigen (CEA) and carbohydrate antigen 19-9 (CA19-9) have played an important role in clinical practice to detect recurrent disease, but their diagnostic performance is inadequate for the early detection of new disease [3, 6, 7]. Substantial investment has been made developing new biomarkers for early detection, but most studies of these tests have occurred in specialist clinical settings [8] where cancer prevalence is higher than in the community settings where they would eventually be applied [9, 10] The performance characteristics of a diagnostic test are strongly determined by the prevalence and severity of the target disease and of other diseases within the study population [9]. In populations where the prevalence of the target disease is low (e.g. GI cancer in primary care), the corresponding positive predictive values (PPV) are lower than in high-prevalence populations. Tests that are evaluated only in these high-prevalence populations tend to have lower sensitivity and higher specificity when translated to low-prevalence populations [10, 11]. This is known as the ‘spectrum effect’, and has crucial implications for comparing the performance of tests in different populations [9, 10]. In recognition of these issues, the CanTest Framework was developed (Fig. 1) [10]. This novel framework encompasses a translational pathway for diagnostic tests, from new test discovery to health system implementation in low-prevalence populations [11]. The framework highlights the importance of evaluating clinical performance, implementation, patient safety, quality of care, and cost-effectiveness in the intended setting. It is vital that these elements are investigated alongside test performance in order to ascertain clinical utility and improved outcomes for patients [8].

Fig. 1

The CanTest framework.

Source: Walter et al. 2019 [10]

The CanTest framework. Source: Walter et al. 2019 [10] This review aimed to systematically identify novel biomarkers for the early detection of lower GI cancers that have measures of diagnostic performance and show sufficient promise to warrant further evaluation in low-prevalence populations.

Methods

Search Strategy and Inclusion/Exclusion Criteria

These have been reported elsewhere [12]. The protocol for this review was registered on PROSPERO (registration ID CRD42020165005) and the Preferred Reporting Items for Systematic Reviews and Meta-Analysis of Diagnostic Test Accuracy Studies (PRISMA-DTA) statement was followed [13]. MEDLINE, Embase, Emcare and Web of Science were electronically searched for primary studies published in English between the 1 January 2000 and 31 October 2019. The search strategy was developed with the assistance of a medical librarian (Appendix 1 in the supplementary material). Studies eligible for inclusion were situated within phase 2 (i.e. providing measures of diagnostic accuracy beyond discovery, even if carried out in high-prevalence settings) and phase 3 (i.e. examining diagnostic accuracy in intended low-prevalence settings, and providing measures of clinical utility, including feasibility and acceptability) of the CanTest framework [10] (Fig. 1). We included studies which reported measures of diagnostic performance in an independent population (i.e. beyond measures from the initial discovery phase). Studies were excluded if no references to previous research evaluating biomarker performance were available, and if the study provided only one set of performance measures (reflecting discovery phase only). As studies beyond the discovery phase require larger sample sizes [14, 15], we included those which reported data on at least 50 cancer cases and at least one group of homogeneous non-cancer controls (n ≥ 50) with similar clinical characteristics (e.g. healthy, or with non-malignant conditions), as in previous reviews [15, 16]. We included studies on non-invasive biomarkers feasible for use in the community setting: blood (serum and plasma), urine, faecal, salivary or breath samples. Both observational (cohort or case–control, cross-sectional or longitudinal, prospective or retrospective) and experimental designs were eligible for inclusion. Studies undertaken in all recruitment settings were included. We included studies if they reported on at least one measure of diagnostic performance, namely sensitivity, specificity, PPV, negative predictive value (NPV), false positive, false negative or area under the curve (AUC) for biomarkers used to detect lower GI cancers, including colorectal (colon, rectum, caecum) or anal cancers, in adult populations (mean/median age 18 or older; studies including individuals aged less than 18 were accepted if these were outliers in large samples). Non-specified GI cancers, neuroendocrine cancers, and studies only reporting on familial populations at risk of hereditary cancers were excluded. Novel biomarkers were considered individually, in combination or as part of a panel. Studies reporting only on a single, established biomarker (CEA, CA19-9, or FIT or FOBT) were excluded [16, 17]. Studies providing measures of diagnostic performance for combinations of established and novel biomarkers were included. Covidence systematic review software [18] was used to facilitate article screening. Titles and abstracts were screened independently by two reviewers (any two of PD, NC, CS, KM, DB or RB). Full-text articles were also independently evaluated for inclusion by two reviewers (any two of the aforementioned). Reference lists of included studies were manually reviewed by one author to identify additional studies (NC). Full-text articles selected at this stage were also independently assessed by two reviewers (any two of PD, NC, RB or DB). Disagreements were resolved by consensus; when this could not be reached, a senior reviewer was consulted (JE or FMW).

Data Extraction and Analysis

Data extraction was piloted to ensure consistency and was performed independently (by SS, DB, RB, JMG, JO). Information on study characteristics, populations, biomarkers and measures of diagnostic performance were extracted. When studies reported on different phases of biomarker development, only data from the eligible phases were extracted. When studies had more than one eligible phase, data were extracted for all eligible phases. Extracted data were collated and checked for consistency and inaccuracies (PD). Biomarkers were categorised according to a modified version of the classifications reported by Uttley et al. [15]: microRNAs and other RNAs, autoantibodies and other immunological markers, other proteins (i.e. proteins that did not fit into other categories), metabolic markers, DNA-related markers (protein-coding genes, gene mutations), circulating tumour DNA, DNA methylation markers and other biomarkers. Controls were classified as normal/healthy, having non-malignant conditions or those with adenomas/polyps. Controls described as healthy were coded as such unless studies described underlying conditions. Full details of the control population classification are available in Appendix 2 in the supplementary material.

Quality Assessment and Risk of Bias

Considering the key issue of spectrum bias, studies were classified as either single-gate or two-gate designs. Single-gate studies recruit participants before disease status is known, with a single route of entry, and with the same inclusion criteria. Two-gate studies recruit participants through different routes and use different criteria for cases and controls. This can lead to over-inflated measures of diagnostic performance for example if there is an over-representation of individuals with advanced disease within the study population and comparison with the ‘fittest of the fit’ healthy controls [11]. One author (PD) classified all studies and another (NC) checked the classification. A full description of this classification and how it approaches issues covered by the QUADAS-2 critical appraisal tool [19] is available in Appendix 3 in the supplementary material. Studies included in the meta-analysis were assessed using the QUADAS-2 [19] tool by two reviewers (PD and NC).

Data Synthesis

As significant heterogeneity was anticipated, we used narrative synthesis to summarise the data [20]. An overview of the evidence was developed to describe the key characteristics of the included studies, their populations, biomarkers and outcome measures. Data were examined for similarities that would allow for subgroup analyses, namely the same biomarker, with similar study design and appropriate accuracy performance measures. For meta-analysis to occur, biomarkers had to be investigated in more than two studies, with individual outcome measures provided, similar populations included and a single-gate study design. We focused the analysis on single-gate studies, as this design reduces spectrum bias, and is more likely to provide results that translate for use in low-prevalence populations. Meta-analysis of diagnostic test accuracy was performed using MetaDTA (version 1.43) [21] and RevMan (5.3) [22] software. For meta-analysis, we used the random effects bivariate binomial model of Chu and Cole fitted as a generalised linear mixed effect model [21]. Sensitivity and specificity were jointly modelled and the estimates from each study were assumed to vary [21]. Hierarchical summary receiver operating characteristic (HSROC) parameters were estimated using the bivariate model parameters. Summary points of sensitivity and specificity were presented alongside forest plots and SROC curves. Heterogeneity and threshold effects were evaluated using the SROC plots and random effects correlation.

Compliance with Ethics Guidelines

This article is based on previously conducted studies and does not contain any studies with human participants or animals performed by any of the authors.

Results

A total of 16,597 records were identified in database searches; 9172 were retained after removing duplicates (Fig. 2). During title and abstract screening 8179 records were excluded. After assessing the full text of the remaining 993 records, 731 of them were excluded. Of the remaining studies, 142 are included in this review. The characteristics of included studies are described further in Table 1, and measures of diagnostic performance are described in supplementary Table 1.

Fig. 2

Study selection

Table 1

Characteristics of included studies: country, setting and population

Author (year)	Country (population)	Setting^a		Cases and controls						Biomarkers
		Setting^a		Cases (N)		Controls (N)				Type (N)
		Hosp	Other	CRC	A/P	All	HC	NM	A/P	miRNA	Autoab	DNA	Protein	Metab	ctDNA	Other^b
Ahlquist et al. (2012) [23]	USA, Canada, Denmark	–	X	82	0	960	0	0	0	–	–	1	1	–	–	4
Amiot et al. (2014) [24]	France	–	X	66	0	123	123	0	34	–	–	–	–	–	–	3
Bagaria et al. (2013) [25]	India	X	–	50	0	50	50	0	0	–	–	–	2	–	–	–
Broll et al. (2001) [26]	Germany	U	U	122	0	65	65	0	0	–	–	–	2	–	–	–
Bunger et al. (2012) [27]	Germany	–	X	83	60	50	50	0	0	–	1	–	2	–	–	–
Calistri et al. (2009) [28]	Italy	X	–	100	0	100	100	0	0	2	–	–	–	–	–	–
Cao et al. (2019a) [29]	China	X	U	62	0	59	59	0	0	–	–	–	–	1	–	–
Cao et al. (2019b) [30]	China	X	–	118	0	85	85	0	0	–	1	–	2	1	–	–
Chan et al. (2010) [31]	Taiwan, China	X	–	94	0	54	54	0	0	–	5	–	1	–	–	–
Chang et al. (2016) [32]	Taiwan, China	U	U	229	0	368	368	0	0	2	–	–	–	–	–	–
Chang et al. (2011) [33]	China	X	–	60	60	57	57	0	0	–	5	–	1	–	–	–
Chang et al. (2014) [34]	Taiwan, China	U	U	56	0	120	120	0	0	5	–	–	–	–	–	–
Chao et al. (2013) [35]	Canada, USA	X	X	316	0	328	328	0	0	7	–	–	–	–	–	–
Chen et al. (2019) [36]	China	X	–	111	0	114	114	0	0	–	–	–	–	–	–	3
Choi et al. (2018) [37]	Korea	U	U	69	0	74	74	0	0	–	–	–	–	–	–	11
Church et al. (2014) [38]	USA, Germany	–	X	53	0	1457	934	0	523	–	–	–	–	–	–	1
Ciarloni et al. (2016) [39]	Switzerland, South Korea	X	–	52	0	74	74	0	0	27	–	–	4	–	–	–
DeVos et al. (2009) [40]	Germany, USA	U	U	90	0	155	155	0	0	–	–	–	–	–	–	1
Duran-Sanchon et al. (2019) [41]	Spain	–	X	67	0	353	217	0	483	9	–	–	–	–	–	–
Duvillard et al. (2014) [42]	France	X	–	224	0	252	0	252	0	–	1	–	1	–	–	–
Fan et al. (2017) [43]	Taiwan, China	U	U	92	0	100	100	0	0	–	4	–	1	–	–	–
Fernandes et al. (2005) [44]	Brazil	X	–	169	169	100	100	0	0	–	1	–	1	–	–	–
Flamini et al. (2006) [45]	Italy	X	–	75	57	75	75	0	0	–	–	–	1	–	1	–
Fouad et al. (2017) [46]	Egypt	U	U	50	0	50	0	50	0	1	–	–	–	–	–	–
Fu et al. (2018) [47]	China	X	–	98	0	329	253	76	0	–	–	–	–	–	–	1
Fung et al. (2015) [48]	Australia	X	–	98	0	99	0	99	0	–	–	–	7	–	–	–
Gao et al. (2018) [49]	China	X	–	279	104	74	0	0	0	–	–	–	3	–	–	–
Groblewska et al. (2010) [50]	Poland	X	U	91	0	91	91	0	0	–	–	–	4	–	–	–
Grutzmann et al. (2008) [51]	Germany	U	U	127	0	369	184	149	0	–	–	–	–	–	–	1
Guo et al. (2018) [52]	China	X	–	107	0	218	120	0	98	5	–	–	–	–	–	–
Han et al. (2019) [53]	South Korea	X	–	245	199	245	245	245	0	–	–	–	–	–	–	1
Hao et al. (2018) [54]	China	U	U	186	0	97	97	0	0	2	–	–	–	–	–	–
Hata et al. (2017) [55]	Japan	X	X	225	0	916	916	0	0	–	–	–	–	1	–	–
Haug et al. (2007) [56]	Germany	–	X	65	0	917	917	0	0	–	–	–	1	–	–	–
He et al. (2010) [57]	China	X	–	182	0	170	170	0	0	–	–	–	–	–	–	3
Herreros-Villanueva et al. (2019) [58]	Spain	X	–	96	78	101	100	1	0	6	–	–	–	–	–	–
Imaoka et al. (2016) [59]	Japan	X	–	211	0	57	57	0	0	1	–	–	–	–	–	–
Imperiale et al. (2014) [60]	USA, Canada	X	–	65	0	9167	6274	0	2893	1	–	–	–	–	–	2
Jaberie et al. (2019) [61]	Iran	X	U	113	242	51	50	0	0	–	–	–	2	–	–	–
Jensen et al. (2018) [62]	Denmark	X	–	143	0	91	91	0	0	–	–	–	–	–	3	–
Jin et al. (2015) [63]	China	X	–	135	0	341	91	0	250	1	–	–	–	–	–	–
Johnson et al. (2014) [64]	USA	U	U	101	0	171	94	0	77	–	–	–	1	–	–	1
Jones et al. (2016) [65]	USA	U	U	68	0	68	68	0	0	–	1	–	12	–	–	–
Karam et al. (2018) [66]	Egypt	X	U	65	0	70	70	0	0	–	–	–	–	–	–	1
Karl et al. (2008) [67]	Germany	X	–	186	0	252	252	0	0	–	–	–	5	–	–	–
Kim et al. (2017) [68]	Korea	X	–	166	60	336	120	81	135	–	–	–	2	–	–	–
Kim et al. (2015) [69]	Korea	X	–	139	0	60	60	0	0	–	–	–	2	–	–	–
Koga et al. (2013) [70]	Japan	X	–	117	0	107	107	0	0	1	–	–	1	–	–	–
Lee et al. (2013a) [71]	Korea	X	–	101	0	96	96	0	0	–	–	–	–	–	–	1
Lee et al. (2013b) [72]	Korea	X	U	132	0	228	124	67	37	–	–	–	3	–	–	–
Li et al. (2019a) [73]	China	X	–	147	0	147	147	0	0	–	–	–	1	–	–	–
Li et al. (2019b) [74]	China	X	–	62	60	155	155	0	0	–	–	–	–	–	–	1
Li et al. (2012) [75]	China	X	–	70	0	141	141	0	0	–	1	–	–	–	–	–
Liu et al. (2013) [76]	China	X	U	200	0	80	80	0	0	2	–	–	–	–	–	–
Liu et al. (2016) [77]	China	X	–	148	0	320	80	80	160	1	–	–	1	–	–	–
Liu et al. (2018a) [78]	China	X	–	130	0	140	90	0	50	2	–	–	–	–	–	–
Liu et al. (2018b) [79]	China	X	–	50	0	100	50	0	50	2	–	–	–	–	–	–
Lumachi et al. (2012) [80]	Italy	U	U	102	0	99	0	99	0	–	1	–	4	–	–	–
Luo et al. (2019) [81]	China	X	–	57	0	192	192	0	0	12	–	–	1	–	–	–
Marcuello et al. (2019) [82]	Spain	–	X	59	0	80	80	0	0	6	–	–	–	–	–	–
Marshall et al. (2010) [83]	Canada, USA	X	–	202	0	208	0	0	0	7	–	–	–	–	–	–
Matsubara et al. (2011) [84]	Japan	X	–	101	0	109	109	0	0	–	1	–	1	–	–	–
Matsushita et al. (2005) [85]	Japan	X	–	116	169	83	83	0	0	–	–	–	–	–	–	1
Melotte et al. (2015) [86]	Germany	X	–	66	57	240	240	0	0	–	–	–	–	–	–	2
Meng et al. (2012) [87]	China	–	X	93	0	158	158	0	0	–	–	–	2	–	–	–
Min et al. (2019) [88]	China	X	–	58	0	76	0	69	7	4	–	–	–	–	–	–
Mizuno et al. (2003) [89]	Japan	X	–	100	0	100	100	0	0	–	1	–	–	–	–	–
Mroczko et al. (2010) [90]	Poland	X	–	75	104	70	70	0	0	–	–	–	4	–	–	–
Mroczko et al. (2006) [91]	Poland	X	–	76	0	65	65	0	0	–	–	–	3	–	–	–
Mulder et al. (2007) [92]	Netherlands	X	–	52	0	63	63	0	0	–	–	–	1	–	–	–
Murakoshi et al. (2011) [93]	Japan	X	–	115	0	230	230	0	0	–	1	–	1	–	–	–
Murata et al. (2012) [94]	Japan	X	–	252	199	81	81	0	0	9	–	–	–	–	–	–
Ng et al. (2017) [95]	Hong Kong	X	–	117	0	90	90	0	0	2	–	–	1	–	–	–
Nielsen et al. (2011) [96]	Denmark	X	–	294	0	4202	2173	1176	843	–	–	–	2	–	–	–
Ning et al. (2018) [97]	China	X	U	513	0	75	75	0	0	–	–	–	4	–	–	–
Nishiumi et al. (2012) [98]	Japan	X	–	59	0	63	63	0	0	–	–	–	2	27	–	–
Niu et al. (2012) [99]	China	X	–	119	78	397	300	97	0	–	–	–	–	1	–	–
Ørntoft et al. (2015) [100]	Denmark	–	X	128	0	150	150	0	0	–	–	–	–	–	–	1
Palmqvist et al. (2003) [101]	Sweden	–	X	124	0	243	0	0	0	–	–	–	2	–	–
Pedersen et al. (2013) [102]	England, Wales, Northern Ireland	–	X	97	242	94	0	0	0	–	18	–	–	–	–	–
Pedersen et al. (2015) [103]	Australia, Netherlands	X	–	129	0	1288	450	542	296	–	1	–	–	1	–	–
Peng et al. (2017) [104]	China	X	U	559	0	559	559	0	0	–	–	–	1	–	–	2
Pengjun et al. (2013) [105]	China	X	–	149	0	69	69	0	0	–	–	–	4	–	–	–
Qian et al. (2018) [106]	Germany	X	–	212	0	106	106	0	0	–	–	–	5	–	–	–
Qu et al. (2019) [107]	China	X	–	100	0	100	100	0	0	5	–	–	–	–	–	–
Ren et al. (2016) [108]	China	X	–	422	0	1019	747	272	0	–	1	–	1	–	–	–
Rho et al. (2018) [109]	Japan	X	–	514	60	386	168	59	159	–	–	–	5	–	–	–
Ritchie et al. (2010) [110]	USA, Japan	–	X	70	0	70	0	0	0	–	–	–	–	3	–	–
Ritchie et al. (2013) [111]	Canada	X	X	98	0	964	0	964	0	–	–	–	–	1	–	–
Ruffin et al. (2010) [112]	USA	U	U	69	137	93	70	23	0	–	–	–	1	1	–	–
Schneider et al. (2005) [113]	Germany	U	U	247	0	53	0	0	0	–	–	–	3	–	–	–
Shastri et al. (2008) [114]	Germany	U	U	55	69	516	498	18	0	–	–	–	1	–	–	–
Shastri et al. (2006) [115]	Germany	U	U	74	0	128	117	11	60	–	–	–	1	–	–	–
Shi et al. (2019) [116]	China	X	–	211	0	103	0	0	103	–	–	–	2	–	–	1
Sithambaram et al. (2015) [117]	Malaysia	U	U	100	0	200	200	0	0	–	–	–	1	–	–	–
Song et al. (2017a) [118]	China	X	–	85	0	324	324	0	0	–	–	–	–	–	–	1
Song et al. (2018a) [119]	Germany	X	–	465	0	882	610	0	272	–	–	–	–	–	–	1
Song et al. (2017b) [120]	China	X	–	388	0	837	590	0	247	–	–	–	–	–	–	1
Song et al. (2018b) [121]	China	X	–	783	0	794	331	0	463	–	–	–	2	–	–	1
Stojkovic Lalosevic et al. (2019) [122]	Serbia	–	X	300	0	300	300	0	0	–	–	–	–	–	–	3
Sun et al. (2019a) [123]	China	X	–	133	0	587	494	11	82	–	–	–	3	–	–	1
Sun et al. (2019b) [124]	China	U	U	105	0	102	102	0	0	–	–	1	1	–	–	3
Swellam et al. (2016) [125]	Egypt	U	U	162	0	57	57	0	0	–	–		5	–	–	–
Symonds et al. (2016) [126]	Australia	X	–	66	169	246	246	0	0	–	1	–	–	1	–	–
Tagore et al. (2003) [127]	USA	–	X	52	57	212	113	0	99	–	–	23	–	–	–	–
Taguchi et al. (2015) [128]	USA	U	U	60	0	60	60	0	0	–	–	–	10	–	–	–
Tang et al. (2011) [129]	China	X	–	169	0	139	30	0	109	–	–	–	–	–	–	1
Tibble et al. (2001) [130]	UK	U	U	62	0	233	0	233	0	–	–	–	1	–	–	–
Toiyama et al. (2013) [131]	Japan	X	–	200	104	53	53	0	0	2	–	–	–	–	–	–
Tomasevic et al. (2016) [132]	Serbia	X	–	181	0	191	0	191	0	–	–	–	2	–	–	–
Toth et al. (2012) [133]	Hungary	U	U	92	0	92	92	0	0	–	–	–	1	–	–	1
Uchiyama et al. (2018) [134]	Japan	X	–	56	0	60	60	0	0	–	–	–	5	–	–	–
Vychytilova-Faltejskova et al. (2016) [135]	Czech Republic	–	X	203	199	100	100	0	0	4	–	–	–	–	–	–
Vychytilova-Faltejskova et al. (2018) [136]	Czech Republic	–	X	179	0	100	100	0	0	4	–	–	–	–	–	–
Wang et al. (2018) [137]	China	U	U	96	0	60	60	0	0	1	–	–	–	–	–	–
Wang et al. (2017a) [138]	China	X	–	91	0	91	91	0	0	–	–	–	10	–	–	–
Wang et al. (2014) [139]	China	X	U	113	0	59	59	0	0	6	–	–	2	–	–	–
Wang et al. (2007) [140]	Taiwan	X	U	157	78	80	80	0	0	4	–	–	–	–	–	–
Wang et al. (2017b) [141]	China	X	–	60	0	50	50	0	0	1	–	–	2	–	–	–
Wang et al. (2016) [142]	China	X	–	120	0	120	120	0	0	4	–	–	1	–	–	–
Warren et al. (2011) [143]	USA and Russia	X	X	50	242	94	0	0	0	–	–	–	–	–	–	1
Wilhelmsen et al. (2017) [144]	Denmark	X	–	512	0	3320	1978	1342	0	–	1	–	6	1	–	–
Wu et al. (2014) [145]	Hong Kong, China	X	–	104	0	109	109	0	0	1	–	–	–	–	–	–
Wu et al. (2012) [146]	Hong Kong	X	–	88	0	101	101	0	0	2	–	–	–	–	–	–
Wu et al. (2017) [147]	China	X	–	135	0	140	140	0	0	–	–	–	1	–	–	–
Wu et al. (2015) [148]	China	–	X	100	0	100	100	0	0	–	–	2	–	–	–	–
Wu et al. (2016) [149]	China	X	–	291	0	295	295	0	0	–	–	–	2	–	–	1
Xie et al. (2017) [150]	China	X	–	132	60	107	0	0	0	–	–	–	2	–	–	–
Xie et al. (2018) [151]	China	X	–	123	0	125	19	106	0	–	–	–	3	–	–	1
Xu et al. (2013) [152]	China	X	X	87	0	73	73	0	0	2	1	14	1	–	–	–
Yang et al. (2018) [153]	China	U	U	50	0	50	0	0	0	–	–	–	2	1	–	–
Yau et al. (2016) [154]	Hong Kong	U	U	198	0	198	198	0	0	1	–	–	–	–	–	–
Yuan et al. (2016) [155]	China	X	–	187	0	109	109	0	0	–	–	–	5	–	–	1
Zhang et al. (2015a) [156]	China	X	–	138	60	111	0	46	65	–	–	–	2	–	–	–
Zhang et al. (2016) [157]	China	X	–	80	0	171	116	55	0	–	–	–	–	6	–	–
Zhang et al. (2015b) [158]	Japan	U	U	130	0	54	54	0	0	4	–	–	–	–	–	–
Zhao et al. (2019a) [159]	China	X	–	117	0	166	166	0	0	–	–	–	–	–	–	2
Zhao et al. (2019b) [160]	China	X	–	358	0	286	286	0	0	–	–	–	1	–	–	–
Zheng et al. (2014) [161]	China	X	–	117	0	175	102	0	73	4	–	–	1	–	–	–
Zhou et al. (2017) [162]	China	X	–	242	0	262	262	0	0	–	–	–	–	–	–	1
Zhu et al. (2013) [163]	China	X	U	269	0	110	110	0	0	–	–	–	5	–	–	–
Zhu et al. (2015) [164]	China	X	–	70	0	70	70	0	0	1	–	–	1	–	–	–

autoab autoantibodies and other immunological markers, ctDNA circulating tumour DNA, HC healthy control, Hosp hospital, Ind individual, NM non-malignant, A/P adenomas/polyps, U unclear, UK United Kingdom, USA United States of America

aDue to wide variations in health systems across different countries, hospital setting is a broad definition than can encompass secondary and tertiary care. Other setting refers to biobanks, reference sets, databases or archived samples; general population cohorts or cohorts from population screening programmes; or cohorts from previous trials or observational studies

bOther biomarker type refers to methylation markers, platelets, white blood cells, red blood cells and colonocytes

Study selection Characteristics of included studies: country, setting and population autoab autoantibodies and other immunological markers, ctDNA circulating tumour DNA, HC healthy control, Hosp hospital, Ind individual, NM non-malignant, A/P adenomas/polyps, U unclear, UK United Kingdom, USA United States of America aDue to wide variations in health systems across different countries, hospital setting is a broad definition than can encompass secondary and tertiary care. Other setting refers to biobanks, reference sets, databases or archived samples; general population cohorts or cohorts from population screening programmes; or cohorts from previous trials or observational studies bOther biomarker type refers to methylation markers, platelets, white blood cells, red blood cells and colonocytes

Characteristics of Included Studies

Most papers (n = 124) recruited patients from a single country. China was the most common country (n = 62), followed by Japan and Germany (both n = 14), and the USA (n = 13). Most studies recruited from single settings with few studies recruiting from at least two different settings (n = 11). The most common recruitment settings were hospitals and other secondary care settings (n = 106). Only one study recruited controls from a primary care setting [57]. All included studies reported on CRC, with six studies specifically referring to colon cancer, and one study specifying rectal and caecum cancer cases separately to colon cancer. Some studies (n = 22) also referred to adenomas or polyps as cases, and five studies also included data on upper GI cancers (e.g. gastric, oesophageal and pancreatic cancers).

Characteristics of Cases and Controls

Overall, the included studies reported on 24,844 cases; the majority were diagnosed with CRC (80.2%) and a minority with adenomas/polyps (19.8%). Most cases had their age reported (79%), either as a range, mean or median. The overall mean age for CRC cases was 61.3 years, and 60.7 years for adenoma/polyp cases. The minimum age for cases was 18 years, while the oldest was 97 years old. The majority (59%) of CRC and adenoma/polyp cases were male. Most studies provided data on tumour staging, mainly using the TNM system (n = 101), though some studies used Dukes’ classification (n = 22), with one study providing data for both. When combining TNM and Dukes’ staging data, over half of the cancers (54%) were diagnosed at early stages (I–II/A + B). Adenomas included as cases were most frequently defined by size, dysplasia, villous component and/or number of adenomas. The included studies reported on a total of 45,374 controls (31,352 normal/healthy, 6414 with non-malignant conditions and 7608 with adenomas or polyps). A number of studies (n = 37) investigated more than one type of control population. The control populations of most studies (n = 108) were tested to rule out CRC, mainly using colonoscopy (n = 65). The majority of studies (n = 17) with adenomas or polyps as controls included those that were low risk (hyperplastic, non-neoplastic polyps or non-advanced adenomas), though some were high risk (advanced adenomas, those with villous histology or high-grade dysplasia). Age data were extractable for 47.1% of controls. The minimum age for a control was 16 years (healthy control), while the oldest was 99 years old. The majority of both healthy (50.6%) and non-malignant (58%) controls were male.

Types of Biomarkers

Most studies investigated more than one biomarker (79.6%), and these often reported on measures of performance for individual and combinations or panels of biomarkers (45.8%). The commonest sample source was blood (82.4%); these analysed serum (n = 62), plasma (n = 41) or whole blood (n = 14). Faeces was also a common sample source (24.6%); two studies analysed urine, and 13 studies analysed more than one type of sample. A total of 378 unique biomarkers were identified across the 142 included studies (Appendix 4 in the supplementary material). The commonest biomarkers were microRNAs and other RNAs, followed by proteins, DNA markers, autoantibodies and other immunological markers, and metabolic markers. Proteins were further classified into subcategories, with the most common being novel proteins (Table 2).

Table 2

Classification of identified biomarkers

Identified biomarkers (142 studies)	N (%)
MicroRNAs and other RNAs	126 (33.3%)
DNA markers (protein coding genes, mutations)	45 (12.2%)
Proteins	86 (22.8%)
Adhesion and matrix proteins	11 (2.9%)
Classic tumour markers	6 (1.6%)
Coagulation and angiogenesis molecules	6 (1.6%)
Cytokines, chemokines and insulin-like growth factors	15 (4.0%)
Hormones	1 (0.3%)
Novel proteins	39 (10.3%)
Not otherwise specified	8 (2.1%)
Autoantibodies and other immunological markers	44 (11.6%)
Metabolic markers	42 (11.1%)
Circulating tumour DNA	4 (1.1%)
DNA methylation	15 (4.0%)
Other biomarkers^a	16 (4.2%)

aOther biomarkers included platelets, white blood cells, red blood cells and colonocytes

Classification of identified biomarkers aOther biomarkers included platelets, white blood cells, red blood cells and colonocytes A total of 54 biomarkers were reported in more than one study (Appendix 5 in the supplementary material). Three biomarkers were investigated by more than 10 studies: CA19-9, CEA and mSEPT9 (methylated septin 9). Additionally, six other biomarkers were investigated in five or more studies: tumour pyruvate kinase isoenzyme type M2 (TuM2-PK), microRNA-21 (miR-21), FIT, microRNA-92a (miR-92a), cancer antigen 72-4 (CA72-4) and TIMP metallopeptidase inhibitor 1 (TIMP-1) (see Appendix 5 for references).

Measures of Diagnostic Performance

Individual measures of diagnostic performance (i.e. measures outside of combinations or panels) were available for 35 biomarkers evaluated more than once (Appendix 5 in the supplementary material). Heterogeneity of study design and included populations precluded meta-analysis for the majority of these biomarkers; however, three had individual measures from multiple studies adopting a classic single-gate design: CEA (n = 7 studies), mSEPT9 (n = 4 studies) and TuM2-PK (n = 3 studies). Differences in the sample sources and diagnostic performance measures provided across the studies precluded meta-analysis for any accuracy measures available for CEA, which was included as a comparator to the novel markers. Meta-analysis was performed for the markers mSEPT9 and TuM2-PK. The estimated sensitivity and specificity of mSEPT9 was 80.6% (95% CI 76.6–84.0%) and 88.0% (95% CI 79.1–93.4%), respectively, and the diagnostic odds ratio was 30.3 (95% CI 17.8–51.4). TuM2-PK had an estimated sensitivity of 81.6% (95% CI 75.2–86.6%) and a specificity of 80.1% (95% CI 76.7–83.0%), and a diagnostic odds ratio of 17.8 (95% CI 11.6–27.2). Paired forest plots of the sensitivity and specificity for both mSEPT9 and TuM2-PK are shown in Figs. 3 and 4.

Fig. 3

Forest plots of sensitivity and specificity for mSEPT9 in plasma

Fig. 4

Forest plots of sensitivity and specificity for TuM2-PK in stool

Forest plots of sensitivity and specificity for mSEPT9 in plasma Forest plots of sensitivity and specificity for TuM2-PK in stool The random effects correlation for mSEPT9 was − 1, indicating a significant threshold effect. Heterogeneity and threshold effect were harder to evaluate statistically for the meta-analysis of TuM2-PK as the low number of included studies impeded accurate fitting of the HSROC curve and generation of a random effects correlation. A cut-off value of 4 U/ml was used for the TuM2-PK assays across all studies. The studies included in the meta-analyses were at low risk of bias across most domains, except for the domains related to patient selection and the index test. Full appraisal data can be found in Appendix 6 in the supplementary material. Summary plots including risk of bias and applicability ratings from QUADAS-2 are shown in Figs. 5 and 6.

Fig. 5

HSROC curve for mSEPT9 (with risk of bias and applicability ratings)

Fig. 6

HSROC plot for TuM2-PK (with risk of bias and applicability ratings)

HSROC curve for mSEPT9 (with risk of bias and applicability ratings) HSROC plot for TuM2-PK (with risk of bias and applicability ratings)

Discussion

This systematic review identified 142 studies reporting on 378 different biomarkers for CRC. The included papers were very heterogeneous, with differences in study design, control populations, sample sources, types of biomarkers, test thresholds and reported performance measures. Meta-analysis of diagnostic accuracy data was only possible for two novel markers: mSEPT9 and TuM2-PK. Both demonstrated high sensitivity, specificity and diagnostic odds ratios in hospital populations. The most common biomarkers (both individually and in panels) were CEA, CA19-9, mSEPT9 and TuM2-PK. CEA and CA19-9 have a more established role in clinical practice for detecting recurrent disease [3, 7] so it is not surprising that these markers are prevalent throughout the literature. Most of the studies included CEA (42/53) and CA19-9 (20/21) in panels or used them as comparators for novel markers. Meta-analysis was not possible for these studies because of heterogeneity in sample sources and performance measures. Twenty studies reported on the performance of mSEPT9 for CRC detection, mostly as a blood-based biomarker sampled from plasma. While most measures of diagnostic performance were for mSEPT9 as an individual marker, it was also included in panels or combinations across seven studies. Fewer studies reported on the performance of TuM2-PK (nine overall, three included TuM2-PK in panels or combinations). Unlike mSEPT9, TuM2-PK was predominantly sampled from stool, though some studies also reported it as a blood-based biomarker. The studies that evaluated mSEPT9 and TuM2-PK included a number of two-gate studies or hybrid designs, and multiple instances where the study design was unclear. The meta-analyses for mSEPT9 and TuM2-PK included only those with a clear, classic single-gate design [11] to reduce heterogeneity and spectrum bias; consequently, both meta-analyses included a low number of studies, resulting in wide confidence intervals for the diagnostic odds ratios. The meta-analysis for mSEPT9 synthesised diagnostic performance data on 899 CRC cases. Cancer cases were mostly diagnosed in stages II or III and diagnostic performance data were also provided for adenomas and polyps in most cases. This is important to note, as the diagnostic performance results for early stage cancers are more likely to translate for use in early detection, and the ability for biomarkers also to detect high-risk adenomas, polyps or dysplasia could provide additional clinical utility. Across all studies, the test sensitivity was higher when detecting advanced CRC cases. Conversely, test sensitivity decreased when used to detect either adenomas or degrees of dysplasia. As previously mentioned, several studies evaluated mSEPT9 within diagnostic panels or in combination with other markers. Three studies in particular [66, 151, 153] showed the sensitivity of mSEPT9 to detect CRC increased when combined with more established markers such as FIT and CEA. The results from our review show a slightly higher sensitivity for mSEPT9 in comparison to a recent meta-analysis of 19 studies [165] though it should be noted the analysis from that review included a mixture of study designs and focused on high-risk populations. Our results are comparable to previous analyses which estimated the sensitivity of mSEPT9 as up to 88% [165, 166]. The meta-analysis of diagnostic accuracy data for TuM2-PK as a stool marker included 183 CRC cases. Similarly to mSEPT9, the sensitivity of TuM2-PK was higher for more advanced cancers (Dukes’ stage C and D; stages III and IV) and lower when it was used to detect adenomas, polyps or dysplasia. All three studies included in the meta-analysis compared the diagnostic performance of TuM2-PK to the established stool marker FIT, and demonstrated that FIT was preferable to TuM2-PK as a faecal biomarker for screening populations [94, 116, 117]. Three studies [50, 67, 115] evaluated TuM2-PK as a blood-based biomarker in combination with other markers or in panels, and all found sensitivity to be higher for TuM2-PK in combination with other markers. TuM2-PK may therefore be more promising in blood-based diagnostic panels than as a stand-alone stool marker. Two-gate and hybrid designs were used widely in the included studies. These types of study designs can lead to over-inflated measures of diagnostic performance due to an over-representation of individuals with advanced disease within the study population [11]. While many studies attempted single-gate designs and recruited participants through one route (usually screening populations where all participants attended for a colonoscopy), the low prevalence of CRC cases meant that extra cases were sourced from alternative routes. This study design issue highlights the importance of large-scale studies and trials that are adequately powered to evaluate diagnostic performance in truly low-prevalence populations. Several other methodological limitations were identified across the studies. These included the parallel analysis of large numbers of biomarkers during discovery studies; limited external, independent validation of test performance; and selective reporting for validation including alternative analyses and combinations or use of several cut-off points. Insufficient reporting regarding population characteristics and recruitment was also an issue in many studies, with information often provided as supplementary data and with little detail. As a result of the large amount of evidence on biomarker development and evaluation, we believe the field could benefit from a “living systematic review”; this refers to high-quality, up-to-date online summaries of evidence which can be constantly updated as new research becomes available [167]. Although our search was restricted to studies published in English, recent reviews indicate that this has minimal impact on review conclusions [168, 169]. Further limitations of this review include the exclusion of studies that evaluated biomarkers within risk assessment tools or risk prediction models. These studies have strong potential to be used in the community; however, we believe they should be investigated in a separate systematic review. The heterogeneity of the published literature meant we could only conduct meta-analyses on a limited subset of included studies. Nonetheless, we believe the narrative synthesis of additional studies provides a useful summary of the current state of the science in this area. There was insufficient homogeneous data on biomarker panels to report summary estimates of their diagnostic performance. A study from Fung et al. [48] describes ColoSTAT, a novel blood-based diagnostic panel for CRC that includes TuM2-PK with two other biomarkers (IL-8 and DKK-3) and is currently being trialled in Australia. The ColoSTAT panel has reported sensitivity and specificity of 73% and 95%, respectively, for CRC, which is comparable to reported values for FIT (64–73% and 92–95%, respectively [170-173]) for the detection of CRC in screening populations. Previous trials using this panel have been conducted in high-prevalence settings, with two-gate designs. Once further data are available on ColoSTAT and its performance to detect early stage CRC, it may have applicability in low-prevalence settings as an alternative to FIT, either for screening or in symptomatic populations.

Conclusion

There is a large body of evidence on novel biomarkers being developed to aid with the early detection of lower GI cancers. Few of these markers have yet demonstrated their validity or clinical utility, but two show promise for further evaluation, mSEPT9 and TuM2-PK, and could contribute towards the early detection of CRC as part of blood-based diagnostic panels. Further, large-scale studies in low-prevalence populations are required to evaluate their potential role to support diagnostic assessment in primary care and community settings. This review offers a comprehensive overview of the current state of evidence, situates it within a translational framework for diagnostic tests and makes recommendations in order to build the evidence base for the early detection of lower GI cancers in low-prevalence settings. Below is the link to the electronic supplementary material. Supplementary file1 (DOCX 67 KB) Supplementary file1 (DOCX 87 KB)

To our knowledge, this is the first systematic review to characterise the range of novel biomarkers being investigated for the early detection of lower GI cancers, with a focus on their readiness to progress to further evaluation in low-prevalence populations such as primary care.

We identified 378 unique biomarkers from the literature; a meta-analysis of diagnostic accuracy data indicated mSEPT9 and TuM2-PK have potential for further evaluation in low-prevalence populations.

We highlight the need for (1) further studies on mSEPT9 and TuM2-PK in low-prevalence populations; (2) better reporting to facilitate translation; (3) more consistency in the use of biomarkers. By doing so, we will be able to progress to a different step in the evaluation process of promising biomarkers, and ultimately ascertain clinical benefits for our intended population. This will require going beyond test performance, investigating implementation (including feasibility and acceptability), safety and cost-effectiveness.

165 in total

1. Free DNA and carcinoembryonic antigen serum levels: an important combination for diagnosis of colorectal cancer.

Authors: Emanuela Flamini; Laura Mercatali; Oriana Nanni; Daniele Calistri; Roberta Nunziatini; Wainer Zoli; Paola Rosetti; Nice Gardini; Arturo Lattuneddu; Giorgio Maria Verdecchia; Dino Amadori
Journal: Clin Cancer Res Date: 2006-12-01 Impact factor: 12.531

2. Matrix metalloproteinase 2 and tissue inhibitor of matrix metalloproteinases 2 in the diagnosis of colorectal adenoma and cancer patients.

Authors: Magdalena Groblewska; Barbara Mroczko; Mariusz Gryko; Bogusław Kędra; Maciej Szmitkowski
Journal: Folia Histochem Cytobiol Date: 2010-12 Impact factor: 1.698

3. Screening for colorectal neoplasms with new fecal occult blood tests: update on performance characteristics.

Authors: James E Allison; Lori C Sakoda; Theodore R Levin; Jo P Tucker; Irene S Tekawa; Thomas Cuff; Mary Pat Pauly; Lyle Shlager; Albert M Palitz; Wei K Zhao; J Sanford Schwartz; David F Ransohoff; Joseph V Selby
Journal: J Natl Cancer Inst Date: 2007-09-25 Impact factor: 13.506

4. Combination of preoperative NLR, PLR and CEA could increase the diagnostic efficacy for I-III stage CRC.

Authors: Hong-Xin Peng; Lin Yang; Bang-Shun He; Yu-Qin Pan; Hou-Qun Ying; Hui-Ling Sun; Kang Lin; Xiu-Xiu Hu; Tao Xu; Shu-Kui Wang
Journal: J Clin Lab Anal Date: 2016-09-30 Impact factor: 2.352

5. Identification of microRNA-135b in stool as a potential noninvasive biomarker for colorectal cancer and adenoma.

Authors: Chung Wah Wu; Siew Chien Ng; Yujuan Dong; Linwei Tian; Simon Siu Man Ng; Wing Wa Leung; Wai Tak Law; Tung On Yau; Francis Ka Leung Chan; Joseph Jao Yiu Sung; Jun Yu
Journal: Clin Cancer Res Date: 2014-04-01 Impact factor: 12.531

6. Identification and Validation of MicroRNA Profiles in Fecal Samples for Detection of Colorectal Cancer.

Authors: Saray Duran-Sanchon; Lorena Moreno; Josep M Augé; Miquel Serra-Burriel; Míriam Cuatrecasas; Leticia Moreira; Agatha Martín; Anna Serradesanferm; Àngels Pozo; Rosa Costa; Antonio Lacy; Maria Pellisé; Juan José Lozano; Meritxell Gironella; Antoni Castells
Journal: Gastroenterology Date: 2019-10-14 Impact factor: 22.682

7. The Diagnostic Accuracy of the M2 Pyruvate Kinase Quick Stool Test--A Rapid Office Based Assay Test for the Detection of Colorectal Cancer.

Authors: Suresh Sithambaram; Ida Hilmi; Khean-Lee Goh
Journal: PLoS One Date: 2015-07-09 Impact factor: 3.240