Literature DB >> 35220286

Scoring systems for the management of oncological hepato-pancreato-biliary patients.

Alexander W Coombs¹, Chloe Jordan¹, Sabba A Hussain¹, Omar Ghandour¹.

Abstract

Oncological scoring systems in surgery are used as evidence-based decision aids to best support management through assessing prognosis, effectiveness and recurrence. Currently, the use of scoring systems in the hepato-pancreato-biliary (HPB) field is limited as concerns over precision and applicability prevent their widespread clinical implementation. The aim of this review was to discuss clinically useful oncological scoring systems for surgical management of HPB patients. A narrative review was conducted to appraise oncological HPB scoring systems. Original research articles of established and novel scoring systems were searched using Google Scholar, PubMed, Cochrane, and Ovid Medline. Selected models were determined by authors. This review discusses nine scoring systems in cancers of the liver (CLIP, BCLC, ALBI Grade, RETREAT, Fong's score), pancreas (Genç's score, mGPS), and biliary tract (TMHSS, MEGNA). Eight models used exclusively objective measurements to compute their scores while one used a mixture of both subjective and objective inputs. Seven models evaluated their scoring performance in external populations, with reported discriminatory c-statistic ranging from 0.58 to 0.82. Selection of model variables was most frequently determined using a combination of univariate and multivariate analysis. Calibration, another determinant of model accuracy, was poorly reported amongst nine scoring systems. A diverse range of HPB surgical scoring systems may facilitate evidence-based decisions on patient management and treatment. Future scoring systems need to be developed using heterogenous patient cohorts with improved stratification, with future trends integrating machine learning and genetics to improve outcome prediction.

Entities: Chemical

Keywords: Decision support techniques; Models, statistical; Neoplasms

Year: 2022 PMID： 35220286 PMCID： PMC8901986 DOI： 10.14701/ahbps.21-113

Source DB: PubMed Journal: Ann Hepatobiliary Pancreat Surg ISSN： 2508-5859

INTRODUCTION

Hepato-pancreato-biliary (HPB) surgery comprises treatment of benign and malignant diseases of the liver, pancreas and biliary tract [1]. Primary HPB cancers are relatively uncommon compared to other malignancies, although their incidence is increasing worldwide [2]. HPB surgeons commonly need to perform oncological treatment for colorectal cancer (CRC) liver metastases, with approximately half of all CRC patients developing metastatic liver disease [3]. Despite improvements in cancer surveillance and advances in treatment, HPB malignancies are still associated with high mortality rates [4]. This is partly due to their atypical presentation, meaning that patients are often diagnosed with an advanced disease. The principal management choice of these patients is surgical resection, although such surgeries are complex with high complication rates [4]. Additionally, as HPB malignancies are most prevalent in elderly populations, surgery potentially proposes a plethora of challenges. Therefore, adequate patient selection is essential [5]. Here, scoring systems for patients who are likely to benefit from surgical intervention and for patients who should be treated more conservatively may positively support a HPB surgeon’s clinical decision-making. Scoring systems in surgery have long been established to guide clinicians with a multitude already being utilised in regular clinical practice to predict surgical outcomes [6]. They can be applied for numerous reasons: to risk stratify patients into treatment groups, or to estimate an individual’s risk of complications, prognosis, and recurrence [7,8]. Risk assessment is imperative in clinical management. Evidence-based stratification tools compliment the shared decision-making process of both patients and clinicians [9]. This assessment can occur at any stage (pre-, peri-, or post-operative), although most scoring systems have been designed to be conducted prior to surgery [10]. An ideal scoring system should be accurate and simple using objective variables [11,12]. Notwithstanding the availability of various tools in the literature, they mostly fail to be implemented in regular clinical practice [13]. Low uptake can be ascribed to a lack of awareness of risk stratification models and concerns regarding their applicability, complexity and precision [11,12]. Additionally, the validation quality of scoring systems is non-uniform, with many models failing to meet the minimum requirements of study quality based on specialised quality metrics. This results in a substantial amount of research waste. In addition, often there is no clear evidence as to what clinical and economic effects would be of a given model [14,15]. Despite this, published scoring systems still offer clinical value with a culmination of effort being focused on increasing their predictive accuracies through the development of novel methods and refining existing models [10]. The aim of this review is to provide an overview of various HPB oncological scoring systems, assess the development quality of these systems, and evaluate performances of these models.

METHODOLOGY

We conducted a narrative review to provide an overview of potential clinical valuable and novel scoring systems that might be considered for use in the management of HPB oncological patients. Original research articles were identified using PubMed, MEDLINE (Ovid), Cochrane, and Google Scholar (Supplementary Material 1). The scoring system chosen for review was based on current clinical endorsement and its validation in multiple populations, or alternatively if the score’s function was deemed to be novel in HPB oncological management. When possible, scoring systems that had been externally validated were favourably selected. The effectiveness of each scoring system was primarily based on the c-statistic in the original and external study cohort populations. The c-statistic measures the ability of a scoring system to distinguish between those who do experience an outcome of interest and those who do not [16]. The measure of performance is also known as the discrimination ability. C-statistic scores lie between 0.5 and 1. A c-statistic of 0.5 infers no predictive ability (no better than random chance) and a c-statistic of 1 denotes perfect discrimination. In line with other previous reviews, a c-statistic > 0.7 was deemed ‘acceptable’ and a c-statistic > 0.8 was deemed ‘good’ discriminatory power [17,18]. Authors also assessed and reported calibration of scoring systems, another important measurement of model performance. Calibration is an assessment of accurate alignment between estimated predicted risk and observed risk in models [19]. A scoring system might have good discrimination. However, if it is poorly calibrated, then the model’s predictive ability will be inaccurate. This is especially important when assessing scoring systems that will ultimately be used in decision-making. Scoring systems were also subjectively analysed by authors for their applicability in clinical practice. Factors considered were the accessibility of variables used, number of variables needed and the use of objective measurements. A scoring system in this review was defined as a model or score that could risk-stratify patients into different categories to assist in the decision-making management of patients. At least two scoring systems for each organ in HPB were selected for variation purposes. Articles not in the English language were excluded. Abbreviations of selected scoring systems are summarised in Table 1.

Table 1

Abbreviations of scoring systems

Abbreviation	Expanded term
ALBI	Albumin-Bilirubin
BCLC	Barcelona-Clinic Liver Cancer
CLIP	Cancer of the Liver Italian Program
MEGNA	Multifocality, Extra-hepatic Extension, Grade, Node Positivity, Age (> 60 yr) prognostic score
mGPS	Modified Glasgow Prognostic Score
RETREAT	Risk Estimation of Tumour Recurrence after Transplant
TMHSS	Tata Memorial Hospital Scoring System

SCORING SYSTEMS

Liver

Liver cancer is the second leading cause of all cancer related deaths globally, with hepatocellular carcinoma (HCC) being the most common variation (75%–85%) of primary liver cancer [2,20]. Robust and universal models that accurately assess prognosis, recurrence and function are valuable in assisting effective patient management.

Cancer of the Liver Italian Program (CLIP)

Mortality from HCC has progressively increased across Western Europe, making prognostic measures imperative in improving decision-making and outcomes [21]. CLIP is extensively used clinically as a prognostic tool for survival in HCC patients [22,23]. CLIP scores points based on the severity of certain factors in order to calculate survival as an endpoint (Table 2). Patients with higher scores indicate poorer prognosis. CLIP is the current standard for comparing effectiveness of new models. It is an improvement to Okuda, one of preliminary combined scoring systems in HCC [24].

Table 2

Variables used within each scoring system, their definition and outcome, and the statistical method of variable selection

Scoring system	Variable	Variable definition	Outcome measured by scoring system	Statistical method of variable selection
CLIP [21]	Child-Pugh score	A	Survival prognosis	Univariate and multivariate analysis
		B
		C
	Tumour morphology	Uninodular and extension ≤ 50%
		Multinodular and extension ≤ 50%
		Massive or extension ≥ 50%
	Serum AFP	< 400 ng/dL
		≥ 400 ng/dL
	Portal vein thrombosis	Yes
		No
BCLC [38]	Tumour stage	1 HCC < 2 cm	Staging and treatment	Multivariate analysis and literature review
		1 HCC < 5 cm or 3 nodules < 3 cm
		Multinodular
		Portal invasion, N1 M1
		Any
	Child-Pugh stage	A
		B
		C
	ECOG performance status	0
		1–2
		> 2
ALBI [44]	Serum albumin	Nomogram	Liver function	Univariate and multivariate analysis
	Serum bilirubin	Nomogram
RETREAT [51]	Serum AFP at liver transplantation (ng/mL)	0–20	Recurrence	Univariate and multivariate analysis
		21–99
		100–999
		≥ 1,000
	Microvascular Invasion	Yes
		No
	Largest viable tumour diameter (cm) plus number of viable tumours	0
		1.1–4.9
		5.0–9.9
		≥ 10
Fong et al. [61]	Nodal invasion	Yes	Recurrence	Univariate and multivariate analysis
		No
	Length of disease-free interval (mon)	≥ 12
		< 12
	Number of tumours	> 1
		≤ 1
	Tumour size (cm)	> 5
		≤5
	CEA (ng/mL)	> 200
		≤ 200
Genç et al. [69]	Tumour grade	1	Recurrence	Multivariate analysis
		2
	Node positivity	Yes
		No
	Perineural invasion	Yes
		No
mGPS [72]	CRP	> 10	Survival	Univariate analysis
		< 10
	Serum Albumin	> 35
		< 35
TMHSS [82]	Serum Bilirubin (mg/dL)	< 3	Management strategy	Univariate analysis
		> 3
	Carbohydrate antigen 19-9 (U/mL)	0–30
		30–90
		90–450
		> 450
	Computed tomography scan	Normal
		Gallbladder mass
		Liver infiltration
		Medially placed mass/intrahepatic biliary radicle dilatation
		Metastatic disease
MEGNA [87]	Multifocality	Yes	Survival	Multivariate analysis
		No
	Extra-hepatic organ involvement	Yes
		No
	Tumour grade	Yes
		No
	Node positivity	Yes
		No
	Age > 60 yr	Yes
		No

CLIP, Cancer of the Liver Italian Program; BCLC, Barcelona Clinic Liver Cancer; ALBI, Albumin-Bilirubin Grade; RETREAT, Risk Estimation of Tumor Recurrence after Transplant Score; mGPS, Modified Glasgow Prognostic Score; TMHSS, Tata Memorial Hospital Scoring System; MEGNA, Multifocality, Extra-hepatic Extension, Grade, Node Positivity, Age (> 60 yr) prognostic score; AFP, alpha-fetoprotein; ECOG, Eastern Cooperative Oncology Group; CEA, carcinoembryonic antigen; CRP, C-reactive protein.

CLIP has been externally validated in numerous populations globally. It is cited as one of the most accurate tools in calculating HCC prognosis [25-27]. In patients with advanced HCC, CLIP is superior in predicting prognosis to other prominent systems. It has a better discriminative c-statistic (0.806) than the Japanese Integrated System (0.754) for example (Table 3) [26,28]. Other benefits include its simplicity due to the use of routinely available measures.

Table 3

Overview of the nine scoring systems explored in this review, including key components essential for their development and statistical analysis

Name	Year	Organ	Country	Type of study	Development cohort size	Development cohort sample method	Testing cohort (external validation) size	Testing cohort (external validation) sample method	Internal cohort (C-index)	External cohort (C-index)	Calibration metric score	Statistical methods
CLIP [21]	1998	Liver	Italy	RC	435	Split-sample validation	196	Same centre, different population	N/A	0.806	N/A	MCRA
BCLC [38]	1999	Liver	Spain	LR	N/A	N/A	766	Different centre	N/A	0.72	N/A	MRCA
ALBI [44]	2015	Liver	Japan	RC	1,313	Split-sample validation	5,097	Same and two additional centre, different population	N/A	0.61–0.68	N/A	MCRA
RETREAT [51]	2017	Liver	USA	RC	721	N/A	340	Different centre	0.77	0.82	N/A	MCRA
Fong et al. [61]	1999	Liver	USA	RC	1,001	N/A	N/A	N/A	N/A	0.62	N/A	MCRA
Genç et al. [69]	2018	Pancreas	Netherlands	RC	211	N/A	N/A	N/A	0.81	N/A	Hosmer–Lemeshow chi-square 11.25, p = 0.258	MCRA
mGPS [72]	2007	Pancreas	UK	RC	316	N/A	807	Different centre	N/A	N/A	N/A	MCRA
TMHSS [82]	2017	Gall bladder	India	RC	124	Split-sample validation	N/A	N/A	0.75	N/A	N/A	MCRA
MEGNA [87]	2008	Biliary tract	USA	RC	275	Split-sample validation	417	Different population	N/A	0.58	N/A	MCRA

CLIP, Cancer of the Liver Italian Program; BCLC, Barcelona Clinic Liver Cancer; ALBI, Albumin-Bilirubin Grade; RETREAT, Risk Estimation of Tumor Recurrence after Transplant Score; mGPS, Modified Glasgow Prognostic Score; TMHSS, Tata Memorial Hospital Scoring System; MEGNA, Multifocality, Extra-hepatic Extension, Grade, Node Positivity, Age (> 60 yr) prognostic score; RC, retrospective cohort; LR, literature review; MCRA, multi-variable cox regression analysis; N/A, not available.

However, one variable, tumour morphology, has a subjective determinant of severity, thereby reducing the reliability of the score between users. Selection bias existed in the initial cohort as the patient group consisted mainly of middle-aged males [22]. Since then, it has been validated in numerous countries, thus increasing its global applicability [28-30]. The suitability of CLIP for patients undergoing surgical resection was also questioned, as most patients did not receive this treatment in the initial cohort [22]. There is also a limited range of risk stratification, as 80% of patients have a score of 0–2. Therefore, CLIP’s use may only be beneficial for patients with higher scores [23]. Modifications to CLIP have recently been proposed, reporting improved prognostic prediction in the long term (c-statistic: modified CLIP-2 0.879; original CLIP 0.762) [31,32]. However, these modifications have yet to be clinically validated.

Barcelona Clinic Liver Cancer (BCLC)

The BCLC score is another prognostic tool used preoperatively to determine HCC staging and to risk-stratify patients into treatment groups [33]. These stages range from very early (0) to early (A), intermediate (B), advanced (C), and end-stage disease (D) (Table 2). BCLC can provide both staging classification and treatment recommendations, thus increasing its clinical utility. These recommendations include surgical resection, transplantation or ablation (0 & A); transarterial chemoembolisation (TACE) (B); medical treatments (C); and palliative care (D). BCLC is also the most extensively used HCC score clinically in the western world due to its practicality and validation in western populations [24,34]. Many organisations have endorsed this system in their guidelines, including the American Associations for the Study of the Liver (AASLD) and European Associations for the Study of the Liver (EASL) [21,35]. Studies validating BCLC also suggest that it is the most suitable for accurately predicting prognosis in potential surgical patients, a benefit over CLIP [36]. However, the clinical validity of BCLC over the recommended treatment for stage B patients (TACE) is debated. The initial cohort for stage B was heterogenous in nature with varied HCC features. Hence, TACE may not always be clinically appropriate to perform in stage B patients [37,38]. Furthermore, the score is missing an important variable (cause of underlying disease) and comorbid conditions, all of which may influence staging and treatment, respectively [38]. To improve this, new refinements of BCLC with better prognostic prediction and treatment choices have been reported [24]. One example is an expanded BCLC tool through the inclusion of the Milan criteria (which assesses suitability for liver transplantation) [39]. However, more extensive clinical and external validation is required before such expanded criteria can have widespread clinical implementation.

Albumin-Bilirubin Grade (ALBI)

Chronic liver disease (CLD) is a significant risk factor for the development of HCC [40]. It has been estimated that 70%–90% of all HCC patients have some degrees of concurrent CLD [41]. As CLD is a competing cause of death in HCC patients, understanding its severity is useful for predicting liver function and survival [42]. Currently, clinicians use the Child-Pugh score to assess CLD in HCC patients. However, this score fails to distinguish between different severities of CLD, categorising the majority of patients into class B. Additionally, it includes two subjective measurements: ascites and hepatic encephalopathy [43]. Therefore, a new scoring system (ALBI grade) has been developed specifically to grade liver function in HCC patients [44]. ALBI utilises a nomogram to categorise patients into 3 grades. It uses two simple and objective measurements: serum albumin and bilirubin (Table 2). The ALBI grade has been validated to predict overall survival in patients of each BCLC stage undergoing various treatment options. It has demonstrated the suitability of use in heterogenous cohorts globally. Its applicability to all stages of HCC, along with its simplicity, makes ALBI a useful tool clinically [45]. Additionally, to support its use in practice, ALBI has exhibited a strong correlation with indocyanine green clearance, a bedside test that estimates liver function [46]. The application of this scoring system is expected to be useful for risk stratifying HCC patients in the pre-operative stage. Grade 1 patients may be the most suitable to undergo intent-curative hepatic resection, whereas grade 2 or 3 patients are more suitable for liver transplantation or other less invasive treatment options such as ablation therapies [46]. More recently, it has been suggested that a post-operative ALBI score calculation can predict overall survival more accurately than the pre-operative version [47]. Therefore, its application post-operatively may supplement an updated prognostic prediction [47]. One pitfall of ALBI is that patients who are assigned a grade 2 demonstrate a wide range of hepatic functions, which may allocate some patients into treatment options that might not be optimal. This potential issue has been raised and addressed in a study that proposes a newly modified version of the original model called mALBI, which further sub-categorises patients into grade 2a and 2b [48]. Moreover, the ALBI grade has not yet been endorsed for clinical use. It is still under critical review despite the literature reporting its effectiveness as a predictor of liver function and outcome in HCC patients.

Risk Estimation of Tumor Recurrence after Transplant Score (RETREAT)

Risk of recurrence remains an issue for HCC patients undergoing liver transplantation. Those experiencing recurrence (20%) have a poor median survival of 12 months [49]. Both the USA and UK currently incorporate the Milan criteria, a criteria that considers tumour size, number of tumours and tumour invasion, into their pre-operative guidance for liver transplantation selection in HCC patients [49,50]. The issue with the Milan criteria is that well-established risk factors of recurrence are not considered [51,52]. RETREAT, a novel scoring system, has been developed to guide clinicians as to which patients are at risk of recurrence and consequently may need to be followed-up more regularly [51]. This externally validated scoring system uses three easily objective and accessible variables to calculate a score that predicts risk of recurrence within 1 and 5 years of treatment (Table 2). Authors of RETREAT suggest that those with a score ≥ 4 would benefit from an adjunct therapy [51]. Research supports that increased frequency of follow up can improve post-recurrence survival of patients [53]. This proposes that intervention of increased monitoring may decrease mortality as recurrent HCC can be discovered and treated earlier. A large study including all patients who underwent liver transplant in the US population between 2012 and 2014 demonstrated significant prognostic ability by using RETREAT, thus validating its use [54]. Since then, The American Association for the study of Liver Diseases has endorsed RETREAT in its report as a useful way to determine follow-up intervals [55]. The limitation of using RETREAT as a prognostic tool is that the measurement of AFP is required at the time of liver transplantation. Therefore, this scoring system cannot be used pre-operatively to determine whether to proceed with transplantation. However, many criteria and scoring tools are available for this purpose [50]. Other scoring systems similar to the post-operative nature of RETREAT include Agopian et al.’s [56] nomogram and post-MORAL [57]. Both scores have greater prognostic power than RETREAT, with c-statistic of 0.88 and 0.85, respectively. Despite their impressive c-statistics, these findings should be interpreted with caution as these scoring systems have yet to be validated with external populations. Therefore, their use over RETREAT is not currently recommended. Moreover, in Asian populations, the SNAPP scoring system may offer improved risk of recurrence prediction than RETREAT which has been validated in western populations [58].

Fong’s score: clinical score for predicting recurrence after hepatic resection for metastatic colorectal cancer

Liver resection remains the key curative treatment option for colorectal cancer liver metastases (CRLM) with a cure rate of 20.6% [59]. Of patients who undergo hepatic resection for CRLM, 60%–85% experience cancer recurrence [60]. Therefore, similar to liver transplantation for HCC, it is vital that suitable patients with CRLM are identified for intent-curative surgery and that those with higher risk are followed up more vigilantly. Fong’s scoring system risk stratifies CRLM patients into low (1–2) and high (3–5) risks of recurrence (Table 2) [61]. Patients with a low risk (1–2) are advised to consider intent-curative resection. Patients with scores 3–4 should consider adjunct therapy with resection. Patients with a score of 5 are indicated to have very poor prognosis. Thus, extra consideration is required about the decision to surgically resect these patients. Many prognostic scoring tools have been produced for curative liver resection of colorectal liver metastases in HPB oncology [62]. Currently, Fong’s score is the most well-known and understood model in the literature. However, others have tried to eclipse its prognostic ability. In a meta-analysis, Fong was compared other scores externally validated in at least two study populations [62]. Fong’s score was significant for predicting rate free survival (RFS), but not significant for estimating overall survival. For predicting RFS, the Basingstoke score had a greater prognostic power (c-statistic: 0.74) than Fong, although its confidence intervals (CI) were widely distributed (CI: 0.52–0.88). For estimating overall survival, clinicians may opt to use the Valentini nomogram which has a significant prognostic power (c-statistic: 0.71). In consideration that a prognostic model c-statistic > 0.7 is ‘acceptable’, Fong’s score does not meet this requirement. Therefore, it may not be as useful as first thought for identifying high-risk patients despite its popularity in the literature. In light of this, its clinical endorsement may be overlooked and future models instead may be requested. Future models may need to incorporate genetic markers and utilise machine learning (ML) or neural networks to improve its prognostic power in predicting recurrence in CRLM patients [62,63]. Notwithstanding this disappointing c-statistic, Fong’s score may still be useful for estimating RFS in CRLM patients, especially as this score has been externally validated in four different populations, more than any of its comparators.

Pancreas

Pancreatic cancer (PC) is the seventh leading cause of cancer related mortality worldwide, totalling 4.5% of total cancer deaths in 2018 [2]. Despite complete surgical resection being the only curative treatment for PC, merely 10%–15% of patients are eligible due to most presenting with advanced disease [64]. For this reason, efforts focusing on developing tools that can identify suitable patients for surgery may be helpful in PC management.

Genç’s score: scoring system to predict recurrent disease in grades 1 and 2 nonfunctional pancreatic neuroendocrine tumors

Although neuroendocrine tumours (NETS) are perceived to be rare, they are increasing in incidence annually, displaying a 6.4-fold increase over a 39-year period [65]. Non-functional pancreatic neuroendocrine tumours (P-NETs) account for 60%–90% of all P-NETs. They are often discovered incidentally on imaging [66,67]. For malignant and localised disease, surgical resection of the P-NET is required to stop distant metastases from developing [66,68]. Overall, surgical treatment provides a good prognosis for patients. However, for those who experience recurrence, the prognosis is poor. Therefore, a scoring system that can identify patients at risk of recurrence is needed for clinicians. A scoring system by Genç et al. [69] was the first to be proposed for risk-stratifying patients with non-functional P-NETs. Genç’s score uses three easily accessible variables to calculate a total score out of 88 points. The score can be used to categorise patients into low risk (< 24) and high risk (> 24) of recurrence. For high-risk patients, authors suggest the use of an adjunct be considered to improve their prognosis. In terms of overall survival, low risk and high-risk patients had a mean survival of 110.3 and 99.4 months respectively [69]. Whilst the mean survival of the two groups is statistically significant, it may be argued that the difference of 10 months is not substantial, with high-risk patients just above the cut off possibly having similar outcomes to low-risk patients. This is likely considering that a low-risk patient with a score of 24 has 16% chance of recurrence, while a high-risk patient with a score of 40 has 25% chance of recurrence. Therefore, the decision to use an adjunct may be more difficult for patients with scores of 40 or 48 than for those with scores of 64 or 88. This scoring system may be most useful in the decision of follow up intervals after surgical treatment to monitor high risk patients. Genç’s score has only been validated in the development cohort despite it having a well-performing c-statistic of 0.81. The lack of simple guidance to predicting recurrence in patients with non-functional P-NETS makes this a novel and clinically useful tool. However, its endorsement for deciding treatment options is not advocated until the scoring system has been externally validated with other populations. For predicting tumour recurrence in functional P-NETS or in Asian populations, a similar scoring system described by Zou et al. [70] (c-statistic: 0.81) may be a suitable alternative.

Modified Glasgow Prognostic Score (mGPS)

In addition to tumour biological factors that make up the tumour-node-metastasis (TNM) staging system, there is mounting evidence to support systemic inflammation as a critical and independent marker of prognosis in malignancy [71]. As a result, inflammation based prognostic scores have been developed and proposed for prognostication and predicting survival outcomes in cancer patients [71]. The mGPS is a prognostic score used to grade inflammatory responses of PC by combining both C-reactive protein (CRP) and serum albumin levels. The mGPS has scores between 0–2, with a higher score indicating a worse prognosis (Table 2). The main advantage of the mGPS is that it utilises two commonly recorded objective measures, making it easy and simple to calculate in a clinical setting. In a meta-analysis of 20 studies, mGPS demonstrated significant overall prognostic power with hazard ratio (HR) of 1.50 for those categorised as high risk by this score versus those with a low risk [72]. However, most of these studies were conducted in eastern populations, with western population studies showing inconclusive findings (HR: 1.34, p = 0.268). Another significant pitfall of the mGPS is the possibility of receiving inaccurate results due to elevated CRP or albumin levels unrelated to PC and instead due to concurrent co-morbidities or infection. Such inaccuracies may ultimately influence clinical decision-making. Therefore, good clinical judgement is needed when using the model in patients with comorbid disease. More recently, the CRP/albumin ratio has shown potential [73]. Using the same two simple markers, one study showed that the CRP/albumin ratio had a better discriminatory ability (0.70) than the mGPS (0.63), though it should be validated with data from multiple centres before any recommendation could be made for its use over mGPS [73]. Additionally, a scoring system with haemostatic markers has been developed for use in patients with advanced pancreatic patients to provide an estimated median survival time [74]. Its novel use of markers provides an interesting area for future research. However, like the CRP/albumin ratio, its prognostic ability should be tested with external populations. Overall, the mGPS provides a valuable pre-operative prognostic ability, although future novel scores with improved discrimination may eclipse its use in western populations.

Biliary

Biliary tract cancers primarily include three different entities: gallbladder cancer (GBC), intrahepatic cholangiocarcinoma (ICC) and extrahepatic cholangiocarcinoma (ECC) [75]. Although uncommon, these malignancies are highly fatal with a poor prognosis. Thus, scoring systems may provide importance in risk stratification of patients.

Tata Memorial Hospital Scoring System (TMHSS)

Although collectively rare, GBC is the most common malignancy of the biliary tract. It is often diagnosed at an advanced stage. It remains difficult to treat mainly due to the associated poor prognosis, with surgery being the only curative option [75-77]. Currently, all available scoring systems for GBC are based on histopathological specimen results following cholecystectomy [78,79]. Thus, clinicians are unable to predict surgical outcomes preoperatively to support their clinical management. A long-held concern of GBC treatment is the offering of resection to jaundiced patients [80]. This is because jaundice is strongly associated with advanced disease, high mortality and inoperability [80]. However, recent studies have successfully demonstrated that a small sub-population of these patients can undergo surgery with curative intent and achieve long-term survival post-operatively [81]. TMHSS, a new pre-operative scoring system, might be helpful to clinicians in identifying GBC patients most suitable for surgical intervention and equally aiding to avoid surgery in those where such invasive treatment offers no overall benefit. The score has been proposed based on radiological, biochemical and clinical features [82]. The aim of TMHSS is to offer an estimation of prognosis and to predict resectability of GBC: Group A (score 0–3) being managed surgically, Group B (score 4–6) managed with neoadjuvant options or staging laparoscopy, and Group C (score ≥ 7) managed palliatively (Table 2). Statistical methods used to derive the scoring system were not acknowledged [82]. Additionally, in the development cohort, surgery was considered amenable for 58% of patients in comparison with the validation cohort where surgery was only considered for 32.5% of the population. Discrepancies between development and validation cohorts highlight the heterogeneity and that testing in a larger population group inclusive of various ethnicities would be appropriate. Strengths of TMHSS include its ease of computation since its variables (CA 19-9 and serum bilirubin) are routinely assessed in practice. The score has good discriminative ability in identifying patients who would benefit from radical resection. However, the scoring system only appears to be beneficial for patients at the two extremes of the disease—Group A and Group C. The benefit of the score is less clear for Group B, for whom the scoring system fails to differentiate patients who require surgery or palliative care, an area that requires further exploration [83]. Nevertheless, TMHSS has the potential to decrease unnecessary surgical explorations and to guide patients to an appropriate management strategy, thereby offering some prognostication. The TMHSS is a precise, simple and easily performed test with a high negative predictive value, showing its value in predicting the resectability of GBC.

Multifocality, Extra-hepatic Extension, Grade, Node Positivity, Age (> 60) prognostic score (MEGNA)

ICC, a malignancy arising from biliary epithelium, is the second most common primary liver tumour. It accounts for 10%–15% of hepatic malignancies [84,85]. The decision to manage patients surgically often relies on the perceived oncological benefit. Although multiple prognostic nomograms have been developed to predict survival following surgical resection, they have failed to be clinically integrated due to their complexity [86]. The required variables for these nomograms can only be obtained post-operatively. In addition, most nomograms are not validated in a population representative sample. The MEGNA prognostic score was developed to provide a more accurate pre-operative estimation of survival prediction following resection of ICC [87]. The allocated score (0–5) risk stratifies patients into four groups, with a higher score indicating a worse prognosis. MEGNA is based on five pre-operative variables, highlighting its simplicity (Table 2). Despite being developed on data from patients who undergo resection, MEGNA has the ability to inform decisions regarding management of ICC patients, although additional studies are required to evaluate its use preoperatively. Strengths of this scoring system include external validation and superiority in predicting prognosis and survival to existing models. A comparison of the prognostic separation index between MEGNA and staging systems of the American Joint Committee on Cancer (AJCC) has demonstrated the superiority of MEGNA in patient survival following hepatectomy [88]. However, its drawback is that currently MENGA is only potentially generalisable to US population. Current literature highlights substantial heterogeneity in histopathological and genomic characteristics of ICC based on geography [87]. This heterogeneity is demonstrated in a study by Hahn et al. [88] who evaluated the use of the MEGNA score with a direct comparison of its prognostic value with the AJCC staging system, the most widely used one currently. This study found that the C-index was 0.58 for MEGNA and 0.61 for AJCC, thus demonstrating that the ability of the MEGNA score to predict individual patient prognosis was below the 0.7 threshold of acceptable. On the one hand, this study demonstrated that MEGNA was not superior to the AJCC system in this population. It also showed that neither AJCC nor MEGNA performed adequately to support clinical decisions within this population. Contrary to this, a multi-centre validation study by Schnitzbauer et al. [89] concluded that risk groups calculated by the MEGNA score resulted in significant stratification in survival, making it a good discriminator for ICC. In summary, current scoring systems for ICC are sub-optimal. Further research studies are needed to validate the utility of the MEGNA score across geographical populations [87].

Future trends

Efforts to find ideal oncological surgical scoring systems will continue, with developers striving for an objective, accurate, and economical tool. Future model development should thus seek to produce a tool using robust statistical models, increasing precision estimates whilst being externally validated with various heterogenous populations [90]. A basic overview of how model may be developed is presented in Fig. 1 [91-94]. Moreover, all future model development should be guided by and reported in accordance to the TRIPOD (transparent reporting of a multivariable prediction model for individual prognosis or diagnosis) guidelines to ensure methodological rigor and transparency [95].

Fig. 1

Flowchart illustrating the stepwise approach for developing a scoring system. TRIPOD, transparent reporting of a multi-variable prediction model for individual prognosis or diagnosis.

Advancements in computing technologies and large data-set availability have led to the development of ML as an alternative approach to predict and stratify the risk of patient outcomes [96]. It has been posited to be advantageous over traditional regression-based scoring systems when there are large numbers of co-variates and non-linear relationships or when complex interactions are present. Although to date, ML studies have failed to show a performance benefit over traditional logistic regression models [97], data-driven technologies still hold potential advantages of being able to offer personalised care to patients from diagnosis, treatment, prognosis and monitoring in order to improve individual health outcomes. Concerns have been expressed regarding bias in ML, such as subjective selection of relevant variables using clinical knowledge that might influence results. Moreover, ML can only analyse data provided. Over-fitting of data may cause models to be unsuitable for other populations. In addressing the lack of validation and transparency of ML algorithms, a specific version of TRIPOD, the TRIPOD-ML, is being developed [98]. This checklist can provide methodological guidance on key items for researchers to consider and report during study development of future ML-based scoring systems, encouraging rigorous and transparent research in the ML community. With the movement towards personalised medicine, newer scoring systems are being developed to incorporate genetics, such as the use of prognosis-associated genes in PC [97]. MicroRNAs involved in gene regulation shows promising prognostic value due to their exclusivity to specific cell or tissue types [99]. Such genetic markers included in scoring systems could better predict outcomes and target individualised therapy for patients in the future.

Limitations

The main limitation of this review was the subjective nature by which the scoring systems were selected, with certain scoring systems of possible clinical use not being included. However, this subjectivity allowed authors to consider literature across the HPB speciality. The selected scoring systems were economical to use and reputable. Some provided novel use in the speciality. A further limitation was the use of the c-statistic to assess suitability for clinical application. Such models may be best assessed using other additional statistics and tools such as sensitivity and specificity [16,100]. Additionally, the used reference ranges were not completely justified based on an acceptable c-statistic as c-statistics close to the 0.7 mark may need caution [16]. The reporting of calibration was also poorly observed in this review. Therefore, while discrimination was satisfactory for most, a poor calibration result could weaken the use of such scores in clinical practice.

CONCLUSION

HPB oncological scoring systems discussed in this review provide a diverse overview of available models to predict prognosis, recurrence and function. These scoring systems may be considered for use in clinical environments to assist in the management of HPB patients. This could facilitate evidence-based discussions with patients so that they can make an informed and personally correct decision about their treatment. Current weaknesses of scoring systems include the lack of validation across heterogenous population groups, the non-reporting of calibration and the non-inclusion of various statistics to determine their clinical suitability. Scoring systems of the future should be designed and reported in consideration of TRIPOD. Personalised medicine and artificial intelligence will most likely become the trend of future surgical scoring systems.

SUPPLEMENTARY DATA

Supplementary data related to this article can be found at https://doi.org/10.14701/ahbps.21-113.

97 in total

1. Advanced hepatocellular carcinoma: which staging systems best predict prognosis?

Authors: Fidel-David Huitzil-Melendez; Marinela Capanu; Eileen M O'Reilly; Austin Duffy; Bolorsukh Gansukh; Leonard L Saltz; Ghassan K Abou-Alfa
Journal: J Clin Oncol Date: 2010-05-10 Impact factor: 44.544

2. Performance of prediction models on survival outcomes of colorectal cancer with surgical resection: A systematic review and meta-analysis.

Authors: Yazhou He; Yuhan Ong; Xue Li; Farhat Vn Din; Ewan Brown; Maria Timofeeva; Ziqiang Wang; Susan M Farrington; Harry Campbell; Malcolm G Dunlop; Evropi Theodoratou
Journal: Surg Oncol Date: 2019-05-20 Impact factor: 3.279

Review 3. Prognostic staging system for hepatocellular carcinoma (CLIP score): its value and limitations, and a proposal for a new staging system, the Japan Integrated Staging Score (JIS score).

Authors: Masatoshi Kudo; Hobyung Chung; Yukio Osaki
Journal: J Gastroenterol Date: 2003 Impact factor: 7.527

Review 4. Limitations of the barcelona clinic liver cancer staging system with a focus on transarterial chemoembolization as a key modality for treatment of hepatocellular carcinoma.

Authors: Pranab M Barman; Grace L Su
Journal: Clin Liver Dis (Hoboken) Date: 2016-02-26

5. Development and Validation of a Prognostic Score for Intrahepatic Cholangiocarcinoma.

Authors: Mustafa Raoof; Sinziana Dumitra; Philip H G Ituarte; Laleh Melstrom; Susanne G Warner; Yuman Fong; Gagandeep Singh
Journal: JAMA Surg Date: 2017-05-17 Impact factor: 14.766

Review 6. Long-term results after resection for gallbladder cancer. Implications for staging and management.

Authors: D L Bartlett; Y Fong; J G Fortner; M F Brennan; L H Blumgart
Journal: Ann Surg Date: 1996-11 Impact factor: 12.969

7. Validation and ranking of seven staging systems of hepatocellular carcinoma.

Authors: Zhan-Hong Chen; Ying-Fen Hong; Jinxiang Lin; Xing Li; Dong-Hao Wu; Jing-Yun Wen; Jie Chen; Dan-Yun Ruan; Qu Lin; Min Dong; Li Wei; Tian-Tian Wang; Ze-Xiao Lin; Xiao-Kun Ma; Xiang-Yuan Wu; Ruihua Xu
Journal: Oncol Lett Date: 2017-05-22 Impact factor: 2.967

1. Immunotherapy improved cancer related pain management in patients with advanced Hepato-Pancreatic Biliary Cancers: A propensity score-matched (PSM) analysis.

Authors: Xiufang Wu; Fei Qin; Qiangze Zhang; Jianling Qiao; Yulian Qi; Bing Liu
Journal: Front Oncol Date: 2022-09-21 Impact factor: 5.738

1 in total