Literature DB >> 35436725

APOLLO: An accurate and independently validated prediction model of lower-grade gliomas overall survival and a comparative study of model performance.

Jiajin Chen¹, Sipeng Shen², Yi Li³, Juanjuan Fan¹, Shiyu Xiong⁴, Jingtong Xu⁴, Chenxu Zhu¹, Lijuan Lin¹, Xuesi Dong⁵, Weiwei Duan⁶, Yang Zhao¹, Xu Qian⁷, Zhonghua Liu⁸, Yongyue Wei², David C Christiani⁹, Ruyang Zhang¹⁰, Feng Chen¹¹.

Abstract

BACKGROUND: Virtually few accurate and robust prediction models of lower-grade gliomas (LGG) survival exist that may aid physicians in making clinical decisions. We aimed to develop a prognostic prediction model of LGG by incorporating demographic, clinical and transcriptional biomarkers with either main effects or gene-gene interactions.
METHODS: Based on gene expression profiles of 1,420 LGG patients from six independent cohorts comprising both European and Asian populations, we proposed a 3-D analysis strategy to develop and validate an Accurate Prediction mOdel of Lower-grade gLiomas Overall survival (APOLLO). We further conducted decision curve analysis to assess the net benefit (NB) of identifying true positives and the net reduction (NR) of unnecessary interventions. Finally, we compared the performance of APOLLO and the existing prediction models by the first systematic review.
FINDINGS: APOLLO possessed an excellent discriminative ability to identify patients at high mortality risk. Compared to those with less than the 20th percentile of APOLLO risk score, patients with more than the 90th percentile of APOLLO risk score had significantly worse overall survival (HR=54·18, 95% CI: 34·73-84·52, P=2·66 × 10-69). Further, APOLLO can accurately predict both 36- and 60-month survival in six independent cohorts with a pooled AUC36-month=0·901 (95% CI: 0·879-0·923), AUC60-month=0·843 (95% CI: 0·815-0·871) and C-index=0·818 (95% CI: 0·800-0·835). Moreover, APOLLO offered an effective screening strategy for detecting LGG patients susceptible to death (NB36-month=0·166, NR36-month=40·1% and NB60-month=0·258, NR60-month=19·2%). The systematic comparisons revealed APOLLO outperformed the existing models in accuracy and robustness.
INTERPRETATION: APOLLO has the demonstrated feasibility and utility of predicting LGG survival (http://bigdata.njmu.edu.cn/APOLLO). FUNDING: National Key Research and Development Program of China (2016YFE0204900); Natural Science Foundation of Jiangsu Province (BK20191354); National Natural Science Foundation of China (81973142 and 82103946); China Postdoctoral Science Foundation (2020M681671); National Institutes of Health (CA209414, CA249096, CA092824 and ES000002).

Entities: Chemical

Keywords: Lower-grade gliomas; Nomogram; Online tool; Prognostic prediction; Survival; Systematic review

Mesh：

Substances：
Biomarkers

Year: 2022 PMID： 35436725 PMCID： PMC9035655 DOI： 10.1016/j.ebiom.2022.104007

Source DB: PubMed Journal: EBioMedicine ISSN： 2352-3964 Impact factor: 11.205

Evidence before this study

We searched PubMed, Embase, MEDLINE, Web of Science, and Cochrane Library for articles about prognostic prediction models of LGG published before Aug 30, 2021, using search term “((lower-grade glioma) OR (lgg)) AND ((progn*) OR (survival)) AND ((predict*) OR (auc) OR (area under the curve) OR (receiver operator characteristic curve) OR (c-index) OR (c statistic) OR (roc) OR (calibration))”. We found the existing models underwent limited prediction accuracy and model validation, as most of them either solely relied on training populations or retrained models in testing populations to assess the model performance, which might be overestimated due to overfitting. Additionally, most of these existing models have limited model robustness and transportability to accommodate independent population, impeding their wide applications.

Added value of this study

In this study, we collected 1,420 LGG patients from six European and Asian populations and proposed an effective modeling strategy to develop and validate an Accurate Prediction mOdel of Lower-grade gLiomas Overall survival (APOLLO), which has the demonstrated feasibility and utility in distinguishing LGG patients at high risk of mortality and predicting their survival. Our systematic review revealed that APOLLO outperformed the existing models in accuracy and robustness.

Implications of all the available evidence

APOLLO has clinical benefits at identifying LGG patients at high mortality risk and presents a higher net benefit of identifying true positives and net reduction of unnecessary interventions. A convenient online tool to implement APOLLO was developed at http://bigdata.njmu.edu.cn/APOLLO. Alt-text: Unlabelled box

Introduction

Gliomas, the most common malignant cancer in the brain and central nervous system, account for over 80% of malignant brain tumors. Lower-grade gliomas (LGG), consisting of diffuse low- and intermediate-grade gliomas, are graded II and III by World Health Organization (WHO). Compared to those diagnosed as glioblastoma (GBM) with a WHO grade IV, LGG patients tend to have more favorable prognosis; however, 70% of them will progress to GBM within ten years. Thus, delaying tumor progression for LGG patients is critical. What is often overlooked is wide heterogeneity of LGG prognosis that is ubiquitous for those even with similar clinical features, indicating possible molecular underpinnings of the disease progression process. As a crucial milestone, the WHO Classification of Tumors of the Central Nervous System synthesized molecular and histological information to reclassify gliomas, by using well recognized molecular biomarkers. Recent evidence has emerged that gene expressions may pose inducible and reversible effects on LGG prognosis via several channels, including immunity,, stemness, and autophagy., The prognostic prediction utilizing biomarkers can aid physicians in making clinical decisions or guiding adjuvant therapy. Recently, much effort has been shifting to the LGG prognostic prediction.11, 12, 13 However, existing prediction models have various technical bottlenecks, impeding their wide applications. Specifically, these models underwent limited model validation, as most of them either solely relied on training populations or retrained models in testing populations to assess the model performance, which might be overestimated due to overfitting., Therefore, most of these existing models have limited model robustness and transportability to accommodate independent populations.,, Furthermore, almost all of the studies merely focused on predictors with main effects, but neglected predictors exhibiting gene-gene (G×G) interactions, which may provide pivotal clues regarding the biologic mechanisms of complex diseases and enhance prediction accuracy,, as evidenced by our own study of lung cancer. To address challenges in LGG survival prediction, we developed an Accurate and independently validated Prediction mOdel of Lower-grade gLiomas Overall survival (APOLLO) which identifies and includes biomarkers with significant main effects or G×G interactions, based on six cohorts with both European and Asian populations. Additionally, we have developed a free online tool implementing APOLLO to facilitate prediction of LGG survival.

Materials and Methods

Data collection and study population

We curated the clinical and gene expression data of LGG patients from six glioma cohorts, namely, the Cancer Genome Atlas (TCGA), the Chinese Glioma Genome Atlas (CGGA1), CGGA2, Rembrandt (GSE108476), Weller (GSE61374) and Gravendeel (GSE16011) cohorts. Only newly diagnosed LGG patients with complete overall survival time and transcriptomics data were retained. With the focus on biological functions and clinical utility, we considered a total of 723 pan-cancer driving genes defined by COSMIC; among them, included in our study were 680 genes shared by all six cohorts. All gene expression levels were log2-transformed and standardized before being passed into association analyses; see the Supplementary Methods for the details of sample quality control. Included in our subsequent analyses were a total of 1,420 LGG patients with 680 genes, whose demographic and clinical characteristics were summarized in Supplementary Table S1.

APOLLO construction and validation

Figure 1, depicting the study design and workflow, features a 3-D strategy (Double Types of Effects, Double Steps of Screening, and Double Steps of Modeling) for the development and validation of the APOLLO model.

Figure 1

Flowchart of development and validation of APOLLO and a systematic assessment.

The APOLLO model was developed by using a 3-D analysis strategy, encompassing Double Types of Effects, Double Steps of Screening, and Double Steps of Modeling. For biomarker screening for double types of effects, we tested both main effects and G×G interactions, followed by double steps of screening (biomarkers were first identified using the TCGA cohort and then validated in the CGGA1 cohort); the double steps of modeling meant that the model was first trained in the TCGA cohort and then tested in the CGGA1, CGGA2, Rembrandt, Weller and Gravendeel cohorts, respectively.

Double Types of Effects. For selection of important main effects and G×G interactions, we considered Cox Models 1 and 2, respectively:which adjusted for covariates, including age, WHO grade, IDH mutation and 1p/19q status (Supplementary Table S2). Double Steps of Screening. We scanned the pan-cancer related genes to select candidate genes and interactions, and then validated them with an independent validation dataset. Specifically, on the TCGA cohort, we fitted Models 1 and 2 on each gene and interaction, respectively, and selected important genes and interactions by controlling the false positive rate at a 5% level (q-FDR≤5%). On the CGGA1 cohort, we validated these selected genes or interactions; only those with P≤0.05 and with same effect directions as in the discovery step would be selected as candidate biomarkers to be passed onto the next modeling stage. Double Steps of Modeling. On the TCGA cohort and with the candidate genes and interactions identified from the previous screening stage, we used Cox models (adjusted for demographic and clinical predictors) to conduct forward stepwise regression, that is, using the likelihood ratio test with Pentry≤0.05 and Premoval>0.05, to identify a final multivariable Cox model and construct APOLLO. As validation, we assessed the discriminative performance of the obtained APOLLO via area under the receiver operating characteristic curves (AUC) or concordance index (C-index) on one internal cohort (CGGA1) and four external cohorts, namely, CGGA2, Rembrandt, Weller and Gravendeel. Flowchart of development and validation of APOLLO and a systematic assessment. The APOLLO model was developed by using a 3-D analysis strategy, encompassing Double Types of Effects, Double Steps of Screening, and Double Steps of Modeling. For biomarker screening for double types of effects, we tested both main effects and G×G interactions, followed by double steps of screening (biomarkers were first identified using the TCGA cohort and then validated in the CGGA1 cohort); the double steps of modeling meant that the model was first trained in the TCGA cohort and then tested in the CGGA1, CGGA2, Rembrandt, Weller and Gravendeel cohorts, respectively.

Bioinformatics analysis for transcriptional predictors

To understand the potential gene functions of the identified transcriptional predictors, we conducted a gene enrichment pathway analysis based on Gene Ontology (GO) and Kyoto Encyclopedia of Genes and Genomes (KEGG) database by using R package clusterProfiler. Estimation of Stromal and Immune Cells in Malignant Tumor Tissues Using Expression Data (ESTIMATE) was used to predict the presence of stromal and immune cells in tumor tissue, and CIBERSORT was performed to determine the proportions of 22 immune cells from bulk tumors based on gene expression. Finally, the gene network analysis of screened genes and immune checkpoint genes was performed using GeneMANIA, a plugin of the Cytoscape application.

A systematic review of LGG survival prediction models

Following the Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) (Supplementary Table S3), we conducted a systematic literature search on prognostic prediction models of LGG using five major databases, namely, PubMed, Embase, MEDLINE, Web of Science, and Cochrane Library. The literature search and data extraction were done independently by two researchers (S.X. and J.X.), and the discrepancies were arbitrated by a third researcher (J.C.). Details of search strings, exclusion criteria and data extraction process were provided in Supplementary Methods. We totally retrieved 3,035 articles. After removing duplicates, 1,444 articles were included for further screening. Among them, 126 articles that met the criteria on title or abstract, were eligible for a full-text review. Finally, 54 articles fully meeting our selection criteria were retained and used for data extraction.

APOLLO visualization and online software

We generated a nomogram for visualizing APOLLO by using R package rms, which can be accessed at http://bigdata.njmu.edu.cn/APOLLO. With input values of predictors for a LGG patient, the online calculator immediately returns predicted survival rates and 95% confidence intervals (CIs) at any time point between 0 and 120 months, based on an interactive web-based Kaplan-Meier survival curve.

Statistical analysis

Continuous variables were summarized as mean ± standard deviation, and categorized variables were described by frequency (n) and proportion (%). Associations between characteristics and overall survival were evaluated by Cox models using R package survival. Study centers were adjusted for when analyzing the combined samples. Kaplan-Meier survival curves illustrated the survival differences across different risk groups. The prediction accuracy was presented by using a time-dependent receiver operating characteristic (ROC) curve and was assessed by the time-dependent AUC, which can be obtained from R package timeROC. We used calibration plots to evaluate the consistency between nomogram-predicted and observed risks, and conducted decision curve analysis (with details given in Supplementary Methods) to gauge the net benefit (NB) of identifying true high risk patients that ought to have intervention and the net reduction (NR) of unnecessary interventions, due to the use of APOLLO as a screening tool., Since these transcriptional predictors were validated by trans-ethnic populations, we assumed APOLLO had uniform and homogenous performance across cohorts. Thus, we performed meta-analysis to pool prediction accuracy of APOLLO from six cohorts using the fixed-effect model, implemented by the R package meta. Stratified analyses were displayed by forest plots using the R package forestplot. Statistical analyses were performed using R (version 3·6·3). A two-sided P value less than or equal to 0.05 was considered statistically significant unless otherwise specified. The source code and data were deposited at https://github.com/JiajinChen/APOLLO.

Ethics

The study was performed in accordance with Good Clinical Practice guidelines and the World Medical Association Declaration of Helsinki. All patients provided informed consents. All data used in this study were de-identified and no protected health data was needed.

Role of the funding source

The sponsors had no role in the study design, data collection, data analyses, interpretation, or writing of the study.

Results

Development and construction of APOLLO

First, 42 genes with main effects and 307 pairs of genes with G×G interactions were identified (q-FDR≤0.05) to be possibly associated with overall survival in TCGA cohort. Of them, 28 genes with main effects and 27 pairs of genes with G×G interactions were validated in CGGA1 cohort to be candidate transcriptional predictors (Supplementary Tables S4-S5). Then, out of these candidate transcriptional predictors and on the TCGA training cohort, we used forward stepwise regression strategy to construct a final Cox model, which included 5 genes with main effects and 4 pairs of genes with G×G interactions (Supplementary Table S6). Using the coefficient estimates from this final Cox model, APOLLO, which integrated demographic, clinical and transcriptional predictors, was defined as:

Transcriptional predictors of APOLLO and their immune relevance

KEGG enrichment analysis categorized gene probes into 30 pathways, including the glioma pathway, and GO annotation identified 279 biological process pathways, 24 molecular function pathways and 20 cellular component pathways, suggesting potential biological functions (Supplementary Table S7). We compared the proportions of 22 types of immune cells between high- and low-risk groups defined by the median transcriptional score (-0·2689), and found that they were significantly different between the two groups (Supplementary Figure S1a). Further, the transcriptional score was correlated with the stromal, immune and ESTIMATE scores (Supplementary Figure S1b). Additionally, we observed high connectivity and large correlations between transcriptional predictors and immune checkpoint genes (Supplementary Figure S1c and Supplementary Figure S2), indicating that the transcriptional predictors may play a role in immune responses. Numerous immunity-related drugs targeting these transcriptional predictors have been documented in the DrugBank database (Supplementary Table S8), and, thereby, APOLLO may have potential roles in guiding immunotherapy.

Discriminative ability of APOLLO

Patients in each of the six cohorts were categorized into low- and high-risk groups using the median APOLLO score (0·6945) obtained from the TCGA training set. The APOLLO score had an adequate discriminative ability in both training and testing sets. Compared to the low-risk group in the corresponding cohort, the high-risk group was associated with worse survival in the TCGA cohort (the training set) and CGGA1 (the internal testing cohort), exhibiting a large hazard ratio (HR) (HRTCGA=8·51, 95% CI: 5·10-14·18, P=2·14 × 10−16; HRCGGA1=4·86, 95% CI: 3·24-7·28, P=1·75 × 10−14) (Figure 2a-b), and in the 4 external testing sets (HRCGGA2=6·26, 95% CI: 2·86-13·68, P=4·41 × 10−6; HRRembrandt=3·49, 95% CI: 2·06-5·91, P=3·32 × 10−6; HRWeller=3·41, 95% CI: 1·73-6·72, P=3·99 × 10−4; HRGravendeel=2·19, 95% CI: 1·31-3·68, P=2·88 × 10−3) (Figure 2c-f). We further illustrated the discriminative ability of the APOLLO score by classifying patients into 6 groups defined by the quintiles and the 90 percentile of the score in the combined cohort. The median survival months dramatically dropped from 192·6 in the 1st group (less than the 20th percentile) to 15·7 in the 6th group (above the 90th percentile). There appeared to exist a dose-response association: higher-percentile groups were associated with shorter survival and higher mortality risk (HR6 vs 1=54·18, 95% CI: 34·73-84·52, P=2·66 × 10−69; HR5 vs 1=16·28, 95% CI: 10·57-25·07, P=1·07 × 10−36; HR4 vs 1=7·05, 95% CI: 4·66-10·69, P=3·03 × 10−20; HR3 vs 1=3·88, 95% CI: 2·51-6·00, P=9·78 × 10−10; HR2 vs 1=2·63, 95% CI: 1·69-4·10, P=1·83 × 10−5); see Figure 2g-h.

Figure 2

Kaplan-Meier survival curves of LGG patients stratified by APOLLO score.

Survival differences between high- and low-risk patients in (A) TCGA, (B) CGGA1, (C) CGGA2, (D) Rembrandt, (E) Weller and (F) Gravendeel cohorts. Patients in all six cohorts were categorized into two groups based on the same cutoff point: the median of APOLLO score defined in TCGA training set. (G) Discriminative ability of the APOLLO score by illustrating the 36- and 60-month survival rate, median survival month for six groups, defined by quantiles at 20%, 40%, 60%, 80% and 90% of APOLLO score as the cutoffs. (H) The hazard ratios (HRs) and P values for patients at different levels of APOLLO score (level 1 as reference), which were derived from a Cox proportional hazards model.

Kaplan-Meier survival curves of LGG patients stratified by APOLLO score. Survival differences between high- and low-risk patients in (A) TCGA, (B) CGGA1, (C) CGGA2, (D) Rembrandt, (E) Weller and (F) Gravendeel cohorts. Patients in all six cohorts were categorized into two groups based on the same cutoff point: the median of APOLLO score defined in TCGA training set. (G) Discriminative ability of the APOLLO score by illustrating the 36- and 60-month survival rate, median survival month for six groups, defined by quantiles at 20%, 40%, 60%, 80% and 90% of APOLLO score as the cutoffs. (H) The hazard ratios (HRs) and P values for patients at different levels of APOLLO score (level 1 as reference), which were derived from a Cox proportional hazards model.

Predictive performance of APOLLO

APOLLO predicted the 36- and 60-month survival rates quite accurately in the TCGA training set and CGGA1 internal testing set (AUC36-month=0·933 and 0·888; AUC60-month=0·854 and 0·851) (Figure 3a-b) and exhibited an excellent predictive ability in the CGGA2, Rembrandt, Weller and Gravendeel external testing sets (AUC36-month=0·898, 0·893, 0·844 and 0·861, AUC60-month=0·896, 0·817, 0·806 and 0·790) (Figure 3c-f). In the meta-analysis, APOLLO presented an excellent accuracy in both training sets (AUC36-month=0·913, AUC60-month=0·852) and testing sets (AUC36-month=0·879, AUC60-month=0·831), and combined data (AUC36-month=0·901, AUC60-month=0·843). The calibration curve suggested a good accordance (Supplementary Figure S3). APOLLO significantly outperformed a basic model with the four covariates aforementioned (Supplementary Figure S4), improving AUC by 5·4% (P < 2 × 10−16) and 5·8% (P < 2 × 10−16) for the 36- and 60-month survival prediction, respectively (Supplementary Figures S5-S6). Additionally, APOLLO presented an excellent C-index in the TCGA training cohort (0·874) and CGGA1 (0·804) internal testing cohort and four external testing cohorts: CGGA2 (0·807), Rembrandt (0·772), Weller (0·787) and Gravendeel (0·759); and a pooled C-index of 0·818 (95% CI: 0·800-0·835) (Figure 3i).

Figure 3

Time-dependent receiver operating characteristic curves of APLLO for 36- and 60-month overall survival prediction.

The time-dependent ROC and AUC of APOLLO in (A) TCGA, (B) CGGA1, (C) CGGA2, (D) Rembrandt, (E) Weller and (F) Gravendeel cohorts, respectively. The pooled accuracy for (G) AUC36-month, (H) AUC60-month and (I) C-index of APOLLO across six independent cohorts.

Time-dependent receiver operating characteristic curves of APLLO for 36- and 60-month overall survival prediction. The time-dependent ROC and AUC of APOLLO in (A) TCGA, (B) CGGA1, (C) CGGA2, (D) Rembrandt, (E) Weller and (F) Gravendeel cohorts, respectively. The pooled accuracy for (G) AUC36-month, (H) AUC60-month and (I) C-index of APOLLO across six independent cohorts.

Clinical net benefits with APOLLO

With 36-month survival as the endpoint, DCA showed that APOLLO presented more clinical net benefits than several competing intervention strategies, namely, intervention for all, intervention for none, and intervention based on a basic model with only clinical and demographic indicators. Specifically, compared with the strategy of intervention for none and with a reasonable threshold probability (e.g., P=0·4), APOLLO presented a higher net benefit (NB) than the basic model (NBAPOLLO=0·130 vs NBBasic=0·111). In other words, APOLLO identified 13·0 true positive patients per 100 patients that ought to have intervention, whereas only 11·1 for the basic model (Figure 4a). On the other hand, compared to the strategy of intervention for all, APOLLO presented a higher net reduction (NR) than the basic model (NRAPOLLO=55·4% vs NRBasic=52·5%). This means APOLLO can reduce the number of unnecessary clinical interventions by 55·4%, without missing interventions for any patients truly at high mortality risk; by comparison, only 52·5% for basic model (Figure 4b). As a sensitivity analysis and by varying the threshold probability from 0 to 0·5, the APOLLO decision curves were higher than those of the other strategies over a spectrum of threshold probability and APOLLO had the best average NB and NR in for 36- and 60-month survival (NB36-month=0·166, NR36-month=40·1% and NB60-month=0·258, NR60-month=19·2%), indicating its uniform utility and suitability for clinical implementation (Figure 4a-d). For individualized prognostic prediction and screening of high-risk patients, a nomogram of APOLLO is presented in Figure 4e.

Figure 4

Decision curve analysis and nomogram for clinical application of APOLLO.

The decision curve analysis for net benefit (NB) and net reduction (NR) of patients avoided unnecessary interventions at both 36-month (A-B) and 60-month (C-D) survival, respectively for APOLLO and the basic model composed of four common demographic and clinical predictors. (E) The nomogram for APOLLO. The value of each predictor can be converted into the corresponding points according to the axis in the top of nomogram. The sum of points for each predictor can correspond to the total points axis at the bottom of the nomogram and further used to estimate the patient's 36- and 60-month survival rate.

Decision curve analysis and nomogram for clinical application of APOLLO. The decision curve analysis for net benefit (NB) and net reduction (NR) of patients avoided unnecessary interventions at both 36-month (A-B) and 60-month (C-D) survival, respectively for APOLLO and the basic model composed of four common demographic and clinical predictors. (E) The nomogram for APOLLO. The value of each predictor can be converted into the corresponding points according to the axis in the top of nomogram. The sum of points for each predictor can correspond to the total points axis at the bottom of the nomogram and further used to estimate the patient's 36- and 60-month survival rate.

Sensitivity analysis of APOLLO prediction

To assess the robustness of APOLLO, we performed a series of subgroup analyses with subgroups defined by age, gender, WHO grade, IDH mutation, 1p/19q status, MGMT promoter, radiotherapy and chemotherapy. In all the subpopulations examined, APOLLO presented good discriminative ability; the HRs that compare high- and low-risk groups within the subpopulations ranged from 3·33 (95% CI: 2·45-4·52, P=1·59 × 10−14) to 8·77 (95% CI: 5·65-13·63, P=4·54 × 10−22) (Supplementary Figure S7a). Moreover, APOLLO had reasonable AUCs in all of these subpopulations, ranging from 0·829 (95% CI: 0·784-0·873) to 0·907 (95% CI: 0·875-0·940) for 36-month survival and 0·757 (95% CI: 0·705-0·810) to 0·921 (95% CI: 0·881-0·961) for 60-month survival (Supplementary Figure S7b-c). In real-world applications, missingness may happen, in which case we recommend to use the mean imputation to fill the missing values of genes before applying APPOLO. Our simulations verified the feasibility of mean imputation (Supplementary Table S9).

Comparison of APOLLO with existing models by a systematic review

Among the 54 screened articles (Figure 5), the prognostic models have various types of predictors: 31 (57·4%) models were developed based on gene expressions, 8 (14·8%) on lncRNA and 6 (11·1%) on radiomic features (Supplementary Table S10). A total of 30 (55·6%) models were constructed by integrating multi-level biomarkers, and 19 (35·2%) studies considered molecular mutations. Except for 4 models that were only applicable to LGG subgroups (2 for IDH-wild type LGG, 1 for Grade II LGG and 1 for LGG with epilepsy), all of the models were suitable for all LGG patients. While differing in biomarker selection methods, 52 (96·3%) models were derived using Cox models. Of the 35 models using clinical variables, age was the most common predictor (n=34), followed by grade (n=27), IDH mutation (n=17), gender (n=8) and 1p/19q status (n=7) (Supplementary Table S10).

Figure 5

Flowchart of the systematic review of literature search and selection using five databases.

A total of relevant 3,035 records were obtained in PubMed, Embase, MEDLINE, Web of Science and Cochrane Library as of Aug 30, 2021. With the removal of duplicate records and irrelevant or ineligible records (based on title/abstract/text), retained were 54 records that met the criteria.

Flowchart of the systematic review of literature search and selection using five databases. A total of relevant 3,035 records were obtained in PubMed, Embase, MEDLINE, Web of Science and Cochrane Library as of Aug 30, 2021. With the removal of duplicate records and irrelevant or ineligible records (based on title/abstract/text), retained were 54 records that met the criteria. The prediction accuracy of these published LGG prognostic models was extracted from the original paper and was summarized in Table 1 and Supplementary Table S11. While 8 studies had sample size>1,000, the rest only has small to modest sample sizes, which may not guarantee the reliability of the prediction model. The 24 (44·4%) models without any self-reported external validation should be used with caution; though the other 30 (55·6%) models were externally validated, 7 of which were not completely externally validated, as they used the validation sets to screen the predictors. Further, only 4 models had multiple validations (Supplementary Table S11). In general, among 22 models that were validated by completely external testing sets, their prediction accuracy varies (C-index=0·753, Range: 0·620-0·830; AUC3-year=0·789, Range: 0·635-0·836 and AUC5-year=0·720, Range: 0·594-0·807) and was in general smaller than that of APOLLO derived from four external testing sets (C-index=0·780, Range: 0·759-0·807; AUC3-year=0·877, Range: 0·844-0·898 and AUC5-year=0·812, Range: 0·790-0·896).

Table 1

Comparison of prediction accuracy between APOLLO and 28 models of LGG with self-reported external validation.

No	PMID	Year	Method	ValidationType	Data source		Sample size			Performance in training set			Performance in testing set
No	PMID	Year	Method	ValidationType	Training	Testing	Training	Testing	Total	AUC_3-year	AUC_5-year	C-index	AUC_3-year	AUC_5-year	C-index
-	APOLLO	-	Cox	External	TCGACGGA1	CGGA2RembrandtGSE61374GSE16011	505408	143121137106	1420	0.9330.888	0.8540.851	0.8740.804	0.8980.8930.8440.861	0.8960.8170.8060.790	0.8070.7720.7870.759

1	33665000	2021	Cox	External	TCGA	CGGA a	522	623	1145	0.766	0.763	-	0.744	0.764	-

2	34123829	2021	Cox	External	TCGA	CGGA aGSE16011 a +Rembrandt a	476	407231	1114	-	-	0.878	-	-	0.7340.748

3	33951297	2021	Cox	External	TCGA	CGGA	506	592	1098	0.710	0.601	-	0.655	0.655	-

4	33400376	2021	Cox	External	TCGA	CGGA	506	592	1098	0.782	-	-	0.734	-	-

5	33594759	2021	Cox	External	TCGA	CGGA	495	590	1085	0.875	0.816	-	0.756	0.728	-

6	34395274	2021	Cox	External	TCGA	CGGA1CGGA2	474	407168	1049	0.872	0.815	-	0.6350.775	0.5940.807	-

7	33381460	2020	Cox	External	TCGA	CGGA	525	420	945	-	0.633	-	-	0.671	-

8	34015817	2021	NN	External	TCGA	CGGA	493	408	901	0.925	0.871	-	0.795	0.767	-

9	33363544	2020	Cox	External	TCGA	CGGA	459	362	821	-	-	0.878	-	-	0.68

10	32519365	2021	Cox	External	TCGA	CGGA1CGGA2	477	199139	815	0.848	0.750	-	0.8020.828	0.6740.755	-

11	31824866	2019	Cox	External	TCGA	CGGA a	511	172	683	0.89	0.78	0.839			0.811

12	31803233	2019	Cox	External	TCGA	CGGA a	511	172	683	0.831	0.711	-	0.909	0.892	-

13	34408772	2021	Cox	External	TCGA	CGGA	495	172	667	0.84	0.74	-	0.74	0.71	-

14	32793593	2020	Cox	External	TCGA	CGGA a	476	170	646	0.860	0.806	0.817	0.783	0.759	0.642

15	31533943	2019	Cox	External	CGGA	TCGA	172	451	623	0.890	0.912	-	0.782	0.696	-

16	31921517	2020	Cox	External	TCGA	CGGA	456	159	615	0.878	0.827	-	0.806	0.807	-

17	24049111	2013	Cox	External	EORTC	RTOG+NCCTG	338	235	573	-	-	0.67	-	-	0.62

18	29204839	2018	Cox	External	TCGA	CGGA	420	100	520	-	-	0.83	-	-	0.68

19	32162004	2020	RSF	External	Local	TCIA	205	91	296	-	-		-	-	0.709_iAUC

20	30362964	2018	Cox	External	TCIA	CGGA	85	148	233	-	-	0.92	-	-	0.7

21	33409797	2021	Cox	External	Local	Local	149	66	215	-	-	0.821	-	-	0.763

22	32060714	2020	Cox	External	Local	TCIA	112	46	158	-	-	0.773_iAUC	-	-	0.830_iAUC

23	32740813	2020	Cox	External	Local	TCIA	117	33	150	-	-	0.770_iAUC	-	-	0.787_iAUC

24	32229719	2020	Cox	InternalExternal	TCGA	CGGA1 aCGGA2GSE16011GSE61374	329140	40511888136	1216	-	-	0.8730.881	-	-	0.7810.7650.7210.753

25	31853837	2020	Cox	InternalExternal	TCGA	CGGA a	329140	405	874	-	-	0.8770.878	-	-	0.812

26	32351547	2020	Cox	InternalExternal	TCGA	CGGA	304128	353	785	0.8820.836	0.8840.761	0.8640.831	0.836	0.798	0.756

27	32431729	2020	Cox	InternalExternal	TCGA	CGGA	297124	353	774	0.9050.915	0.8370.828	0.8700.847	0.798	0.740	0.753

28	33591634	2021	Cox	InternalExternal	TCGA	CGGA	352152	224	728	0.9300.816	0.8760.857	-	0.835	0.711	-

Abbreviations: NN: neural network; RSF: random survival forest; iAUC: integrated area under the time-dependent ROC curve; Internal: a model was cross validated by randomly splitting the original data. External: a model was externally validated by an independent external population. The performance for each model was extracted from the original paper.

Datasets were used for biomarker screening, which were not completely external validation.

Comparison of prediction accuracy between APOLLO and 28 models of LGG with self-reported external validation. Abbreviations: NN: neural network; RSF: random survival forest; iAUC: integrated area under the time-dependent ROC curve; Internal: a model was cross validated by randomly splitting the original data. External: a model was externally validated by an independent external population. The performance for each model was extracted from the original paper. Datasets were used for biomarker screening, which were not completely external validation.

Discussion

Wide variation exists in LGG survival, ranging from 1 to over 10 years,, and patients at high risk of mortality may warrant close imaging monitoring and radical postoperative adjuvant therapy. Hence, there is an urgent need to develop accurate and robust prognostic prediction models for data-aided clinical decisions. Leveraging available public LGG transcriptome data from six independent cohorts, we adopted a 3-D analysis strategy to screen biomarkers and developed APOLLO. Derived from a large LGG cohort (TCGA) and validated in 5 trans-ethnicity cohorts with European and Asian populations, APOLLO exhibited an excellent prediction accuracy in the training and testing sets. Further, it offered good clinical net benefits for screening patients with high risk of mortality. Our systematic review also confirmed that APOLLO outperformed existing prediction models. As the utility and transportability of prediction models can be affected by gaps between the training population and the target population that the model is applied to, we addressed this by proposing a 3-D analysis strategy, including Double types of effects, Double steps of screening and Double steps of modeling. The first one ensured the accuracy of the APOLLO by recognizing that G×G interactions which provided valuable insight into biological mechanisms of complex diseases., The latter two guaranteed the robustness of APOLLO. For example, our screening procedure identified biomarkers using a European population (TCGA) and validated those biomarkers using an Asian population (CGGA1). This trans-ethnic validation revealed robustness of the transcriptional predictors. In the ensuing modeling procedures, APOLLO was trained using a TCGA cohort and was later applied to one internal and 4 external cohorts (CGGA1, CGGA2, Rembrandt, Weller and Gravendeel), and retained excellent prediction accuracy regardless of stratification by age, gender, WHO grade, IDH mutation, 1p/19q status, MGMT promoter methylation level, and history of radiotherapy or chemotherapy. According to Global Burden of Disease, there are over 1·71 million brain & nervous system cancer patients worldwide, and 427·5 thousand (25%) are LGG. Assuming that LGG patients with probability of mortality ≥ 0·4 should be clinically intervened (Figure 5b and 5d), APOLLO yielded NR36-month=55·4% and NR60-month=32·4%, meaning that, compared to the most extreme strategy of offering interventions on every LGG patient, our model could help reduce 236·8 thousand (427·5 × 55·4%) and 138·5 thousand (427·5 × 32·4%) unnecessary interventions for short- and long-term survival outcome, respectively. In the future, APOLLO may, through customized biochips, offer maximized benefits to patients and provide cost-effective precision medicine. As such, our manuscript may present a proof of concept. We found that APOLLO outperformed 54 models we reviewed in prediction accuracy and robustness. Further, we briefly summarized the biological functions of these transcriptional biomarkers in APOLLO. For the genes with significant main effects, the genetic variants of CHIC2 are found in brain tumor tissues, and ITGAV is a prognostic factor of gliomas; PLCG1 and IGF2BP2 are related to SUMOylation and m6A methylation, and involved in the immune responses, occurrence and development of gliomas;39, 40, 41 MSN is an active biomarker for glioma immune regulation and a drug target. For the pairs of genes with significant interactions, PRF1 is strongly associated with anti-CTLA-4 or anti-PD-L1 immunotherapy, and is related to immune cell activities and survival of gliomas. BCORL1 is a transcriptional corepressor that can fuse with ELF4, and repress the activation of PRF1. HMGA1 and TFG are regulated by NRF1, and can affect the prognosis of gliomas.45, 46, 47 FAS and SMAD4 are important members of the TNF-receptor superfamily and TGF-β signaling pathway, respectively, play a major role in tumor microenvironment and have antagonistic interactions. Though the biological function of the interaction between CTNND2 and GOLGA5 remains unclear, overexpressed CTNND2 is likely to increase tumor invasion of gliomas. PRF1, HMGA1, BCORL1, FAS, and MSN are the top 5 transcriptional biomarkers that are the most correlated with immune checkpoint genes. Specifically, PRF1 is viewed to be critically important for the immune cytolytic activity (CYT), reflecting the immune response of tumor cells, and is a well-established marker for cancer survival, including gliomas. HMGA1 contributes to the immunosuppressive microenvironment in tumors and the silencing of HMGA1, and can boost checkpoint blockade immunotherapy. BCORL1 is involved in the immune response pathway, impacting the response to immunochemotherapy. FAS receptor signaling plays many important roles in the immune system, evidenced by that the tumoral FAS expression may predict the survival of CAR-T-treated patients. MSN, a known target for cancer immunotherapy, regulates the migration of effector T cells. Finally, genes included in APOLLO were transcriptional predictors with immune relevance, which can be immunotherapeutic targets. Our study has several strengths. First, we performed, to our knowledge, the first systematic review of prognostic prediction models for LGG and confirmed the good performance of APOLLO. Second, this is perhaps the largest molecular prognostic prediction study for LGG, and APOLLO was strongly overall, as well as trans-ethnically validated by several large LGG cohorts. Our extensive subgroup analysis suggested the robustness and transportability of APOLLO to different populations. Third, we proposed an effective 3-D strategy for biomarker screening and model construction, by focusing on biomarkers with important main effects or G×G interactions. The strategy struck a reasonable balance among statistical properties (false positive control vs statistical power gaining), model interpretations (main effects vs G×G interactions), and computational complexity (fast variable screening vs consistent model selection). Finally, we provided a web-based tool to facilitate the application of APOLLO. We also acknowledge some limitations. First, heterogeneity existed across these cohorts with various sequencing or microarray platforms. To address this, we harmonized the data by performing standard normal transformation, which work to some degree. Second, some well recognized prognostic factors (e.g., tumor size and extent of surgical resection) were missing in several cohorts. We envision that there is much room for improvement with more available and complete clinical factors. Third, applications of APOLLO to the other ethnicity populations should be cautious, as APOLLO was trained and validated among the Asian and European populations. Forth, the improvement of accuracy was not uniform in all external validation datasets, possibly due to the population heterogeneity or the limited sample size in a single dataset. Finally, more biological experiments are needed to confirm gene functions of these transcriptional predictors used in APOLLO. To conclude, we presented an Accurate and independently validated Prediction mOdel of Lower-grade gLiomas Overall survival (APOLLO), which was demonstrated, by a systematic review, with the best prediction accuracy and robustness, and was a cost-effective strategy for screening LGG patients at high risk of mortality. A free and user-friendly online tool was developed at http://bigdata.njmu.edu.cn/APOLLO.

Contributors

Study design: J.C., R.Z., Y.W., S.S. and F.C.; Data collection and quality control: J.C., J.F. and D.C.C.; Analyses and interpretation: J.C., J.F., Y.W., S.S., W.D., X.D., Y.Z., L.L. Y.L. and Z.L.; Online tool: J.C., C.Z.; Systematic review: J.C., S.X. and J.X.; Manuscript draft: J.C., R.Z. and Y.L.; Manuscript revise: Y.L., Y.W., F.C., X.Q. and D.C.C.; All authors read and approved the final manuscript.

Data Sharing Statement

TCGA Database: https://xenabrowser.net/ CGGA Database: http://www.cgga.org.cn/ GEO Database: https://www.ncbi.nlm.nih.gov/gds/ COSMIC: http://cancer.sanger.ac.uk/cosmic/

Declaration of interests

All authors declare that they have no conflicts of interest.

56 in total

1. Semiparametric estimation of time-dependent ROC curves for longitudinal marker data.

Authors: Yingye Zheng; Patrick J Heagerty
Journal: Biostatistics Date: 2004-10 Impact factor: 5.899

2. Decision curve analysis.

Authors: Mark Fitzgerald; Benjamin R Saville; Roger J Lewis
Journal: JAMA Date: 2015-01-27 Impact factor: 56.272

Review 3. Transmembrane TNF-alpha reverse signaling leading to TGF-beta production is selectively activated by TNF targeting molecules: Therapeutic implications.

Authors: Zsuzsa Szondy; Anna Pallai
Journal: Pharmacol Res Date: 2016-11-22 Impact factor: 7.658

4. FGFR1 Induces Glioblastoma Radioresistance through the PLCγ/Hif1α Pathway.

Authors: Valérie Gouazé-Andersson; Caroline Delmas; Marion Taurand; Judith Martinez-Gala; Solène Evrard; Sandrine Mazoyer; Christine Toulas; Elizabeth Cohen-Jonathan-Moyal
Journal: Cancer Res Date: 2016-02-19 Impact factor: 12.701

Review 5. The COSMIC Cancer Gene Census: describing genetic dysfunction across all human cancers.

Authors: Zbyslaw Sondka; Sally Bamford; Charlotte G Cole; Sari A Ward; Ian Dunham; Simon A Forbes
Journal: Nat Rev Cancer Date: 2018-11 Impact factor: 60.716

6. An independently validated survival nomogram for lower-grade glioma.

Authors: Haley Gittleman; Andrew E Sloan; Jill S Barnholtz-Sloan
Journal: Neuro Oncol Date: 2020-05-15 Impact factor: 12.300

7. Prognostic Value of a Stemness Index-Associated Signature in Primary Lower-Grade Glioma.

Authors: Mingwei Zhang; Xuezhen Wang; Xiaoping Chen; Feibao Guo; Jinsheng Hong
Journal: Front Genet Date: 2020-05-05 Impact factor: 4.599

8. Comprehensive transcriptomic characterization reveals core genes and module associated with immunological changes via 1619 samples of brain glioma.

Authors: Ying Zhang; Wenping Ma; Wenhua Fan; Changyuan Ren; Jianbao Xu; Fan Zeng; Zhaoshi Bao; Tao Jiang; Zheng Zhao
Journal: Cell Death Dis Date: 2021-12-08 Impact factor: 8.469

9. IDH/MGMT-driven molecular classification of low-grade glioma is a strong predictor for long-term survival.

Authors: Severina Leu; Stefanie von Felten; Stephan Frank; Erik Vassella; Istvan Vajtai; Elisabeth Taylor; Marianne Schulz; Gregor Hutter; Jürgen Hench; Philippe Schucht; Jean-Louis Boulay; Luigi Mariani
Journal: Neuro Oncol Date: 2013-02-13 Impact factor: 13.029

10. The REMBRANDT study, a large collection of genomic data from brain cancer patients.

Authors: Yuriy Gusev; Krithika Bhuvaneshwar; Lei Song; Jean-Claude Zenklusen; Howard Fine; Subha Madhavan
Journal: Sci Data Date: 2018-08-14 Impact factor: 6.444