Literature DB >> 27114989

Predicting the Survival Time for Bladder Cancer Using an Additive Hazards Model in Microarray Data.

Leili Tapak1, Hossein Mahjub2, Majid Sadeghifar3, Massoud Saidijam4, Jalal Poorolajal5.   

Abstract

BACKGROUND: One substantial part of microarray studies is to predict patients' survival based on their gene expression profile. Variable selection techniques are powerful tools to handle high dimensionality in analysis of microarray data. However, these techniques have not been investigated in competing risks setting. This study aimed to investigate the performance of four sparse variable selection methods in estimating the survival time.
METHODS: The data included 1381 gene expression measurements and clinical information from 301 patients with bladder cancer operated in the years 1987 to 2000 in hospitals in Denmark, Sweden, Spain, France, and England. Four methods of the least absolute shrinkage and selection operator, smoothly clipped absolute deviation, the smooth integration of counting and absolute deviation and elastic net were utilized for simultaneous variable selection and estimation under an additive hazards model. The criteria of area under ROC curve, Brier score and c-index were used to compare the methods.
RESULTS: The median follow-up time for all patients was 47 months. The elastic net approach was indicated to outperform other methods. The elastic net had the lowest integrated Brier score (0.137±0.07) and the greatest median of the over-time AUC and C-index (0.803±0.06 and 0.779±0.13, respectively). Five out of 19 selected genes by the elastic net were significant (P<0.05) under an additive hazards model. It was indicated that the expression of RTN4, SON, IGF1R and CDC20 decrease the survival time, while the expression of SMARCAD1 increase it.
CONCLUSION: The elastic net had higher capability than the other methods for the prediction of survival time in patients with bladder cancer in the presence of competing risks base on additive hazards model.

Entities:  

Keywords:  Additive hazards model; Bladder cancer; Microarray data; Survival analysis; Variable selection

Year:  2016        PMID: 27114989      PMCID: PMC4841879     

Source DB:  PubMed          Journal:  Iran J Public Health        ISSN: 2251-6085            Impact factor:   1.429


Introduction

Urothelial carcinoma of the urinary bladder is the ninth most frequently diagnosed malignancy world-wide (1) and one of the most prevalent, representing 3% of cancers diagnosed globally (2). Bladder cancer accounts for an estimated 386,000 new diagnoses and 150,000 related deaths annually (1). Early detection of bladder cancer remains one of the most urgent issues in many researches (3). Although, there are some improvements in imaging and surgical techniques, the overall mortality of patients with bladder cancers has not been unchanged (4) and outcomes for patients remain suboptimal (2). Currently, morphologic and pathologic criteria such as histology, stage, and grade are used for conventional diagnosis of bladder cancer, which play an important role in determining treatment (5). However, even though essential prognostic information is provided by these clinical criteria, they have inadequate power to predict patient outcome precisely (5) and there remains significant variability in the prognosis of patients with similar characteristics (2). Thus, the need to identify additional tumor characteristics that predict clinical behavior is highlighted in patients with bladder cancer (2). Recently, there has been a growing interest in the use of gene expression signatures for prediction disease outcome of patients with cancer. The major objective of these studies is to identify a small subset of genes that their expression levels are significantly correlated with clinical outcomes like time to an event (6). Identification of influential genes may help to better characterize cancers and consequently optimize therapy decisions. Accordingly, the cancer patients’ survival time can be estimated based on gene expression profile (7). However, typical methods of analysis are not applicable anymore, because the number of genomic variables, P, is much greater than the number of subjects, n (8). When the outcome of interest is survival time, microarray data analysis is further complicated due to censorship for a number of subjects, especially in the competing risks setting where there is more than one reason of failure. Several variable selection methods based on the maximization of a penalized likelihood, originally developed for linear regression, have been adapted to survival models for high dimension low sample size time-to-event data under the Cox proportional hazards model (9–14). For example, the least absolute shrinkage and selection operator (Lasso) (15) smoothly clipped absolute deviation (SCAD) (10), Dantzig selector (9), LARS (11), the smooth integration of counting and absolute deviation (SICA) penalty (12) and the elastic net (13) have been proposed. A well-known method for analyzing survival data is additive hazards model, which assumes that the covariates have additive effects on the hazard (16, 17). The additive models have some remarkable features. Particularly, they pertain to the risk difference or excess risk measure, which is especially relevant and informative in epidemiological and clinical studies (8). Regularization techniques for variable selection, in high dimension survival data, have also been extended to the additive hazards model, though the number of studies is limited (8, 18). The objective function of additive hazards model makes least-squares form of estimations computationally easier. This is especially substantial for high dimensional studies where computation cost is of serious concern (19). These techniques have just been used for a single time endpoint and the performance of them has not been investigated for gene selection in the presence of competing risks. This study aimed to investigate the performance of four renowned variable selection methods of Lasso, SCAD, SICA and elastic net for high-dimensional time-to-event data with competing risks based on an additive hazards model to predict survival time in patients with bladder cancer. The other goal of this study was to identify significant genes among those selected by the better variable selection method and to determine their effects on bladder cancer patient’s survivals according to the additive hazards model.

Methods

Data Source

This study used a publicly available bladder cancer data set analyzed by Dyrskjøt et al. (20). The dataset consists gene expression measurements for 1381 genes and survival outcomes on 404 patients with bladder cancer that were operated in the years 1987 to 2000 in hospitals in Denmark, Sweden, Spain, France, and England, with pTa and pT1 tumors, with no previous or synchronous muscle-invasive tumors (GEO with series accession no. GSE5479). However, the analysis was limited to n=301 patients with complete information. Two competing events including time to progression or death from bladder cancer (the response of interest) and death from other or unknown causes (the competing event) existed.

Regularization for Additive Hazards Model

For a sample with k competing risk types, let T be the time to the kth type of failure, T = min(T1, . . ., Tk) be the failure time and C be the censoring time. Denote the failure indicator by Δ = I (T ≤ C), where I (·) is the indicator function and takes value 1 if the observed time is an event time and value 0 if censoring occurred. Let Z be a P-dimensional vector of predictable covariate processes and assume that T and C are conditionally independent given Z. The observed data consist of (T, Δ ε, Z), where ε ∈ {1,...,K} indicates the (potentially unobserved) cause of failure (8, 21). The cause-specific hazard function associated with kth risk is defined as: Under the Lin and Ying additive hazards model (16), the hazard function of a failure time T conditional on a P-vector of possibly time-dependent covariates Z is specified as: where λ0(·) is an unspecified baseline hazard function which is common to all subjects and β0 is a P-vector of regression coefficients (8). The penalized estimator β̂ of regression coefficients in the additive hazards model is a solution to the regularization problem: Where L(β) is the likelihood function of the additive hazards model (8), and p (θ), θ ≥ 0, is a penalty function that depends on the regularization parameter λ ≥ 0 and often is rewritten as p (·) = λρ(·) (8). In this study, four commonly used sparse penalty functions of Lasso, SCAD, SICA and elastic net have been considered (8). Then, the L1-penalty term, ρ(θ)=θ, θ ≥ 0 is used by the Lasso method. On the other hand, the elastic net method combines the L1-penalty ρ(θ)=θ and the L2-penalty ρ(θ)=θ2 which yields a penalty with the form of ρ(θ)=(1−a)θ+aθ2 and 02 as a shape parameter and the SICA penalty takes the form , θ≥0 and a>0 is a shape parameter. Estimation of β̂ was accomplished through the coordinate descent algorithm (8). After a solution path has been produced, selecting the optimal regularization parameter λ is carried out via the use of a cross validation score by M-fold cross-validation. The cross-validation score is defined as follows where L( (·) is the least squares type loss function computed from the mth part of the data, and β̂(−) (λ) is the estimate from the data with the mth part removed (8). The additional parameter a in the elastic net, SCAD and SICA methods was also tuned according to the method used by (8).

Performance Criteria

Assessment of the performance of the four methods was conducted through several criteria. Analysis of predictive performance was performed by using time-dependent receiver operator characteristic (ROC) curves (22) and bootstrap .632+ prediction error curves (21). The present study utilized concordance probability (C-index) that can be applied to measure and compare the discriminative power of a risk prediction models and the Integrated Brier score (21, 23).

Software

Analysis was performed using the R software programming (http://www.r-project.org) by implementing a publically available R package which has been provided by Lin and Lv (2013) ( http://162.105.204.96/teachers/linw/software.html). In addition, the “pec” and “survAUC” R packages were utilized to evaluate the performance of used methods.

Results

The median follow-up time for all patients was 47 months. Progression or death from bladder cancer and competing event were observed in 74 and 33 patients, respectively. By the end of the time of follow-up, the number of 194 patients was censored. Additive hazards models were fitted to microarray bladder cancer data by the Lasso, elastic net, SCAD and SICA penalization techniques for the ‘progression or death from bladder cancer’ event. Table 1 presents selected genes by the four methods. The procedures were repeated 100 times, each time yielding a different set of genes. The frequency of occurrences of the genes, means of coefficients and standard errors over 100 replicates, were shown in Table 1. The number of selected genes varied between the methods. In addition, there were eight common genes (SEQ265, SEQ279, SEQ1226, SEQ1262, SEQ1384, SEQ213, SEQ34 and SEQ377) among the four feature selection techniques.
Table 1:

Selected microarray features by using four variable selection approaches for progression or death from bladder cancer event in Dyrskjøt data set. Values shown are frequency of selected genes, means of coefficients (standard errors) over 100 replicates

MethodElastic netLASSOSCADSICA
Gene IDFrequency*β (SE)Frequencyβ (SE)Frequencyβ (SE)Frequencyβ (SE)
SEQ1082845.2(0.3)884.9(0.3)873.9(0.3)--
SEQ1197904.5(0.2)914.3(0.2)953.6(0.2)--
SEQ1226100−5.5(0.2)100−5.1(0.2)100−4.4(0.2)100−19(0.2)
SEQ126299−8.9(0.3)99−9.0(0.3)99−7.9(0.3)100−19(0.2)
SEQ1284------561.5(0.2)
SEQ1295------832.3(0.2)
SEQ1330540.8(0.1)430.6(0.1)400.4(0.1)--
SEQ138476−5.1(0.4)75−4.9(0.4)74−3.4(0.3)100−48(0.3)
SEQ16298−1.6(0.1)96−1.6(0.1)98−1.5(0.1)--
SEQ213631.6(0.2)431.2(0.2)500.8(0.1)833.8(0.3)
SEQ24032−0.1(0.3)------
SEQ2651009.4(0.1)1009.5(0.1)1009.4(0.1)972.6(0.1)
SEQ27984−5.4(0.3)88−5.2(0.3)87−4.1(0.3)83−1.7(0.1)
SEQ287761.0(0.1)690.8(0.1)740.6(0.1)--
SEQ3410023(0.3)10024(0.3)10023(0.3)10031(0.1)
SEQ377998.8(0.4)978.1(0.3)997.2(0.3)1008.2(0.2)
SEQ408------130.4(0.1)
SEQ410------1006.9(0.2)
SEQ542------560.9(0.1)
SEQ82010011(0.1)10011(0.1)10012(0.1)--
SEQ8331007.1(0.2)1007.1(0.2)1006.6(0.2)--
SEQ843------97−3.3(0.1)
SEQ94090−5.2(0.3)91−5.1(0.2)95−4.2(0.2)--
SEQ948------13−0.3(0.1)

Coefficients and standard errors (SE) must be multiplied by 10−4

Selected microarray features by using four variable selection approaches for progression or death from bladder cancer event in Dyrskjøt data set. Values shown are frequency of selected genes, means of coefficients (standard errors) over 100 replicates Coefficients and standard errors (SE) must be multiplied by 10−4 Table 2 shows the mean and standard errors of the Brier score, the median of the area under ROC curve over time (AUC) and C-index for each method. In terms of three criteria, the elastic net penalty outperformed the other three methods. The mean of the integrated Brier score of the elastic net over 100 repetitions was the lowest (0.137±0.07). In addition, the mean of the median of the over-time AUC and C-index were the greatest for the elastic net (0.803±0.06 and 0.779±0.13, respectively).
Table 2:

Results of various methods applied to the Bladder cancer microarray data

MethodIntegrated Brier scoreAUC(t)C-index
Elastic net0.137±0.070.803±0.060.779±0.13
Lasso0.153±0.090.741±0.110.693±0.19
SCAD0.144±0.060.763±0.070.722±0.16
SICA0.145±0.070.761±0.070.717±0.12

AUC is the area under ROC curve

Results of various methods applied to the Bladder cancer microarray data AUC is the area under ROC curve In order to evaluate prediction performance improvement by including selected microarray features over a purely clinical model, bootstrap .632+ prediction error curves were drawn based on B=100 bootstrap samples drawn without replacement. Fig. 1 shows the estimates for prediction of the four variable selection methods as well as the model with clinical covariates. Including selected microarray features in the models clearly improve over the purely clinical model, indicating that valuable information is contained in the data. In addition, based on these criteria, the elastic net outperformed the other three methods.
Fig. 1:

Bootstrap 0.632+ prediction error curve estimates for prediction of the conditional probability function from bladder cancer microarray data

Bootstrap 0.632+ prediction error curve estimates for prediction of the conditional probability function from bladder cancer microarray data Besides, five out of 19 genes selected by elastic net (SEQ1082, SEQ1197, SEQ1262, SEQ833 and SEQ940) were significant based on additive hazards model (P=0.005, 0.016, 0.039, 0.009 and 0.025 respectively). Table 3 shows the coefficients, standard errors and P-values for these genes. The expression of gene SEQ940 increased the survival time, whereas the expression of other significant genes (SEQ1082, SEQ1197, SEQ1262 and SEQ833) decreased the survival time.
Table 3:

Influential genes on bladder cancer patient’s survival based on additive hazards model from selected genes by elastic net

Gene IDGene numberGene nameCoefficient (SE)P-value
SEQ1082NM_207521.1Homo sapiens reticulon 4 (RTN4)0.0025 (0.0009)0.005
SEQ1197NM_003103.5Homo sapiens (human) SON DNA binding protein (SON)0.0034 (0.0014)0.016
SEQ1262NM_000875.2Homo sapiens insulin-like growth factor 1 receptor (IGF1R), mRNA0.0029 (0.0014)0.039
SEQ833NM_001255.1Homo sapiens CDC20 cell division cycle 20 homolog (S. cerevisiae) (CDC20), mRNA0.0018 (0007)0.009
SEQ940NM_020159.1Homo sapiens SWI/SNF-related, matrix-associated actin-dependent regulator of chromatin, subfamily a, containing DEAD/H box 1 (SMARCAD1), transcript variant 3, mRNA−0.0023 (0.0010)0.025
Influential genes on bladder cancer patient’s survival based on additive hazards model from selected genes by elastic net

Discussion

The present study considered a cause specific additive hazard approach for high-dimensional time-to-event data with competing risks and compared the performance of four penalized variable selection techniques of Lasso, elastic net, SCAD and SICA. Four variable selection techniques were applied in a real dataset containing microarray competing risks data related to the bladder cancer patients. Despite the existence of the variability in the selected genes by different methods due to the low sample size and high dimensionality, there was some consistency across the methods. However, the elastic net method showed better performance. The results also showed that selected microarray features by the four methods could improve the prediction of survival over a purely clinical model in bladder cancer patients. Besides, based on an additive hazards model five out of 19 selected genes by the elastic net were indicated as influential genes on bladder cancer survival. Consequently, these genes can predict the survival time of the patients with bladder cancer. The expression of genes RTN4, SON, IGF1R and CDC20 can decrease the survival time, while the expression of gene SMARCAD1 might increase survival time. The significant genes are related to types of cancers. Nogo proteins, encoded by gene reticulon-4 (RTN4), are myelin associated endoplasmic reticulum proteins, have been suggested by recent studies to play an important role in apoptosis, especially in cancer cells like bladder and lung cancer (24, 25). They were apoptosis-inducing proteins, and involved in the process of apoptosis through some classical apoptotic signal pathways (24, 25). A potent neurite outgrowth inhibitor is the product of this gene and is useful in blocking the regeneration of the central nervous system (25). Besides, the SON protein regulates alternative splicing of RNAs from the genes involved in apoptosis and epigenetic modification (26). In addition, SON-mediated splicing is essential for proper processing of selective transcripts related to cell cycle, microtubules, centrosome maintenance, and genome stability (26). In addition, the absence of this gene involved in the regulation of gene expression will result in a disruption in gene expression and is effective in the editing process. This gene has an important role in cancer and other types of human disease (27, 28). The insulin-like growth factor 1 receptor (IGF1R) signaling pathway plays important roles in regulating cellular proliferation and apoptosis and changes in expression of IGF1R may be a risk factor of cancer incidence (29). This gene functions as an anti-apoptotic agent by enhancing cell survival, besides, it is highly overexpressed in most malignant tissues (30). The overexpression of the IGF1R in invasive bladder cancer tissues and promotes motility and invasion of urothelial carcinoma cells have been confirmed by several studies (31–35). The other gene, CDC20, acts as a regulatory protein interacting with several other proteins at multiple points in the cell cycle. The overexpression of CDC20 is related to poor prognosis of urothelial carcinoma of the human bladder (36–38). The gene SMARCAD1’s function is as binding transcriptional start sites of many genes involved in transcriptional regulation and the end resection. This gene encodes a member of the SNF subfamily of helicase proteins, which plays a critical role in the restoration of heterochromatin organization and propagation of epigenetic patterns following DNA replication by mediating histone H3/H4 deacetylation (39). Heterochromatin maintenance and proper chromosome segregation needs SMARCAD1 and it is related to several types of cancer including bladder, breast, colorectal, and gastric cancers (40–42). This data set was first analyzed by Dyrskjøt et al. (20) and they identified 88 highly significantly correlated genes with progression-free survival by a univariate Cox regression strategy. Based on the results of the present study, there were only four common microarray feature (SEQ213, SEQ833, SEQ820 and SEQ843). In addition, Binder et al. performed a study on the same dataset using Cox proportional hazards likelihood based boosting method (21). There were also five common genes (SEQ34, SEQ162, SEQ265, SEQ820 and SEQ1384) with their study. A number of studies have evaluated the performance of variable selection methods based on both additive and proportional hazards approaches (9, 10, 18, 19). In a study conducted by Lin and Lv (8), the performance of the elastic net, Lasso, SCAD and SICA was compared based on a simulation study for a single point time-to-event data in high-dimensional low sample size setting. Their results showed that the SICA outperformed other methods, which was inconsistent with the results of the present study. In another study Engler and Li (6), compared the elastic net and Lasso variable selection methods, for non-competing risks time-to-event data in high-dimensional low-sample size setting based on Cox proportional hazards. The results of their simulation studies and real data set demonstrated that the elastic net outperform the Lasso, which was in agreement with the present study. Based on MSE criteria the performance of elastic net in linear regression was superior to Lasso, which was also similar to the results of the present study (13). The performance of elastic net and Lasso was compared via a simulation study for a single point time-to-event data in high-dimensional low sample size setting based on additive hazards method (8). Their results showed that the elastic net outperformed the Lasso, which was similar to the result of the present study for competing risks setting. Ogutu et al. reported similar accuracies for Lasso and elastic net in handling linear regression (43). The performance of variable selection methods and the different models depends on the used data with no method dominating the others (44, 45). The present study focused on evaluation of the performance of four well-known variable selection methods of Lasso, elastic net, SCAD and SICA in an additive manner to analyze microarray competing risks data. The additive hazards model and used variable selection approaches provide a useful alternative to existing dimension reduction techniques based on Cox’s model for competing risks survival data with high-dimensional covariates. The present study also introduced a new set of influential microarray features in bladder cancer patients’ survival from an additive perspective, which is different from proportional hazards point of view. According to the result of the present study, despite the small number of selected genes all the methods of Lasso, elastic net, SCAD and SICA showed reasonable performance in additive manner and the selected genes improved prediction performance over a purely clinical model. The expression levels of influential genes play an important role on survival time as either risk factors or preventive factors. Therefore, determining the expression levels of such genes might help in primary prevention programs (46).

Conclusion

The elastic net penalty has higher capability than the Lasso, SCAD and SICA in the prediction of survival time in patients with bladder cancer in high-dimensional competing risk settings based on the additive hazards model. Besides a combination of appropriate statistical methods and gene expression data can help detecting influential genes in survival time.

Ethical considerations

Ethical issues (Including plagiarism, informed consent, misconduct, data fabrication and/or falsification, double publication and/or submission, redundancy, etc.) have been completely observed by the authors.
  36 in total

1.  High expression of spindle assembly checkpoint proteins CDC20 and MAD2 is associated with poor prognosis in urothelial bladder cancer.

Authors:  Jung-Woo Choi; Younghye Kim; Ju-Han Lee; Young-Sik Kim
Journal:  Virchows Arch       Date:  2013-08-31       Impact factor: 4.064

2.  Predicting survival from microarray data--a comparative study.

Authors:  H M Bøvelstad; S Nygård; H L Størvold; M Aldrin; Ø Borgan; A Frigessi; O C Lingjaerde
Journal:  Bioinformatics       Date:  2007-06-06       Impact factor: 6.937

3.  Additive risk models for survival data with high-dimensional covariates.

Authors:  Shuangge Ma; Michael R Kosorok; Jason P Fine
Journal:  Biometrics       Date:  2006-03       Impact factor: 2.571

4.  Maintenance of silent chromatin through replication requires SWI/SNF-like chromatin remodeler SMARCAD1.

Authors:  Samuel P Rowbotham; Leila Barki; Ana Neves-Costa; Fatima Santos; Wendy Dean; Nicola Hawkes; Parul Choudhary; W Ryan Will; Judith Webster; David Oxley; Catherine M Green; Patrick Varga-Weisz; Jacqueline E Mermoud
Journal:  Mol Cell       Date:  2011-05-06       Impact factor: 17.970

5.  Gene expression is highly correlated on the chromosome level in urinary bladder cancer.

Authors:  George I Lambrou; Maria Adamaki; Dimitris Delakas; Demetrios A Spandidos; Spyros Vlahopoulos; Apostolos Zaravinos
Journal:  Cell Cycle       Date:  2013-05-08       Impact factor: 4.534

6.  Combination of a novel gene expression signature with a clinical nomogram improves the prediction of survival in high-risk bladder cancer.

Authors:  Markus Riester; Jennifer M Taylor; Andrew Feifer; Theresa Koppie; Jonathan E Rosenberg; Robert J Downey; Bernard H Bochner; Franziska Michor
Journal:  Clin Cancer Res       Date:  2012-01-06       Impact factor: 12.531

7.  Association of genetic variations in RTN4 3'-UTR with risk of uterine leiomyomas.

Authors:  Kui Zhang; Peng Bai; Shaoqing Shi; Bin Zhou; Yanyun Wang; Yaping Song; Li Rao; Lin Zhang
Journal:  Pathol Oncol Res       Date:  2013-03-12       Impact factor: 3.201

Review 8.  Dichotomy of decorin activity on the insulin-like growth factor-I system.

Authors:  Andrea Morrione; Thomas Neill; Renato V Iozzo
Journal:  FEBS J       Date:  2013-02-15       Impact factor: 5.542

9.  Identification of novel germline polymorphisms governing capecitabine sensitivity.

Authors:  Peter H O'Donnell; Amy L Stark; Eric R Gamazon; Heather E Wheeler; Bridget E McIlwee; Lidija Gorsic; Hae Kyung Im; R Stephanie Huang; Nancy J Cox; M Eileen Dolan
Journal:  Cancer       Date:  2012-01-03       Impact factor: 6.860

10.  High-dimensional additive hazards regression for oral squamous cell carcinoma using microarray data: a comparative study.

Authors:  Omid Hamidi; Lily Tapak; Aarefeh Jafarzadeh Kohneloo; Majid Sadeghifar
Journal:  Biomed Res Int       Date:  2014-05-19       Impact factor: 3.411

View more
  1 in total

1.  Regularized Weighted Nonparametric Likelihood Approach for High-Dimension Sparse Subdistribution Hazards Model for Competing Risk Data.

Authors:  Leili Tapak; Michael R Kosorok; Majid Sadeghifar; Omid Hamidi; Saeid Afshar; Hassan Doosti
Journal:  Comput Math Methods Med       Date:  2021-09-19       Impact factor: 2.238

  1 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.