Literature DB >> 28612601

Classification of Bladder Cancer Patients via Penalized Linearzzm321990Discriminant Analysis

Hadi Raeisi Shahraki1, Peyman Bemani, Maryam Jalali.   

Abstract

Objectives: In order to identify genes with the greatest contribution to bladder cancer, we proposed a sparse model making the best discrimination from other patients.
Methods: In a cross-sectional study, 22 genes with a key role in most cancers were considered in 21 bladder cancer patients and 14 participants of the same age (± 3 years) without bladder cancer in Shiraz city, Southern Iran. Real time-PCR was carried out using SYBR Green and for each of the 22 target genes 2-Δct as a quantitative index of gene expression was reported. We determined the most affective genes for the discriminant vector by applying penalized linear discriminant analysis using LASSO penalties. All the analyses were performed using SPSS version 18 and the penalized LDA package in R.3.1.3 software.
Results: Using penalized linear discriminant analysis led to elimination of 13 less important genes. Considering the simultaneous effects of 22 genes with important influence on many cancers, it was found that TGFβ, IL12A, Her2, MDM2, CTLA-4 and IL-23 genes had the greatest contribution in classifying bladder cancer patients with the penalized linear discriminant vector. The receiver operating characteristic (ROC) curve revealed that the proposed vector had good performance with minimal (only 3) mis- classification. The area under the curve (AUC) of our proposed test was 96% (95% CI: 83%- 100%) and sensitivity, specificity, positive and negative predictive values were 90.5%, 85.7%, 90.5% and 85.7%, respectively. Conclusions: The penalized discriminant method can be considered as appropriate for classifying bladder cancer cases and searching for important biomarkers. Creative Commons Attribution License

Entities:  

Keywords:  Bladder cancer; classification; gene expression; discriminant analysis; penalized method

Year:  2017        PMID: 28612601      PMCID: PMC5555561          DOI: 10.22034/APJCP.2017.18.5.1453

Source DB:  PubMed          Journal:  Asian Pac J Cancer Prev        ISSN: 1513-7368


Introduction

Bladder cancer is one of the most frequently occurring tumors in the urinary system. The incidence of this cancer is growing, but no significant changes were observed in its mortality rate during the last three decades (Huang et al., 2011). Bladder cancer is the sixth most common catastrophe in developed countries (Yao et al., 2007). Among the common cancers, it is ranked the forth; also, it is the ninth cause of cancer death in males and the eighth most common cancer in females (Andrew et al., 2009; Mohammad-Beigi et al., 2011; Shahraki et al., 2016). The treatment success for this cancer mostly depends on early stage of prognosis (Huang, Lin et al., 2011). In recent years, in order to detect early diagnosis of cancers, many genetic studies have been conducted. In this context, finding genes whose amount of expression can be used to best distinguish between bladder cancer patients from the others is desirable. Although recent progress has been noticeable, there is still ambiguity in diagnosis (Dudoit et al., 2002). Here, we focus on the classification of bladder cancer patients using gene expression data. So far, many studies have been conducted on the effect of one or more genetic markers on this cancer, while nowadays the association between genes is an obvious issue and single marker analysis is inefficient. Therefore, it is necessary to perform multi-marker analysis and study all of the effective and important genetic factors jointly. Recently, many statistical methods have been developed in genetic studies in which the joint effect of genetic factors has been investigated. Linear discriminant analysis (LDA) is a very useful and common method for this goal, but in the presence of many markers, it usually suffers from high dimensionality and lack of interpretability (Ye et al., 2004). As a statistical rule of thumb in LDA, for each variable at least 10 samples are required in order to lead to growing cost and time in gene expression studies (Van Belle, 2011). Penalized linear discriminant analysis is an alternative statistical method used to classify and dimension reduction, even in facing a huge number of markers and small sample size. By controlling multicollinearity in high dimensional setting, penalized LDA can handle a large number of variables and is applicable even when the number of variables are larger than the sample size (Witten and Tibshirani 2011; Raeisi Shahraki et al., 2016) Penalized discriminant methods have been applied in many studies recently. Huang et.al in 2006 used penalized discriminant method for tumor classification using gene expression data. To check the validity of this method, they applied it to classify four DNA microarray datasets. The result showed the efficiency and feasibility of this method (Huang and Zheng 2006). In another study of finding and classifying biomarkers of breast cancer, penalized mixed model was applied. By analyzing the experimental data, the authors found that the proposed method was able to classify the breast cancer type properly and find important genes that have been verified in biochemical or biomedical studies (Shi et al., 2009). The aim of this study was to determine the most affective genes from 22 genes which have a key role in most cancers, in predicting bladder cancer and classifying people according to expression of these genes by applying penalized linear discriminant analysis.

Materials and Methods

In a cross-sectional study, 25 bladder cancer patients and 25clients of the same age (± 3 years) without bladder cancer who referred to a nursing home located in Kholdebarin park, Shiraz city in Southern Iran were enrolled. In the first group, histopathologic examination confirmed their cancer and those who had surgery for removing cancerous tumors or had received chemotherapy or radiotherapy were excluded from the study. Other exclusion criteria were having metabolic, immunological, genetic and infectious diseases during the sampling and receiving any treatment. Exclusion criteria for the latter group included not having urinary problems and having a history of cancer and autoimmune disease even in their first degree relatives. In addition, those who had any type of disease two weeks before the sampling day were excluded. 3cc blood was taken from each patient, and CDNA synthesis for each of the 22 target genes was performed. Then, real time-PCR experiments were carried out using SYBR Green and2^(-Δct) was reported as a quantitative number of gene expression.

Statistical analysis

1) Penalized linear discriminant analysis

The aim of Linear Discriminant Analysis (LDA) is to separation a number of observations into known classes (Merchante et al., 2012).“Fisher’s linear discriminant analysis (FLDA) is a linear combination of observed or measured markers that describe the separations between known groups of observations in the best possible way. Its basic objective is to classify or predict the problems where the dependent variables appear in a qualitative form. In Fisher’s method, multivariate observations have maximum separation (Merchante, Grandvalet et al., 2012). It should be noted that all the contributed markers in Fisher’s LDA do not contain useful information. In other words, only some of the markers may be enough for the classification, so it is wise to select a subset of these variables because the variables are likely to be correlated (Qiao et al., 2008). In cases with many variables, such as most genetic analyses, applying LDA is not suitable because the condition of using LDA is not met anymore and the interpretation of the results is difficult (Merchante, Grandvalet et al., 2012). Penalized LDA procedure is applied in high dimensional (very huge number of markers) and low sample size settings. The basis of penalized LDA is that when there are many unnecessary markers for the purpose of classification, by selecting important or significant markers we would confront lower dimensional problem. In high dimensional low sample size (HDLSS) settings, classical discriminant rule cannot be used (Qiao, Zhou et al., 2008). Also, in the presence of HDLSS data, the interpretation of the classification rule obtained from LDA is difficult. (Witten and Tibshirani, 2011).

2) Formulation

Let X be an n×p matrix in which n is the number of observations and p is the number of markers. The kth penalized discriminant vectorβk can be obtained by the following equation: Where Pk is a convex penalty function on the kth discriminant vector and is a positive definite estimate for within-class covariance matrix Σ, lambda is a positive constant that impose penalty on the biomarkers and is the estimate for the: between-class covariance matrix Σ. (Witten and Tibshirani 2011) between-class covariance matrixΣ. (Witten and Tibshirani 2011) In penalized LDA, amounts of lambda is very important because larger values corresponding to a larger penalty lead to removal of a greater number of biomarkers from the model (Shahraki et al., 2015). In order to estimate optimum amount of lambda, we used 5-fold cross validation technique. All the statistical analyses were performed using SPSS version 18 and penalized LDA package in R.3.1.3 software.

Results

In this research, due to the nature of gene expression data sets which are unpredictable, we did not compute missing values; therefore, after eliminating the missing data, the sample size decreased to 35(21 case and 14 control). Table 1 represents descriptive statistics for all of 22 genes. Individual comparisons between expressions of genes in the two groups were performed using Mann-Whitney test. The results revealed that expression of 11 out of 22 genes were significantly different between the case and control groups. Among these, the median of the expression of OCT4, SDF1, BCL2, CTLA-4, Foxp3, IL-23and IL-27 was higher in bladder cancers than the other, but it was reversed for the Her2, IL12 A, MDM2 and TGF (Table 1).
Table 1

Descriptive Statistics and Univariate Test Results For Comparisons Between Expressions Of Genes

GeneControl (14)Case (21)P- value
CXCR40.20 (0.62)0.38 (0.76)0.89
Oct-040.002 (0.002)0.019 (0.119)0.008*
SDF13 ×10-5 (3.8×10-5)4.6 ×10-5 (1.2×10-4)< 0.001*
BCL20.001 (0.005)0.01 (0.024)0.002*
P530.022 (0.049)0.034 (0.19)0.57
Fas0.015 (0.052)0.024 (0.042)0.24
CTLA-40.0007 (0.002)0.005 (0.005)0.003*
Foxp30.0005 (0.0009)0.002 (0.0048)0.002*
CXCR30.022 (0.257)0.01 (0.019)0.2
E-Cadherin0.0004 (0.0017)7×10-5(0.0005)0.18
Her20.017 (0.116)0.0029 (0.007)0.018*
IFN0.0005 (0.0035)0.0024 (0.0095)0.12
IP100.0004 (0.015)0.0006 (0.007)0.57
IL12 A0.0015 (0.0126)0.0003 (0.0008)0.004*
IL12 B0.0001 (0.0016)0.0002 (0.0009)0.3
MDM20.006 (0.019)0.0019 (0.002)0.002*
Survivin0.0005 (0.0059)0.00016 (0.00036)0.24
IL-230.00007 (0.0007)0.012 (0.068)< 0.001*
IL-273 ×10-6 (6×10-6)0.00035 (0.001)< 0.001*
IL-60.0029 (0.0315)0.00077 (0.0016)0.32
TGF0.738 (0.99)0.00006 (0.0013)0.004*
IL-170.00027 (0.0017)0.00015 (0.00001)0.87
Descriptive Statistics and Univariate Test Results For Comparisons Between Expressions Of Genes In order to choose optimum lambda (tuning parameter) to perform penalized linear discriminant analysis, we implemented 5 fold cross validation and lambda was obtained 0.31. Then, we performed penalized linear discriminant analysis using LASSO penalties on the discriminant vector which led to elimination of 13 less important genes. Coefficients of genes in the discriminant vector are shown in the second column of Table 2. In the penalized LDA, all the variables with non-zero coefficients were considered as the significant genes.
Table 2

Coefficients and Standard Errors of Penalized Linear Discriminant Analysis

GeneDiscriminantvector
CoefficientSE
CXCR400.03
Oct-0400.07
SDF100.1
BCL200.05
P5300.02
Fas00.04
CTLA-40.280.29
Foxp30.080.24
CXCR3-0.010.09
E-Cadherin00.07
Her2-0.410.3
IFN00.07
IP1000
IL12 A-0.440.3
IL12 B00
MDM2-0.320.28
Survivin-0.050.09
IL-230.160.14
IL-2700.13
IL-600.16
TGF-0.650.48
IL-1700.03
Coefficients and Standard Errors of Penalized Linear Discriminant Analysis Box plot of estimated coefficients using 500 time bootstrap methods is shown in Figure 1.
Figure 1

Box Plot of the Estimated Coefficients for Genes Using 500 Time Bootstrap

Box Plot of the Estimated Coefficients for Genes Using 500 Time Bootstrap In order to evaluate our penalized methods, we obtained the scores using the discriminant vector and considered this score as a new test for distinguishing bladder cancer patients from the other. The receiver operating characteristic (ROC) curve revealed that proposed vector had a good performance and minimum (only 3) mis-classification. The area under the curve (AUC) of our proposed test was 96% (95% CI: 83%-100%) and sensitivity, specificity, positive predictive value and negative predictive value were 90.5%, 85.7%, 90.5%, and 85.7%, respectively.

Discussion

Considering the simultaneous effect of 22 genes which had an important role in many cancers according to previous studies, in this study, TGFβ, IL12A, Her2, MDM2, CTLA-4 and IL-23genes, respectively, had the most contribution to classification of bladder cancer patients in penalized linear discriminant vector. Also, Survivin, Foxp3 and CXCR3 genes had a low but significant effect on this multivariable vector. Penalized linear discriminant analysis, as an efficient method in high dimensional and low sample size setting (Witten and Tibshirani 2011; Merchante, Grandvalet et al., 2012), by eliminating 13 redundant and unimportant genes, represents a sparse model which makes the best discrimination between bladder cancer patients and others. Our result is consistent with those of other studies. Several studies have shown the higher expression of TGFβ in both mRNA and protein level in bladder cancer patients compared with healthy controls and even low grade bladder cancer patients. This finding was consistent with the results of Shaker (2013) and Shariat’s (2001) studies who found that expression of TGF-ß1 and its receptor (TGF-ßR1) can be used as biological markers of bladder carcinoma (Shariat et al., 2001; Shaker et al., 2013). Also, the critical effect of TGF-β expression on phenotype, aggressive properties, progression of bladder cancer and eventually its outcome has been proven (Helmy et al., 2007; Shaker, Hammam et al., 2013) Shariat (2001) showed that transforming the growth factor-β1 levels is at its highest amount in patients with bladder carcinoma metastatic to lymph nodes and is a strong independent predictor of disease recurrence and disease specific mortality(Shariat, Shalev et al., 2001). In agreement with what has been mentioned above, an in vitro study showed that production of TGF-b1 was significantly associated with the phenotype of the prostate cancer cell line and the possible involvement of the TGF-b pathway in the bladder cancer progression (Hung et al., 2008). In another study which reported consistent results with ours, high expression of TGFβ was related to immune escape mechanism in bladder cancer and TGF-beta-1 protein. Also, it can be used as an attractive target for anticancer therapy (Helmy, Hammam et al., 2007). Some other studies focused on genes involved in TGFβ pathway signaling and confirmed their role in promotion of tumor invasion and metastasis; for example, in a study performed by Fan (2014), TGF-b induced genes expression resulted in epithelial-mesenchymaltransition (EMT), a process that endows aggressive properties of cancer cell including metastasis (Fan et al., 2014). In a study by Eissa (2005) it was found that HER2/neu was significantly over-expressed in the malignant bladder cancer group compared to the benign and normal groups although they could not find a significant correlation between HER2/neu and stage or grade, but it was significantly associated with lymph node status of the tumor (Eissa et al., 2005). Similarly, Fleischmann (2011) found that Her2 amplification and over-expression was correlated with aggressive properties of bladder cancer. They showed that Her2 amplification was significantly more frequent in the lymph node metastases from urothelial bladder cancer than in the primary tumors (Fleischmann et al., 2011). In a study performed by ElMoneim (2011), Her2/neuover expression was evaluated in patients with urothelial carcinoma of bladder cancer. Based on their report, overexpression was seen in about half of the patients and Her2/neuover expression in high grade tumors was statistically significant when compared with low grade ones (ElMoneim et al., 2011). In another study performed by Shawky (2013), Her-2/Neu expression was observed in 62.5% of bladder carcinoma patients and statistical analysis revealed a significant direct association between Her-2/Neu and both increasing grade of carcinoma and depth of tumor invasion. Her-2/Neuimmunopositivity was observed in a considerable proportion of cases and it was adversely associated with prognostic factors (Shawky et al., 2013). Hammam (2015) evaluated HER2 oncoprotein expression by both immunohistochemical (IHC) staining and fluorescence in situ hybridisation (FISH) in different malignant and benign bladder lesions. The results showed expression of HER2 was significantly higher in patients with malignant lesions than in the other groups, and in high-stage tumors than in low-stage ones (Hammam et al., 2015). In another study which is inconsistent with other studies, HER2 expression was statistically higher in patients with malignant lesions of bladder cancer than other groups, and in high-grade tumors than in low grade ones. High-stage and -grade bladder malignancies expressed HER2 much more than did benign lesions (Hammam, Nour et al., 2015). Cell-cycle regulatory proteins are important indicators in determining progression through the cell-cycle and progression to invasive cancer in patients presenting with superficial bladder cancer. One of these cell cycle markers is MDM2. This protein is involved in an autoregulatory feedback loop with p53, thereby controlling its activity (Mitra et al., 2012). Another effective gene to classify bladder cancer patients in our study was MDM2. There are many studies with the same result as ours. For example, Lianes (1994) reported high levels of MDM2 expression in human bladder tumors and demonstrated that aberrant Mdm2 phenotypes might be important diagnostic and prognostic markers in patients with bladder cancer (Lianes et al., 1994). Forkhead box P3 (FOXP3), a gene member of the forkhead/winged-helix family of transcription vregulators, acts mainly in regulating the development and function of CD4+CD25+ regulatory T cells (Triulzi et al., 2013). Although this gene has initially been found to have crucial importance in generation of CD4+CD25+ regulatory T cells (Tregs), more recently its expression in epithelial cancers of the breast, prostate, and bladder has been identified(Zhang et al., 2015). Winerdal (2016) conducted the first study examining FOXP3 expression in invasive urothelial urinary bladder cancer (UBC) and in their tumor-infiltrating lymphocytes (TILs). The aim of this study was to determine the possible impact of FOXP3 expression in T-cells, as well as in tumor cells, on long-term survival in patients with UBC invading muscle. Their results showed that patients with FOXP3+ tumor cells had decreased long-term survival compared to those with FOXP3tumors. The results of their study indicated that FOXP3 expression, in both lymphocytes and tumor cells, was an important prognostic factor in UBC (Winerdal et al., 2011). Another study by Zhang (2015) showed that Foxp3Δ2 (exon 3-deleted isoform FOXP3Δ3) expression in the bladder epithelial cells inversely correlated with survival following radical cystectomy and promoted resistance to chemotherapy (Zhang, Peek et al., 2015). Also, the expression of this isoform increased with tumor stage in patients with bladder cancer (Thoma 2016). Another study by Tuna (2003) on evaluation of the ability of MDM2 as predictors of recurrence in superficial transitional cell carcinoma of the bladder showed that the percentage of positive MDM2 in a total number of counted tumor cells had a significant relationship with tumor grade and recurrence, so MDM2 expression was a valuable parameter in predicting the recurrence of superficial bladder cancer (Tuna et al., 2003). In conclusion, we found that by applying penalized discriminant method we can classify the bladder cancer cases properly and find important genes that have been verified in previous studies. Therefore, we introduced a sparse model which makes the best discrimination between bladder cancer patients and others.
  22 in total

1.  Bladder cancer determination via two urinary metabolites: a biomarker pattern approach.

Authors:  Zhenzhen Huang; Lin Lin; Yao Gao; Yongjing Chen; Xiaomei Yan; Jinchun Xing; Wei Hang
Journal:  Mol Cell Proteomics       Date:  2011-07-28       Impact factor: 5.911

2.  Bladder cancer: FOXP3Δ3 involved in chemotherapy resistance.

Authors:  Clemens Thoma
Journal:  Nat Rev Urol       Date:  2016-06-07       Impact factor: 14.432

3.  Survival Prognostic Factors of Male Breast Cancer in Southern Iran: a LASSO-Cox Regression Approach.

Authors:  Hadi Raeisi Shahraki; Alireza Salehi; Najaf Zare
Journal:  Asian Pac J Cancer Prev       Date:  2015

4.  Her2 amplification is significantly more frequent in lymph node metastases from urothelial bladder cancer than in the primary tumours.

Authors:  Achim Fleischmann; Diana Rotzer; Roland Seiler; Urs E Studer; George N Thalmann
Journal:  Eur Urol       Date:  2011-05-25       Impact factor: 20.096

5.  TGF-B1 pathway as biological marker of bladder carcinoma schistosomal and non-schistosomal.

Authors:  Olfat Shaker; Olfat Hammam; Mohamed Wishahi; Mamdouh Roshdi
Journal:  Urol Oncol       Date:  2011-03-23       Impact factor: 3.498

6.  Preoperative plasma levels of transforming growth factor beta(1) (TGF-beta(1)) strongly predict progression in patients undergoing radical prostatectomy.

Authors:  S F Shariat; M Shalev; A Menesses-Diaz; I Y Kim; M W Kattan; T M Wheeler; K M Slawin
Journal:  J Clin Oncol       Date:  2001-06-01       Impact factor: 44.544

Review 7.  Prognostic value of cell-cycle regulation biomarkers in bladder cancer.

Authors:  Anirban P Mitra; Donna E Hansel; Richard J Cote
Journal:  Semin Oncol       Date:  2012-10       Impact factor: 4.929

8.  Expression of S100 protein family members in the pathogenesis of bladder tumors.

Authors:  Ruisheng Yao; Antonio Lopez-Beltran; Gregory T Maclennan; Rodolfo Montironi; John N Eble; Liang Cheng
Journal:  Anticancer Res       Date:  2007 Sep-Oct       Impact factor: 2.480

9.  The role of TGF-beta-1 protein and TGF-beta-R-1 receptor in immune escape mechanism in bladder cancer.

Authors:  Amira Helmy; Olfat Ali Hammam; Tarek Ramzy El Lithy; Mohamed Mohi El Deen Wishahi
Journal:  MedGenMed       Date:  2007-11-13

10.  Altered patterns of MDM2 and TP53 expression in human bladder cancer.

Authors:  P Lianes; I Orlow; Z F Zhang; M R Oliva; A S Sarkis; V E Reuter; C Cordon-Cardo
Journal:  J Natl Cancer Inst       Date:  1994-09-07       Impact factor: 13.506

View more
  2 in total

1.  The Prognostic Significance of the BIN1 and CCND2 Gene in Adult Patients with Acute Myeloid Leukemia.

Authors:  Xinwen Zhang; Hao Xiong; Jialin Duan; Xiaomin Chen; Yang Liu; Chunlan Huang
Journal:  Indian J Hematol Blood Transfus       Date:  2021-08-05       Impact factor: 0.915

2.  Aortic Dissection Auxiliary Diagnosis Model and Applied Research Based on Ensemble Learning.

Authors:  Jingmin Luo; Wei Zhang; Shiyang Tan; Lijue Liu; Yongping Bai; Guogang Zhang
Journal:  Front Cardiovasc Med       Date:  2021-12-23
  2 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.