Literature DB >> 30763332

Complex harmonic regularization with differential evolution in a memetic framework for biomarker selection.

Sai Wang1, Hai-Wei Shen1, Hua Chai1, Yong Liang1,2.   

Abstract

For studying cancer and genetic diseases, the issue of identifying high correlation genes from high-dimensional data is an important problem. It is a great challenge to select relevant biomarkers from gene expression data that contains some important correlation structures, and some of the genes can be divided into different groups with a common biological function, chromosomal location or regulation. In this paper, we propose a penalized accelerated failure time model CHR-DE using a non-convex regularization (local search) with differential evolution (global search) in a wrapper-embedded memetic framework. The complex harmonic regularization (CHR) can approximate to the combination [Formula: see text] and ℓq (1 ≤ q < 2) for selecting biomarkers in group. And differential evolution (DE) is utilized to globally optimize the CHR's hyperparameters, which make CHR-DE achieve strong capability of selecting groups of genes in high-dimensional biological data. We also developed an efficient path seeking algorithm to optimize this penalized model. The proposed method is evaluated on synthetic and three gene expression datasets: breast cancer, hepatocellular carcinoma and colorectal cancer. The experimental results demonstrate that CHR-DE is a more effective tool for feature selection and learning prediction.

Entities:  

Mesh:

Substances:

Year:  2019        PMID: 30763332      PMCID: PMC6375558          DOI: 10.1371/journal.pone.0210786

Source DB:  PubMed          Journal:  PLoS One        ISSN: 1932-6203            Impact factor:   3.240


1 Introduction

Feature selection is a great step forward for selecting biomarkers in biological data with high dimension and small sample. Among various kinds of feature selection methods, the regularization methods use different penalty functions embedded in the learning procedure into a single process and has lower risk to over-fitting. The well known penalty is the least absolute shrinkage and selection operator (Lasso, ℓ1-norm) [1], which is performing continuous shrinkage and feature selection at the same time. Other ℓ1-norm type regularization methods typically include smoothly clipped absolute deviation (SCAD) [2], group lasso [3], minimax concave penalty (MCP) [4], etc. Besides, Xu et al [5] has proved that when , there is no significant difference in the performance of ℓ-norm, but the computational complexity to solve the ℓ1/2 regularization is much lower than that of the ℓ0-norm; while , the solutions of the ℓ regularization is more sparse with the decline in p. Under this theory, Chu et al [6] proposed a naïve harmonic regularization that can approximate penalties. One limitation of these ℓ1-norm type regularizations is that when the data set contains strong correlations among the predictors, it tends to select only one feature from the group and does not even care which one is selected, but these groups may be gene pathways in gene expression data. In theory, a strictly convex penalty function provides a sufficient condition for grouping effect of variables and ℓ-norm (q > 1) penalty guarantees strict convexity [7]. Zou and Hastie [8] proposed the Elastic net that mixes the ℓ1 and ℓ2 penalties. After that, some regularization methods without prior knowledge that combined ℓ2-norm for selecting groups of variables are SCAD-ℓ2 [7], ℓ1/2 + ℓ2 [9], and so on. While, there are also some regularization methods with prior knowledge, such as group lasso [3] that has been used for multivariate analysis of variance model, where each factor may have several levels and can be expressed by a group of dummy variables. In this article, we employ a complex harmonic regularization (CHR) [10] that approximates to the combination and ℓ (1 ≤ q < 2) to select the key factors in group among all features. This approach avoided determining the value of p or q in advance, i.e., we would not need to assume the probability distribution of the data, before evaluating the grouping effect and spare by the existing regularization methods. However, the hyperparameters of CHR are sensitive to the resolution, and the hyperparameter tuning is typically done by expert analysis, evolutionary algorithms, bayesian optimization and grid search [11]. Jaderberg et al [12] efficiently set the hyperparameters of neural networks based on the genetic algorithm (GA). Liu et al [13] proposed a hybrid genetic algorithm which combines genetic algorithm with embedded ℓ1/2 + ℓ2 regularization together. Such evolutionary algorithms are suitable to deal with tuning hyperparameters of these multimodal penalty functions. GA [14] is the most widely used one in the literature. However, GA is much slower convergence to optimum for high dimensional problem. Consequently, it cannot handle the learning model with more hyperparameters. A popular swarm-intelligence-based algorithm is the particle swarm optimization (PSO) algorithm [15] which is well adapted to the optimization of nonlinear functions in multidimensional space. Differential evolution (DE) [16] has been particularly proposed for continuous search spaces and is very simple to implement. Vesterstrom and Thomsen [17] have evaluated the performance of GA, DE and PSO regarding their general applicability as numerical optimization techniques. Then, they concluded that DE is less sensitive to parameter changes than other metaheuristic algorithms. Therefore, the DE can rightfully be regarded as an excellent choice to hyperparameter optimization. Memetic algorithm [18] is now widely used as a synergy of evolutionary or any population-based approach with separate individual learning or local improvement procedures for problem search. Evolution strategy (ES) is the first and oldest evolutionary algorithm, and it is based on the adaptation and evolution. Covariance matrix adaptation evolution strategies (CMA-ES) [19] is one of the most recent and powerful versions of memetic algorithm that combined evolution strategies with local information. The gene-pool optimal mixing evolutionary algorithm (GOMEA) is made for local search applying a strong mathematical background on the generation of the solutions, but it is considered to be a EA for discrete optimization problems [20]. Recently, Bouter et al. [21] proposed the real-valued GOMEA (RV-GOMEA) to cover the real-valued search space. Besides, memetic framework [22] models memetic algorithms as a process involving feature selection and learning procedure. In this paper, we present a wrapper-embedded memetic framework that utilizes DE to globally optimize the hyperparameters of non-convex regularization CHR that is a local search to select biomarkers in group. The workflow of our proposed algorithm is shown in Fig 1. Microarray gene expression data for one certain cancer are collected, processed into a matrix file that contains the genes (rows) and tissue samples (columns). After setting the CHR’s hyperparameters in DE procedure, CHR starts the learning procedures, and then gives the fitness values feedback to update its hyperparameters. With a fully trained model, we can get some groups of genes with non-zero coefficients, which may be the valid biomarkers for this cancer.
Fig 1

The workflow of our proposed the complex harmonic regularization with differential evolution algorithm (CHR-DE) for selecting biomarkers.

Microarray gene expression data for one certain cancer are collected, processed into a matrix file that contains the genes (rows) and tissue samples (columns). In order to identify tumor subclasses that are both biologically meaningful and clinically relevant, we apply the differential evolution (DE) to fine tuning the hyperparameters of the complex harmonic regularization (CHR). After the operations of DE procedure, such as differential mutation, crossover, adaptive local search and selection, this CHR can be used in the learning procedures, and then give the fitness values feedback to update its hyperparameters. With a fully trained model, we can get some groups of genes with non-zero coefficients, which may be the valid biomarkers for this cancer.

The workflow of our proposed the complex harmonic regularization with differential evolution algorithm (CHR-DE) for selecting biomarkers.

Microarray gene expression data for one certain cancer are collected, processed into a matrix file that contains the genes (rows) and tissue samples (columns). In order to identify tumor subclasses that are both biologically meaningful and clinically relevant, we apply the differential evolution (DE) to fine tuning the hyperparameters of the complex harmonic regularization (CHR). After the operations of DE procedure, such as differential mutation, crossover, adaptive local search and selection, this CHR can be used in the learning procedures, and then give the fitness values feedback to update its hyperparameters. With a fully trained model, we can get some groups of genes with non-zero coefficients, which may be the valid biomarkers for this cancer. The remainder of this paper is organized as follows: the CHR method for survival data in accelerated failure time (AFT) model is presented in Section 2, the implement of tuning CHR’s hyperparameters is introduced in Section 3, the experimental results and discussions are illustrated in Section 4, a concluding remark is finally made in Section 5.

2 Complex harmonic penalized accelerated failure time model

2.1 Accelerated failure time model

Suppose X denotes the h × k data matrix whose rows are X = (x, x, …, x), 1 ≤ i ≤ h, T denotes the sample vector of a lifetime or time to certain event of interest (τ1, τ2, …, τ). Throughout this article we consider failure times (or survival times) that are right censored, survival time τ = min(t, c), where t is the true survival time, c is the time to the first censoring event (e.g., study conclusion, date of final follow up) for each subject i. Our survival data consist of independent observations for h individuals , where δ is the censoring indicator, if δ = 0, it represents the right censoring time and δ = 1 means the completed time. The accelerated failure time (AFT) model is treated as a linear regression between the survival time τ and the covariates X: G(τ) = β0 + x β + ε, i = 1, 2, …, h, where , β0 is the intercept, is the regression coefficient, and ε are h independent random errors with a normal distribution function. Because of the censoring time in the datasets, the standard least squares approach is not allowed to directly compute the regression parameters of the covariates in AFT model. In order to simplify the method, we use the mean imputation method [23] to estimate the right censored data in the least squares criterion. The estimated value G(τ) of the censoring survival time τ is given by: where t(⋅) are distinct censored lifetimes in an ascending sort order, r is the number of individuals at risk of failing just before time t(i), is the Kaplan-Meier estimator [24] of the survival function, and is the step of at time t(. Therefore, the least squares approach of AFT model is to minimize the loss function L(β) for the Gaussian family: where the first column of X is all ones, and each censored y is replaced with the imputed value G(τ).

2.2 Path seeking algorithm for complex harmonic regularization penalty

Regularization is a way to avoid over-fitting in AFT model and the common form of regularization for a control parameter λ (λ > 0) is: where are the estimated coefficients, L(β) is a loss function and P(β) represents the regularization term. In fact, the survival data have different probability distributions of grouping effect and sparse. In theory, a strictly convex penalty function, such as ℓ (1 < q < 2), provides a sufficient condition for the grouping effect. On the contrary, ℓ (0 < p < 1) penalty can provide different sparse evaluation with different p value. The limitation of the existing regularization methods is that a fixed p (0 < p < 1) value ℓ-norm with ℓ2-norm is used to evaluate the grouping effect and spares in variable selection, thus they often have assumptions about the probability distribution of the data. Upon our previous work naïve harmonic regularization that can approximate penalties [6], we designed the CHR penalty that can approximate the combination of the and ℓ (1 ≤ q < 2) penalties [10]. The CHR penalty can be normally expressed as: where 0 < a, b < 1; λ1, λ2 ≥ 0; Furthermore, comparing with the fixed p and q, the CHR penalty can suggest a proper value for p and q in given datasets, and the CHR penalty can be plotted as Fig 2. When a is close to 0, m(β) ≈ |β| (ℓ1-norm, see Fig 2(c)). When a is close to 1, (ℓ1/2-norm, see Fig 2(b)). When b is close to 0, n(β) ≈ |β|2 (ℓ2-norm, see Fig 2(e)). When b is close to 1, n(β) = |β| (see Fig 2(f)), that is same with a closing to 0.
Fig 2

The complex harmonic regularization.

(a) the curves represent m(⋅) at different parameter a values; (b) the solid curve represents m(⋅) at the parameter a = 0.99, and the dashed curve is the ℓ1/2 regularization; (c) the solid curve represents m(⋅) at the parameter a = 0.01, and the dashed curve is the ℓ1 regularization; (d) the curves represent n(⋅) at different parameter b values; (e) the solid curve represents n(⋅) at the parameter b = 0.01, and the dashed curve is the ℓ2 regularization; (f) the solid curve represents n(⋅) at the parameter b = 0.99, and the dashed curve is the ℓ1 regularization.

The complex harmonic regularization.

(a) the curves represent m(⋅) at different parameter a values; (b) the solid curve represents m(⋅) at the parameter a = 0.99, and the dashed curve is the ℓ1/2 regularization; (c) the solid curve represents m(⋅) at the parameter a = 0.01, and the dashed curve is the ℓ1 regularization; (d) the curves represent n(⋅) at different parameter b values; (e) the solid curve represents n(⋅) at the parameter b = 0.01, and the dashed curve is the ℓ2 regularization; (f) the solid curve represents n(⋅) at the parameter b = 0.99, and the dashed curve is the ℓ1 regularization. Theorem 1. m(⋅) and n(⋅) approximate to the combination of and ℓ (1 ≤ q < 2) regularizations with adjustable p and q to evaluate the grouping effect and sparse of data, i.e., Proof. There are the inductions of the first two equations. The inductions of other two equations are similar to these and need not be explained here. Let in Eq (4), then the common form of CHR penalty can be re-expressed as: Therefore, we can use the path seeking algorithm [25] in linear model to sequentially construct a path directly in parameter space that closely approximates that for CHR penalty, without having repeatedly solve numerical optimization problem. Let ν measure length along the path and Δν > 0 be a small increment. Here, we need to note that the size of the step Δν can be obtained by Define where λ(ν) is the ratio of these two gradients φ(ν) for loss function Eq (2) and ϕ(ν) for the penalty function with respect to |β|. This path seeking scheme can accelerate solving the CHR penalty. The details of the implementation of CHR penalty are outlined in Algorithm 1. Algorithm 1 Implementation of CHR penalty 1: Initialize: 2: repeat 3:  Compute 4: 5:  if S = empty then 6:   j* = arg max |λ(ν)| 7:  else 8:   j* = arg max |λ(ν)| 9:  end if 10: 11: 12:  ν ← ν + Δν 13: untill λ(ν) = 0 After initializing the path, the vector λ(ν) is computed via Eqs (7)–(9) at each step. Then, those non zero coefficients which have a sign opposite to that of their corresponding λ(ν) are identified. When the set S is empty, the coefficient corresponding to the largest component of λ(ν), in absolute value is selected at line 6. And when there are one or more elements in the set S, the coefficient with corresponding largest |λ(ν)| within this subset is instead selected. The selected coefficient is then incriminated by a small amount in the direction of the sign of its correspond λ(ν) with all other coefficient remaining unchanged, producing the solution for the next path point ν + Δν. Iterations continue until all components of λ(ν) are zero. Although the complex harmonic penalized AFT model can adapt for different data distributions, this model has three hyperparameters a, b, γ which are sensitive to the resolution. The more suitable way thereby is optimized by the evolutionary algorithms to make these regularized hyperparameters more precise and efficient.

3 Complex harmonic regularization in a memetic framework

3.1 A wrapper-embedded memetic framework

Memetic framework [22] models memetic algorithms (MAs) as a process involving feature selection and learning procedure. The term of MAs, which combine evolutionary algorithms (EAs) with local search (LS) [26], have recently received much attention from the feature selection problems. These methods are inspired by Darwin’s principles of natural evolution and Dawkins defined memes, which unlike genes, can adapt themselves [27]. In most memetic-based feature selection approaches, an EA is used for wapper feature selection and a LS algorithm is used for filter feature selection. Zhu et al [28] applied genetic algorithm for wrapper feature selection and used Markov blanket approach as a LS for filter feature selection. Noman and Iba [29] incorporated a crossover-based LS with adaptive length in DE resulted into a DE-variant, where the length of the LS algorithm can be adjusted adaptively using a hill climbing heuristic. However, such memetic-based approaches have the potential limitation that filter evaluation measures may eliminate potentially useful features regardless of their performance in the wrapper approaches. In addition, the wrapper approaches usually involve a large number of assessments, and each assessment usually takes a considerable amount of time, especially when the numbers of features and instances are large. The second limitation of the existing memetic-based feature selection methods is that they are primarily concerned with the relatively small numbers of features and instances. Focusing on these limitations above, regularization method can adapting relationships between data by designing different penalty functions with original, grouping effect or net effect. What’s more, regularization methods evaluate features and build model at one stage. Therefore, we embed CHR penalty into a DE-variant for improving the selection ability under the global optimization of the non-convex regularization.

3.2 Implementation of complex harmonic regularization with differential evolution (CHR-DE) algorithm

Our proposed wrapper-embedded feature selection approach (CHR-DE) in memetic framework includes population-initialized, differential mutation, crossover, adaptive local search and selection operations. The first step of the CHR-DE approach is that the DE population is randomly initialized with each chromosome encoding the penalized hyperparameters (intron) and the coefficients of each gene in the AFT model (exon). Subsequently, the CHR approach (local search) is performed on the exon part under the fixed intron part, to reach a local optimal solution or to improve the fitness of individuals in the search population. DE operations are performed on the intron parts of the chromosomes, and the selection operator generates the next population. This process repeats itself till the stopping conditions are satisfied. The details of this approach are outlined in Algorithm 2. Algorithm 2 The CHR-DE algorithm in memetic framework Input: Bounds of solution space h, l; Population size N; Individual size N; Fitness function f(⋅); //Embedded with CHR penalty Crossover rate cr; Scaling factor F; Output: Regression coefficient β*. 1: Generate initial population //Begin DE procedure 2: pop ← rand(N, N) × (h − l) + l 3: for i = 1: N do 4:  Calculate f(pop(i)) 5: end for 6: repeat 7:  Select pop, pop pop randomly in pop 8:  //Differential mutation 9:  for i = 1: N do 10:   child(i) ← pop + F × (pop + pop) 11:   //Crossover 12:   j = ⌊rand × N⌋ 13:   for j = 1: N do 14:    if rand < cr OR j == j then 15:     offspring(i)(j) ← child(i)(j) 16:    else 17:     offspring(i)(j) ← pop(i)(j) 18:    end if 19:   end for 20:   //Selection 21:   if f(offspring) ≥ f(pop) then 22:    pop ← offspring 23:   end if 24:  end for 25:  //Adaptive local search 26:  tmpPop ← mean(pop) + w(pop − mean(pop)) 27:  for i = 1: N do 28:   for j = 1: N − 1 do 29: 30:   end for 31:   C(1) ← 0 32:   for j = 2: N do 33:    C(j) ← r(j − 1)(tmpPop(i − 1) − tmpPop(i) + C(j − 1)) 34:   end for 35:   offspring ← tmpPop(N) + C(N) 36:   if offspring ∈ (h, l) AND f(offspring) ≥ f(pop(i)) then 37:    pop(i) ← offspring 38:   end if 39:  end for 40: untill stopping criterion is met

3.2.1 Chromosome representation: Intron and exon

The first step of the CHR-DE approach is that the population of N individuals initializing randomly with each chromosome which adopts the “intron + exon” encoding [13] to construct the penalized hyperparameters (intron) and the coefficients of each gene in the AFT model (exon), i.e., c = (a, b, γ, β1, β2, ⋯, β). In CHR scheme, there are three parameters in intron part which should cover this range by uniformly randomizing individuals with minimum and maximum bounds l, h in the search space. DE searches for a global optimum in intron part which is N dimensional real parameter space : where rand is a uniformly distributed random number lying between 0 and 1. Meanwhile, the CHR is performed on exon part for each introns in individuals, i.e., β to reach a local optimal solution and to gain the fitness of each individuals.

3.2.2 Fitness definition

The mean squared error (MSE) and the concordance index (CI) are two criteria used to design a fitness function. In statistics, the MSE measures the average of the squares of the errors, which is evaluated by Eq (11) for survival data. where the predicted value . In survival analysis, the CI is the standard performance measure for model assessment and quantifies the quality of rankings by Eq (12). We employ the weighted-sum method [30] to change this bi-objective problem into a single objective problem. Thus, the individual with low MSE and high CI produces a high fitness value by Eq (13). where w is the weight of MSE for the individual i in the population, w is the CI for this individual. These weight factors can be adjusted according to what people value as an important weight, e.g., if MSE is more important than CI, we set the weight factors w = 95%, w = 5%. Furthermore, the results with different values of w and w can be found in the S1 Appendix.

3.2.3 Differential mutation operation

After initialization, DE uses a differential mutation operator based on linear combination. The indices r, s, t are mutually exclusive integers randomly generated within the range [1, N]. These indices are randomly generated once for each mutant vector child. The scaling factor F ∈ [0, 1+[ is a positive value which cannot be much greater than 1 for scaling the difference vector [31].

3.2.4 Crossover operation

To enhance the potential diversity of the population, a crossover operation applied to each pair of the target vector pop and its corresponding mutant vector child to generate a trial vector offspring. We employ the binomial (uniform) crossover to create a single trial vector. This crossover is defined for each jth component of the ith parameter vector as follows: where j ∈ [1, 2, ⋯, N] is a randomly chosen index, which ensures that offspring gets at least one component from child.

3.2.5 Adaptive local search

Usually in EAs the solutions with better fitness values are generally for reproduction, thus we use adaptive simplex crossover local search strategy for exploring the neighborhood of the best individual of population. Firstly, we expand the population with simplex crossover: where w is the control parameter of this local search. Then, generating the offspring upon the expansion population in Eqs (17) and (18).

3.2.6 Selection operation

The solutions with better fitness values are generally preferred for reproduction, as they are more likely to be in the proximity of a basin of attraction. Therefore, we deterministically select the best individual of the population for exploring its neighborhood using the selection operation that is described as where f(⋅) is the fitness function in Eq (13) to be maximized. Therefore, if the new trial vector yields an equal or higher value of the fitness function, it replaces the corresponding target vector in the next generation; otherwise the target is retained in the population. Hence, the population either gets better or remains the same in fitness status, but never deteriorates.

4 Results and discussion

4.1 Synthetic datasets

To demonstrate the performance of our proposed regularization procedure, we assume that the graph modules with 200 key factors (KFs) and that each regulates 10 different genes for a total of 2200 variables. Among these models and genes, 4 KFs and their 10 regulated genes (44 variables in total) are associated with the response based on the following model: where the independent random noise ε ∼ N(0, 1), and the non-zero coefficients are specified as For each KF, the X value is simulated from a N(0, 1) distribution, and conditional on the value of KF, we simulate the expression levels of the genes that they regulated from a conditional normal distributions ϱ of 0.2, 0.5, 0.7, and 0.9, respectively. For example, if the x1 is KF of x, i = 2, 3, ⋯, 10, then we can define this group is x = ϱ × x1 + (1 − ϱ) × x. Therefore, we have a total of 2200 variables and 44 of them are relevant. All of penalties in our experiments are solved by the general path seeking method [25]. The original DE for feature subset selection was conducted by Khushaba et al. [32]. For each model, we use two-thirds of simulated data for training and remaining one-third for testing with 600 samples. A 10-fold cross validation (CV) is conducted on training set for tuning parameters of all approaches. In our experimentation, the scaling factor F = 0.9, cross rate cr = 0.9, and the weight factors w = 95%, w = 5%, w = 1 respectively. Because the population size should be small [29], we set N = 4, and the stoping criterion of 10,000. In addition, we also calculate both sensitivity and specificity for each procedure, where To further evaluate the performance of each penalties, we employ the prediction mean-squared errors (MSE) and the concordance index (CI) with standard errors. After repeating the each penalties 50 times, the averaged results are summarized in Table 1. Generally, our proposed CHR-DE approach gives lower MSE with higher CI than other approaches. The CHR-DE also results in much higher sensitivity with comparable specificity for identifying the relevant features. The Lasso and ℓ1/2 without ℓ2-norm have strong selectivity especially in high grouping effect data ϱ = 0.7, 0.9. With the correlation ϱ increasing among genes, these no grouping effect penalties select a few genes, e.g., the sensitivity of ℓ1/2 is from 0.790 down to 0.091 (only selecting these 4 non-zero coefficient KFs) with highest specificity 0.998. The wrapper methods DE and CMA-ES have weaker selectivity than other grouping effect penalties, e.g., Elastic net, ℓ1/2 + ℓ2 and CHR, especially in the data containing low correlation features ϱ = 0.2. Although other grouping effect penalties have lower specificity, they perform well and select more correct genes whose coefficients β is non-zero, no matter what the conditional normal distributions ϱ. Comparing with the CHR’s hyperparameters tuning by grid search (CHR-GS), the CHR-DE utilizes the evolutionary algorithm to skip redundant parameter settings or to add new ones and ultimately achieves better performance.
Table 1

Results of the synthetic data, sensitivity, specificity, mean-squared-error (MSE), concordance index (CI) are based on 50 simulations.

Standard errors are given in parentheses.

ϱPenaltySensitivitySpecificityMSECI
0.2Lasso0.863 (0.152)0.996 (0.013)21.911 (3.268)0.841 (0.018)
1/20.790 (0.082)0.998 (0.001)11.318 (2.131)0.849 (0.016)
DE0.809 (0.066)0.990 (0.012)20.451 (1.875)0.879 (0.017)
CMA-ES0.746 (0.063)0.986 (0.015)20.786 (2.666)0.861 (0.016)
Elastic net0.840 (0.164)0.936 (0.014)8.649 (1.918)0.883 (0.026)
1/2 + 20.922 (0.136)0.953 (0.016)6.777 (1.754)0.901 (0.016)
CHR-GS0.977 (0.066)0.956 (0.015)6.746 (1.713)0.912 (0.014)
CHR-DE0.988 (0.081)0.962 (0.012)6.461 (1.520)0.914 (0.012)
0.5Lasso0.795 (0.117)0.996 (0.013)21.615 (3.380)0.880 (0.023)
1/20.272 (0.052)0.998 (0.001)11.475 (2.462)0.929 (0.034)
DE0.871 (0.071)0.992 (0.013)18.518 (2.924)0.946 (0.029)
CMA-ES0.735 (0.067)0.986 (0.016)18.614 (2.132)0.949 (0.025)
Elastic net0.818 (0.183)0.928 (0.015)9.605 (2.764)0.961 (0.035)
1/2 + 20.886 (0.167)0.931 (0.014)9.391 (3.479)0.966 (0.028)
CHR-GS0.928 (0.035)0.947 (0.017)9.375 (2.466)0.969 (0.023)
CHR-DE0.931 (0.054)0.949 (0.013)8.031 (2.357)0.972 (0.027)
0.7Lasso0.681 (0.023)0.997 (0.014)29.822 (2.945)0.882 (0.022)
1/20.091 (0.003)0.998 (0.001)22.850 (2.397)0.945 (0.028)
DE0.720 (0.039)0.991 (0.012)15.727 (2.628)0.963 (0.030)
CMA-ES0.680 (0.028)0.987 (0.016)16.017 (1.922)0.966 (0.025)
Elastic net0.863 (0.192)0.853 (0.015)12.873 (2.521)0.977 (0.031)
1/2 + 20.841 (0.133)0.882 (0.010)13.351 (2.757)0.965 (0.028)
CHR-GS0.923 (0.032)0.903 (0.011)12.560 (1.711)0.978 (0.024)
CHR-DE0.946 (0.061)0.924 (0.008)12.452 (1.188)0.978 (0.023)
0.9Lasso0.409 (0.005)0.995 (0.013)34.439 (2.113)0.878 (0.027)
1/20.091 (0.003)0.998 (0.001)29.565 (1.798)0.935 (0.025)
DE0.697 (0.046)0.990 (0.012)15.151 (2.757)0.967 (0.028)
CMA-ES0.435 (0.011)0.986 (0.018)15.513 (2.077)0.965 (0.025)
Elastic net0.727 (0.153)0.824 (0.016)23.764 (2.463)0.941 (0.034)
1/2 + 20.795 (0.126)0.831 (0.012)15.478 (2.826)0.967 (0.030)
CHR-GS0.864 (0.082)0.844 (0.009)14.113 (1.523)0.976 (0.026)
CHR-DE0.909 (0.063)0.873 (0.006)13.351 (1.182)0.977 (0.024)

Results of the synthetic data, sensitivity, specificity, mean-squared-error (MSE), concordance index (CI) are based on 50 simulations.

Standard errors are given in parentheses.

4.2 Real datasets

We demonstrate the proposed methods by analyzing microarray expression data from NCBI’s gene expression omnibus (GEO) with the accession number, including breast cancer (GSE22210) [33], hepatocellular carcinoma (HCC, GSE10141) [34] and colorectal cancer (CRC, GSE103479). To evaluate our CHR-DE method, we divide these datasets at random two-thirds samples become training set and the remainders are test set. The details about these above datasets are shown in Table 2. Besides, the Figs 3–5 show the pathways of some selected genes by CHR-DE method in three different cancers rendered with cBioPortal [35]. The query genes are outlined with a thick border, and all other genes are automatically identified as altered in one cancer. Darker red indicates increased frequency of alteration (defined by mutation, copy number amplification, or homozygous deletion) in one cancer. The drugs that target genes are display with hexagons, and orange indicates FDA-approved.
Table 2

The real datasets.

Dataset# genes# samples (training / test)
GSE222101,452167 (117 / 50)
GSE101416,14480 (56 / 24)
GSE103479110,961155 (109 / 46)
Fig 3

The network views of IL1B, NFKB1, IGF1R, LAT and RASA1 in the breast cancer rendered with cBioPortal [35].

The selected genes by CHR-DE are outlined with a thick border, and all other genes are automatically identified as altered in one cancer. Darker red indicates increased frequency of alteration (defined by mutation, copy number amplification, or homozygous deletion) in one cancer. The drugs that target genes are display with hexagons, and orange indicates FDA-approved.

Fig 5

The network view of CDC42, SLC10A2, TNRC6B and MOV10 in the colorectal cancer rendered with cBioPortal [35].

The selected genes by CHR-DE are outlined with a thick border, and all other genes are automatically identified as altered in colorectal cancer. Darker red indicates increased frequency of alteration (defined by mutation, copy number amplification, or homozygous deletion) in one cancer.

The network views of IL1B, NFKB1, IGF1R, LAT and RASA1 in the breast cancer rendered with cBioPortal [35].

The selected genes by CHR-DE are outlined with a thick border, and all other genes are automatically identified as altered in one cancer. Darker red indicates increased frequency of alteration (defined by mutation, copy number amplification, or homozygous deletion) in one cancer. The drugs that target genes are display with hexagons, and orange indicates FDA-approved.

The network view of ADRB3 and MAPK3 in the hepatocellular carcinoma rendered with cBioPortal [35].

The selected genes by CHR-DE are outlined with a thick border, and all other genes are automatically identified as altered in one cancer. Darker red indicates increased frequency of alteration (defined by mutation, copy number amplification, or homozygous deletion) in one cancer. The drugs that target genes are display with hexagons, and orange indicates FDA-approved.

The network view of CDC42, SLC10A2, TNRC6B and MOV10 in the colorectal cancer rendered with cBioPortal [35].

The selected genes by CHR-DE are outlined with a thick border, and all other genes are automatically identified as altered in colorectal cancer. Darker red indicates increased frequency of alteration (defined by mutation, copy number amplification, or homozygous deletion) in one cancer.

4.2.1 Breast cancer

GSE22210 contains 167 breast tumor samples with 1,452 genes obtained using GEO Platform GPL9183 [33]. Table 3 shows that the CHR-DE performs best in predicting the patients’ survival time with selecting smaller number of genes than the Elastic net and CHR-GS.
Table 3

The results with standard errors in parentheses for GSE22210.

Penalty# selected genesMSECI
Lasso4621.023 (2.680)0.776 (0.017)
1/22325.271 (2.432)0.783 (0.019)
DE4421.421 (2.381)0.735 (0.024)
CMA-ES38.528.835 (2.619)0.695 (0.016)
Elastic net15933.331 (2.125)0.809 (0.025)
1/2 + 2104.33331.975 (1.992)0.805 (0.022)
CHR-GS122.66718.790 (1.987)0.813 (0.014)
CHR-DE117.66717.371 (1.871)0.815 (0.015)
As see from the Table 4, CHR-DE penalty selects some unique genes, such as HIC1 LIF which play an important role in the development of primary breast cancer [36, 37]. The XIST is selected by these 8 different methods and lack an X chromosome decorated by XIST RNA causes the basal-like subtype of invasive breast carcinoma [38]. Moreover, some relevant genes are selected by other regularization models such as IL1B, NFKB1, IGF1R and SERPINB2 which are also found by the CHR-DE. Especially, the IL1B, NFKB1 and IGF1R in a small group of network by CHR-DE method as shown in Fig 3, and they are also targeted by several cancer drugs. The IL1B leads to enhanced production of proinflammatory cytokines triggered by the treatment, with subsequent effects on persistent fatigue in the aftermath of breast cancer [39]. Wood et al [40] identified NFKB1 mutation in breast tumorigenesis. As one of related receptors in insulin-like growth factor (IGF) system, type I IGF receptor (IGF1R) can influence the activity of estrogen receptor-α (ER) that can be used in promoting breast tumor regression [41]. The the plasminogen activator inhibitor type 2 (PAI2, SERPINB2), is significantly associated with increased survival in patients with breast cancer [42, 43].
Table 4

The top 10 selected genes in the GSE22210.

Lasso1/2DECMA-ESElastic net1/2 + 2CHR-GSCHR-DE
1XISTIL1BIGSF4CASB4SERPINB2XISTXISTSERPINB2
2LATXISTAFF3KIAA1804XISTIL1BIL1BIMPACT
3IL1BHLA-DQA2BMP4CASP10IMPACTLATLATXIST
4DNASE1L1TGFAIGF2ASCDKN2AIL1BESR2NFKB1HIC1
5NFKB1CDKN1AXISTTERTLATKCNK4TGFAIGF1R
6HDAC9GNMTCD9BCAP31CCND1IGF1RCDKN1ALAT
7BCL2L2LATCDC25BGLI2NFKB1CD1ARASA1LIF
8ESR2BCL2L2MMP1XISTTGFAPTPRFHDAC9IL1B
9AFPHDAC9NFKB2ABCG2HLA-DQA2HLA-DQA2LAMC1NFKB1
10LAMC1CD44HFECCKBRRASGRF1TGFARASGRF1RASA1

4.2.2 Hepatocellular carcinoma

GSE10141 contains 6,144 genes for 80 hepatocellular carcinoma (HCC) patients. Table 5 also shows that the CHR-DE performed best in predicting the patients’ survival time with selecting smaller number of genes than the Elastic net and CHR-GS.
Table 5

The results with standard errors in parentheses for GSE10141.

Penalty# selected genesMSECI
Lasso2931.228 (3.165)0.764 (0.030)
1/21032.756 (2.203)0.772 (0.031)
DE4431.975 (2.701)0.756 (0.029)
CMA-ES34.7532.037 (2.982)0.736 (0.027)
Elastic net6028.721 (3.672)0.753 (0.022)
1/2 + 23630.333 (2.406)0.732 (0.026)
CHR-GS41.66727.460 (2.181)0.771 (0.023)
CHR-DE4127.161 (2.026)0.781 (0.018)
As see from the Table 6, CHR-DE penalty selects some unique genes, such as KRT14, NOLC1. Liver cytokeratin14 (KRT14), a marker of liver stem cells, is only positive in G0 phase of hepatocellular carcinoma cell line Huh7 [44]. NOLC1 is regulated by CREB-NOLC1 pathway that plays an important role in hepatocellular carcinoma progression by modulating tumor growth, angiogenesis and apoptosis [45, 46]. Furthermore, the ADRB3, MAPK3, MGAT1, TGFBI and DAD1 are selected by CHR-DE penalty and other methods such as Lasso, ℓ1/2, DE, CMA-ES and CHR-GS meanwhile. Especially, the ADRB3 and MAPK3 in a small group of network by CHR-DE method as shown in Fig 4, and they are also targeted by several cancer drugs. Zhao et al [47] identified two pathways, “calcium signaling pathway” and “neuroactive ligand-receptor interaction” containing ADRB3, which correlated with middle and late stages of HCC development. Okabe et al [48] suggested that activation of the MAPK pathway containing MAPK3, MAPK9 is a common feature of HCC. Guo et al [49] reported alterations of glycogene and N-glycan such as MGAT1 in human hepatocarcinoma cells correlate with tumor invasion, tumorigenicity and sensitivity to chemotherapeutic drug. As a tumor suppressor, arginylglycylaspartic acid (RGD) peptides released from βig-H3, also known as transforming growth factor-beta-induced protein (TGFBI) peptides mediate apoptosis of Hep3B hepatoma cells [50]. While, βig-H3 can promote the progression of hepatocellular carcinoma as well [51, 52]. Tanaka et al [53] has demonstrated that high expression of DAD1 in HCC cells can activate oligosaccharyltransferase (OST) and block apoptosis, thereby enhancing tumor cell survival.
Table 6

The top 10 selected genes in the GSE10141.

Lasso1/2DECMA-ESElastic net1/2 + 2CHR-GSCHR-DE
1PSG6CYP24A1KLRC3SLC29A2PSG6PSG6CYP24A1CYP24A1
2CYP24A1ADRB3IFI6HMGB2CYP2A7CYP24A1ADRB3KRT14
3ADRB3OLFM4IL32TTC35CYP24A1CYP2A7ATP6AP2ADRB3
4PPP2CAEFNA5NCBP2BTG3LBX1GPR3MGAT1ATP6AP2
5MGAT1MGAT1ITGA5ICAM2SYT5MPLSPTBN1MGAT1
6CCR9AADACLSRNFKBIBSLC10A2VIPAUHTGFBI
7DAD1SULT1E1SPTBN2MAPK3KRT81PRKCQIGFBP3NOLC1
8ATP6AP2TGFBIASPATAP1MPLSSTR3SULT1E1DAD1
9CAPZA1LSRMAPK9OSTF1EPYCSYT5GRM5MAPK3
10OLFM4HIST1H2BHRSC1A1EIF2B1HTR6KRT81ACTBGM2A
Fig 4

The network view of ADRB3 and MAPK3 in the hepatocellular carcinoma rendered with cBioPortal [35].

The selected genes by CHR-DE are outlined with a thick border, and all other genes are automatically identified as altered in one cancer. Darker red indicates increased frequency of alteration (defined by mutation, copy number amplification, or homozygous deletion) in one cancer. The drugs that target genes are display with hexagons, and orange indicates FDA-approved.

4.2.3 Colorectal cancer

GSE103479 contains 110,961 genes for 155 colorectal cancer (CRC) patients. Table 7 also shows that the CHR-DE performed best in predicting the patients’ survival time with selecting smaller number of genes than the Elastic net and CHR-GS.
Table 7

The results with standard errors in parentheses for GSE103479.

Penalty# selected genesMSECI
Lasso3963.909 (3.588)0.691 (0.033)
1/21862.245 (2.624)0.725 (0.049)
DE4462.374 (3.115)0.707 (0.034)
CMA-ES39.563.975 (4.415)0.682 (0.031)
Elastic net6661.201 (4.290)0.713 (0.036)
1/2 + 24359.832 (3.278)0.727 (0.028)
CHR-GS56.33356.202 (3.107)0.735 (0.036)
CHR-DE51.33353.999 (3.043)0.748 (0.035)
As see from the Table 8, the CDC42 is selected by CHR-DE penalty and other methods. It is one of the best characterized members of the Rho GTPase family, which was found to be up-regulated in several types of human tumors including CRC. Targeting CDC42 would potentially decrease CRC metastasis formation [54, 55, 56]. Furthermore, there are four selected genes CDC42, SLC10A2, TNRC6B and MOV10 in a small group of network by CHR-DE method as shown in Fig 5. This ileal sodium dependent bile acid transporter (ISBT; gene code: SLC10A2) has been associated with the risk for development of sporadic colorectal adenoma, a precursor lesion for CRC [57]. ATN1 may be promising biomarkers for the distinction between serrated and conventional CRC [58]. These two above genes SLC10A2 and ATN1 are selected by CHR-DE penalty and Lasso. The RPS11 is selected by these 6 different penalties at the same time. Kasai et al [59] demonstrated that RPS11 is highly expressed in CRC (especially in immature mucosal cells located in the crypt base) but can be detected hardly in the normal colorectal mucosa.
Table 8

The top 10 selected genes in the GSE103479.

Lasso1/2DECMA-ESElastic net1/2 + 2CHR-GSCHR-DE
1RPS11RPS11TMTC1CLDND2RPS11RPS11RPS11RPS11
2TNRC6BIKGALTABCD3FCGR3AIKLINC01315LINC01315
3LINC01315RP11-50B3.4CDC42KLRK1FAM24ARPL27AFCGR3ATNRC6B
4CDC42RHOARMND5BCDC42IKCDC42IKSLC10A2
5SLC10A2PIAS1SPACA1MYH4RPL27ATNRC6BRPL27ACDC42
6ATN1SERPINB12LILRB1ITGA7XKRXSERPINC1FAM24AATN1
7SERPINB12DLSTOR8B2AVENNNMTGABPASERPINB12MOV10
8LCE1BCDC42OPTCTBC1D32DEFB108BSLC10A2CDC42BPIFA3
9RNF215RPL27AGLTSCR2OR5P3SERPINB12XKRXXKRXSERPINB12
10WDR73ZDHHC20SYTL1A4GNTTREM1ZDHHC20TNRC6BGABPA

5 Conclusion

In this paper, we have proposed a penalized accelerated failure time model CHR-DE to recognize the biomarkers that are both biologically meaningful and clinically. This model is designed based on wrapper-embedded memetic framework that combines a non-convex regularization (local search) with differential evolution (global search). First, this new method inherits the robust power of regularization methods that integrate feature selection and learning procedure into a single process. Furthermore, our proposed method utilizes differential evolution (DE) to globally optimize the CHR’s hyperparameters, which make CHR-DE achieve strong capability of selecting groups of genes in high-dimensional biological data. We also developed an efficient path seeking algorithm to optimize this penalized model. The results in both synthetic and real datasets have indicated that the CHR-DE method is highly competitive against some existing feature selection approaches to select biomarkers in groups. Additionally, this CHR-DE scheme can be easily implemented in other high-dimensional and low-sample datasets.

The results with different values of MSE and CI weights.

We display the results with different weightings in synthetic datasets and breast cancer data (GSE22210). (PDF) Click here for additional data file.
  32 in total

1.  Novel Regularization Method for Biomarker Selection and Cancer Classification.

Authors:  Xiao-Ying Liu; Sai Wang; Hai Zhang; Hui Zhang; Zi-Yi Yang; Yong Liang
Journal:  IEEE/ACM Trans Comput Biol Bioinform       Date:  2019-02-04       Impact factor: 3.710

2.  Methylation of the HIC-1 candidate tumor suppressor gene in human breast cancer.

Authors:  H Fujii; M A Biel; W Zhou; S A Weitzman; S B Baylin; E Gabrielson
Journal:  Oncogene       Date:  1998-04-23       Impact factor: 9.867

3.  Differential expression of ribosomal proteins in human normal and neoplastic colorectum.

Authors:  Hide Kasai; Daita Nadano; Eiko Hidaka; Kayoko Higuchi; Masatomo Kawakubo; Taka-Aki Sato; Jun Nakayama
Journal:  J Histochem Cytochem       Date:  2003-05       Impact factor: 2.479

4.  BetaIg-h3 is involved in the HAb18G/CD147-mediated metastasis process in human hepatoma cells.

Authors:  Juan Tang; Hong-wei Zhou; Jian-li Jiang; Xiang-min Yang; Yu Li; Hong-xin Zhang; Zhi-nan Chen; Wei-ping Guo
Journal:  Exp Biol Med (Maywood)       Date:  2007-03

5.  Enhanced expression of mRNAs of antisecretory factor-1, gp96, DAD1 and CDC34 in human hepatocellular carcinomas.

Authors:  K Tanaka; N Kondoh; M Shuda; O Matsubara; N Imazeki; A Ryo; T Wakatsuki; A Hada; N Goseki; T Igari; K Hatsuse; T Aihara; S Horiuchi; N Yamamoto; M Yamamoto
Journal:  Biochim Biophys Acta       Date:  2001-04-30

6.  Molecular subtypes of breast cancer are associated with characteristic DNA methylation patterns.

Authors:  Karolina Holm; Cecilia Hegardt; Johan Staaf; Johan Vallon-Christersson; Göran Jönsson; Håkan Olsson; Ake Borg; Markus Ringnér
Journal:  Breast Cancer Res       Date:  2010-06-18       Impact factor: 6.466

7.  Cytokine gene polymorphisms and fatigue in breast cancer survivors: early findings.

Authors:  Alicia Collado-Hidalgo; Julienne E Bower; Patricia A Ganz; Michael R Irwin; Steve W Cole
Journal:  Brain Behav Immun       Date:  2008-07-09       Impact factor: 7.217

8.  Glycogenes mediate the invasive properties and chemosensitivity of human hepatocarcinoma cells.

Authors:  Rui Guo; Lei Cheng; Yongfu Zhao; Jianing Zhang; Chunqing Liu; Huimin Zhou; Li Jia
Journal:  Int J Biochem Cell Biol       Date:  2012-10-24       Impact factor: 5.085

9.  Novel harmonic regularization approach for variable selection in Cox's proportional hazards model.

Authors:  Ge-Jin Chu; Yong Liang; Jia-Xuan Wang
Journal:  Comput Math Methods Med       Date:  2014-11-24       Impact factor: 2.238

10.  Protein-protein interaction analysis of distinct molecular pathways in two subtypes of colorectal carcinoma.

Authors:  Hanzhang Chen; Yunzhen Fang; Hailong Zhu; Shuai Li; Tao Wang; Pan Gu; Xia Fang; Yunjin Wu; Jun Liang; Yu Zeng; Long Zhang; Weizhe Qiu; Lanjing Zhang; Xianghua Yi
Journal:  Mol Med Rep       Date:  2014-09-22       Impact factor: 2.952

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.