Literature DB >> 19584932

Gene-based multiclass cancer diagnosis with class-selective rejections.

Nisrine Jrad¹, Edith Grall-Maës, Pierre Beauseroy.

Abstract

Supervised learning of microarray data is receiving much attention in recent years. Multiclass cancer diagnosis, based on selected gene profiles, are used as adjunct of clinical diagnosis. However, supervised diagnosis may hinder patient care, add expense or confound a result. To avoid this misleading, a multiclass cancer diagnosis with class-selective rejection is proposed. It rejects some patients from one, some, or all classes in order to ensure a higher reliability while reducing time and expense costs. Moreover, this classifier takes into account asymmetric penalties dependent on each class and on each wrong or partially correct decision. It is based on nu-1-SVM coupled with its regularization path and minimizes a general loss function defined in the class-selective rejection scheme. The state of art multiclass algorithms can be considered as a particular case of the proposed algorithm where the number of decisions is given by the classes and the loss function is defined by the Bayesian risk. Two experiments are carried out in the Bayesian and the class selective rejection frameworks. Five genes selected datasets are used to assess the performance of the proposed method. Results are discussed and accuracies are compared with those computed by the Naive Bayes, Nearest Neighbor, Linear Perceptron, Multilayer Perceptron, and Support Vector Machines classifiers.

Entities: Chemical Disease Gene Species

Mesh：

Year: 2009 PMID： 19584932 PMCID： PMC2703706 DOI： 10.1155/2009/608701

Source DB: PubMed Journal: J Biomed Biotechnol ISSN： 1110-7243

1. Introduction

Cancer diagnosis, based on gene expression profiling, have improved over the past 40 years. Many microarray technologies studies were developed to analyze the gene expression. These genes are later used to categorize cancer classes. Two different classification approaches can be used: class discovery and class prediction. The first is an unsupervised learning approach that allows to separate samples into clusters based on similarities in gene expression, without prior knowledge of sample identity. The second is a supervised approach which predicts the category of an already defined sample using its gene expression profiles. Since these classification problems are described by a large number of genes and a small number of samples, it is crucial to perform genes selection before the classification step. One way to identify informative genes pointed in [1] is the test statistics. Researches show that the performance of supervised decisions based on selected gene expression can be comparable to the clinical decisions. However, no classification strategy is absolutely accurate. First, many factors may effectively decrease the predictive power of a multiclass problem. For example, findings of [2] imply that information useful for multiclass tumor classification is encoded in a complex gene expression and cannot be given by a simple one. Second, it is not possible to find an optimal classification method for all kinds of multiclass problems. Thus, supervised diagnosis are always considered as an important adjunct of traditional diagnostics and never like its substitute. Unfortunately, supervised diagnosis can be misleading. They may hinder patient care (wrong decision on a sick patient), add expense (wrong decision on a healthy patient) or confound the results of cancer categories. To overcome these limitations, a multi-SVM [3] classifier with class-selective rejection [4-7] is proposed. Class-selective rejection consists of rejecting some patients from one, some, or all classes in order to ensure a higher reliability while reducing time and expense costs. Moreover, any of the existing multiclass [8-10] algorithms have taken into consideration asymmetric penalties on wrong decisions. For example, in a binary cancer problem, a wrong decision on a sick patient must cost more than a wrong decision on a healthy patient. The proposed classifier handles this kind of problems. It minimizes a general loss function that takes into account asymmetric penalties dependant on each class and on each wrong or partially correct decision. The proposed method divides the multiple class problem into several unary classification problems and train one ν-1-SVM [11-13] coupled with its regularization path [14, 15] for each class. The winning class or subset of classes is determined using a prediction function that takes into consideration the costs asymmetry. The parameters of all the ν-1-SVMs are optimized jointly in order to minimize a loss function. Taking advantage of the regularization path method, the entire parameters searching space is considered. Since the searching space is widely extended, the selected decision rule is more likely to be the optimal one. The state-of-art multiclass algorithms [8-10] can be considered as a particular case of the proposed algorithm where the number of decisions is given by the existing classes and the loss function is defined by the Bayesian risk. Two experiments are reported in order to assess the performance of the proposed approach. The first one considers the proposed algorithm in the Bayesian framework and uses the selected microarray genes to make results comparable with existing ones. Performances are compared with those assessed using Naive Bayes, Nearest Neighbor, Linear Perceptron, Multilayer Perceptron, and Support Vector Machines classifiers, invoked in [1]. The second one shows the ability of the proposed algorithm solving multiclass cancer diagnosis in the class-selective rejection scheme. It minimizes an asymmetric loss function. Experimental results show that, a cascade of class-selective classifiers with class-selective rejections can be considered as an improved supervised diagnosis rule. This paper is outlined as follows. Section 2 presents a description of the model as a gene selection task. It introduces the multiclass cancer diagnosis problem in the class-selective rejection scheme. It also proposes a supervised training algorithm based on ν-1-SVM coupled with its regularization path. The two experiments are carried out in Section 3, results are reported, compared and discussed. Finally, a conclusion is presented in Section 4.

2. Models and Methods

This section describes the multiclass cancer diagnosis based on microarray data. Feature selection is evoked as a first process in a gene-based cancer diagnosis. Test statistics are used as a possible way for informative genes identification [1]. Once genes selection is processed, a classification problem should be solved. The multiclass cancer diagnosis problem, formulated in the general framework of class-selective rejection, is introduced. A solution based on ν-1-SVM [11-13] is proposed. First a brief description of ν-1-SVM and the derivation of its regularization path [14, 15] is presented. Second, the proposed algorithm [3] is explained. It allows to determine a multiclass cancer diagnosis that minimizes an asymmetric loss function in the class-selective rejection scheme.

2.1. Genes Selection Using Test Statistics

Gene profiles are successfully applied to supervised cancer diagnosis. Since cancer diagnosis problems are usually described by a small set of samples with a large number of genes, feature or gene selection is an important issue in analyzing multiclass microarray data. Given a microarray data with N tumor classes, n tumor samples and g genes per sample, one should identify a small subset of informative genes that contribute most to the prediction task. Various feature selection methods exist in literature. One way pointed in [1] is to use test statistics for the equality of the class means. Authors of [1] formulate first the expression levels of a given gene by a one-way analysis of variance model. Second, the power of genes in discriminating between tumor types is determined by a test statistic. The discrimination power is the value of the test evaluated at the expression level of the gene. The higher the discrimination power is, the more powerful the gene is in discriminating between tumor types. Thus, genes with higher power of discrimination are considered as informative genes. Let Y be the expression level from the pth sample of the jth class, the following general model is considered: In the model μ represents the mean expression level of the gene in class w, ϵ are independent random variables and E(ϵ) = 0, V(ϵ) = σ2 < ∞ for j = 1,…, N; p = 1,…, n. For the case of homogeneity of variances, the ANOVA F or F test [16] is the optimal one testing the means equality hypothesis. With heterogeneity of variances, the task is challenging. However, it is known that, with a large number of genes present, usually in thousands, no practical test is available to locate the best set of genes. Thus, the authors of [1] studied six different statistics. ANOVA F test statistic, the definition of this test is where and , . For simplicity, ∑ is used to indicate the sum taken over the index j. Under means equality hypothesis and assuming variance homogeneity, this test has a distribution of F [16]. Brown-Forsythe test statistic [17], given by Under means equality hypothesis, B is distributed approximately as F where Welch test statistic [18], defined as with ω = n/s2 and h = ω/∑ω. Under means equality hypothesis, W has an approximate distribution of F where Adjusted Welch test statistic [19]. It is similar to Welch statistic and defined to be where ω* = n/(Φs2) with Φ chosen such that 1 ≤ Φ ≤ (n − 1)/(n − 3) and h* = ω*/∑ω*. Under means equality hypothesis, W* has an approximate distribution of F where Cochran test statistic [20]. This test statistic is simply the quantity appearing in the numerator of the Welch test statistic W, that is, Under means equality hypothesis, C has an approximate distribution of χ2. Kruskal-Wallis test statistic. This is the well-known nonparametric test given by where R is the rank sum for the jth class. The ranks assigned to Y are those obtained from ranking the entire set of Y. Assuming each n ≥ 5, then under means equality hypothesis, H has an approximate distribution of χ2 [21]. These tests performances are evaluated and compared over different supervised learning methods applied to publicly available microarray datasets. Experimental results show that the model for gene expression values without assuming equal variances is more appropriate than that assuming equal variances. Besides, under heterogeneity of variances, Brown-Forsythe test statistic, Welch test statistic, adjusted Welch test statistic, and Cochran test statistic, perform much better than ANOVA F test statistic and Kruskal-Wallis test statistic.

2.2. Multitumor Classes with Selective Rejection

Once gene selection is processed, the classification problem should be solved. Let us define this diagnosis problem in the class-selective rejection scheme. Assuming that the multiclass cancer problem deals with N tumor classes noted w1 … w and that any patient or sample x belongs to one tumor class and has d informative genes, a decision rule consists in a partition Z of ℜ in I sets Z corresponding to the different decision options. In the simple classification scheme, the options are defined by the N tumor classes. In the class-selective rejection scheme, the options are defined by the N tumor classes and the subsets of tumor classes (i.e. assigning patient x to the subset of tumor classes {w1, w3} means that x is assigned to cancer categories w1 and w3 with ambiguity). The problem consists in finding the decision rule Z* that minimizes a given loss function c(Z) defined by where c is the cost of assigning a patient x to the ith decision option when it belongs to the tumor class w. The values of c being relative since the aim is to minimize c(Z), the values can be defined in the interval [0;1] without loss of generality. P is the a priori probability of tumor class w and P(D/w) is the probability that patients of the tumor class w are assigned to the ith option.

2.3. μ-1-SVM

To solve the multiclass diagnosis problem, an approach based on ν-1-SVM is proposed. Considering a set of m samples of a given tumor classes X = {x1, x2,…, x} drawn from an input space 𝒳, ν-1-SVM computes a decision function f(·) and a real number b in order to determine the region ℛ in 𝒳 such that f(x) − b ≥ 0 if the sample x ∈ ℛ and f(x) − b < 0 otherwise. The decision function f(·) is parameterized by λ = νm (with 0 ≤ ν < 1) to control the number of outliers. It is designed by minimizing the volume of ℛ under the constraint that all the samples of X, except the fraction ν of outliers, must lie in ℛ. In order to determine ℛ, the space of possible functions f(·) is reduced to a Reproducing Kernel Hilbert Space (RKHS) with kernel function K(·, ·). Let Φ : 𝒳 → ℋ be the mapping defined over the input space 𝒳. Let 〈·, ·〉 be a dot product defined in ℋ. The kernel K(·, ·) over 𝒳 × 𝒳 is defined by: Without loss of generality, K(·, ·) is supposed normalized such that for any x ∈ 𝒳, K(x, x) = 1. Thus, all the mapped vectors Φ(x), p = 1,…, m are in a subset of a hypersphere with radius one and center O. Provided K(·, ·) is always positive and Φ(X) is a subset of the positive orthant of the hypersphere. A common choice of K(·, ·) is the Gaussian RBF kernel K(x, x) = exp[−1/2σ2||x − x||2] with σ the parameter of the Gaussian RBF kernel. ν-1-SVM consists of separating the mapped samples in ℋ from the center O with a hyperplane 𝒲. Finding the hyperplane 𝒲 is equivalent to find the decision function f(·) such that f(x) − b = 〈w, Φ(x)〉 − b ≥ 0 for the (1 − ν)m mapped samples while 𝒲 is the hyperplane with maximum margin b/||w|| with w the normal vector of 𝒲. This yields f(·) as the solution of the following convex quadratic optimization problem: where ξ are the slack variables. This optimization problem is solved by introducing lagrange multipliers α. As a consequence to Kuhn-Tücker conditions, w is given by which results in The dual formulation of (13) is obtained by introducing Lagrange multipliers as A geometrical interpretation of the solution in the RKHS is given by Figure 1. f(·) and b define a hyperplane 𝒲 orthogonal to f(·). The hyperplane 𝒲 separates the Φ(x)s from the sphere center, while having b/||w|| maximum which is equivalent to minimize the portion 𝒮 of the hypersphere bounded by 𝒲 that contains the set {Φ(x) s.t. x ∈ ℛ}. Tuning ν or equivalently λ is a crucial point since it enables to control the margin error. It is obvious that changing λ leads to solve the optimization problem formulated in (16) in order to find the new region ℛ. To obtain great computational savings and extend the search space of λ, we proposed to use ν-1-SVM regularization path [14, 15]. Regularization path was first introduced by Hastie et al. [14] for a binary SVM. Later, Rakotomamojy and Davy [15] developed the entire regularization path for a ν-1-SVM. The basic idea of the ν-1-SVM regularization path is that the parameter vector of a ν-1-SVM is a piecewise linear function of λ. Thus the principle of the method is to start with large λ, (i.e., λ = m − ϵ) and decrease it towards zero, keeping track of breaks that occur as λ varies. As λ decreases, ||w|| increases and hence the distance between the sphere center and 𝒲 decreases. Samples move from being outside (non-margin SVs with α = 1 in Figure 1) to inside the portion 𝒮 (non-SVs with α = 0). By continuity, patients must linger on the hyperplane 𝒲 (margin SVs with 0 < α < 1) while their αs decrease from 1 to 0. αs are piecewise-linear in λ. Break points occur when a point moves from a position to another one. Since α is piecewise-linear in λ, f(·) and b are also piecewise-linear in λ. Thus, after initializing the regularization path (computing α by solving (16) for λ = m − ϵ), almost all the αs are computed by solving linear systems. Only for some few integer values of λ smaller than m, αs are computed by solving (16) according to [15]. Using simple linear interpolation, this algorithm enables to determine very rapidly the ν-1-SVM corresponding to any value of λ.

2.4. Multiclass SVM Based on μ-1-SVM

Given N classes and N trained ν-1-SVMs, one should design a supervised decision rule Z, moving from unary to multiclass classifier by assigning samples to a decision option. To determine the decision rule, first a prediction function should decide the winning option. A distance measure between x and the training class set w, using the ν-1-SVM parameterized by λ, is defined as follows: where θ is the angle delimited by w and the support vector as shown in Figure 1. cos(θ) is a normalizing factor which is used to make all the d (x) comparable.

Figure 1

Training data mapped into the feature space on a portion 𝒮 of a hypersphere.

Using ||Φ(x)|| = 1 in (17) leads to the following: Since the α are obtained by the regularization path for any value of λ, computing d is considered as an easy-fast task. The distance measure d(x) is inspired from [22]. When data are distributed in a unimodal form, the d(x) is a decreasing function with respect to the distance between a sample x and the data mean. The probability density function is also a decreasing function with respect to the distance from the mean. Thus, d(x) preserves distribution order relations. In such case, and under optimality of the ν-1-SVM classifier, the use of d(x) should reach the same performances as the one obtained using the distribution. In the simplest case of multiclass problems where the loss function is defined as the error probability, a patient x is assigned to the tumor class maximizing d(x). To extend the multiclass prediction process to the class-selective scheme, a weighted form of the distance measure is proposed. The weight β is associated to d. β reflects an adjusted value of the distance d according to the penalty associated with the tumor class w. Thus, introducing weights leads to treat differently each tumor class and helps solving problems with different costs c on the classification decisions. Finally, in the general case where the loss function is considered in the class-selective rejection scheme, the prediction process can be defined as follows: a blinded sample x is assigned to the ith option if and only if Thus, in contrast to previous multiclass SVMs, which construct the maximum margin between classes and locate the decision hyperplane in the middle of the margin, the proposed approach resembles more to the robust Bayesian classifier. The distribution of each tumor class is considered and the optimal decision is slightly deviated toward the class with the smaller variance. The proposed decision rule depends on , and vectors of σ, ν and β for j = 1,…, N. Tuning ν is the most time expensive task since changing ν leads to solve the optimization problem formulated in (16). Moreover, tuning ν is a crucial point, it enables to control the margin error. In fact, it was shown in [11] that this regularization parameter is an upper bound on the fraction of outliers and a lower bound on the fraction of the SVs. In [9, 23] a smooth grid search was supplied in order to choose the optimal values of . The N values νs were chosen equal to reduce the computational costs. However, this assumption reduces the search space of parameters too. To avoid this restriction, the proposed approach optimizes all the ν with j = 1,…, N corresponding to the Nν-1-SVMs using regularization path and consequently explores the entire parameters space. Thus the tuned ν are most likely to be the optimal ones. The parameter are set equals σ1 = σ2 = ⋯ = σ. The optimal vector of σ, λ and β, j = 1,…, N, is the one which minimizes an estimator of c(Z) using a validation set. Since the problem is described by a sample set, an estimator of c(Z) given by (11) is used: where and are the empirical estimators of P and P(D/w), respectively. The optimal rule is obtained by tuning λ, β and σ so that the estimated loss computed on a validation set is minimum. This is accomplished by employing a global search for λ and β and an iterative search over the kernel parameter. For each given value σ of the parameter kernels, ν-1-SVMs are trained using the regularization path method on a training set. Then the minimization of over a validation set is sought by solving an alternate optimization problem over λ and β which is easy since all ν-1-SVM solutions are easily interpolated from the regularization path. σ is chosen from a previously defined set of real numbers [σ0,…, σ] with s ∈ ℵ. Algorithm 1 elucidates the proposed approach.

Algorithm 1

Multiclass SVM minimizing an asymmetric loss function.

3. Experimental Results

In this section, two experiments are reported in order to assess the performance of the proposed approach. First, the cancer diagnosis problem is considered in the traditional Bayesian framework. Five gene expression datasets and five supervised algorithms are considered. Each gene dataset was selected using the six test statistics of [1]. The decisions are given by the possible set of tumor classes and the loss function is defined as the probability of error to make results comparable with those of [1]. Second, in order to show the advantages of considering the multiclass cancer diagnosis in class-selective rejection scheme, one gene dataset is considered and studied with an asymmetric loss function. A cascade of classifiers with rejection options is used to ensure a reliable diagnosis. For both experiments, the loss function was minimized by determining the optimal parameters β and λ for j = 1,…, N for a given kernel parameter and by testing different values of in the set [2−3, 2−2, 2−1, 20, 21, 22]. Finally, the decision rule which minimizes the loss function is selected.

3.1. Bayesian Framework

Five multiclass gene expression datasets leukemia72 [24], ovarian [25], NCI [26, 27], lung cancer [28] and lymphoma [29] were considered. Table 1 describes the five genes datasets. For each dataset, the six test statistics F, B, W, W*, C, and H were used to select informative genes.

Table 1

Multiclass gene expression datasets.

Dataset	Leukemia72	Ovarian	NCI	Lung cancer	Lymphoma
No. of gene	6817	7129	9703	918	4026
No. of sample	72	39	60	73	96
No. of class	3	3	9	7	9

The cancer diagnosis problem was considered in the traditional Bayesian framework. Decisions were given by the set of possible classes and loss function was defined by the error risk. This means that in (20) c are defined according to the Table 2. The performance of the proposed method was measured by evaluating its accuracy rate and it was compared to results obtained by the five predictors evoked in [1]: Naive Bayes, Nearest Neighbor, Linear Perceptron, Multilayer Perceptron Neural Network with five nodes in the middle layer, and Support Vector Machines with second-order polynomial kernel.

Table 2

Loss function cost matrix in the Bayesian framework.

				Patient class
		1	2	.	.	N

	1	0	1	·	·	1
	2	1	0	1		·
Prediction	·	·	·	·	·	·
	·	·		·	·	1
	N	1	·	·	1	0

To compute the generalization accuracy of the proposed classifier, Leave One Out (LOO) resampling method is used to divide a gene dataset of n patients into two sets, a set of n − 1 patients and a test set of 1 blinded patient. This method involves n separate runs. For each run, the first set of n − 1 samples is divided using 5 Cross-validation (5-CV) into a training set and a validation set. Nν-1-SVMs are trained using the training set for all values of ν. The decision is obtained by tuning the parameters β, λ and σ for j = 1,…, N so that the loss function computed on the validation set is minimum. Optimal parameters are then used to build the decision rule using the whole n − 1 samples. The blinded test set is classified according to this rule. The overall prediction error is the sum of the patients misclassified on all n runs. Table 3 reports errors of the proposed algorithm, the average value and the median value of the 5 classifiers prediction errors reported in [1] when 50 informative genes are used. Table 4 reports values when 100 informative genes are used. F, B, W, W*, C, and H represent the six test statistics.

Table 3

Prediction errors of the proposed classifier, mean and median values of the 5 classifiers prediction errors according to [1] with 50 informative selected genes.

		F	B	W	W*	C	H
Leukemia	Proposed algorithm	4	3	5	5	3	2
	Mean	3.4	2.4	2.8	2.8	3.2	3.0
	Median	3	2	3	3	3	3
Ovarian	Proposed algorithm	0	0	0	0	0	0
	Mean	0.2	0.0	0.0	0.0	0.0	0.0
	Median	0	0	0	0	0	0
NCI	Proposed algorithm	31	26	27	27	27	33
	Mean	36.0	32.0	27.4	26.0	27.0	35.4
	Median	35	29	27	27	27	35
Lung cancer	Proposed algorithm	14	16	16	16	16	15
	Mean	17.6	17.0	17.6	17.6	18.0	18.0
	Median	17	17	18	18	18	18
Lymphoma	Proposed algorithm	18	16	9	10	9	15
	Mean	23.8	19.8	14.0	14.0	12.8	22.0
	Median	23	19	12	12	13	20

Table 4

Prediction errors of the proposed classifier, mean and median values of the 5 classifiers prediction errors according to [1] with 100 informative selected genes.

		F	B	W	W*	C	H
Leukemia	Proposed algorithm	5	2	3	3	4	6
	Mean	3.4	3.0	3.0	3.0	3.2	3.0
	Median	3	3	4	3	3	3
Ovarian	Proposed algorithm	0	0	0	0	0	0
	Mean	0.2	0.0	0.0	0.0	0.0	0.0
	Median	0	0	0	0	0	0
NCI	Proposed algorithm	33	21	26	25	26	36
	Mean	33.0	22.6	23.8	25.2	25.2	31.6
	Median	33	22	25	26	26	31
Lung cancer	Proposed algorithm	11	10	11	11	11	13
	Mean	12.2	12.2	11.4	12.2	12.2	15.8
	Median	12	12	11	11	11	14
Lymphoma	Proposed algorithm	16	16	11	10	11	17
	Mean	21.8	19.2	13.0	13.8	14.4	18.2
	Median	17	16	12	12	12	18

Experimental results show that, for ovarian, NCI, lung cancer and lymphoma multiclass genes problems, the proposed approach achieves competitive performances compared to the 5 classifiers reported in [1]. For these datasets, prediction errors of the proposed approach are less than the mean and median values of the 5 classifiers prediction errors reported in [1]. However, for leukemia72, the proposed algorithm performances are almost in the same range of those provided by the 5 classifiers reported in [1]. The proposed approach prediction error is equal, or in the worst case, slightly higher than the mean and median errors. Moreover, we can note that focussing on the test statistics comparison, experimental results confirm those of [1]. B, W and W* can be the most performing tests under variances heterogeneity assumptions.

3.2. Class-Selective Rejection Framework

In the following, we present the study of lung cancer problem in the class-selective rejection scheme. Lung cancer diagnosis problem is determined by the gene expression profiles of 67 lung tumors and 6 normal lung specimens from patients whose clinical course was followed for up to 5 years. The tumors comprised 41 Adenocarcinomas (ACs), 16 squamous cell carcinomas (SCCs); 5 cell lung cancers (LCLCs) and 5 small cell lung cancers (SCLCs). ACs are subdivided into three subgroups 21 AC of group 1 tumors, 7 AC of group 2 tumors and 13 AC of group 3 tumors. Thus, the multiclass diagnosis cancer consists of 7 classes. Authors in [28] observed that AC of group 3 tumors shared strong expression of genes with LCLC and SCC tumors. Thus, poorly differentiated AC is difficult to distinguish from LCLC or SCC. Confusion matrices (Tables 5 and 6) computed in the Bayesian framework, with 50 W* and 50 H prove well these claims. It can be noticed that 8 of the 16 misclassified 50 W* patients and 8 of the 15 misclassified 50 H patients correspond to confusion between these three subcategories. Therefore, one may define a new decision option as a subset of these three classes to reduce error.

Table 5

Confusion matrix of 50 W* lung cancer dataset. Total of misclassified is equal to 16.

		Patient class
		Normal	SCLC	LCLC	SCC	AC2	AC3	AC1

Predicted decision	Normal	6	0	0	0	0	0	0
	SCLC	0	4	0	0	0	1	0
	LCLC	0	0	3	0	0	4	1
	SCC	0	0	0	16	0	3	0
	AC2	0	0	0	0	4	0	0
	AC3	0	1	1	0	1	4	0
	AC1	0	0	1	0	2	1	20

Table 6

Confusion Matrix of 50 H lung cancer dataset. Total of misclassified is equal to 15.

		Patient class
		Normal	SCLC	LCLC	SCC	AC2	AC3	AC1

Predicted decision	Normal	5	0	0	0	0	0	0
	SCLC	0	4	0	0	0	0	0
	LCLC	0	0	1	1	0	2	2
	SCC	0	0	2	14	0	1	0
	AC2	0	0	0	0	7	0	0
	AC3	0	0	2	1	0	8	0
	AC1	1	1	0	0	0	2	19

Moreover, same researches affirm that distinction between patients with nonsmall cell lung tumors (SCC, AC and LCLC) and those with small cell tumors or SCLC is extremely important, since they are treated very differently. Thus, a confusion or wrong decision among patients of nonsmall cell lung tumors should cost less than a confusion between nonsmall and small lung cells tumors. Besides, one may provide an extra decision option that includes all the subcategories of tumors to avoid this kind of confusion. Finally, another natural decision option can be the set of all classes, which means that the classifier has totally withhold taking a solution. Given all these information, the loss function can be empirically defined according to the asymmetric cost matrix given in Table 7. Solving 50 W* lung cancer problem in this scheme leads to the confusion matrix presented in Table 8. As a comparison with Table 5, one may mainly note that the number of misclassified patients decreases from 16 to 10 and 8 withhold decisions or rejected patients. This partial rejection contributes to avoid confusion between nonsmall and small lung cells tumors and reduces errors due to indistinctness among LCLC, SCC and AC3. Besides, according to the example under study, no patient is totally rejected. It is an expected result since initially (Table 5) there was no confusion between normal and tumor samples.

Table 7

Asymmetric cost matrix of the loss function.

		Patient class
		Normal	SCLC	LCLC	SCC	AC2	AC3	AC1

	Normal	0	1	1	1	1	1	1
	SCLC	1	0	1	1	1	1	1
	LCLC	1	1	0	0.9	0.9	1	1
	SCC	1	1	0.9	0	0.9	1	0.9
	AC2	1	1	0.9	0.9	0	0.9	0.9
Predicted decision	AC3	1	1	0.9	0.9	0.9	0	0.9
	AC1	1	1	0.9	0.9	0.9	0.9	0
	{LCLC, SCC, AC3}	1	1	0.6	0.6	0.9	0.2	0.9
	All tumors	1	0.2	0.6	0.6	0.2	0.2	0.5
	All classes	0.6	0.2	0.6	0.6	0.2	0.6	0.6

Table 8

Confusion matrix of the 50 W* lung cancer problem with class-selective rejection using cost matrix defined in Table 7. Total of misclassified is equal to 10, total of partially and totally rejected samples is equal to 8.

		Patient class
		Normal	SCLC	LCLC	SCC	AC2	AC3	AC1
	Normal	6	0	0	0	0	0	0

	SCLC	0	3	0	0	0	0	0
	LCLC	0	0	3	0	0	4	0
	SCC	0	0	0	16	0	2	0
Predicted decision	AC2	0	0	0	0	4	0	0
	AC3	0	0	0	0	1	3	0
	AC1	0	0	1	0	1	1	20
	{LCLC, SCC, AC3}	0	0	1	0	0	2	0
	All tumors	0	2	0	0	1	1	1
	All classes	0	0	0	0	0	0	0

To take a decision concerning the rejected patients, we may refer to clinical analysis. It is worth to note that for partially rejected patients, clinical analysis will be less expensive in terms of time and money than those on completely blinded patients. Moreover, a supervised solution can be also proposed. It aims to use genes selected from another test statistic in order to assign rejected patients to one of the possible classes. According to Tables 3 and 4, prediction errors computed on same patients using genes selected by different test statistics may decrease since errors of two different test statistics do not occur on the same patients. Thus, we chose 50 H lung cancer dataset to reclassify the 8 rejected patients of Table 8. Five of them were correctly classified while three remained misclassified. Results are reported in Table 9. The number of misclassified patients decreases to 13 which is less than all the prediction errors obtained with 50 informative genes (lung cancer problem prediction errors of Table 3). In fact, many factors play an important role in the cascade classifiers system such as the asymmetric costs matrix which has been chosen empirically, the choice of test statistics, the number of classifiers in a cascade system,…. Such concerns are under study.

Table 9

Confusion matrix of the cascade classifier (50 W* with rejection and 50 H classifier). Total of misclassified is equal to 13.

		Patient class
		Normal	SCLC	LCLC	SCC	AC2	AC3	AC1

Predicted decision	Normal	6	0	0	0	0	0	0
	SCLC	0	4	0	0	0	0	0
	LCLC	0	0	3	0	0	4	1
	SCC	0	0	0	16	0	2	0
	AC2	0	0	0	0	5	0	0
	AC3	0	1	1	0	1	6	0
	AC1	0	0	1	0	1	1	20

4. Conclusion

Cancer diagnosis using genes involve a gene selection task and a supervised classification procedure. This paper tackles the classification step. It considers the problem of gene-based multiclass cancer diagnosis in the general framework of class-selective rejection. It gives a general formulation of the problem and proposes a possible solution based on ν-1-SVM coupled with its regularization path. The proposed classifier minimizes any asymmetric loss function. Experimental results show that, in the particular case where decisions are given by the possible classes and the loss function is set equal to the error rate, the proposed algorithm, compared with the state of art multiclass algorithms, can be considered as a competitive one. In the class-selective rejection, the proposed classifier ensures higher reliability and reduces time and expense costs by introducing partial and total rejection. Furthermore, results prove that a cascade of classifiers with class-selective rejections can be considered as a good way to get improved supervised diagnosis. To get the most reliable diagnosis, the confusion matrix defining the loss function should be carefully chosen. Finding the optimal loss function according to performance constraints is an promising approach [30] which is actually under investigation.

11 in total

1. Estimating the support of a high-dimensional distribution.

Authors: B Schölkopf; J C Platt; J Shawe-Taylor; A J Smola; R C Williamson
Journal: Neural Comput Date: 2001-07 Impact factor: 2.026

2. Systematic variation in gene expression patterns in human cancer cell lines.

Authors: D T Ross; U Scherf; M B Eisen; C M Perou; C Rees; P Spellman; V Iyer; S S Jeffrey; M Van de Rijn; M Waltham; A Pergamenschikov; J C Lee; D Lashkari; D Shalon; T G Myers; J N Weinstein; D Botstein; P O Brown
Journal: Nat Genet Date: 2000-03 Impact factor: 38.330

3. A gene expression database for the molecular pharmacology of cancer.

Authors: U Scherf; D T Ross; M Waltham; L H Smith; J K Lee; L Tanabe; K W Kohn; W C Reinhold; T G Myers; D T Andrews; D A Scudiero; M B Eisen; E A Sausville; Y Pommier; D Botstein; P O Brown; J N Weinstein
Journal: Nat Genet Date: 2000-03 Impact factor: 38.330

4. A comparison of methods for multiclass support vector machines.

Authors: Chih-Wei Hsu; Chih-Jen Lin
Journal: IEEE Trans Neural Netw Date: 2002

5. Optimal decision rule with class-selective rejection and performance constraints.

Authors: Edith Grall-Maës; Pierre Beauseroy
Journal: IEEE Trans Pattern Anal Mach Intell Date: 2009-11 Impact factor: 6.226

6. Analysis of gene expression profiles in normal and neoplastic ovarian tissue samples identifies candidate molecular markers of epithelial ovarian cancer.

Authors: J B Welsh; P P Zarrinkar; L M Sapinoso; S G Kern; C A Behling; B J Monk; D J Lockhart; R A Burger; G M Hampton
Journal: Proc Natl Acad Sci U S A Date: 2001-01-30 Impact factor: 11.205

7. Multiclass cancer diagnosis using tumor gene expression signatures.

Authors: S Ramaswamy; P Tamayo; R Rifkin; S Mukherjee; C H Yeang; M Angelo; C Ladd; M Reich; E Latulippe; J P Mesirov; T Poggio; W Gerald; M Loda; E S Lander; T R Golub
Journal: Proc Natl Acad Sci U S A Date: 2001-12-11 Impact factor: 11.205

8. Distinct types of diffuse large B-cell lymphoma identified by gene expression profiling.

Authors: A A Alizadeh; M B Eisen; R E Davis; C Ma; I S Lossos; A Rosenwald; J C Boldrick; H Sabet; T Tran; X Yu; J I Powell; L Yang; G E Marti; T Moore; J Hudson; L Lu; D B Lewis; R Tibshirani; G Sherlock; W C Chan; T C Greiner; D D Weisenburger; J O Armitage; R Warnke; R Levy; W Wilson; M R Grever; J C Byrd; D Botstein; P O Brown; L M Staudt
Journal: Nature Date: 2000-02-03 Impact factor: 49.962

9. Diversity of gene expression in adenocarcinoma of the lung.

Authors: M E Garber; O G Troyanskaya; K Schluens; S Petersen; Z Thaesler; M Pacyna-Gengelbach; M van de Rijn; G D Rosen; C M Perou; R I Whyte; R B Altman; P O Brown; D Botstein; I Petersen
Journal: Proc Natl Acad Sci U S A Date: 2001-11-13 Impact factor: 11.205

10. Selecting genes by test statistics.

Authors: Dechang Chen; Zhenqiu Liu; Xiaobin Ma; Dong Hua
Journal: J Biomed Biotechnol Date: 2005-06-30

1 in total

1. Family-based genetic risk prediction of multifactorial disease.

Authors: Douglas M Ruderfer; Joshua Korn; Shaun M Purcell
Journal: Genome Med Date: 2010-01-15 Impact factor: 11.117

1 in total