Literature DB >> 27512621

Weighted K-means support vector machine for cancer prediction.

Abstract

To date, the support vector machine (SVM) has been widely applied to diverse bio-medical fields to address disease subtype identification and pathogenicity of genetic variants. In this paper, I propose the weighted K-means support vector machine (wKM-SVM) and weighted support vector machine (wSVM), for which I allow the SVM to impose weights to the loss term. Besides, I demonstrate the numerical relations between the objective function of the SVM and weights. Motivated by general ensemble techniques, which are known to improve accuracy, I directly adopt the boosting algorithm to the newly proposed weighted KM-SVM (and wSVM). For predictive performance, a range of simulation studies demonstrate that the weighted KM-SVM (and wSVM) with boosting outperforms the standard KM-SVM (and SVM) including but not limited to many popular classification rules. I applied the proposed methods to simulated data and two large-scale real applications in the TCGA pan-cancer methylation data of breast and kidney cancer. In conclusion, the weighted KM-SVM (and wSVM) increases accuracy of the classification model, and will facilitate disease diagnosis and clinical treatment decisions to benefit patients. A software package (wSVM) is publicly available at the R-project webpage (https://www.r-project.org).

Entities: Chemical Disease Gene Species

Keywords: K-means clustering; Support vector machine; TCGA; Weighted SVM

Year: 2016 PMID： 27512621 PMCID： PMC4960100 DOI： 10.1186/s40064-016-2677-4

Source DB: PubMed Journal: Springerplus ISSN： 2193-1801

Introduction

Cutting-edge microarray and sequencing techniques for transcriptome and DNA methylome have received increasing attentions to decipher biological processes and to predict the multi-causes of complex diseases [e.g., cancer diagnosis (Ramaswamy et al. 2001), prognosis (Vijver et al. 2002), and therapeutic outcomes (Ma et al. 2004)]. To this end, the supervised machine learning has considerably contributed to developing tools towards the translational and clinical application. For example, diverse biomarker panels on the basis of transcriptional expressions have been released [e.g. MammaPrint (van ’t Veer 2002), Oncotype DX (Paik et al. 2004), Breast Cancer Index BCI (Zhang et al. 2013) and PAM50 (Parker et al. 2009)] for survival, recurrence, drug response and disease subtypes. It is evident that effective prediction tasks advance clinical diagnosis tools that build on translating models from transcriptomic studies. In this standpoint, rapid and precise classification rules are imperative to support exploring disease-related biomarkers, diagnosis and sub-types identification, and to deliver meaningful information for tailored treatment and precision medicine. The support vector machine (SVM) was originally introduced by Cortes and Vapnik (1995). Over the decades, the SVM has been applied to a range of study fields, including pattern recognition (Kikuchia and Abeb 2005), disease subtype identification (Gould et al. 2014), pathogenicity of genetic variants (Kircher et al. 2014) and so on. In theory, the forte of the SVM is attributed to its flexibility and outstanding classification accuracy. However, the SVM relies on the quadratic programming (QP), whose computational complexity is commonly costly and subject to size of data. Some methods to circumvent this drawback (Wang and Wu 2005; Lee et al. 2007) were proposed to speed up its computation with minimizing loss of accuracy. Interestingly, Wang and Wu (2005) applied the SVM to centers of K-means clustering alone (KM-SVM). Due to small cluster size K, this method dramatically diminishes the number of observations, and hence can reduces the high-computational cost. The KM-SVM assumes that cluster centers adequately account for original data. This KM-SVM is also called the Global KM-SVM (Lee et al. 2007) in short. Similarly, Lee et al. (2007) also proposed so-called the By-class KM-SVM, where class labels separate samples into two groups at the outset, to which I apply K-means clustering respectively, while the Global KM-SVM, in contrast, employs a majority voting to determine class labels of respective centers. Not surprisingly, it is commonplace that the KM-SVM performs worsen than the standard SVM in most cases. In other words, the KM-SVM pursues computational efficiency at the expense of prediction accuracy. Yang et al. (2007) and Bang and Jhun (2014) proposed the weighted support vector machine and the weighted KM-SVM to improve accuracy in the context of the outlier sensitivity problem (i.e., WSVM-outlier). The primary idea is to assign weights to each data sample, which manipulates relative importance. It is proved that WSVM-outlier reduces the effect of outliers, and yields higher classification rates. Yet I notice that the WSVM-outlier solely adopts outlier-sensitive algorithms (e.g., a robust fuzzy clustering, kernel-based possibilistic c-means), that are only well-suited to adjusting outlier effects, but not always guarantees to perform best in general cases. It is, therefore, interesting to add other weight schemes applicable to general scenarios. Boosting is a machine learning ensemble algorithm, making it possible to reduce bias and variance, and to boost predictive power. More specifically, most boosting algorithms (Schapire 1990; Breiman 1998; Freund and Schapire 1997) iteratively glean weak classifiers, and incorporate them to a strong classifier. At each iteration, weak classifiers gain weights in some reasonable ways, and thereby subsequent weak learners focus more on samples that preceding weak learners mis-classified. Over the decades, many have introduced diverse boosting algorithms: Schapire (1990) originally proposed (a recursive majority gate formulation), and Mason et al. (2000) developed boost by majority. Interestingly, Freund and Schapire (1997) then developed AdaBoost.M1, an adaptive algorithm known to be superior to the previous ones. Taking all things into consideration, I proposed a new algorithm, the weighted KM-SVM (wKM-SVM) and weighted support vector machine (wSVM) to improve the KM-SVM (and SVM) via weights, together with the boosting algorithm. In this paper, I utilize AdaBoost.M1 (Freund and Schapire 1997) in place of the outlier-sensitive algorithms used in WSVM-outlier (Yang et al. 2007). The wKM-SVM (wSVM) adds weights to the hinge loss term, making it straightforward to derive the quadratic programming (QP) objective function, while the WSVM-outlier, to the contrary, directly maneuvers the penalization constant corresponding to each sample. Yang et al. (2007) hardly enables to grasp how each weight is implemented in optimization, whereas my proposed wKM-SVM (wSVM) can demonstrate the numerical relationship between the objective function and weights. The weighted KM-SVM (wKM-SVM) is universally applicable to many different data analysis scenarios, for which comprehensive experiments assess accuracy and provide comparisons with other methods. In this paper, I applied the proposed method to pan-cancer methylation data (https://tcga-data.nci.nih.gov/tcga/) including breast cancer (breast invasive carcinoma) and kidney cancer (kidney renal clear cell carcinoma). From simulations and real applications, the proposed wKM-SVM (wSVM) is shown to be more efficient in predictive power, as compared to the standard SVM and KM-SVM, including but not limited to many popular classification rules (e.g., decision trees and k-NN and so on). In conclusion, the wKM-SVM (and wSVM) increases accuracy of the classification model that will ultimately improve disease understanding and clinical treatment decisions to benefit patients. This paper is outlined as follows. In “Backgrounds” section, I review background studies in terms of the SVM and ensemble methods. In “Proposed methods” section, the weighted SVM algorithm is proposed. In “Numerical studies” section, I compare performance of my proposed methods with other methods, and claim biological implications from analysis of the TCGA pan-cancer data. In “Conclusion and discussion” section, conclusions and further studies are discussed.

Backgrounds

Support vector machine

Consider the data of , with and for , where denotes an input space. Let and be an input and class label of the nth sample. and denote the inner product and norm in . Define hyperplane by . A classification rule that builds on f(x) isCommonly, w and b are called the weight vector and bias. The optimal vector and bias can be obtained by solving the following quadratic optimization problem,subject to , for , where are slack variables and C is the regularization parameter. Note that (1) can be reformulated with the Wolfe dual form by introducing the Lagrange multipliers.where is the Lagrange multiplier with respect to for . is then the solution of (2). From the derivatives of the Lagrange equations, I see that the solution of f(x) as below:Importantly, () is a non-zero solution and its properties are induced by the Karush–Kuhn–Tucker conditions including boundary constraints. Taken together, the decision rule can be formed asFor nonlinear decision rules, a kernel method can be applicable with the inner product replaced by a nonlinear kernel, . For more details, see Cortes and Vapnik (1995).

K-means SVM

The support vector machine using the K-means clustering (KM-SVM) is the SVM algorithm sequentially combined with the K-means clustering. Importantly, it is believed that the K-means clustering is one of the most popular clustering methods. The following describes how to implement KM-SVM. I first divide samples of train data into several clusters by applying the K-means clustering. Given pre-defined K, the K-means clustering produces clusters . Class labels (i.e., or 1) of are assigned via majority voting (). Second, I build up a SVM classifier over derived cluster centers. It is interesting to note that the KM-SVM greatly cut down the number of data and support vectors used to estimate solutions, and so has the forte of computational efficiency. Wang and Wu (2005) originally introduced the prototype KM-SVM (Global KM-SVM). Due to its practical utilities, diverse KM-SVM-type classification rules have been proposed afterward (Gu and Han 2013; Lee et al. 2007). In this paper, I mainly focus on the KM-SVM methods proposed by Lee et al. (2007). Wang and Wu (2005) applies the K-means clustering to whole input data, while Lee et al. (2007) uses the K-means clustering to two sample groups independently separated by each class label (By-class KM-SVM). It is known that the By-class KM-SVM improves error rates, and efficiently circumvents the problem of imbalanced class labels.

Proposed methods

Weighted support vector machine

In this section, I newly introduce the weighted SVM that can accommodate some weights. The previous weighted SVMs (Yang et al. 2007; Bang and Jhun 2014) directly maneuver the penalization constant corresponding to each sample. With these strategies, I hardly grasp how each weight plays a role in optimization, leading to challenges to verify the numerical relationship between the objective function and weights. To the contrary, my proposed method adds weights to the hinge loss term, making it tractable to derive the quadratic programming (QP) objective function, and to impose weights to the hinge loss. In short, I call this the weighted KM-SVM (wKM-SVM) henceforth. In what follows, I formulate the SVM objective function with the penalization form:where is a weight of the nth sample. This penalization form with is the same assubject to the constraints . Consider the soft margin SVM. Letandwhere and . Equivalence between (4) and (5) is proved in Lemmas 1 and 2.

Lemma 1

Letforand. Then, I getsubject to and for. The details of the proof are presented in Additional file1. The weighted KM-SVM (or SVM) with the boosting algorithm

Lemma 2

Letbe the minimizer ofsubjectto (3.5). I obtainHence, (4) is derived by optimizing (5) with respect to. See the details of the proof in Additional file1.

Solutions for weighted SVM

In this section, I derive the solution of the weighted SVM. I adopt the quadratic programming (QP) to solve for some ,subject to the constraints and for . Consider the LagrangianWith a little of algebra, I can build the Wolfe dual form to estimate the weight term w and b, and it is enough to solve the dual problem as below:subject to and for . See the details of the proof in Additional file 1.

Weighted KM-SVM with an ensemble technique

Generally it is known that the KM-SVM boosts computational efficiency at the expense of prediction accuracy. Such low accuracy of KM-SVM can be overcome with importing ensemble methods (e.g., boosting Schapire 1990; Breiman 1998), and these ensemble methods can be applicable to the standard SVM as well. In this paper, I make use of AdaBoost.M1 introduced by Freund and Schapire (1997). In principle, AdaBoost.M1 increases weights to mis-classified samples. At each boosting iteration, weighted weak classifiers are stacked by samples, and produces integrated classification rules by majority voting. Simply put, the weighted KM-SVM (and wSVM) is more of applying boosting to weights in order to add an artificial impact to mis-classified samples. The following is the weighted KM-SVM (and wSVM) objective function (3):The weight is updated via where and . Table 1 summarizes the algorithm of the weighted KM-SVM (and SVM) with the boosting method. At each iteration , I fit a KM-SVM weak classifier together with the weighted term as in Step 2-(1). The weighted error rate () is then calculated in Step 2-(2). In Step 2-(3), I calculate the weight constant given . It is worthwhile to note that weights of clustering centers misclassified by increases by exp(). In other words, serves to adjust relative importance of misclassified samples. In Step 2-(4), I finalize the classifier G(x) by integrating all weak classifiers via majority voting.

Table 1

The weighted KM-SVM (or SVM) with the boosting algorithm

1. Initialize the weight \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$c_n$$\end{document}cn with \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\frac{1}{N}$$\end{document}1N.
2. For \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$m=1$$\end{document}m=1 to M:
(1) Fit a KM-SVM (or SVM) \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$G_m(x)$$\end{document}Gm(x) with weights \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$c_n$$\end{document}cn to clustering centers of train data.
(2) Compute
\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$err_m = \frac{\sum _{n=1}^N c_n I \big (y_n \ne G_m(x_n) \big ) }{\sum _{n=1}^N w_n }$$\end{document}errm=∑n=1NcnI(yn≠Gm(xn))∑n=1Nwn
(3) Compute \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\alpha _m = \text {log} \big ( \frac{1-err_m}{ err_m } \big )$$\end{document}αm=log(1-errmerrm)
(4) Set \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$c_n \leftarrow c_n \cdot \text {exp} \big [ \alpha _m \cdot I (y_n \ne G_m(x_n) ) \big ]$$\end{document}cn←cn·exp[αm·I(yn≠Gm(xn))]
3. Output \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$G(x) = \text {Sign}\Big [ \sum _{m=1}^M \alpha _m G_m(x) \Big ]$$\end{document}G(x)=Sign[∑m=1MαmGm(x)].

Numerical studies

Simulated data

In this section, I examine predictive performance of the weighted KM-SVM (and SVM) with boosting. Below I briefly illustrate how I generate simulated data. Let be the binary variable of the sample (; for ), and be a matrix of two predictor variables randomly generated from the bivariate normal distribution, where and for , and . With the simulation scheme above, I generated samples for train data and samples for test data. The regularization parameter C was chosen by 5-fold cross-validation over from train data, and the radial based kernel was applied with (a.k.a. free parameter). The number of clusters (K) is defined by half size of train data. Making use of the weighted KM-SVM (and SVM) fitted by the optimal parameter, I calculated error rates of test data. The experiment to generate test error rates (= error rates of test data) was repeated 1000 times and average values are presented in Fig. 1a, b. The test error rates were benchmarked to compare with other classification rules. In Fig. 1a, I first observe that the SVM (= 0.265) performs better than the KM-SVM (= 0.291) in accuracy. This is consistent with previous experimental knowledge (Lee et al. 2007). In addition, I notice that the weighted KM-SVM (= 0.278) (and SVM = 0.265) considerably improves the non-weighted KM-SVM (= 0.291) (and SVM = 0.26). Generally, the SVM is believed to be superior to many popular prediction rules. In this simulation, I consider CART (Breiman et al. 1984), kNN (Altman et al. 1992) and Random forest (Ho 1998) for comparison with the family of SVM classifiers. In Fig. 1a, the weighted SVM performs best among all of classification rules. Moreover, it is remarkable to see that the weighted KM-SVM performs better than CART and Random forest despite its data reduction. Figure 2a, b illustrate how the proposed methods reduce error rates as iterated. The test error rates dramatically drop after the first few iterations, and hence boosting evidently helps increasing accuracy. In Fig. 1b, the declining pattern of test error rates are presented as r (i.e., a parameter for for ) increases in size ranging from 0.6 to 1.4. It is clear to say that the weighted KM-SVM (and wSVM) is consistently better than the KM-SVM (and SVM).

Fig. 1

Fig. 2

a Test error rates of the weighted SVM as the boosting increases in iteration. b Test error rates of the weighted KM-SVM as the boosting increases in iteration

Performance comparisons across different classification rules. Each dot represents the averaged values of repeated simulations, and the bars overlaid with dots represent standard errors. a Prediction errors of six different classification rules, b decreasing patterns of test error rates as r (coordinates of centers) increases in value a Test error rates of the weighted SVM as the boosting increases in iteration. b Test error rates of the weighted KM-SVM as the boosting increases in iteration Shown are the brief descriptions of the nineteen microarray datasets of disease-related binary phenotypes (e.g., case and control). All datasets are publicly available

Application to genomic data

Below I demonstrate applications to two real methylation expression profiles for breast and kedney cancer. TCGA cancers data (Level 3 DNA methylation of beta values targeting on methylated and the unmethylated probes) from the TCGA database (https://tcga-data.nci.nih.gov/tcga/), where I retrieved methylation data of two cancer types (Breast carcinoma (BRCA), Kidney renal clear cell carcinoma (KIRC). I matched up features across all studies and filtered out probes by the rank sum of mean and standard deviation (Wang et al. 2012) (mean <0.7, SD <0.7), which leaves 910 probes. Table 2 describes details of TCGA data. In this application, I pose a hypothetical question if the proposed methods (wKM-SVM and wSVM) can improve accuracy for cancer prediction. To this end, I first randomly split the whole data set into two parts with approximately same size, which I denote as train and test data. The number of clusters (K) is defined by half size of train data. I examined by the test set weighted KM-SVM’s (and wSVM’s) performance using each SVM constructed by the train set. Similar to simulation studies, I observe that the weighted KM-SVM (and wSVM) outperforms the standard KM-SVM (and SVM) in prediction accuracy. It is also notable that the weighted KM-SVM (and wSVM) better performs than CART, kNN and Random Forest, as shown in Fig. 3a, b. Therefore, I conclude that the proposed weighted SVM can facilitate cancer prediction with enhanced accuracy.

Table 2

Shown are the brief descriptions of the nineteen microarray datasets of disease-related binary phenotypes (e.g., case and control). All datasets are publicly available

Name	Study	Type	# of samples	Control	Case	# of matched genes	Reference
BRCA	Breast cancer	Methylation	343	27	316	10,121	The Cancer Genome Atlas (TCGA)
KIRC	Kidney cancer	Methylation	418	199	219	10,121	The Cancer Genome Atlas (TCGA)

Fig. 3

a Performance comparisons of breast cancer data across different classification rules, b performance comparisons of kidney cancer data across different classification rules

Conclusion and discussion

In this paper, I propose the new algorithm for the weighted KM-SVM to improve prediction accuracy. Typically, the KM-SVM has higher error rate than that it appears in the SVM, due to data reduction. To circumvent this issue, I suggest the weighted KM-SVM (and SVM) and evaluated performance of each of classifiers through various experimental scenarios. Putting together, I conclude that the proposed weighted KM-SVM (and SVM) is effective to diminish its error rates. In particular, I applied the weighted KM-SVM (and SVM) to TCGA cancer methylation data, and found its improved performance for disease prediction. Due to high accuracy, the weighted KM-SVM (and wSVM) can be widely used to facilitate predicting the complex diseases and therapeutic outcomes. Looking beyond this scope, this precise classification rule advances the upcoming horizon in pursuit of precision medicine, as it is urgently required in the bio-medical field to identify relations between bio-molecular units and clinical phenotype patterns (e.g., candidate biomarker detection, disease subtypes identification and associated biological pathways). The KM-SVM, however, does not involve size of clusters (i.e., the number of samples that belong to a cluster), and so clustering centers may not suitably represent original data structures. This weakness point may potentially results in poor prediction. For future work, I may suggest a new weighting scheme in proportion to size of clusters to improve more in accuracy. I leave this idea to next study.

10 in total

1. A multigene assay to predict recurrence of tamoxifen-treated, node-negative breast cancer.

Authors: Soonmyung Paik; Steven Shak; Gong Tang; Chungyeul Kim; Joffre Baker; Maureen Cronin; Frederick L Baehner; Michael G Walker; Drew Watson; Taesung Park; William Hiller; Edwin R Fisher; D Lawrence Wickerham; John Bryant; Norman Wolmark
Journal: N Engl J Med Date: 2004-12-10 Impact factor: 91.245

2. Gene expression profiling predicts clinical outcome of breast cancer.

Authors: Laura J van 't Veer; Hongyue Dai; Marc J van de Vijver; Yudong D He; Augustinus A M Hart; Mao Mao; Hans L Peterse; Karin van der Kooy; Matthew J Marton; Anke T Witteveen; George J Schreiber; Ron M Kerkhoven; Chris Roberts; Peter S Linsley; René Bernards; Stephen H Friend
Journal: Nature Date: 2002-01-31 Impact factor: 49.962

3. Multiclass cancer diagnosis using tumor gene expression signatures.

Authors: S Ramaswamy; P Tamayo; R Rifkin; S Mukherjee; C H Yeang; M Angelo; C Ladd; M Reich; E Latulippe; J P Mesirov; T Poggio; W Gerald; M Loda; E S Lander; T R Golub
Journal: Proc Natl Acad Sci U S A Date: 2001-12-11 Impact factor: 11.205

4. A two-gene expression ratio predicts clinical outcome in breast cancer patients treated with tamoxifen.

Authors: Xiao-Jun Ma; Zuncai Wang; Paula D Ryan; Steven J Isakoff; Anne Barmettler; Andrew Fuller; Beth Muir; Gayatry Mohapatra; Ranelle Salunga; J Todd Tuggle; Yen Tran; Diem Tran; Ana Tassin; Paul Amon; Wilson Wang; Wei Wang; Edward Enright; Kimberly Stecker; Eden Estepa-Sabal; Barbara Smith; Jerry Younger; Ulysses Balis; James Michaelson; Atul Bhan; Karleen Habin; Thomas M Baer; Joan Brugge; Daniel A Haber; Mark G Erlander; Dennis C Sgroi
Journal: Cancer Cell Date: 2004-06 Impact factor: 31.743

5. A gene-expression signature as a predictor of survival in breast cancer.

Authors: Marc J van de Vijver; Yudong D He; Laura J van't Veer; Hongyue Dai; Augustinus A M Hart; Dorien W Voskuil; George J Schreiber; Johannes L Peterse; Chris Roberts; Matthew J Marton; Mark Parrish; Douwe Atsma; Anke Witteveen; Annuska Glas; Leonie Delahaye; Tony van der Velde; Harry Bartelink; Sjoerd Rodenhuis; Emiel T Rutgers; Stephen H Friend; René Bernards
Journal: N Engl J Med Date: 2002-12-19 Impact factor: 91.245

6. Supervised risk predictor of breast cancer based on intrinsic subtypes.

Authors: Joel S Parker; Michael Mullins; Maggie C U Cheang; Samuel Leung; David Voduc; Tammi Vickery; Sherri Davies; Christiane Fauron; Xiaping He; Zhiyuan Hu; John F Quackenbush; Inge J Stijleman; Juan Palazzo; J S Marron; Andrew B Nobel; Elaine Mardis; Torsten O Nielsen; Matthew J Ellis; Charles M Perou; Philip S Bernard
Journal: J Clin Oncol Date: 2009-02-09 Impact factor: 44.544

7. Breast cancer index identifies early-stage estrogen receptor-positive breast cancer patients at risk for early- and late-distant recurrence.

Authors: Yi Zhang; Catherine A Schnabel; Brock E Schroeder; Piiha-Lotta Jerevall; Rachel C Jankowitz; Tommy Fornander; Olle Stål; Adam M Brufsky; Dennis Sgroi; Mark G Erlander
Journal: Clin Cancer Res Date: 2013-06-11 Impact factor: 12.531

8. Detecting disease-associated genes with confounding variable adjustment and the impact on genomic meta-analysis: with application to major depressive disorder.

Authors: Xingbin Wang; Yan Lin; Chi Song; Etienne Sibille; George C Tseng
Journal: BMC Bioinformatics Date: 2012-03-29 Impact factor: 3.169

9. A general framework for estimating the relative pathogenicity of human genetic variants.

Authors: Martin Kircher; Daniela M Witten; Preti Jain; Brian J O'Roak; Gregory M Cooper; Jay Shendure
Journal: Nat Genet Date: 2014-02-02 Impact factor: 38.330

10. Multivariate neuroanatomical classification of cognitive subtypes in schizophrenia: a support vector machine learning approach.

Authors: Ian C Gould; Alana M Shepherd; Kristin R Laurens; Murray J Cairns; Vaughan J Carr; Melissa J Green
Journal: Neuroimage Clin Date: 2014-09-18 Impact factor: 4.881

10 in total

4 in total

Review 1. Applications of Support Vector Machine (SVM) Learning in Cancer Genomics.

Authors: Shujun Huang; Nianguang Cai; Pedro Penzuti Pacheco; Shavira Narrandes; Yang Wang; Wayne Xu
Journal: Cancer Genomics Proteomics Date: 2018 Jan-Feb Impact factor: 4.069

2. Genomic Analysis Reveals the Prognostic and Immunotherapeutic Response Characteristics of Ferroptosis in Lung Squamous Cell Carcinoma.

Authors: Yinhe Feng; Xingyu Xiong; Yubin Wang; Ding Han; Chunfang Zeng; Hui Mao
Journal: Lung Date: 2022-05-05 Impact factor: 2.584

3. Meta-analytic support vector machine for integrating multiple omics data.

Authors: SungHwan Kim; Jae-Hwan Jhong; JungJun Lee; Ja-Yong Koo
Journal: BioData Min Date: 2017-01-26 Impact factor: 2.522

4. PDAC-ANN: an artificial neural network to predict pancreatic ductal adenocarcinoma based on gene expression.

Authors: Palloma Porto Almeida; Cristina Padre Cardoso; Leandro Martins de Freitas
Journal: BMC Cancer Date: 2020-01-31 Impact factor: 4.430

4 in total