Literature DB >> 16046820

Classification and selection of biomarkers in genomic data using LASSO.

Abstract

High-throughput gene expression technologies such as microarrays have been utilized in a variety of scientific applications. Most of the work has been done on assessing univariate associations between gene expression profiles with clinical outcome (variable selection) or on developing classification procedures with gene expression data (supervised learning). We consider a hybrid variable selection/classification approach that is based on linear combinations of the gene expression profiles that maximize an accuracy measure summarized using the receiver operating characteristic curve. Under a specific probability model, this leads to the consideration of linear discriminant functions. We incorporate an automated variable selection approach using LASSO. An equivalence between LASSO estimation with support vector machines allows for model fitting using standard software. We apply the proposed method to simulated data as well as data from a recently published prostate cancer study.

Entities: Chemical Disease Gene Species

Year: 2005 PMID： 16046820 PMCID： PMC1184048 DOI： 10.1155/JBB.2005.147

Source DB: PubMed Journal: J Biomed Biotechnol ISSN： 1110-7243

INTRODUCTION

DNA microarrays simultaneously gauge the expression of thousands of genes in clinical samples. In this paper, we focus on cancer studies, where gene expression technologies have been applied extensively (Alizadeh et al [1]; Khan et al [2]; Dhanasekaran et al [3]). Obtaining large-scale gene expression profiles of tumors should theoretically allow for the identification of subsets of genes that function as prognostic disease markers or biologic predictors of therapeutic response. Because the data are highly multivariate and complex, it is important to develop automated statistical methods to detect systematic signals in gene expression patterns. In cancer studies, analyses have typically focused on one of three problems. First, investigators have looked for genes that discriminate neoplastic from benign tissue. Statistically, this is the problem assessing differential expression of genes and has been studied by several authors; see, for example, Efron et al [4]. A second problem is clustering the samples to find subtypes of disease using algorithms such as those in [5]. The final class of problems is classification or supervised learning, which involves using the profile to predict some clinical outcome, such as the stage of disease. Suppose that in this instance, we treat the gene expression profile as the independent variables and tissue type as the response. A particular feature of microarray experiments is that the dimension of the predictor space (number of genes) is typically larger than the number of samples. This is known as the “large p, small n” paradigm (West [6]), so classification methods must take this into account. One method to do this is apply prefiltering criteria in which the candidate number of genes for building a classifier is smaller than the number of samples. For example, Dudoit et al [7] performed a systematic comparison of several discrimination methods for the classification of tumors based on microarray experiments. However, they must perform an initial reduction in the number of predictors before building the classifier. We wish to consider the joint effects of genes in determining classification rules for discriminating tumors. There are two assumptions that drive our proposed methodology. First, we assume that the joint effects of multiple genes must be considered in discriminating classes of disease. Recently, much attention has been given to the finding that a 70-gene signature can predict breast cancer survival (van't Veer et al [8]; van de Vijver et al [9]). However, most such gene signatures have been constructed using univariate methods. It seems reasonable to consider joint models, as genes are correlated because of their mutual involvement in disease pathways. The second assumption is that there are individual genes that can discriminate classes. This is different from the latent factor and partial least squares proposals put forth by other authors (West [6]; Nguyen and Rocke [10]), where linear combinations of all available genes are used to predict the outcome. We seek to develop interpretable models for classification; for this purpose, using individual genes for predictors rather than linear combinations of genes seems reasonable. In this paper, we develop classification rules based on the consideration of measures of diagnostic accuracy. In particular, we are interested in finding gene expression profiles that can discriminate between two populations. A unique challenge is posed because of the large p, small n problem. Our solution is to combine the problems of variable selection and classification. We suggest an approach for classification using the LASSO approach (Tibshirani [11]). An advantage of this approach is that some of the effects of the variables in these models are estimated to be exactly zero. These will represent genes that have no discriminatory power between the two classes, while those with nonzero coefficients will represent genes that can separate classes of tumors successfully. Thus, a by-product of the approach is the generation of a gene list. We exploit an equivalence between LASSO and support vector machines (SVMs) in order to fit the proposed classifier. The structure of the paper is as follows. In “materials and methods,” we provide background on the data structures observed and the motivation based on biomarker combinations, which leads to the use of linear discriminant functions. We also provide a review of LASSO estimation (Tibshirani [11]) in this section. The latter two techniques are then involved in the proposed estimation procedure, described in “results and discussion.” There, we also describe how to implement the proposed method using software for SVMs. Issues of model selection are also discussed. We describe the application of the proposed methodologies to simulated data and data from a recent cancer profiling study (Dhanasekaran et al [3]) in “prostate cancer gene expression data.” Finally, some concluding remarks are made in “conclusion.”

MATERIALS AND METHODS

Let a denote the transpose of the vector a. For the ith sample (i = 1, . . ., n), we let X = [X . . . X] denote the p × 1 gene expression profile vector (ie, X is the gene expression measurement of the jth gene, j = 1, . . ., p). We suppose that the data have already been preprocessed and normalized. In addition, it is assumed that the gene expression data are standardized so that for each gene, the mean is zero and standard deviation one. Let g denote the tumor class for the ith sample (i = 1, . . ., n); we assume that there are two classes so that g takes values g ∈ {0, 1}. Here and in the sequel, we will refer to g = 1 as the diseased class and g = 0 as the healthy class; however, the methods proposed here are applicable to any two-class setting. In “LASSO estimation,” we assume the existence of a continuous response variable Y for the ith sample (i = 1, . . ., n).

ROC curves and optimal biomarker combinations

Our approach is to consider each measurement from a microarray for a single gene as a diagnostic test. Thus, for each subject, we have a high-dimensional vector of diagnostic test results. We then want to utilize this information in a way to separate the two populations of patients. This issue of finding combinations of biomarkers to accurately classify patients has been considered by Su and Liu [12], Baker [13], and Pepe and Thompson [14] in the statistical literature. To combine information across the high-dimensional vector of gene expression profiles, we consider linear combinations of the form , i = 1, . . ., n. Without loss of generality, we will also assume that larger values of this linear combination corresponding to increasing likelihood of having g = 1. While the method can be easily extended to incorporate interactions between gene expression measurements, we focus on consideration of the main effects for purposes of exposition. Suppose X represents the gene expression profile for a typical cancer specimen (ie, g = 1), and X is the corresponding profile for a randomly chosen benign specimen. Note that in our situation, the diagnostic test is the linear combination . One relevant quantity is the false positive rate based on a cutoff c, defined to be . Similarly, the true positive rate is . The true and false positive rates can be summarized by the receiver operating characteristic (ROC) curve, which is a graphical presentation of {FP(c), TP(c) : −∞ < c < ∞}. The ROC curve shows the tradeoff between increasing true and false positive rates. Tests that are have {FP(c),TP(c)} values close to (0,1) indicate perfect discriminators, while those with {FP(c),TP(c)} values close to the 45° line in the (0, 1) × (0, 1) plane are tests that are unable to discriminate between the diseased and healthy populations. Examples of ideal and noninformative ROC curves are given in Figures 1a and 1b.

Figure 1

Receiver operating characteristic (ROC) curves for (a) ideal and (b) noninformative tests.

While the specificity and sensitivity of a diagnostic test depend on the cutoff value chosen, a useful summary measure to consider is the area under the ROC curve. It can be shown mathematically that the area under curve is (Bamber [15]). Under a binormal probability model, Su and Liu [12] showed that this quantity is optimized using the linear discriminant function. This motivates our choice of consideration of these variables. We next present an algorithm for estimation of these functions.

Linear discriminant functions by optimal scoring

While linear discriminant analysis (LDA) is typically calculated using matrix algebra techniques, an alternative method of calculating them is through the use of optimal scoring (Hastie et al [16, 17]). In this method, the problem of classification into two groups is reexpressed as a regression problem based on quantities known as optimal scores. The point of optimal scoring is to turn the categorical class labels into quantitative variables. Let (g) = [θ(g), . . ., θ(g)] be the n × 1 vector of quantitative scores assigned to g for the kth class. The optimal scoring problem involves finding the vector of coefficients ≡ (η, η, . . ., η) and the scoring map θ : {0, 1} → R that minimize the following average squared residual: Let Z be an n × 2 matrix with the ith row equal to (1, 0) if g = 1 and (0, 1) if g = 0 (i = 1, . . ., n). The optimal scores are assumed to be mutually orthogonal and normalized with respect to an inner product. Thus, the minimization of (1) is subject to the constraint N‖ZΘ‖2 = 1, where Θ = [θ(0) θ(1)] is a 2 × 1 vector of the optimal scores. Hastie et al [16] state that the minimization of this constrained optimization problem leads to estimates of that are proportional to the discriminant variables (ie, the discriminant function) in LDA. In particular, they propose the following algorithm for the estimation of the LDA functions Choose an initial score matrix Θ0 satisfying , where D = ZZ/n. Let . Let X be the n × p matrix with ith row X. Fit a linear regression model of on X, yielding fitted values . Let be the vector of fitted regression functions. Obtain the eigenvector matrix Φ of ; the optimal scores are then Θ* = Θ0Φ. Define . As mentioned before, a problem with attempting to apply standard linear discriminant function methods to the data here is that there is not a numerically unique solution because p is larger than n. Thus, some type of regularization is needed. Our approach is based on the LASSO, which is described in the next section.

LASSO estimation

We suppose that our data are (Y, X), where Y (i = 1, . . ., n) is a continuous variable. The LASSO solution is to the optimization problem of minimizing where β = (β1, . . ., β) and λ ≥ 0 is a penalty term. Thus, the constraint that is utilized is an L constraint. An alternative way of formulating (2) is to minimize , subject to the constraint that . Note that in the absence of the constraint, the solution is given by the ordinary least squares (OLS) estimator. If the usual OLS estimator satisfies the constraint, then the LASSO and OLS estimates of β coincide. However, for smaller values of t, some of the components of β are estimated to be zero. In the linear regression setting, LASSO estimation has been considered by Tibshirani [11]. For a given value of t, minimization of subject to an L constraint on the components of β is a quadratic programming problem with 2 linear equality constraints. A sequential algorithm is given by Tibshirani [11] to solve the optimization problem. While Tibshirani [11] considered estimating coefficients in regression models using LASSO, our interest is in using gene expression data to classify tumors. In particular, we seek to extend the LDA approach advocated by Dudoit et al [7] to handle the case where p is larger than n. We outline the proposed method in the next section.

Estimation methods

We propose to use an optimal scoring procedure for classification, where LASSO estimation is incorporated. In the notation of the previous section, we wish to solve the following optimization problem. Minimize subject to the constraint N‖ZΘ‖2 = 1. Here is the outline for our procedure. Note that we are incorporating the LASSO estimation procedure in step (2) of the algorithm. We cannot use the algorithm of Tibshirani [11] because it is too computationally intensive for large p (number of genes). However, it turns out that the algorithm can be fit using standard software for SVMs, which we will now describe. Choose an initial score matrix Θ0 satisfying , and let Θ0 = ZΘ. Fit a linear regression model of Θ0 on X subject to an L1 constraint on the parameters. Define the fitted values . Let be the vector of fitted regression functions. Obtain the eigenvector matrix Φ of ; the optimal scores are Θ= Θ0Φ. Define .

Support vector machines

An excellent descriptions of SVMs for classification can be found in [18]. We provide an overview of the method here. We assume that the data are {x, y} (i = 1, . . ., n), where x is a d-dimensional vector and y ∈ {−1, +1} is the class label. The goal of SVMs is to find an optimal separating hyperplane between the observations with y = −1 and those with y = 1. This problem can be expressed as minimizing ‖w‖2 subject to the following constraints: Details on how to solve the optimization problem can be found in [18, chapter 7]. In the unregularized case, fitting the LASSO model is equivalent to fitting an SVM classifier with the following 2p × 1 n-dimensional vectors as the inputs: g, Y and −Y (k = 1, . . ., p), defined to be the sample labels, gene expression values and their negative values for the kth gene across the n samples. The label is the vector y0, defined to be −1 for the first entry and 1 for the other entries. The proof of the equivalence is given in the “appendix.” We have created a macro in R (R foundation) that implements the proposed method and can be obtained from the first author. As mentioned earlier, an advantage of this approach is that most of the gene effects are estimated to be exactly zero. The method can also identify the genes associated with each of the two classes. Genes whose coefficients are negative are associated with the class g = −1, while those with positive estimated coefficients are associated with g = 1. As is evident in the algorithm from the previous section or in (3), the parameter λ needs to be estimated. We use fivefold cross-validation for this.

RESULTS AND DISCUSSION

Simulated data

We first performed a set of simulations to determine how well the proposed methods were at classification. We generated p = 1000 dimensional vectors for two populations. We considered the following sample size combinations (n0, n1) = (15, 15), (20, 10), (50, 50), and (70, 30), where n is the number of samples in the group with g = k (k = 0, 1). All the genes were assumed to be independent with a normal distribution and variance 1. We assumed a model in which a fraction π of the genes was differentially expressed between the two classes, π = 0.05 and π = 0.5 were considered. We examined two scenarios. For the first scenario, there was a big change in differential expression in the differentially expressed genes, a shift of 5 units in the mean. In the second scenario, the fold change was only a 1.5 unit difference in mean. For each simulation setting, 100 datasets were generated, and the classification error rates were estimated using three-fold cross-validation. No optimization was performed; we set λ = 10. The results are summarized in Table 1. Based on the table, we find that for larger sample sizes and larger effect sizes, as well as larger numbers of effects, the error rates are smaller.

Table 1

Classification error rates (x 100) from simulation study. Numbers in parentheses represent standard errors associated with misclassification rates

Sample size	π = 0.05	π = 0.05	π = 0.05	π = 0.05
Sample size	small effects	large effects	small effects	large effects

(n₀, n₁) = (15,15)	17.3	15.8	12.3	11.9
(n₀, n₁) = (15,15)	(1.65)	(1.63)	(1.21)	(1.30)
(n₀, n₁) = (20,10)	20.7	19.3	13.3	12.7
(n₀, n₁) = (20,10)	(1.51)	(1.45)	(1.35)	(1.38)
(n₀, n₁) = (50,50)	14.2	13.9	9.8	8.6
(n₀, n₁) = (50,50)	(1.15)	(1.24)	(1.02)	(1.11)
(n₀, n₁) = (70,30)	18.3	17.6	10.2	9.9
(n₀, n₁) = (70,30)	(1.17)	(1.29)	(1.08)	(1.06)

However, in our simulations (data not shown), we found that the method had difficulty in selecting the correct variables when p is larger than n. This attests to the fact that variable selection in the situation of large p and small n is quite difficult. We discuss this situation in the “conclusion.”

Prostate cancer gene expression data

The example we consider is from a prostate cancer study; a subset of the samples was considered by Dhanasekaran et al [3]. We focus here on noncancer versus cancer tissues. The samples are profiled using spotted cDNA (ie, red/green) microarrays; there are initially 101 samples profiled using 10 K chips (9984 genes). We have taken the following preprocessing steps: This leaves a total of 4880 genes for analysis. remove genes that are reported as missing in more than 10% of the samples; remove genes that have a variance less than 0.05 in all samples; impute measurements for missing genes using the median. We first performed an estimation of the error rate using fivefold cross-validation. This generally gave an error rate between 15–20% for various choices of λ, suggesting that the classifier is not sensitive to the choice of the smoothing parameter. One of the by-products of the procedure is a list of genes that are estimated to have non-zero effects. We present the gene lists for λ = 1 in Table 2. Out of the 4880 genes, only 21 are estimated to have nonzero effects. Of the genes that are overexpressed in prostate cancer relative to benign prostate tissue, the early growth response (Hs. 326035/301865), feline sarcoma viral oncogene homolog (Hs.81665), T-cell receptor gamma locus (Hs. 112259), and fatty acid synthase (Hs.83190) have been seen by other investigators to be upregulated in prostate cancer, as in Table 3. The other genes on the list could represent false positives or genes whose joint effect is predictive of cancer status.

Table 2

List of genes underexpressed in prostate cancer relative to benign prostate tissue

Clone ID	Gene name

Hs.288965	Homo sapiens cDNA: FLJ22300 fis, clone HRC04759
Hs.76307	Neuroblastoma, suppression of tumorigenicity 1
Hs.9615	Myosin, light polypeptide 9, regulatory
Hs.226795	Glutathione S-transferase pi
Hs.171731	Solute carrier family 14 (urea transporter),
Hs.171731	member 1 (Kidd blood group)

Table 3

List of genes overexpressed in prostate cancer relative to benign prostate tissue.

Clone ID	Gene name

Hs.326035/301865	Early growth response 1 -OR- dopachrome tautomerase
Hs.326035/301865	(dopachrome delta-isomerase, tyrosine-related protein 2)
Hs.299221	Pyruvate dehydrogenase kinase, isoenzyme 4
Hs.81665	v-kit Hardy-Zuckerman 4 feline sarcoma viral oncogene homolog
Hs.74267	Ribosomal protein L15
Hs.75431	Fibrinogen, gamma polypeptide
Hs.335797	ESTs, moderately similar to hypothetical protein
Hs.335797	FLJ20097 (Homo sapiens) (H sapiens)
Hs.82129	Carbonic anhydrase III, muscle specific
Hs.112259	T-cell receptor gamma locus
Hs.151258	Hypothetical protein FLJ21062
Hs.22394	Sec3-like
Hs.84190	Solute carrier family 19 (folate transporter), member 1
Hs.119597	Stearoyl-CoA desaturase (delta-9-desaturase)
Hs.131740	Homo sapiens cDNA FLJ30428 fis, clone BRACE2008941
Hs.50727	N-acetylglucosaminidase, alpha- (Sanfilippo disease IIIB)
Hs.83190	Fatty acid synthase
Hs.82961	Homo sapiens, clone MGC: 22588 IMAGE: 4696566,
Hs.82961	mRNA, complete cds

Conclusion

In this paper, we have introduced a new approach to the joint problems of classification and variable selection in the analysis of microarray data. These problems have been treated as separate problems in the previous literature. Our approach is combine the two problems by use of the LASSO. This work has opened the way for several future avenues of research that we are currently investigating. First, a popular alternative to LDA in classification problems is logistic regression. It has been recently motivated by ROC considerations (McIntosh and Pepe [19]). While it is possible to formulate a LASSO estimation for logistic regression models, adapting the LASSO-SVM equivalence to this situation requires new algorithms. It will also be important to compare the performance of the two L1-regularized procedures (LDA and logistic regression) on real and simulated microarray datasets. In this paper, we focused on the two-class problem. While LDA and logistic regression can be extended to accommodate multicategorical responses, the ROC arguments that motivated the method here only exist for two populations. We are currently exploring theoretical frameworks for generalizing ROC ideas for multiple disease states. The estimation procedure described in this paper allows the joint estimation of multivariate gene effects on the response (class label). The approach described here could be generalized by fitting more nonlinear gene effects in the estimation algorithm or by including higher-order interactions between genes. Another generalization is to perform a clustering of the genes and to enter the cluster averages as covariates in the model. Such an approach was taken by Hastie et al [20] and Tibshirani et al [21]. It is also of current interest to incorporate biological knowledge into microarray data analyses. In many instances, scientists are interested in the effects of a particular gene or pathway on genetic expression. In this context, approaches have been suggested by Zien et al [22] and Pavlidis et al [23] in which biological knowledge as represented by pathway scores or functional annotation status are correlated with gene expression. However, their approaches were univariate. There would be potential gains in efficiencies of analyses by considering joint models for pathways. We are currently studying the applicability of the joint estimation procedure described here to that setting. Finally, a by-product of the method proposed here is that the individual genes can be estimated to have exactly zero effect on the response. The list of genes with estimated nonzero effects then comprise a gene list that investigators can do further validation work on. However, in our simulations (data not shown), we found that the method had difficulty in selecting the correct v variables. This attests to the fact that variable selection in the situation of large p and small n is quite difficult. An alternative to the method proposed here is Bayesian variable selection methods (Lee et al [24]). We are currently exploring an adaptation of the algorithm described here to a Bayesian approach.

14 in total

1. 'Gene shaving' as a method for identifying distinct sets of genes with similar expression patterns.

Authors: T Hastie; R Tibshirani; M B Eisen; A Alizadeh; R Levy; L Staudt; W C Chan; D Botstein; P Brown
Journal: Genome Biol Date: 2000-08-04 Impact factor: 13.583

2. Analysis of gene expression data with pathway scores.

Authors: A Zien; R Küffner; R Zimmer; T Lengauer
Journal: Proc Int Conf Intell Syst Mol Biol Date: 2000

3. Gene selection: a Bayesian variable selection approach.

Authors: Kyeong Eun Lee; Naijun Sha; Edward R Dougherty; Marina Vannucci; Bani K Mallick
Journal: Bioinformatics Date: 2003-01 Impact factor: 6.937

4. Combining several screening tests: optimality of the risk score.

Authors: Martin W McIntosh; Margaret Sullivan Pepe
Journal: Biometrics Date: 2002-09 Impact factor: 2.571

5. Gene expression profiling predicts clinical outcome of breast cancer.

Authors: Laura J van 't Veer; Hongyue Dai; Marc J van de Vijver; Yudong D He; Augustinus A M Hart; Mao Mao; Hans L Peterse; Karin van der Kooy; Matthew J Marton; Anke T Witteveen; George J Schreiber; Ron M Kerkhoven; Chris Roberts; Peter S Linsley; René Bernards; Stephen H Friend
Journal: Nature Date: 2002-01-31 Impact factor: 49.962

6. Delineation of prognostic biomarkers in prostate cancer.

Authors: S M Dhanasekaran; T R Barrette; D Ghosh; R Shah; S Varambally; K Kurachi; K J Pienta; M A Rubin; A M Chinnaiyan
Journal: Nature Date: 2001-08-23 Impact factor: 49.962

7. Classification and diagnostic prediction of cancers using gene expression profiling and artificial neural networks.

Authors: J Khan; J S Wei; M Ringnér; L H Saal; M Ladanyi; F Westermann; F Berthold; M Schwab; C R Antonescu; C Peterson; P S Meltzer
Journal: Nat Med Date: 2001-06 Impact factor: 53.440

8. Identifying combinations of cancer markers for further study as triggers of early intervention.

Authors: S G Baker
Journal: Biometrics Date: 2000-12 Impact factor: 2.571

9. Tumor classification by partial least squares using microarray gene expression data.

Authors: Danh V Nguyen; David M Rocke
Journal: Bioinformatics Date: 2002-01 Impact factor: 6.937

10. Cluster analysis and display of genome-wide expression patterns.

Authors: M B Eisen; P T Spellman; P O Brown; D Botstein
Journal: Proc Natl Acad Sci U S A Date: 1998-12-08 Impact factor: 11.205

32 in total

Review 1. Penalized feature selection and classification in bioinformatics.

Authors: Shuangge Ma; Jian Huang
Journal: Brief Bioinform Date: 2008-06-18 Impact factor: 11.622

2. Folded concave penalized learning in identifying multimodal MRI marker for Parkinson's disease.

Authors: Hongcheng Liu; Guangwei Du; Lijun Zhang; Mechelle M Lewis; Xue Wang; Tao Yao; Runze Li; Xuemei Huang
Journal: J Neurosci Methods Date: 2016-04-19 Impact factor: 2.390

3. Identifying informative imaging biomarkers via tree structured sparse learning for AD diagnosis.

Authors: Manhua Liu; Daoqiang Zhang; Dinggang Shen
Journal: Neuroinformatics Date: 2014-07

4. Gene selection for microarray data classification via subspace learning and manifold regularization.

Authors: Chang Tang; Lijuan Cao; Xiao Zheng; Minhui Wang
Journal: Med Biol Eng Comput Date: 2017-12-19 Impact factor: 2.602

5. Predicting the lethal phenotype of the knockout mouse by integrating comprehensive genomic data.

Authors: Yuan Yuan; Yanxun Xu; Jianfeng Xu; Robyn L Ball; Han Liang
Journal: Bioinformatics Date: 2012-03-13 Impact factor: 6.937

6. Penalized integrative semiparametric interaction analysis for multiple genetic datasets.

Authors: Yang Li; Rong Li; Cunjie Lin; Yichen Qin; Shuangge Ma
Journal: Stat Med Date: 2019-04-16 Impact factor: 2.373

7. Sign-based Shrinkage Based on an Asymmetric LASSO Penalty.

Authors: Eric S Kawaguchi; Burcu F Darst; Kan Wang; David V Conti
Journal: J Data Sci Date: 2021-06-02

8. Genome-wide discovery of human heart enhancers.

Authors: Leelavati Narlikar; Noboru J Sakabe; Alexander A Blanski; Fabio E Arimura; John M Westlund; Marcelo A Nobrega; Ivan Ovcharenko
Journal: Genome Res Date: 2010-01-14 Impact factor: 9.043

9. Tree-guided sparse coding for brain disease classification.

Authors: Manhua Liu; Daoqiang Zhang; Pew-Thian Yap; Dinggang Shen
Journal: Med Image Comput Comput Assist Interv Date: 2012

10. Predicting risk for Alcohol Use Disorder using longitudinal data with multimodal biomarkers and family history: a machine learning study.

Authors: Sivan Kinreich; Jacquelyn L Meyers; Adi Maron-Katz; Chella Kamarajan; Ashwini K Pandey; David B Chorlian; Jian Zhang; Gayathri Pandey; Stacey Subbie-Saenz de Viteri; Dan Pitti; Andrey P Anokhin; Lance Bauer; Victor Hesselbrock; Marc A Schuckit; Howard J Edenberg; Bernice Porjesz
Journal: Mol Psychiatry Date: 2019-10-08 Impact factor: 15.992