| Literature DB >> 19615113 |
Christèle Robert-Granié1, Kim-Anh Lê Cao, Magali Sancristobal.
Abstract
BACKGROUND: The aim of this work was to study the performances of 2 predictive statistical tools on a data set that was given to all participants of the Eadgene-SABRE Post Analyses Working Group, namely the Pig data set of Hazard et al. (2008). The data consisted of 3686 gene expressions measured on 24 animals partitioned in 2 genotypes and 2 treatments. The objective was to find biomarkers that characterized the genotypes and the treatments in the whole set of genes.Entities:
Year: 2009 PMID: 19615113 PMCID: PMC2712743 DOI: 10.1186/1753-6561-3-S4-S13
Source DB: PubMed Journal: BMC Proc ISSN: 1753-6561
6] proposed a sparse version of the PLS, that combines variable selection and modelling in a one-step procedure for such problems. The sparse PLS (noted sPLS) is based on Lasso regression [7] that penalizes the loading vectors using Singular Value Decomposition to solve the PLS [8].
Figure 1Comparison of significance level (-log10 of the p-value in the differential analysis) with the importance measure of Random Forest. The genes above the horizontal line are differentially expressed genes (t test) whereas the genes on the right hand side of the vertical line are declared as most important and highly predictive by Random forest.
Figure 2Heat map displays of the hierarchical clustering results. The light (dark) colour represents over-expressed (under-expressed) genes. The clusterings were performed with the Ward method and Euclidian distance with the 50 genes selected with Random Forest. Genes are displayed in lines and individuals in columns.
Figure 3Graphical representation of individuals with the two latent variables associated to the X data set. The first axis (first latent variable) separates the two genotypes, while the second opposes the treatments.
Figure 4Graphical representation of genes selected with sPLS and their correlation. Genes clustered together indicate a high correlation between them. This figure can be combined with the interpretation of Figure 3: the genes in dark colour are predictive for the genotype effect (first axis) and the genes in red are linked with the treatment (second axis).