Eun Jeong Min1, Sandra E Safo2, Qi Long1. 1. Department of Biostatistics, Epidemiology and Informatics, University of Pennsylvania, Philadelphia, PA, USA. 2. Division of Biostatistics, University of Minnesota, Minneapolis, MN, USA.
Abstract
MOTIVATION: Co-inertia analysis (CIA) is a multivariate statistical analysis method that can assess relationships and trends in two sets of data. Recently CIA has been used for an integrative analysis of multiple high-dimensional omics data. However, for classical CIA, all elements in the loading vectors are nonzero, presenting a challenge for the interpretation when analyzing omics data. For other multivariate statistical methods such as canonical correlation analysis (CCA), penalized least squares (PLS), various approaches have been proposed to produce sparse loading vectors via l1-penalization/constraint. We propose a novel CIA method that uses l1-penalization to induce sparsity in estimators of loading vectors. Our method simultaneously conducts model fitting and variable selection. Also, we propose another CIA method that incorporates structure/network information such as those from functional genomics, besides using sparsity penalty so that one can get biologically meaningful and interpretable results. RESULTS: Extensive simulations demonstrate that our proposed penalized CIA methods achieve the best or close to the best performance compared to the existing CIA method in terms of feature selection and recovery of true loading vectors. Also, we apply our methods to the integrative analysis of gene expression data and protein abundance data from the NCI-60 cancer cell lines. Our analysis of the NCI-60 cancer cell line data reveals meaningful variables for cancer diseases and biologically meaningful results that are consistent with previous studies. AVAILABILITY AND IMPLEMENTATION: Our algorithms are implemented as an R package which is freely available at: https://www.med.upenn.edu/long-lab/. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
MOTIVATION: Co-inertia analysis (CIA) is a multivariate statistical analysis method that can assess relationships and trends in two sets of data. Recently CIA has been used for an integrative analysis of multiple high-dimensional omics data. However, for classical CIA, all elements in the loading vectors are nonzero, presenting a challenge for the interpretation when analyzing omics data. For other multivariate statistical methods such as canonical correlation analysis (CCA), penalized least squares (PLS), various approaches have been proposed to produce sparse loading vectors via l1-penalization/constraint. We propose a novel CIA method that uses l1-penalization to induce sparsity in estimators of loading vectors. Our method simultaneously conducts model fitting and variable selection. Also, we propose another CIA method that incorporates structure/network information such as those from functional genomics, besides using sparsity penalty so that one can get biologically meaningful and interpretable results. RESULTS: Extensive simulations demonstrate that our proposed penalized CIA methods achieve the best or close to the best performance compared to the existing CIA method in terms of feature selection and recovery of true loading vectors. Also, we apply our methods to the integrative analysis of gene expression data and protein abundance data from the NCI-60 cancer cell lines. Our analysis of the NCI-60 cancer cell line data reveals meaningful variables for cancer diseases and biologically meaningful results that are consistent with previous studies. AVAILABILITY AND IMPLEMENTATION: Our algorithms are implemented as an R package which is freely available at: https://www.med.upenn.edu/long-lab/. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
Authors: D T Ross; U Scherf; M B Eisen; C M Perou; C Rees; P Spellman; V Iyer; S S Jeffrey; M Van de Rijn; M Waltham; A Pergamenschikov; J C Lee; D Lashkari; D Shalon; T G Myers; J N Weinstein; D Botstein; P O Brown Journal: Nat Genet Date: 2000-03 Impact factor: 38.330
Authors: William C Reinhold; Margot Sunshine; Hongfang Liu; Sudhir Varma; Kurt W Kohn; Joel Morris; James Doroshow; Yves Pommier Journal: Cancer Res Date: 2012-07-15 Impact factor: 12.701
Authors: Satoshi Nishizuka; Lu Charboneau; Lynn Young; Sylvia Major; William C Reinhold; Mark Waltham; Hosein Kouros-Mehr; Kimberly J Bussey; Jae K Lee; Virginia Espina; Peter J Munson; Emanuel Petricoin; Lance A Liotta; John N Weinstein Journal: Proc Natl Acad Sci U S A Date: 2003-11-17 Impact factor: 11.205
Authors: Erin A Marshall; Adam P Sage; Kevin W Ng; Victor D Martinez; Natalie S Firmino; Kevin L Bennewith; Wan L Lam Journal: Sci Data Date: 2017-10-24 Impact factor: 6.444
Authors: Duo Jiang; Courtney R Armour; Chenxiao Hu; Meng Mei; Chuan Tian; Thomas J Sharpton; Yuan Jiang Journal: Front Genet Date: 2019-11-08 Impact factor: 4.599