Literature DB >> 33736604

Statistical integration of two omics datasets using GO2PLS.

Zhujie Gu1, Said El Bouhaddani2, Jiayi Pei3, Jeanine Houwing-Duistermaat2,4,5, Hae-Won Uh2.   

Abstract

BACKGROUND: Nowadays, multiple omics data are measured on the same samples in the belief that these different omics datasets represent various aspects of the underlying biological systems. Integrating these omics datasets will facilitate the understanding of the systems. For this purpose, various methods have been proposed, such as Partial Least Squares (PLS), decomposing two datasets into joint and residual subspaces. Since omics data are heterogeneous, the joint components in PLS will contain variation specific to each dataset. To account for this, Two-way Orthogonal Partial Least Squares (O2PLS) captures the heterogeneity by introducing orthogonal subspaces and better estimates the joint subspaces. However, the latent components spanning the joint subspaces in O2PLS are linear combinations of all variables, while it might be of interest to identify a small subset relevant to the research question. To obtain sparsity, we extend O2PLS to Group Sparse O2PLS (GO2PLS) that utilizes biological information on group structures among variables and performs group selection in the joint subspace.
RESULTS: The simulation study showed that introducing sparsity improved the feature selection performance. Furthermore, incorporating group structures increased robustness of the feature selection procedure. GO2PLS performed optimally in terms of accuracy of joint score estimation, joint loading estimation, and feature selection. We applied GO2PLS to datasets from two studies: TwinsUK (a population study) and CVON-DOSIS (a small case-control study). In the first, we incorporated biological information on the group structures of the methylation CpG sites when integrating the methylation dataset with the IgG glycomics data. The targeted genes of the selected methylation groups turned out to be relevant to the immune system, in which the IgG glycans play important roles. In the second, we selected regulatory regions and transcripts that explained the covariance between regulomics and transcriptomics data. The corresponding genes of the selected features appeared to be relevant to heart muscle disease.
CONCLUSIONS: GO2PLS integrates two omics datasets to help understand the underlying system that involves both omics levels. It incorporates external group information and performs group selection, resulting in a small subset of features that best explain the relationship between two omics datasets for better interpretability.

Entities:  

Keywords:  Dimension reduction; Feature selection; Group structure; Integration of Omics data; O2PLS

Mesh:

Year:  2021        PMID: 33736604      PMCID: PMC7977326          DOI: 10.1186/s12859-021-03958-3

Source DB:  PubMed          Journal:  BMC Bioinformatics        ISSN: 1471-2105            Impact factor:   3.169


  18 in total

Review 1.  Partial least squares: a versatile tool for the analysis of high-dimensional genomic data.

Authors:  Anne-Laure Boulesteix; Korbinian Strimmer
Journal:  Brief Bioinform       Date:  2006-05-26       Impact factor: 11.622

2.  Gene set enrichment analysis: a knowledge-based approach for interpreting genome-wide expression profiles.

Authors:  Aravind Subramanian; Pablo Tamayo; Vamsi K Mootha; Sayan Mukherjee; Benjamin L Ebert; Michael A Gillette; Amanda Paulovich; Scott L Pomeroy; Todd R Golub; Eric S Lander; Jill P Mesirov
Journal:  Proc Natl Acad Sci U S A       Date:  2005-09-30       Impact factor: 11.205

3.  A scaling normalization method for differential expression analysis of RNA-seq data.

Authors:  Mark D Robinson; Alicia Oshlack
Journal:  Genome Biol       Date:  2010-03-02       Impact factor: 13.583

4.  Choosing proper normalization is essential for discovery of sparse glycan biomarkers.

Authors:  Hae-Won Uh; Lucija Klarić; Ivo Ugrina; Gordan Lauc; Age K Smilde; Jeanine J Houwing-Duistermaat
Journal:  Mol Omics       Date:  2020-06-15

5.  IgG glycosylation and DNA methylation are interconnected with smoking.

Authors:  Annika Wahl; Silva Kasela; Elena Carnero-Montoro; Maarten van Iterson; Jerko Štambuk; Sapna Sharma; Erik van den Akker; Lucija Klaric; Elisa Benedetti; Genadij Razdorov; Irena Trbojević-Akmačić; Frano Vučković; Ivo Ugrina; Marian Beekman; Joris Deelen; Diana van Heemst; Bastiaan T Heijmans; Manfred Wuhrer; Rosina Plomp; Toma Keser; Mirna Šimurina; Tamara Pavić; Ivan Gudelj; Jasminka Krištić; Harald Grallert; Sonja Kunze; Annette Peters; Jordana T Bell; Timothy D Spector; Lili Milani; P Eline Slagboom; Gordan Lauc; Christian Gieger
Journal:  Biochim Biophys Acta Gen Subj       Date:  2017-10-18       Impact factor: 3.770

6.  Integrating diverse genomic data using gene sets.

Authors:  Svitlana Tyekucheva; Luigi Marchionni; Rachel Karchin; Giovanni Parmigiani
Journal:  Genome Biol       Date:  2011-10-21       Impact factor: 13.583

7.  Secondary phenotype analysis in ascertained family designs: application to the Leiden longevity study.

Authors:  Renaud Tissier; Roula Tsonaka; Simon P Mooijaart; Eline Slagboom; Jeanine J Houwing-Duistermaat
Journal:  Stat Med       Date:  2017-03-16       Impact factor: 2.373

8.  Genetic Dissection of Hypertrophic Cardiomyopathy with Myocardial RNA-Seq.

Authors:  Jun Gao; John Collyer; Maochun Wang; Fengping Sun; Fuyi Xu
Journal:  Int J Mol Sci       Date:  2020-04-25       Impact factor: 5.923

9.  Sparse partial least squares regression for simultaneous dimension reduction and variable selection.

Authors:  Hyonho Chun; Sündüz Keleş
Journal:  J R Stat Soc Series B Stat Methodol       Date:  2010-01       Impact factor: 4.488

10.  Evaluation of O2PLS in Omics data integration.

Authors:  Said El Bouhaddani; Jeanine Houwing-Duistermaat; Perttu Salo; Markus Perola; Geurt Jongbloed; Hae-Won Uh
Journal:  BMC Bioinformatics       Date:  2016-01-20       Impact factor: 3.169

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.