| Literature DB >> 30309317 |
Said El Bouhaddani1,2, Hae-Won Uh3, Geurt Jongbloed4, Caroline Hayward5, Lucija Klarić6,5,7, Szymon M Kiełbasa8, Jeanine Houwing-Duistermaat9.
Abstract
BACKGROUND: With the exponential growth in available biomedical data, there is a need for data integration methods that can extract information about relationships between the data sets. However, these data sets might have very different characteristics. For interpretable results, data-specific variation needs to be quantified. For this task, Two-way Orthogonal Partial Least Squares (O2PLS) has been proposed. To facilitate application and development of the methodology, free and open-source software is required. However, this is not the case with O2PLS.Entities:
Keywords: Data-specific variation; Joint principal components; O2PLS; Omics data integration; R package
Mesh:
Year: 2018 PMID: 30309317 PMCID: PMC6182835 DOI: 10.1186/s12859-018-2371-3
Source DB: PubMed Journal: BMC Bioinformatics ISSN: 1471-2105 Impact factor: 3.169
Fig. 1Workflow of the OmicsPLS package. Firstly, each data set is pre-processed. Secondly, O2PLS is used to decompose each data set in joint, specific and residual parts. Finally, the output is visualized and interpreted
Fig. 2Eigenvalues of the covariance matrices of the genetic and glycan data. The relative contribution of each eigenvalue towards the sum of all eigenvalues is shown for the Genetic PCs (panel a) and IgG1 glycan data (panel b), and their covariance (panel c), respectively
Fig. 3Genetic-Glycan joint principal components obtained with the OmicsPLS R-package. Loading values of each IgG1 glycan variable are depicted per component (panel a-e). The colors and shapes represent the biological grouping of the glycans. In the last row and column, a graphical representation of the structure of a particular glycan is shown (panel f)
Top 5 genes and loading values of the Genetic-Glycan joint principal components
| Component 1: ‘average’ glycan | Component 2: ‘fucosylation’ | Component 3: ‘galactosylation’ | |||
|---|---|---|---|---|---|
| Gene symbol | Loading value | Gene symbol | Loading value | Gene symbol | Loading value |
| DNAJC10 | -0.0929 | FUT8 | -0.0844 | MTO1 | 0.0875 |
| ARID3B | -0.0880 | LGALS8 | -0.0781 | AKAP9 | -0.0627 |
| ZNF502 | 0.0756 | LDB3 | 0.0766 | MRPL33 | -0.0622 |
| TBC1D13 | 0.0611 | ARID3B | -0.0701 | MYLPF | 0.0562 |
| ZC2HC1C | 0.0601 | LCE2D | -0.0677 | POLR2F | 0.0554 |
The results are displayed per component. Only the first three components are shown
Simulation results for OmicsPLS and r.jive: inner products
| OmicsPLS | r.jive | |
|---|---|---|
| X joint | 0.88 (0.09) | 0.88 (0.09) |
| X specific | 0.79 (0.08) | 0.78 (0.09) |
| Y joint | 0.85 (0.08) | 0.85 (0.08) |
| Y specific | 0.93 (0.013) | 0.92 (0.014) |
These results are for p=q=100. One thousand replicates were generated. Median (MAD) values of (the absolute value of) inner products between true and estimated loading vectors for O2PLS and JIVE. Higher values indicate better agreement with true loadings. The results are very similar for high-dimensional data (p=q=104)
Performance comparison of OmicsPLS and r.jive w.r.t. median (MAD) total elapsed time in seconds across 1000 replicates, and convergence across 100 runs
| CPU time (sec) | Convergence (%) | |||
|---|---|---|---|---|
| Dimensions | OmicsPLS | r.jive | OmicsPLS | r.jive |
| Low ( | 0.04 (0.007) | 14 (2.8) | 100 | 9 |
| High ( | 18 (4.1) | 132 (16) | 100 | 8 |
For the convergence, the heterogeneity scenario U=10T was used