| Literature DB >> 26822911 |
Said El Bouhaddani1, Jeanine Houwing-Duistermaat2, Perttu Salo3, Markus Perola4, Geurt Jongbloed5, Hae-Won Uh6.
Abstract
BACKGROUND: Rapid computational and technological developments made large amounts of omics data available in different biological levels. It is becoming clear that simultaneous data analysis methods are needed for better interpretation and understanding of the underlying systems biology. Different methods have been proposed for this task, among them Partial Least Squares (PLS) related methods. To also deal with orthogonal variation, systematic variation in the data unrelated to one another, we consider the Two-way Orthogonal PLS (O2PLS): an integrative data analysis method which is capable of modeling systematic variation, while providing more parsimonious models aiding interpretation.Entities:
Mesh:
Year: 2016 PMID: 26822911 PMCID: PMC4959391 DOI: 10.1186/s12859-015-0854-z
Source DB: PubMed Journal: BMC Bioinformatics ISSN: 1471-2105 Impact factor: 3.169
Simulation parameter choices. The loading value for variable i is the density value of a normal distribution with mean μ and standard deviation σ, denoted as N(i;μ,σ). The noise terms were drawn from a normal distribution with zero mean. The scores were drawn from a standard normal distribution. The variances of the noise terms are such that the expected sum of squares of the noise account for 100α % (equal to 5 or 50 %) of the total sum of squares
| Parameter | ‘Low’-dimensional case | ‘higher’-dimensional case |
|---|---|---|
|
| 500 | 500 |
|
| [100,50] | [500,250] |
|
| [ | [ |
|
| [ | [ |
|
| [ | [ |
|
| [ | [ |
|
| 2 | 2 |
|
| [1,1,1] | [1,1,1] |
|
|
|
|
Gene composition of the LL module identified by Inouye et al.
| Gene annotation | Ilumina ID |
|---|---|
| C1ORF186 | ILMN_1690209 |
| CPA3 | ILMN_1766551 |
| ENPP3 | ILMN_1749131 |
| FCER1A | ILMN_1688423 |
| GATA2 | ILMN_2102670 |
| HDC | ILMN_1792323 |
| HS.132563 | ILMN_1899034 |
| MS4A2 | ILMN_1806721 |
| SLC45A3 | ILMN_1726114 |
| SPRYD5 | ILMN_1753648 |
| CACNG6 | ILMN_1779043 |
Fig. 1Simulation: low dimensions little noise. Boxplots of 1000 simulations in which X (upper row) contains 500 samples and 100 variables, Y (lower row) contains 500 samples and 50 variables. Noise contributed for 5 % of the total variation. The first column corresponds to the joint part, the second column depicts the orthogonal part. The red line denotes the true loading profile
Fig. 2Simulation: high dimensions little noise. Boxplots of 1000 simulations in which X (upper row) contains 500 samples and 500 variables, Y (lower row) contains 500 samples and 250 variables. Noise contributed for 5 % of the total variation. The first column corresponds to the joint part, the second column depicts the orthogonal part. The red line denotes the true loading profile
Fig. 3Simulation: low dimensions high noise. Boxplots of 1000 simulations in which X contains 500 samples and 100 variables, Y contains 500 samples and 50 variables. Noise contributed for 50 % of the total variation. The first column corresponds to the joint part, the second column depicts the orthogonal part. The red line denotes the true loading profile
Fig. 4Simulation: high dimensions high noise. Boxplots of 1000 simulations in which X contains 500 samples and 500 variables, Y contains 500 samples and 250 variables. Noise contributed for 50 % of the total variation. The first column corresponds to the joint part, the second column depicts the orthogonal part. The red line denotes the true loading profile
Fig. 5Pearson correlation heatmap of metabolites. Red indicates high positive correlation, green is little correlation and blue is high negative correlation. The variables are in the original order. A histogram of correlations is added in the top left corner
Absolute and relative variations in O2PLS
|
|
|
|
|
|
|
|
|
|
|---|---|---|---|---|---|---|---|---|
| 1 | 57.97 | 50.81 | 46.31 | 1.37 | 26.74 | 0.80 | 57.74 | 58.55 |
| 2 | 67.94 | 53.40 | 60.80 | 4.24 | 29.52 | 1.45 | 48.55 | 34.25 |
| 3 | 74.08 | 54.79 | 68.99 | 7.35 | 26.70 | 2.00 | 38.69 | 27.23 |
| 4 | 78.06 | 55.62 | 72.94 | 9.63 | 29.23 | 2.40 | 40.07 | 24.87 |
| 5 | 80.93 | 56.69 | 76.51 | 11.30 | 29.81 | 3.32 | 38.97 | 29.43 |
|
|
|
|
|
|
|
The amount of variation per model statistic with respect to the total amount of variation, from an O2PLS fit using Metabolomics (X) and Transcriptomics (Y). The R 2 (definition given in last row) in percentages (with respect to the total variation in X and Y respectively) for each model statistic. The numbers of orthogonal components are n =1,n =8. The number of joint components varies from 1–5. The first row was found best according to the proposed alternative cross-validation (as in Section “Methods”)
Absolute and relative variations of the scores and noise in O2PLS
|
|
|
|
|
|
|
| |
|---|---|---|---|---|---|---|---|
| Absolute | 2551 | 642 | 2316 | 3852 | 138502 | 137837 | 2061 |
| Relative | 46.3 % | 11.7 % | 42.0 % | 1.4 % | 49.4 % | 49.2 % | 53.5 % |
The sum of squares per model part in an O2PLS fit using Metabolomics (X) and Transcriptomics (Y). Absolute quantities as well as percentages with respect to the total variation in X (first three), Y (second three) and U (last one) are shown
Fig. 6Scatterplot joint score vectors. The first joint score vectors (T, U) obtained from an O2PLS fit using Metabolomics (represented by T) and Transcriptomics (represented by U) are plotted against each other. The slope of the fitted line is 0.84, the intercept is zero due to the mean centering of the data. The coefficient of determination R 2 was 0.47
Fig. 7Labeled joint metabolomic loading plot. Four groups of interest are grouped: very-low-density-lipoproteins, high-density-lipoproteins, mobile lipids and amino acids
Fig. 8O2PLS transcriptomic joint loadings. Joint part O2PLS loadings per gene expression. The top ten gene expressions are in black and green. The LL module gene expressions are in red and green. Four of the eleven gene expressions in the LL module are in the top ten, indicated in green. The loadings for five other gene expressions in the top ten and the loadings for the LL module gene expressions have opposite sign
LL module and top 10 gene expressions. Identified gene expressions in the top 10 most important variables for the joint variation in the transcriptome. The corresponding genes are shown. Four gene expressions fall into the earlier identifies Lipid-Leukocyte module
| Gene annotation | Ilumina ID | Module |
|---|---|---|
| CPA3 | ILMN_1766551 | LL and top 10 |
| FCER1A | ILMN_1688423 | LL and top 10 |
| GATA2 | ILMN_2102670 | LL and top 10 |
| HDC | ILMN_1792323 | LL and top 10 |
| DEFA1B | ILMN_1725661 | top 10 |
| DEFA1B | ILMN_1679357 | top 10 |
| DEFA1B | ILMN_2102721 | top 10 |
| SNORD13 | ILMN_1892403 | top 10 |
| DEFA3 | ILMN_2165289 | top 10 |
| IFIT1 | ILMN_1707695 | top 10 |
Fig. 9O2PLS metabolomic orthogonal loadings. Orthogonal part loadings obtained from an O2PLS fit with Metabolomics and Transcriptomics. One orthogonal component in metabolomics was identified
Fig. 10O2PLS transcriptomic orthogonal loadings. Orthogonal part O2PLS loadings per gene expression. There were eight orthogonal components identified. The ratio of the first part sum of squares and last part sum of squares is approximately eleven