| Literature DB >> 22715376 |
Agnieszka Smolinska1, Lionel Blanchet, Leon Coulier, Kirsten A M Ampt, Theo Luider, Rogier Q Hintzen, Sybren S Wijmenga, Lutgarde M C Buydens.
Abstract
BACKGROUND: In the last decade data fusion has become widespread in the field of metabolomics. Linear data fusion is performed most commonly. However, many data display non-linear parameter dependences. The linear methods are bound to fail in such situations. We used proton Nuclear Magnetic Resonance and Gas Chromatography-Mass Spectrometry, two well established techniques, to generate metabolic profiles of Cerebrospinal fluid of Multiple Sclerosis (MScl) individuals. These datasets represent non-linearly separable groups. Thus, to extract relevant information and to combine them a special framework for data fusion is required.Entities:
Mesh:
Year: 2012 PMID: 22715376 PMCID: PMC3371049 DOI: 10.1371/journal.pone.0038163
Source DB: PubMed Journal: PLoS One ISSN: 1932-6203 Impact factor: 3.240
The number of samples included in a training and independent test set.
| Group | No. samples NMR | No. samples GC-MS | Overlap NMR and GC-MS | ||||||
| Training | Test | Total | Training | Test | Total | Training | Test | Total | |
| MScl | 19 | 7 | 26 | 18 | 6 | 24 | 7 | 5 | 12 |
| CIS | 15 | 5 | 20 | 10 | 4 | 14 | 7 | 3 | 10 |
Figure 1Conceptual flowchart of kernel-based data fusion.
X1 and X2 are two blocks of data. *Note that all optimized parameters, i.e. number of variables, sigma for the rbf kernel, coefficients µ and nr. of LV’s are kept during the model reconstruction using all available samples. The particular steps are described in sections data analysis.
Figure 2Representations of the a) kernel mapping of data matrix X into kernel space; b) pseudo samples principle in K-PLS-DA.
k indicates the range of pseudo sample values (uniformly distributed); *Note that there are “p” pseudo sample matrixes and “p” kernel pseudo samples matrixes. **The ŷ-values can be projected into latent variable space. #Note that for kernel pseudo samples the loading and b vector of K-PLS-DA model are used. ***These ŷ-values can be represented as “regression coefficients” shown later in Figure 4 or loading plot shown in Figure 5.
Figure 4The maximum absolute value of “regression coefficients” of original variables.
Figure 5Loading plot of pseudo samples trajectories for selected variables.
Numbers in the brackets correspond to variable numbers in Figure 4.
Figure 3Schematic example of: (a) “regression coefficients” of original variables trajectories plotted versus their range; (b) the maximum absolute value of “regression coefficients” of original variables trajectories shown in a.
Summary of σ parameter for rbf kernel function.
| σ parameter at: | NMR | GC-MS |
| Step 1 (variable selection) | 0.5 | 0.55 |
| Step 3 (kernel fusion) | 0.3 | 0.3 |
An overview of prediction accuracy for the validation set using linear methods, non-linear methods and MKL.
| Correct classification rate | |||
| NMR | GC-MS | Fusion(NMR +GC-MS) | |
| Linear method | 61% | 63% | 65% |
| Non-linear method | 93% | 85% | 89% |
| MKL | 100% | ||