| Literature DB >> 29091994 |
S Deepaisarn1, P D Tar1, N A Thacker1, A Seepujak1, A W McMahon1.
Abstract
Motivation: Matrix-assisted laser desorption/ionisation time-of-flight mass spectrometry (MALDI) facilitates the analysis of large organic molecules. However, the complexity of biological samples and MALDI data acquisition leads to high levels of variation, making reliable quantification of samples difficult. We present a new analysis approach that we believe is well-suited to the properties of MALDI mass spectra, based upon an Independent Component Analysis derived for Poisson sampled data. Simple analyses have been limited to studying small numbers of mass peaks, via peak ratios, which is known to be inefficient. Conventional PCA and ICA methods have also been applied, which extract correlations between any number of peaks, but we argue makes inappropriate assumptions regarding data noise, i.e. uniform and Gaussian.Entities:
Mesh:
Substances:
Year: 2018 PMID: 29091994 PMCID: PMC5860625 DOI: 10.1093/bioinformatics/btx630
Source DB: PubMed Journal: Bioinformatics ISSN: 1367-4803 Impact factor: 6.937
Fig. 1.Monte Carlo generated Bland-Altman plots showing behaviour of uniform independent Gaussian noise (left) and independent Poisson noise (right)
Modelling options, with statistical and signal assumptions, available for varied data properties
| Model | Noise | Signal | Orthogonal | Coefficients |
|---|---|---|---|---|
| PCA | iid Gaussian | Yes | +/- | |
| PCA with Anscombe | Poisson | Yes | +/- | |
| ICA | iid Gaussian | No | +/- | |
| ICA with Anscombe | Poisson | No | +/- | |
| Non neg ICA | iid Gaussian | No | + only | |
| Poisson ICA* | Poisson | No | + only | |
| MALDI data |
Note: Our selected method (*) is based upon assumptions matching the properties of MALDI mass spectra. PCA (Jolliffe, 1986), ICA (Comon, 1994), Non neg ICA (Plumbley and Oja, 2004; Plumbley, 2003), Poisson ICA* (Tar and Thacker, 2014). Italics emphasize that MALDI data has the properties noted in the row above, i.e. the method we propose to use.
Fig. 2.Processing work-flow diagram
Fig. 3.Correlation with ground-truth via peak ratio analysis: milk mixtures using m/z 706.2 peak normalized to m/z 760.5 peak; lamb tissue mixtures using m/z 786.5 peak normalized to m/z 760.5 peak; white: grey matter using m/z 734.5. The x-axis shows the ground-truth mixing proportions. Each cross is a peak ratio estimate from a different spectrum, with repeatability data at each 10% increment. Deviations from the fitted line (least square) show typical measurement accuracy
Fig. 7.Left: Predictive ability of LPM error theory, as measured using pull distributions. A pull distribution should have a mean of zero and standard deviation of unity if predicted errors match observed errors. This is achieved in all variants of our experiments. Right: Measurement precision of peak ratio analysis versus LP-ICA analysis. Values are 1 standard deviation relative errors, expressed as percentage of quantity measurements. LP-ICA method is more precise in all experiments
Fig. 4.Bland-Altman plot showing behaviour of model residuals (y-axis) as function of peak intensity (x-axis). Each point represents a residual between an LP-ICA modelled spectrum bin and actual spectrum. The fitted curves (power law of Eq. 9) show ±1 standard deviation error as a function of peak intensity consistent with Poisson statistics
Fig. 5.Determination of model order for LPMs. This curve shows the goodness-of-fit (Eq. 3 of Supplementary Appendix S1) of LP-ICA models as a function of the number of model components, where each component represents a sub-spectrum that is a mode of correlated spectra variation
Fig. 6.Composition of spectra in terms of weighted contributions of extracted LP-ICA components. Each 10% increment is shown as a step, where each step contains repeatability data for independent spectra with the same mixing proportions. The dots show the best fitted trend. Each bar shows the relative proportion of each LP-ICA component present within a spectrum. The error bars are the Linear Poisson Model predicted errors. The components ‘comp 1’ etc. are listed in the keys from top to bottom in the same order as they appear in the figure