| Literature DB >> 29511258 |
Kirill Veselkov1, Jonathan Sleeman2,3, Emmanuelle Claude4, Johannes P C Vissers4, Dieter Galea5, Anna Mroz5, Ivan Laponogov5, Mark Towers4, Robert Tonge4, Reza Mirnezami5, Zoltan Takats5, Jeremy K Nicholson5, James I Langridge4.
Abstract
Mass Spectrometry Imaging (MSI) holds significant promise in augmenting digital histopathologic analysis by generating highly robust big data about the metabolic, lipidomic and proteomic molecular content of the samples. In the process, a vast quantity of unrefined data, that can amount to several hundred gigabytes per tissue section, is produced. Managing, analysing and interpreting this data is a significant challenge and represents a major barrier to the translational application of MSI. Existing data analysis solutions for MSI rely on a set of heterogeneous bioinformatics packages that are not scalable for the reproducible processing of large-scale (hundreds to thousands) biological sample sets. Here, we present a computational platform (pyBASIS) capable of optimized and scalable processing of MSI data for improved information recovery and comparative analysis across tissue specimens using machine learning and related pattern recognition approaches. The proposed solution also provides a means of seamlessly integrating experimental laboratory data with downstream bioinformatics interpretation/analyses, resulting in a truly integrated system for translational MSI.Entities:
Mesh:
Year: 2018 PMID: 29511258 PMCID: PMC5840264 DOI: 10.1038/s41598-018-22499-z
Source DB: PubMed Journal: Sci Rep ISSN: 2045-2322 Impact factor: 4.379
Figure 1The translational data analytics pipeline for large-scale MS imaging data in clinical research settings. Incorporation of large scale MSI-derived data into conventional patient phenotyping approaches will require upstream handling and assimilation of multi-source, heterogeneous inputs and subsequent downstream generation of clinically relevant biological information. Linking these two steps requires a reproducible and robust bioinformatics pipeline that can seamlessly pre-process and analyse large scale MSI datasets. A fundamental facet of this pipeline will be its transparency and computational consistency – all pre-processed workflows and related meta-data will be registered and stored in open access. Here we introduce the pyBASIS (Bioinformatics for mAss Spectrometry Imaging in augmented Systems pathology) computational package that aims to address these requirements. The module icons displayed in this diagram were obtained from Flaticon under a free-license.
Figure 2Linear O(N) performance of each developed pipeline module. The dependency between the number of processed MALDI/DESI-MSI sample datasets acquired on SYNAPT G2-Si platforms MSI sample datasets and the processing time (0.5 GB of peak picked (~2000 peaks) data per sample; ~100 GB of data for 200 samples). All processing was done using a single core/thread of a standalone workstation PC (8 core Intel® Xeon ® E5-2630 v3 @2.4 GHz, 64 Gb RAM, 3 Tb HDD). The total processing time excluded data import and export, which are included in the Total plot.
Figure 3Integrated bioinformatics pipeline (pyBASIS) operating within a SymphonyTM environment for optimised processing and analysis of large scale MSI datasets. Inter-sample normalisation functionality is illustrated with creation of individual sample profiles (A), followed by derivation of sample-specific normalisation factors (B) and scaling of all spectral intensities using derived normalisation factors (C). Individual steps for the pipeline (1–6) are described in detail in the main text.
Figure 4Impact of variance-stabilizing transformation on information recovery via unsupervised PCA-based analysis.
Figure 5Unsupervised analysis of MALDI-MSI positive ionisation mode imaging datasets, generated on Synapt G2-Si Waters mass spectrometer, in breast cancer of mouse models. The first upper row represents 4 control samples taken from healthy animals, where the highlighted regions indicate the healthy tissues, while the lower row indicates solid tumor tissue with minimal (if any) stromal tissue. (A) The PCA-driven unsupervised analysis of MALDI-MSI data following the optimized pre-processing strategy separates stromal tissue (yellow/red) from cancerous tissue (white) in mammary breast cancer. (B) The representative spectral profiles from mammary gland control and tumour specimens. Shown inset are tentative example identifications.
Figure 6The PCA-driven unsupervised analysis of large-scale DESI-MSI data following the optimized pre-processing strategy separates stromal tissue (yellow) from cancerous tissue (white/grey) in colorectal cancer.