Literature DB >> 29430923

Interactive Visual Exploration of 3D Mass Spectrometry Imaging Data Using Hierarchical Stochastic Neighbor Embedding Reveals Spatiomolecular Structures at Full Data Resolution.

Walid M Abdelmoula1,2, Nicola Pezzotti3, Thomas Hölt3, Jouke Dijkstra1, Anna Vilanova3, Liam A McDonnell4, Boudewijn P F Lelieveldt1,3.   

Abstract

Technological advances in mass spectrometry imaging (MSI) have contributed to growing interest in 3D MSI. However, the large size of 3D MSI data sets has made their efficient analysis and visualization and the identification of informative molecular patterns computationally challenging. Hierarchical stochastic neighbor embedding (HSNE), a nonlinear dimensionality reduction technique that aims at finding hierarchical and multiscale representations of large data sets, is a recent development that enables the analysis of millions of data points, with manageable time and memory complexities. We demonstrate that HSNE can be used to analyze large 3D MSI data sets at full mass spectral and spatial resolution. To benchmark the technique as well as demonstrate its broad applicability, we have analyzed a number of publicly available 3D MSI data sets, recorded from various biological systems and spanning different mass-spectrometry ionization techniques. We demonstrate that HSNE is able to rapidly identify regions of interest within these large high-dimensionality data sets as well as aid the identification of molecular ions that characterize these regions of interest; furthermore, through clearly separating measurement artifacts, the HSNE analysis exhibits a degree of robustness to measurement batch effects, spatially correlated noise, and mass spectral misalignment.

Entities:  

Keywords:  3D MSI; HSNE; data analysis; nonlinear dimensionality reduction; proteomics; segmentation; t-SNE

Mesh:

Year:  2018        PMID: 29430923      PMCID: PMC5838640          DOI: 10.1021/acs.jproteome.7b00725

Source DB:  PubMed          Journal:  J Proteome Res        ISSN: 1535-3893            Impact factor:   4.466


Introduction

Mass spectrometry imaging (MSI) is a promising technology for many life science and biomedical applications.[1−3] MSI can provide the spatial distribution of hundreds of biomolecules directly from tissue. Typically, a thin tissue section is analyzed, pixel-by-pixel, in a predefined 2D raster. Matrix-assisted laser desorption ionization (MALDI),[4,5] secondary ion mass spectrometry (SIMS),[6] and desorption electrospray ionization (DESI)[7] are among the most common ionization methods. MALDI can be used to analyze a diverse range of molecular classes just by changing the tissue preparation method, DESI is able to provide molecular information about lipids without any tissue preparation, and SIMS provides very high spatial resolution capabilities also without any tissue preparation. MSI may also be performed in three dimensions, most often by 2D MSI analysis of sequential tissue sections followed by their coregistration into a 3D volume.[8−13] It has been shown that 3D MSI data can be integrated with in vivo imaging modalities such as magnetic resonance imaging (MRI),[8,14,15] fluorescence microscopy,[16] μ-CT,[17] and positron emission tomography (PET).[18] This integration is not only useful from the biological viewpoint[19,20] but also important for the coregistration process that is required to construct the 3D MSI data sets.[21] In vivo imaging modalities preserve the geometrical entity of the tissue volume and thus provide a reference that may be used to construct and visualize the 3D molecular maps. For MALDI- and DESI-based experiments, 3D MSI is essentially the merging of the 2D MSI data sets from a stack of serial tissue sections.[11] Recent technological advances allow 3D MSI to be acquired in a reasonable time frame.[22,23] Each voxel’s mass spectrum is represented by three spatial coordinates (x,y,z), and the 3D MSI data set can contain millions of voxels and mass spectra. This hyper-dimensional 3D MSI data provides rich molecular information on high chemical specificity across the entire tissue volume but poses computational challenges to efficiently analyze, visualize, and identify informative patterns.[24] Currently, there are needs and ongoing interests of developing computational methods to tackle these challenges.[10,21,25,26] Dimensionality reduction is a well-established component for handling and analyzing high-dimensional data.[27−29] It seeks to represent the high-dimensional data in a lower dimensional space, to facilitate efficient visualization, classification, and clustering.[27] Common linear dimensionality reduction algorithms such as principal component analysis (PCA)[30] and non-negative matrix factorization (NNFM)[31] have been widely used for analyzing 2D MSI data sets[32,33] and have also been applied to 3D MSI data sets.[16,34,35] Nevertheless, their inherent linearity constraints mean that the analyses will be dominated by the major differences in the data sets, for example, between different cell types within the tissue volume.[36] State-of-the-art nonlinear dimensionality reduction is a family of algorithms inherited from Stochastic Neighbor Embedding (SNE).[37] The hallmark of these algorithms is their ability to preserve local structures of high-dimensional data in a low map representation. t-Distributed stochastic neighbor embedding (t-SNE) enables the visualization of high-dimensional nonlinear data by alleviating the crowding problem of SNE and thus is able to visualize high-dimensional data in a single map representation.[36,38−40] Fonville et al.[41] and Abdelmoula et al.[42] have highlighted the superiority of t-SNE for analyzing 2D MSI data sets. Nevertheless, the quadratic computational complexity of t-SNE has limited its practical applicability to data sets of up to a few thousand data points.[43] The Barnes–Hut SNE (BH-SNE), an accelerated version of the t-SNE, has subsequently been shown to handle larger data sets of up to a few hundred thousand data points with a computational complexity of O(N log N) where N is the number of data points.[40,43−46] With current increases in data size, in which data sets can contain up to millions of data points, BH-SNE also becomes impractical.[47,48] On such large data scales BH-SNE becomes computationally intractable, and the interpretation of the final crowded embedding is nontrivial as it visualizes millions of data points in a single 2D or 3D scatter plot.[48] Recent progress has been made by Pezzotti and coworkers, in which the hierarchical stochastic neighbor embedding (HSNE) is used to create a hierarchical representation of the nonlinear data, allowing scalable exploration of the high-dimensional space in a low-dimensional space by constructing 2D embeddings that contain a few hundred data points.[47] The HSNE algorithm aims at visualizing meaningful landmarks that represent sets of high-dimensional data points and has been shown to preserve rare but potentially disease-related clusters.[48] The HSNE technique is based on the concept “Overview-First, Details-on-Demand”.[49] This means that on the higher, coarser, hierarchical scale the resultant embedding shows dominant data structures (i.e., an overview). Then, a more detailed information can be visualized by computing a new embedding at the subsequent finer hierarchical scale using a selection of landmarks of dominant structures from the higher scale and so on. Eventually, this interactive hierarchical scheme helps the user to iteratively refine the visualized information and find informative structures on different scales, while keeping both memory and computational complexities manageable. This is because the landmarks used on a finer scale are a subset of the previous, recomputed, coarser scale. For more detailed information about the HSNE algorithm, we refer to Pezzotti et al.[47] Recently, Oetjen et al. published a set of benchmark 3D MSI data sets, which were acquired using different ionization techniques and collected from different biological systems, namely, murine kidney, murine pancreas, human colorectal adenocarcinoma, and human oral squamous cell carcinoma.[10] This data is publicly available and can be downloaded from the GigaScience GigaDB repository.[10] Patterson and coworkers have also recently published 3D MSI data set of lipids in human carotid atherosclerotic plaque.[50] 3D MSI data sets may easily consist of millions of voxels, with thousands of spectral features per voxel. Until now, processing such data sets with t-SNE type approaches was computationally not feasible. We investigate whether HSNE can be deployed to analyze complete 3D MSI data sets, at full resolution, to reveal tissue-specific spectral signatures at dense spectral and spatial resolution. To this end, we present a framework that consists of (a) dimensionality reduction and data visualization using HSNE, (b) a method to derive 3D maps from selected structures in the HSNE embeddings, and (c) a method to identify tissue-specific m/z features using the 3D spatial correlations between the 3D HSNE maps and the original 3D MSI data. We validate the proposed approach in a variety of previously published 3D MSI data sets from different biological systems.

Materials and Methods

Experimental Data Sets

The 3D MSI data sets used in this study are from a previous study of Oetjen and publicly available for download from the GigaScience GigaDB repository.[10] These data sets were acquired by different MSI ionization methods and were collected from five different biological systems, namely, mouse kidney, mouse pancreas, human colorectal adenocarcinoma, cultured interacting microbial colonies, and human oral squamous cell carcinoma. A brief description of each data set is given in Table .
Table 1

Summary of the 3D MSI Data Sets And Their Computational Processing Time Using HSNE

data setpreservationmass range (kDa)no. tissue sections; (tissue thickness μm)spatial resolution (μm)data set size (no. voxels × no. m/z features)HSNE running time (min)
3D DESI-MSI colorectal carcinomafresh frozen0.2–1.0526; (10)100148 044 × 8073∼10
3D MALDI-MSI mouse kidneyPAXgene2–2073; (3.5)501 362 830 × 7680∼43
3D MALDI-MSI mouse pancreasPAXgene1.6–1529; (5)60497 255 × 13 312>25
3D MALDI-MSI OSCCfresh frozen2–2058; (10)60825 558 × 7680∼30
3D MALDI-MSI atherosclerotic plaquesfresh frozen<15; (10)10010 185 × 20∼5

HSNE on 3D MSI Data

Each 3D MSI data set was organized in a matrix format M in which n is the number of spectra (i.e., number of voxels) and f is the number of m/z features in each spectrum. The HSNE algorithm was applied to M to find a hierarchical and multiscale representation, L. The term L refers to the set of low-dimensional landmarks that represent the data set on scale s. The first scale L1 represents the original data points of M, and landmarks of higher scales are subsets of previous scales (L ⊂ L), in which the landmarks are automatically selected to represent a set of data points. The HSNE algorithm starts at L1 by defining a Finite Markov Chain (FMC) that works as similarity matrix P1 for the data points with linear memory complexity and computational complexity O(n log(n)). Landmarks are selected by computing the stationary distribution of the FMC and selecting the data points whose stationary value is higher than a given threshold, and this step has a computational complexity of O(|L|). The “area of influence” of landmarks in L2 on landmarks in L1 is also computed, which is a probability function that encodes the relatedness of the landmarks in L2 with the data in L1. The calculation of the area of influence has a computational complexity of O(|L|) and a memory complexity that grows linearly with the size L2. Finally, the similarity matrix P2 between the landmarks in L2 is computed as the pairwise overlap of the corresponding areas of influence. To construct a lower dimensional representation (2D) of the landmarks in L2, the t-SNE algorithm is applied using P2 as input instead of the Euclidean distances between the original data points in L1. The power of the HSNE algorithm is to further iterate this process, in which the process above is repeated using P2 as FMC for landmarks in L2 and computing the next hierarchical scale L3 and so on. The application of t-SNE to the landmarks at level L, using the similarity matrix P as input, reveals clusters of landmarks; the hierarchical nature of the landmarks mean that these clusters represent larger structures in the high-dimensional data (and in which the hierarchical level determines the scale of the data structures revealed by the t-SNE analysis). The steps described above result in a hierarchical representation of the data, in which landmarks have been automatically detected as data points that are representative for a group of neighbors in the data space. The t-SNE maps of the landmarks at any level of the data hierarchy can be explored interactively by manually annotating a cluster in the t-SNE maps and drilling into the data underlying the landmarks. Heterogeneity within the larger scale structures can be revealed by first selecting the data within the cluster (given by the area of influence of each landmark contained in the cluster) and creating embeddings at a lower hierarchical level. In this manner HSNE enables a hierarchical exploration of very high-dimensionality data. It should therefore be noted that during generation of the hierarchy, landmarks are selected automatically from the data; during the exploration, subsets of landmarks are selected in this case by manual drawing of clusters of landmarks and subsequently drilling into the data in the level below. For more details of HSNE and t-SNE, we refer the interested reader to the original papers.[38,47] In addition, the source code of the HSNE algorithm has recently been released and is publicly available.[48]

HSNE Spatial Segmentation Maps

Every landmark in the HSNE embedding holds probability values representing the likelihood, for each of the original high-dimensional data points, of belonging to that landmark. The landmarks are located in the HSNE embedding based on their mass spectral similarities. This means that mass spectrometrically similar landmarks cluster together, whereas dissimilar landmarks are located further apart, frequently with clear boundaries between clusters. Here we manually selected clusters that could also be automated using a density-based portioning.[36] Once a cluster of landmarks has been selected, a spatially resolved HSNE segmentation map can be constructed. The HSNE segmentation map is a 3D gray-scale image with intensity values ranging between [0,1]; these reflect the probability of the voxel belonging to the selected landmarks. Voxels of high probability values have a similar mass spectrum to one of the selected landmarks, whereas voxels of low probability values are not represented by that particular selection of landmarks; their similarities are encoded by other landmarks in the HSNE scatter space. The HSNE spatial segmentation maps reveal multiscale spatial structures, and the spatial scale depends on the hierarchical level of the HSNE embedding from which the spatial structures were originally reconstructed. Therefore, finer HSNE spatial structures are typically constructed from landmarks in the HSNE embedding on a finer hierarchical scale and so on. Eventually, an HSNE spatial segmentation map depicts a region of interest that shares similar mass spectral characteristics. Unlike hard clustering techniques such as k-means,[51] the HSNE spatial segmentation map can be considered as a fuzzy-like cluster[52] in which each data point in the entire data set holds a probability of belonging to the cluster.

Spatial Correlations and Corresponding m/z Colocalization

The HSNE segmentation map reflects a specific structure in the 3D MSI data, which can be used to identify the molecular ions that exhibit similar spatial distributions. A colocalized m/z feature is highly expressed in the structure highlighted by the HSNE segmentation map and lowly expressed elsewhere. Colocalized m/z features can be identified by first calculating the Pearson correlation between m/z images and the HSNE segmentation map and then determining those that achieve significant correlation score (p value <0.05). It is possible to identify more than one colocalized m/z feature; however, in this manuscript and for presentation simplicity we opted to visualize only the highest colocalized features.

Results

3D DESI-MSI of Colorectal Carcinoma

The low-dimensional representation generated by HSNE of the 3D DESI-MSI data set of colorectal carcinoma is shown in Figure . The HSNE scatter plots show patterns of landmarks that were projected, at different hierarchical levels, based on their similarities in the high-dimensional space. Figure visualizes the hierarchical representation at three embedding levels, ranging from overview to detailed visualization. Level 3 represents the overview embedding, which visualizes the more global patterns in the data set and separates the tissue foreground from the background. Two clusters representing the background were detected, which is presumed to reflect the heterogeneous nature of the background noise in the original high-dimensionality data. To drill-in to more detailed structures the tissue foreground cluster was selected and a new embedding was constructed at the next level. The level 2 embedding of the tissue foreground revealed two new structures, representing colorectal cancer and connective tissue. This is in agreement with Oetjen et al., who reported two main tissue types (tumor and connective tissue) based on histopathological examination of the tissues.[10]Supplementary Figure S1 demonstrates the close similarity of demarcating tumor from connective tissue in the histological images and the HSNE segmentation maps of level 2. When the cancer and connective tissues were separately subjected to HSNE at the finest hierarchical level, level 1, new structural features were revealed in the HSNE space and associated 3D data volume (Figure ). Figure also shows the Pearson correlation distributions between the HSNE segmentation maps at embedding Level 2 and all of the voxel associated mass spectra as well as the distributions of the ions with highest correlation for cancer and connective tissues, respectively.
Figure 1

Hierarchical analysis of 3D DESI-MSI of colorectal carcinoma data set using the HSNE reveals structural patterns at different hierarchical scales. The overview embedding represents the coarsest level in which generic dominant structures are revealed, namely: background and foreground tissue. Detailed embedding on the tissue foreground reveals two major structures that represent colorectal cancer and connective tissues. At the finest embedding level, more structures are uncovered within each of the colorectal cancer and muscle tissues. The Pearson correlation distribution between HSNE segmentation maps at Level 2 and all of the spectra is presented for cancer and muscle tissue, showing the most localized m/z feature in both tissue classes.

Hierarchical analysis of 3D DESI-MSI of colorectal carcinoma data set using the HSNE reveals structural patterns at different hierarchical scales. The overview embedding represents the coarsest level in which generic dominant structures are revealed, namely: background and foreground tissue. Detailed embedding on the tissue foreground reveals two major structures that represent colorectal cancer and connective tissues. At the finest embedding level, more structures are uncovered within each of the colorectal cancer and muscle tissues. The Pearson correlation distribution between HSNE segmentation maps at Level 2 and all of the spectra is presented for cancer and muscle tissue, showing the most localized m/z feature in both tissue classes. The HSNE algorithm automatically constructed the three hierarchical levels in 10 min on a PC workstation with a 3.5 GHz Intel Xeon processor and 128 GB memory, resulting in the overview embedding. The subsequent, more detailed embeddings required 2 min or less to be visualized based on landmark selection at the previous embedding level.

3D MALDI-MSI of Mouse Kidney

The 40 GB 3D MALDI-MSI data set of the mouse kidney was analyzed using the HSNE pipeline, and the resulting structural patterns are shown in Figure . The HSNE algorithm automatically constructed four hierarchical levels from this large data set, which were computed in ∼43 min on the same PC referred to above. For ease of visualization the structures at hierarchical embedding level 2 were selected and are presented in Figure a in the HSNE space; Figure b shows the associated 3D HSNE segmentation images (which displays each voxel’s probability of belonging to the selected cluster of landmarks). In agreement with Trede et al.[9] who previously processed this mouse kidney data set at reduced size, four main anatomical structures in the mouse kidney were identified, but in this instance the calculation was performed on the full data set and revealed finer spatial detail. Figure c,d shows the four regions as false-color 3D volumes, specifically the renal cortex (red), renal medulla (green), renal pelvis (blue), and the surrounding of the renal pelvis (yellow). Of note, the landmarks not selected in the level 2 embedding represent noise-related structures; see Supplementary Figure S2.
Figure 2

Analysis of 3D MALDI-MSI data of a mouse kidney using the HSNE: (a) HSNE scatter plot showing the spectral similarities as landmarks in a low-dimensional representation and (b) HSNE spatial structures based on the landmarks selection in panel a. The identified four anatomical structures with distinct spectral signatures were merged into a single 3D image (c,d) representing: renal cortex (red), renal medulla (green), renal pelvis (blue), and surrounding of renal pelvis (yellow). The multiorthoslice view in panel d allows in-depth visualization of the identified features.

Analysis of 3D MALDI-MSI data of a mouse kidney using the HSNE: (a) HSNE scatter plot showing the spectral similarities as landmarks in a low-dimensional representation and (b) HSNE spatial structures based on the landmarks selection in panel a. The identified four anatomical structures with distinct spectral signatures were merged into a single 3D image (c,d) representing: renal cortex (red), renal medulla (green), renal pelvis (blue), and surrounding of renal pelvis (yellow). The multiorthoslice view in panel d allows in-depth visualization of the identified features. The 3D structures corresponding to the tissue clusters identified by the HSNE were then used to identify which molecular ions exhibited highly correlated colocalization. The Pearson correlation between the 3D HSNE spatial clusters and the spectral images were calculated (see Supplementary Figure S3), and the colocalized m/z features with the highest correlations were identified and are shown in Figure . Supplementary Figure S4 shows the 3D projections of these colocalized ion features.
Figure 3

Visualization of the most colocalized 3D m/z features with respect to the associated HSNE spatial segmentation maps of the 3D MALDI-MSI mouse kidney data set.

Visualization of the most colocalized 3D m/z features with respect to the associated HSNE spatial segmentation maps of the 3D MALDI-MSI mouse kidney data set.

3D MALDI-MSI of Mouse Pancreas

The 3D MALDI-MSI data set of the mouse pancreas was analyzed using the HSNE pipeline, and the resulting structural patterns are shown in Figure . Three hierarchical embedding levels were automatically constructed, and the HSNE running time is reported in Table . The coarser embedding at level 3 differentiated between noise and two tissue structures, termed structure 1 and structure 2 (Figure a). No additional structural information was revealed within structure 1 at subsequent embedding levels, and so correlation analysis was computed at this level and revealed spatially correlated mass spectral noise; see Supplementary Figure S5. For tissue structure 2 (h-SNE level 3) a more detailed embedding at the next level revealed a highly structured data space (Figure b). Close examination of the HSNE map revealed the data structures distinguished highly localized regions characterized by distinct molecular profiles (red cluster), outlier tissue sections (purple cluster), and spatially correlated mass spectral noise (blue and green clusters). Each of these structures is defined by distinct mass spectral profiles and 3D spatial distributions (Figure c,d, respectively). The protein ion that displayed the greatest colocalization with the red cluster, m/z 5805.54, was reported by Oetjen et al.[10] in the original benchmark 3D MSI data sets paper as insulin. Insulin is produced by the beta cells in islets of Langerhans, highly localized endocrine tissue in the pancreas, which are known to exhibit very distinct spatial and molecular profiles. HSNE enabled highly localized features to be rapidly identified in a large 3D MSI data set, even when that data set contained outlier tissue sections and significant spatially correlated noise.
Figure 4

Analysis of 3D MALDI-MSI of mouse pancreas data set using the HSNE reveals structural patterns at different hierarchical scales. The detailed embedding at level 2 reveals three spectrally distinct clusters given in panel b and colored red, green, and blue. The spatial correlation between each of the clusters identified in panel b and the spectral information was computed (c), and the highest localized m/z features were identified (d). The m/z value of 5805.54, which is colocalized with the red cluster given in panel b, was previously identified as insulin.

Analysis of 3D MALDI-MSI of mouse pancreas data set using the HSNE reveals structural patterns at different hierarchical scales. The detailed embedding at level 2 reveals three spectrally distinct clusters given in panel b and colored red, green, and blue. The spatial correlation between each of the clusters identified in panel b and the spectral information was computed (c), and the highest localized m/z features were identified (d). The m/z value of 5805.54, which is colocalized with the red cluster given in panel b, was previously identified as insulin. Drilling-in to the subsequent finer hierarchical level (level 1), no new structures were identified and therefore we based our results on the two embedding levels presented in Figure a,b.

3D MALDI-MSI of Human Oral Squamous Cell Carcinoma (OSCC)

The 3D MALDI-MSI data set of OSCC was analyzed using the HSNE pipeline, and the resulting structural patterns are shown in Figure . Three hierarchical embedding levels were automatically constructed, which were computed in less than half an hour on the same PC referred to above (Table ). The coarser embedding at level 3 distinguished two dominant patterns, namely, noise and tissue structure (Figure a). A more detailed embedding of the tissue foreground was constructed at hierarchical level 2 (Figure b) and revealed three structures. The correlation distribution between the 3D HSNE cluster maps and the 3D MSI data is shown in Figure c. The molecular ions with the highest colocalization metrics were identified, and their 3D distributions are shown in Figure d. The peptide ions at m/z 3486, 3443, and 3372 were strongly colocalized with the yellow HSNE cluster and were previously reported by Oetjen et al.[10] as defensins HNP1–3, peptides produced by neutrophils (HNP refers to Human Neutrophil Peptide). The mass spectra associated with the red and blue clusters were similar, consisting of the same peptide and protein ions but with different relative intensities. Close examination of the 3D distributions revealed that the red cluster was characterized by a batch effect, in which a number of tissue sections (tissue section numbers 31, 32, and 33) were characterized by very intense thymosin β4 signals, which can be observed as white banding in Figure d. Supplementary Figure S6a shows a comparison of the average mass spectra from tissue section number 1 and tissue section number 31, one of those exhibiting a strong batch effect, for the thymosin β4 signals. Close examination of the spectra also indicated small mass shifts between the spectra; the HSNE algorithm does not include a mass spectral alignment step, and so such misalignment of spectra would be interpreted as different molecular signatures and their separation into separate clusters. Supplementary Figure S6b shows the batch-affected tissue sections are localized to specific regions of the 3D MSI data set. Nevertheless, as with the 3D MSI data set of pancreas, HSNE enabled meaningful conclusions to be rapidly extracted from a large 3D MSI data set, even if it contained batch effects (intense mass spectral peaks and mass spectral misalignment).
Figure 5

Analysis of 3D MALDI-MSI of human oral squamous cell carcinoma data set using the HSNE reveals structural patterns on different hierarchical scales. The correlation analysis (c) allows us to identify the most colocalized m/z features (d) with the HSNE spatial structures (b).

Analysis of 3D MALDI-MSI of human oral squamous cell carcinoma data set using the HSNE reveals structural patterns on different hierarchical scales. The correlation analysis (c) allows us to identify the most colocalized m/z features (d) with the HSNE spatial structures (b).

3D MALDI-MSI of Human Atherosclerotic Plaques

The 3D MALDI-MSI data set of human atherosclerotic plaques was analyzed by the HSNE pipeline, and the resulting structural patterns are shown in Supplementary Figure S7. On the basis of the data distribution, two hierarchical embedding levels were automatically constructed. The coarser embedding at level 2 distinguished two dominant mass spectral patterns that distinguished the inner plaque (yellow cluster) from the rest of the tissue (orange cluster), as depicted in Supplementary Figure S7. A more detailed embedding was constructed at hierarchical level 1, and it not only revealed informative structures for plaque core and outer plaque (red and green clusters, respectively) but also revealed another structure within the inner plaque (blue cluster). The results are in concordance with those previously reported by Patterson et al.;[50] that is, we identified distinct molecular patterns in three main regions, namely: (1) fibrous cap (inner plaque), (2) plaque core, and (3) outer plaque (connective tissue). However, our results depict more heterogeneity within the inner and middle plaque regions. This might reflect the power of HSNE in preserving local structures of the high-dimensional spectra and thus preserves the original nonlinear manifold in the lower dimensional space.

Discussion

The proposed methodology is the first of its kind, to the best of our knowledge, to handle the computational challenges of 3D MSI data analysis at full spatial and mass spectral resolution and in a reasonable time frame while maintaining high accuracy. We have shown the efficiency of this pipeline in analyzing 3D MSI data sets collected from four different biological systems and acquired by different mass spectrometers. The backbone of this methodology is HSNE, which first constructs a hierarchical representation of the high-dimensional data using the landmarks and then interactive construction of a hierarchy of t-SNE embeddings. The former assures high speed as it uses only a representative subset (i.e., landmarks) of the full data.[47] This reduces computational overhead while maintaining the nonlinear structure of the data, thus enabling the analysis of the millions of high-dimensional voxels encountered in 3D MSI. Interactively selecting clusters throughout the HSNE hierarchy allows the spatial structure of the 3D MSI data to be readily investigated. We demonstrate that by correlating these 3D HSNE segmentation maps with the original 3D MSI m/z features (full spatial and mass spectral resolution) the individual tissue specific features can be identified. The presented computational pipeline has proven to be highly efficient for the spatio-chemical segmentation of 3D MSI data and the identification of associated colocalized molecular features. The segmentation maps obtained using HSNE represent regions of interest that capture and summarize molecular patterns in the high-dimensionality, spatially resolved molecular data. For the 3D MSI mouse kidney data, we have achieved much finer spatial segmentation compared with the coarser results previously reported,[9] which thus allowed better colocalization of ion features. In the previous analysis, the 3D data set was reduced to the molecular features retained by peak-picking the MALDI MSI data. The linear MALDI-TOF mass spectrometer used for these measurements is characterized by its low mass resolution, often leading to broad peaks that may not be reliably peak-picked.[53] McDonnell et al. have reported mass spectral representations, to which peak-picking algorithms designed for linear MALDI-TOF measurements have been applied, to increase peak-picking efficiency.[54] Nevertheless, peak picking of linear MALDI-TOF data often leads to information loss due to inefficient peak detection. Here HSNE enabled the analysis of the complete data matrix of full spectra from all voxels, without peak picking. For the 3D MSI data set of colorectal carcinoma, the HSNE spatial segmentation maps distinguished between the tumor and connective tissues and were found to be in close agreement with the histological images; see Supplementary Figure S1. For the 3D MSI data set of mouse pancreas the HSNE analysis revealed structures consistent with the known anatomy of the pancreas; for example, that characterized by insulin (m/z 5805.54) and other peptides demarcated the islets of Langerhans.[10] Similarly, for the 3D MSI OSCC data set, the HSNE analysis identified several molecularly distinct 3D structures, one of which was characterized by colocalized defensins, small proteins produced by neutrophil infiltration into the tumor, and was reported previously.[10,55] Furthermore, HSNE enables these insights to be readily attained even in data sets compromised by batch affects, spatially correlated noise, and mass spectral misalignment. The HSNE analysis has the ability to process 3D MSI data at full spectral and full spatial resolution. The HSNE constructs scatter plots showing the distribution of the landmarks based on the similarity of their mass spectral profiles, in the full high-dimensional space. However, to construct spatially mapped HSNE structures, first a set of landmarks is selected. Here clusters of closely spaced landmarks were manually selected but could be automated by using, for example, density partitioning algorithms such as ACCENSE.[36] By default, HSNE does not consider the spatial origin of each voxel’s mass spectrum when analyzing the 3D MSI data. Therefore, it is not strictly required to register the sequential tissue sections into a 3D volume for the HSNE analysis. However, the image registration is highly valuable for the visualization and assessment of the 3D HSNE segmentation maps. Recent technical developments could allow the registration to be automatically performed using, for example, the t-SNE based registration pipeline presented by Abdelmoula et al.[42] In this previous work t-SNE was used to create a segmentation map that summarized the spatial correspondences in a tissue section’s MSI data set. This segmentation map was then used to register the MSI data to a histological image of the tissue section. For 3D MSI, a similar approach can be used to coregister the MSI data sets from sequential tissue sections; namely, the global registration parameters (e.g., rotation and translation) are corrected using each tissue section’s individual t-SNE segmentation map. Supplementary Figure S8 shows a 3D image of the protein ion at m/z 6257.9 in the mouse kidney 3D MALDI MSI data set, which is localized to the renal cortex. It can be seen that the 3D volume from the original publication[10] (Figure S8a) contains several discontinuities, which are due to errors during registration of the sequential tissue sections. These discontinuities could be removed after automatic t-SNE-based registration using only an Euler transform (rotation and translation)[56] (Figure S8b). One of the challenges facing the automated creation of 3D MSI volumes concerns the deformations that may arise during tissue processing: The nonlinear registrations needed to correct such deformations will require geometrical constraints to preserve the original tissue shape and that could be provided by a reference such as a block-face image or an in vivo image (such as MRI) of the tissue volume before sectioning. Other recent algorithms have also focused on alleviating the scalability issue of t-SNE, such as Largevis[57] and approximated-tSNE[58] (A-tSNE). Both algorithms focus primarily on accelerating the KNN-graph creation, a computationally very intensive step of the original t-SNE algorithm, but lack the multiscale representation of HSNE. This is an important distinction because it means HSNE is implicitly more scalable in terms of computational and memory complexity and avoids the crowded maps that result from analyzing millions of data points and that would otherwise hinder the identification of clusters.[48] The ability of the HSNE to handle large volumes of high-dimensional data with reasonable computational and memory complexity makes it promising for other biological application areas that face similar computational challenges, particularly areas of neurology and cancer research. For example, HSNE holds potential for the analysis of spatially resolved omics,[59] especially with subcellular spatial resolution, such as those produced by array tomography,[60] spatial transcriptomics,[61] and imaging mass cytometry.[62,63]

Concluding Remarks

We presented a computational pipeline to analyze the volumes of 3D MSI with reasonable computational and memory complexities while maintaining accuracy at full spatial and spectral resolution. This would impact the application areas of 3D MSI as it can reveal, relatively fast and in an interactive data driven manner, multiscale molecular structures that might hold biological interest. These structures are otherwise very computationally difficult to identify using alternative pipelines.
  49 in total

1.  Learning the parts of objects by non-negative matrix factorization.

Authors:  D D Lee; H S Seung
Journal:  Nature       Date:  1999-10-21       Impact factor: 49.962

2.  Imaging mass spectrometry data reduction: automated feature identification and extraction.

Authors:  Liam A McDonnell; Alexandra van Remoortere; Nico de Velde; René J M van Zeijl; André M Deelder
Journal:  J Am Soc Mass Spectrom       Date:  2010-08-21       Impact factor: 3.109

Review 3.  Molecular imaging by mass spectrometry--looking beyond classical histology.

Authors:  Kristina Schwamborn; Richard M Caprioli
Journal:  Nat Rev Cancer       Date:  2010-08-19       Impact factor: 60.716

4.  Mass spectrometry imaging as a tool for surgical decision-making.

Authors:  David Calligaris; Isaiah Norton; Daniel R Feldman; Jennifer L Ide; Ian F Dunn; Livia S Eberlin; R Graham Cooks; Ferenc A Jolesz; Alexandra J Golby; Sandro Santagata; Nathalie Y Agar
Journal:  J Mass Spectrom       Date:  2013-11       Impact factor: 1.982

Review 5.  Spatially resolved transcriptomics and beyond.

Authors:  Nicola Crosetto; Magda Bienko; Alexander van Oudenaarden
Journal:  Nat Rev Genet       Date:  2014-12-02       Impact factor: 53.242

6.  Visualization and analysis of gene expression in tissue sections by spatial transcriptomics.

Authors:  Patrik L Ståhl; Fredrik Salmén; Sanja Vickovic; Anna Lundmark; José Fernández Navarro; Jens Magnusson; Stefania Giacomello; Michaela Asp; Jakub O Westholm; Mikael Huss; Annelie Mollbrink; Sten Linnarsson; Simone Codeluppi; Åke Borg; Fredrik Pontén; Paul Igor Costea; Pelin Sahlén; Jan Mulder; Olaf Bergmann; Joakim Lundeberg; Jonas Frisén
Journal:  Science       Date:  2016-07-01       Impact factor: 47.728

7.  Chemo-informatic strategy for imaging mass spectrometry-based hyperspectral profiling of lipid signatures in colorectal cancer.

Authors:  Kirill A Veselkov; Reza Mirnezami; Nicole Strittmatter; Robert D Goldin; James Kinross; Abigail V M Speller; Tigran Abramov; Emrys A Jones; Ara Darzi; Elaine Holmes; Jeremy K Nicholson; Zoltan Takats
Journal:  Proc Natl Acad Sci U S A       Date:  2014-01-07       Impact factor: 11.205

8.  Visualizing the spatial gene expression organization in the brain through non-linear similarity embeddings.

Authors:  Ahmed Mahfouz; Martijn van de Giessen; Laurens van der Maaten; Sjoerd Huisman; Marcel Reinders; Michael J Hawrylycz; Boudewijn P F Lelieveldt
Journal:  Methods       Date:  2014-10-16       Impact factor: 3.608

9.  Human neutrophil peptides 1-3 as gastric cancer tissue markers measured by MALDI-imaging mass spectrometry: implications for infiltrated neutrophils as a tumor target.

Authors:  Chun-Chia Cheng; Jungshan Chang; Ling-Yun Chen; Ai-Sheng Ho; Ker-Jer Huang; Shui-Cheng Lee; Fu-Der Mai; Chun-Chao Chang
Journal:  Dis Markers       Date:  2012       Impact factor: 3.434

10.  Visual analysis of mass cytometry data by hierarchical stochastic neighbour embedding reveals rare cell types.

Authors:  Vincent van Unen; Thomas Höllt; Nicola Pezzotti; Na Li; Marcel J T Reinders; Elmar Eisemann; Frits Koning; Anna Vilanova; Boudewijn P F Lelieveldt
Journal:  Nat Commun       Date:  2017-11-23       Impact factor: 14.919

View more
  11 in total

1.  Automatic 3D Nonlinear Registration of Mass Spectrometry Imaging and Magnetic Resonance Imaging Data.

Authors:  Walid M Abdelmoula; Michael S Regan; Begona G C Lopez; Elizabeth C Randall; Sean Lawler; Ann C Mladek; Michal O Nowicki; Bianca M Marin; Jeffrey N Agar; Kristin R Swanson; Tina Kapur; Jann N Sarkaria; William Wells; Nathalie Y R Agar
Journal:  Anal Chem       Date:  2019-04-22       Impact factor: 6.986

2.  Spatial Segmentation of Mass Spectrometry Imaging Data by Combining Multivariate Clustering and Univariate Thresholding.

Authors:  Hang Hu; Ruichuan Yin; Hilary M Brown; Julia Laskin
Journal:  Anal Chem       Date:  2021-02-11       Impact factor: 6.986

Review 3.  Unsupervised machine learning for exploratory data analysis in imaging mass spectrometry.

Authors:  Nico Verbeeck; Richard M Caprioli; Raf Van de Plas
Journal:  Mass Spectrom Rev       Date:  2019-10-11       Impact factor: 10.946

Review 4.  MALDI-MSI Towards Multimodal Imaging: Challenges and Perspectives.

Authors:  Michael Tuck; Florent Grélard; Landry Blanc; Nicolas Desbenoit
Journal:  Front Chem       Date:  2022-05-09       Impact factor: 5.545

Review 5.  ADVANCES IN HIGH-RESOLUTION MALDI MASS SPECTROMETRY FOR NEUROBIOLOGY.

Authors:  Kellen DeLaney; Ashley Phetsanthad; Lingjun Li
Journal:  Mass Spectrom Rev       Date:  2020-11-09       Impact factor: 10.946

6.  M2aia-Interactive, fast, and memory-efficient analysis of 2D and 3D multi-modal mass spectrometry imaging data.

Authors:  Jonas Cordes; Thomas Enzlein; Christian Marsching; Marven Hinze; Sandy Engelhardt; Carsten Hopf; Ivo Wolf
Journal:  Gigascience       Date:  2021-07-20       Impact factor: 6.524

7.  Innate and adaptive nasal mucosal immune responses following experimental human pneumococcal colonization.

Authors:  Simon P Jochems; Karin de Ruiter; Carla Solórzano; Astrid Voskamp; Elena Mitsi; Elissavet Nikolaou; Beatriz F Carniel; Sherin Pojar; Esther L German; Jesús Reiné; Alessandra Soares-Schanoski; Helen Hill; Rachel Robinson; Angela D Hyder-Wright; Caroline M Weight; Pascal F Durrenberger; Robert S Heyderman; Stephen B Gordon; Hermelijn H Smits; Britta C Urban; Jamie Rylance; Andrea M Collins; Mark D Wilkie; Lepa Lazarova; Samuel C Leong; Maria Yazdanbakhsh; Daniela M Ferreira
Journal:  J Clin Invest       Date:  2019-07-30       Impact factor: 19.456

8.  Three-dimensional (3D) imaging of lipids in skin tissues with infrared matrix-assisted laser desorption electrospray ionization (MALDESI) mass spectrometry.

Authors:  Hongxia Bai; Keith E Linder; David C Muddiman
Journal:  Anal Bioanal Chem       Date:  2021-01-02       Impact factor: 4.142

9.  Chemometric Strategies for Sensitive Annotation and Validation of Anatomical Regions of Interest in Complex Imaging Mass Spectrometry Data.

Authors:  Patrick M Wehrli; Wojciech Michno; Kaj Blennow; Henrik Zetterberg; Jörg Hanrieder
Journal:  J Am Soc Mass Spectrom       Date:  2019-09-16       Impact factor: 3.109

Review 10.  Experimental and Data Analysis Considerations for Three-Dimensional Mass Spectrometry Imaging in Biomedical Research.

Authors:  D R N Vos; S R Ellis; B Balluff; R M A Heeren
Journal:  Mol Imaging Biol       Date:  2020-10-06       Impact factor: 3.488

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.