Literature DB >> 35704658

SPF: A spatial and functional data analytic approach to cell imaging data.

Thao Vu¹, Julia Wrobel¹, Benjamin G Bitler^2,3, Erin L Schenk⁴, Kimberly R Jordan⁵, Debashis Ghosh¹.

Abstract

The tumor microenvironment (TME), which characterizes the tumor and its surroundings, plays a critical role in understanding cancer development and progression. Recent advances in imaging techniques enable researchers to study spatial structure of the TME at a single-cell level. Investigating spatial patterns and interactions of cell subtypes within the TME provides useful insights into how cells with different biological purposes behave, which may consequentially impact a subject's clinical outcomes. We utilize a class of well-known spatial summary statistics, the K-function and its variants, to explore inter-cell dependence as a function of distances between cells. Using techniques from functional data analysis, we introduce an approach to model the association between these summary spatial functions and subject-level outcomes, while controlling for other clinical scalar predictors such as age and disease stage. In particular, we leverage the additive functional Cox regression model (AFCM) to study the nonlinear impact of spatial interaction between tumor and stromal cells on overall survival in patients with non-small cell lung cancer, using multiplex immunohistochemistry (mIHC) data. The applicability of our approach is further validated using a publicly available multiplexed ion beam imaging (MIBI) triple-negative breast cancer dataset.

Entities: Chemical

Mesh：

Year: 2022 PMID： 35704658 PMCID： PMC9239468 DOI： 10.1371/journal.pcbi.1009486

Source DB: PubMed Journal: PLoS Comput Biol ISSN： 1553-734X Impact factor: 4.779

This is a PLOS Computational Biology Methods paper.

1 Introduction

The tumor microenvironment (TME), which consists of tumor cells, stromal cells, immune cells and the extracellular matrix, plays a critical role in understanding cancer development and progression [1, 2]. The concept of the TME has been around for several centuries; it was first documented by Virchow in 1863, characterizing the relationship between inflammation and tumor pathology [3, 4]. Later, the emergence of Paget’s “seed and soil” principle in 1889 further emphasized the relationship between primary tumors and their microenvironment in influencing tumor evolution [5]. However, many TME studies were not widely recognized in the field of cancer research until the late 1970s and 1980s [4]. Then, there were a number of novel discoveries indicating the influences of TME-induced signals on cancer cells and their progression [6-9]. In particular, an experimental model proposed by Tarin et al. highlighted the role of microenvironments in metastasis potential of primary tumors in mice [10]. Additionally, in 1989, Ferrara’s research group discovered vascular endothelial growth factor (VEGF) and its ability to induce angiogenesis, which then became a driving force for anti-angiogenic cancer research [11]. The TME is known to be complex and heterogeneous due to the continuous cellular and molecular adaptations in primary tumor cells that allow for tumor growth and proliferation. Accurately characterizing such heterogeneity is essential in gaining a better understanding of cancer and developing more effective treatment strategies. With advancements in technology, great progress has been made in high parameter imaging of tissues in situ to allow simultaneous quantification and visualization of individual cells in tissue sections. More specifically, multiplex tissue imaging (MTI) [12] methods such as cyclic immunoflourescence (CyCIF) [13], CO-Dectection by indEXing (CODEX) [14], multiplex immunohistochemistry (mIHC) [15], imaging mass cytometry (IMC) [16], and multiplex ion beam imaging (MIBI) [17] are capable of measuring the expression of tens of markers at single-cell resolution while preserving the spatial distribution of cells. As an example, multiplex immunohistochemistry (mIHC) detects and visualizes specific antigens in cells of a tissue section by utilizing antibody-antigen reactions coupled to a flourescent dye or an enzyme [18, 19]. Another instance includes multiplexed ion beam imaging (MIBI) [17], which utilizes secondary ion mass spectrometry to image metal-conjugated antibodies. As such, MIBI enables single-cell analysis of up to 100 parameters without spectral overlap between channels. Altogether, imaging provides an additional dimension of spatial resolution to the single cell signature profiles, which in turn allows researchers not only to study cellular composition but also to make inferences about specific cell-cell interactions. As individual cells within the TME are genetically and epigenetically varied, they are competing with each other for space and resources (i.e., oxygen, nutrients, etc.) [20]. This is analogous to diverse species in their natural habitats as seen in ecology [21]. Typical ecological studies often involves examining spatial structures of species in a given habitat. Thus, leveraging analysis tools developed in ecology may be beneficial in studying spatial cell-cell interactions in the tumor ecosystem. For instance, Alfarouk et al. [22] highlighted regional variations in the distribution of cancer cells in relative proximity to blood vessels. This has an immediate analogy to vegetation around waterways in a riparian ecosystem. Statistics characterizing the distribution of spatial distances are typically used to investigate spatial patterns of cells in the TME [23]. For example, some quantitative measures exist, such as the Morisita-Horn index [24], which has been employed by Maley et al. [25] to capture similarity between two local communities, i.e., tumor and immune cells in breast cancer. In addition, the intratumor lymphocyte ratio is another measure used to quantify the degree of infiltration of immune cells into the tumor [26], which has been shown to have prognostic potential. In spatial statistics terminology, a collection of cells in the TME can be considered as a point pattern generated from a point process. If the cells are a realization of a spatial process assuming complete spatial randomness (CSR), then cells do not have preference for any spatial location. In other words, cells are randomly scattered in a given region of study, in this case the TME. Under this assumption, any deviations in spatial patterns from the null model of CSR could potentially provide some useful insights into how genetically heterogeneous cells behave. In addition to spatial locations, each point in a pattern can be associated with attributes as referred to as a “mark”, which can be numerical (e.g., expression intensity for a given protein, cell size, etc.) and/or categorical (e.g., cell types: immune cells, tumor cells, macrophages, etc.). Such a point pattern is known as a marked point process. The K-function, a popular summary statistic proposed by Ripley [23], has been used to capture interpoint dependence with regard to distances between points in a point pattern. Several transformations of the K-function have also been introduced to explore not only spatial association of points but also the variation in the corresponding mark values. Patrick et al. recently utilizes the K-function as an exploratory analysis tool to identify and summarize complex spatial localization of multiple cell types within the TME, with applications on multiplexed imaging cytometry data [27]. Similarly, Canete et al. introduced the R package spicyR to relate changes in spatial localization of different cell types across subjects with disease progression [28], utilizing a localization score. The score, which is calculated as a integrated difference between the observed and expected L-function, i.e., the variance stabilized K-function, summarizes the spatial attraction or avoidance between pairs of different cell types. Following a different approach, Barua et al. considers the G-cross function to quantitatively differentiate the colocalization between tumor cells and infiltrative immune cells versus tumor cells and noninfiltrative cells as a function of cell distance [29]. Area under G-cross curves (AUC) is then incorporated with clinical factors including patient age, smoking history, and disease stage to find association with patient overall survival through univariate Cox proportional hazard regression model [30]. However, summarizing each curve into a single value might lose some information regarding the progression patterns as the inter-cell distance increases. Herein, we introduce an approach, which leverages both spatial statistic summaries as well as a recently published model, the Additive Functional Cox Model (AFCM) [31], to incorporate spatial heterogeneity of cells available in histological images as functional covariates in a regression framework. Such an approach allows for the addition of other clinical variables to study their impact on patient survival. More precisely, we employ a derivative of the K-function, the mark connection function (mcf) [32], to qualitatively investigate the spatial patterns of the “seed” and “soil” factors in the TME such as primary tumor versus stromal cells, primary tumor versus immune cells, etc. In addition, the correlation in the expression level of each marker between neighboring cell types can be quantified using Moran’s I statistic [33]. Depending on the research problem, the proposed framework provides the flexibility in modelling qualitative and/or quantitative spatial information. The remainder of the article is organized as follows. Section 2 describes in detail summary spatial functions and how to incorporate them into the Cox model. Section 3 presents the results on real datasets, and Section 4 describes the simulation framework. Finally, we provide concluding remarks and discussion in Section 5.

2 Materials and methods

The main goal of our proposed approach is to model the spatial heterogeneity of cells in histological images and patient overall survival in a functional data analysis framework, in addition to other scalar clinical variables. As such, summary functions capturing the spatial architecture of cells in each image are necessary inputs of our model. We utilize mark connection function (mcf) [32] and Moran’s I statistic [33] to represent qualitative and quantitative spatial information, respectively. We then model the available spatial input functions using the Additive Functional Cox Model. Details are given below.

2.1 Spatial summary functions

Summary functions for qualitatively marked point patterns

Qualitatively marked patterns consist of different types of points. An example includes cells in a given image with labels showing different cell types or tissue categories. Some numerical measures are typically employed to summarize any important feature of a given point pattern such as nearest-neighbor distance, which measures the average distance from a cell of one type to its closest cell of other type in the same pattern. Furthermore, some spatial functions are also used to further detect and quantify patterns using the density of points that are r units apart. In particular, the mark connection function (mcf) [32, 34] is used to reveal interesting relationships between points belonging to two different types i and j, referred to as cross-type points. Let g(r) be the pair correlation function between types i and j and g(r) be the pair correlation function for an unmarked process which disregards any label associated with points in the pattern. This pair correlation function g(r) is used as an alternative to the K-function and describes the distribution of interpoint distances equal to r and, for the unmarked process, is defined by where K′(r) is the derivative of the K-function with respect to distance r. Similarly, with being the derivative of K(r). Note that is the cross-type K-function capturing the expected number of points of type j lying within r unit distance of a typical point of type i. Then, we define the mcf as where λ and λ are intensity functions of types i and j respectively; λ• is the intensity function of unmarked point process. U and V are two separate subsets of the pattern, referred to as subpatterns, which contain points of types i and j, respectively. Here, λ and λ are empirically estimated as the ratio of the number of points of U and V and the image area W. In other words, λ = n(u)/|W| and λ = n(v)/|W| with n(u) and n(v) the number of observed points in u and v respectively, and |W| as the area of an observed image. Similarly, the unmarked intensity is estimated as the sum of individual marked intensity functions in a point process, as λ• = λ + λ. Higher values of the mark-connection function denote more cooccurrences of points of types i and j, while the reverse holds for smaller values. Also note that the mark connection makes an assumption of isotropy, i.e., the function value between two points only depends on the distance between the two points and not on the location. Panels (A) and (B) of Fig 1 illustrate the distributions of tumor and stromal cells in two representative subjects, while Fig 1C and 1D display the mcf curves that capture the spatial interactions of the tumor and stroma subpatterns with respect to inter-cell distance. At short distances (small r values), cells of each type tend to cluster, leading to mcf values below the complete randomness indicating line at 1. However, as r increases, stromal and tumor cells start mixing in (Fig 1A), which leads to more cross-type cell interactions. As a result, the corresponding mcf values become slightly larger than 1 toward the end of the curve (Fig 1C).

Fig 1

Examples of mark connection functions (mcf).

Tumor (turquoise) and stromal (red) cell distributions of two patients’ images: (A) 77 and (B) 94, respectively. Corresponding mcf curves are shown in (C) and (D), respectively. Dashed horizontal lines representing mcf under complete randomness. Mcf values below 1 indicate strong clustering of cells of same type, while values above 1 suggest higher level of mixing in of the two cell types.

Examples of mark connection functions (mcf).

Summary functions for numerical marks of cross-type pairs of points

While the aforementioned mcf provides a measure of spatial proximity of cross-type points, Moran’s I correlation [33] incorporates additional information by taking into account a continuous value associated with each subpattern. For example, if the two subpatterns of interest are tumor and stromal cells, Moran’s I allows quantification of subpattern interaction plus additional incorporation of a protein functional marker such as MHCII. The Moran’s I correlation of the mth mark between points of the two subpatterns i and j, which are r units apart, denoted as is calculated as follows where is the sth element of the mth mark in i subpattern, is the tth element of the mth mark in j subpattern. Note that and are the mean marker intensities of type i and j points, respectively, while is the overall mean intensity of both types. Fig 2 overlays Moran’s I curves across all subjects, which depicts correlation in MHCII marker intensity between tumor and stromal cells as a function of distance r for two groups of patients in the lung cancer dataset, (A) MHCII and (B) MHCII [35]. Similar patterns can be observed across the two panels. Specifically, tumor cells and their immediate neighboring stromal cells are negatively correlated in MHCII expression when they are less than 10 μm apart. As the cross-type cell distance increases, the correlation rises at different rates, and eventually drops close to 0. A more detailed discussion is included in Section 3.1.

Fig 2

Moran’s I correlation between tumor and stromal cells using MHCII in NSCLC.

(A) MHCII group. (B) MHCII group. Patients with more than 5% tumor cells positively expressed with MHCII were classified into MHCII group. Negative Moran’s I values indicate an inverse relationship in MHCII expression between tumor and stromal cells. Moran’s I values above 0 suggest a direct relationship in MHCII expression between tumor and stromal cells. Dashed red vertical line in each panel corresponds to distance r = 10 μm.

Moran’s I correlation between tumor and stromal cells using MHCII in NSCLC.

2.2 Model

The aforementioned functions can be considered as summarized features capturing spatial interaction between individual cells for each subject. In order to investigate the association between cell-level spatial effect and patient survival, we leverage a previously published model, the Additive Functional Cox Model (AFCM) [31]. This model allows us to incorporate each subject’s spatial summary function as functional covariates in addition to other scalar clinical variables such as age, sex, and disease stage. In particular, we model the log hazard function for each ith subject i = 1, …, N with T and C respectively denoting the corresponding survival time and censoring time, which are assumed to be conditionally independent given covariates in the model. Note that due to right-censoring, we only observe Y = min(T, C). For i = 1, …, n, let δ = I(T ≤ C) serve as a censoring indicator. For each ith subject, we observe p scalar clinical variables as Z = {Z, Z, …, Z} and functional covariates which is a continuous and integrable curve defined on a compact interval. We assume that X is observed on a grid of points. In our setting, X will be a spatial summary for subject i, such as a mark connection function or Moran’s I function. Following Cui et al. [31], we jointly model the effect of cell-level spatial interactions with clinical variables on a subject’s risk of mortality as follows where log λ[t; Z, X] is the log hazard at time t, given scalar covariates Z and functional covariates X(s), i = 1, …, n. In Eq (3), log λ0(t) is the log baseline hazard function, and the parameter vector represents a multiplicative change in log hazard ratios for a one-unit increase in Z. We also have F, an unspecified smooth function to be estimated. As described in McLean et al. [36], F(.) can be modelled as a tensor product of two penalized spline bases: where B(s) and B(x) are splines defined on functional domain s and functional covariate domain x, respectively. θ for j = 1, …, K; k = 1, …, K are spline coefficients. Following the same approach as Cui et al. [31], we apply cubic regression splines for both domains s and x. Combining Eqs (3) and (4), the model can be rewritten as: where, = (, ), with is a vector of entries θ; , with is a vector of entries , j = 1, 2, …, K, k = 1, 2, …, K. The parameters are estimated by maximizing the penalized partial log-likelihood with the smoothing parameter imposed on . The penalized partial log-likelihood is defined as follows where the penalty term can be expressed as a quadratic term with is a symmetric, non-negative definite penalty matrix. For a given smoothing parameter λ, the regression coefficients are estimated as: using Newton-Raphson procedure, provided the gradient vector and Hessian matrix . Following Wood et al. [37], the selection of the smoothing parameter λ is done by optimizing the log Laplace approximate marginal likelihood, which can be expressed as where, λ = λ and |λ|+ is the product of positive eigenvalues of λ. M is the number of zero eigenvalues of λ. The process involves optimizing with respect to log λ. More details can be found in [37].

2.3 Datasets

We used non-small cell lung cancer (NSCLC) and triple-negative breast cancer (TNBC) datasets collected using multiplex immunohistochemistry (mIHC) and multiplexed ion beam imaging (MIBI) platforms, respectively, to evaluate the applicability of our proposed model.

Non-small cell lung cancer (NSCLC)

Tissue slides collected from 153 patients with non-small cell lung cancer were sequentially stained with antibodies specific for CD19, CD8, CD3, CD14, major histocompatibility complex II (MHCII), cytokeratin, and DAPI; then the slides were imaged on the Vectra 3.0 microscope (Akoya Biosystems). The acquired images were then processed using Akoya’s inForm tissue analysis software to obtain a data matrix with rows corresponding to individual cells and columns corresponding to x- and y-coordinates of each cell on the image, individual marker expression, and cell phenotypes. Each individual had three to five images corresponding to small regions of the tissue sample. Due to the sparsity issue of some images, we decided to select the image with the maximum number of cells to represent each subject. More details can be found in Johnson et al. [35].

Triple-negative breast cancer (TNBC)

TNBC biopsies were compiled into a tissue microarray (TMA) slides and stained with 36 antibodies targeting regulators of immune activation such as PD1, PD-L1, etc. The slides were imaged using the multiplexed ion beam imaging (MIBI) mass spectrometer. Cell segmentation was performed by adapting a convolutional neural network (CNN) approach, i.e., DeepCell [38] for nuclear segmentation to MIBI data. Details of the method were described in Keren et al. [39]. Images from 41 patients were processed using the R package raster to extract pixel coordinates. Panels (A) and (B) of S2 Fig show two representative pixel-level images for two randomly chosen patients. Since each cell consists of multiple pixels, the average pixel-level x- and y-coordinates were used to represent cell locations on each image. Panels (C) and (D) of S2 Fig illustrate cell-level images for patients 1 and 2, respectively, with dot size proportional to cell size and color-coded for each cell type (e.g., immune, endothelial, mesenchymal-like, tumor, etc.). With the cell information in the tumor and stroma regions of the TME not available, we focused on capturing the distribution of tumor and immune cells across images using mcf curves in this dataset. Additionally, two patients in the cohort did not have clinical information available regarding survival outcomes; and one patient’s imaging data was corrupted with a high level of noise. As a result, only data of 38 patients were included in the model.

3 Application results

3.1 NSCLC

Fig 3A illustrates the locations of stromal and tumor cells in XY-coordinates for a representative image. A summary mcf using Eq (1) was calculated to characterize the spatial relationship between primary tumor cells and tumor-associated stromal cells as a function of distance between cells. Since cell distance r depended on the density of observed cells in a given image, we chose the image with the maximum number of cells to compute a reference range of distances. Fig 3B shows mcf curves for all subjects as a function of the distance between cells, measured in microns (μm). These mcf curves were used as functional covariates along with scalar clinical predictors such as total number of cells, disease stage, and age in model (5) to obtain the estimated hazard of mortality for each subject. Fig 4A highlights 30 mcf curves corresponding to 15 patients with shortest survival time (red) and 15 patients with longest observed censored time (black) to aid the interpretation of the fitted functional model. Fig 4C depicts the estimated functional surfaces with values decreased from red to blue, corresponding to a decrease in hazard of mortality, while holding the remaining scalar variables constant. In particular, if there was strong clusterings of tumor cells in the neighborhood of 25—75 μm, the overall survival improved, based on the negative estimate of the log hazard ratio. However, at distances beyond 75 μm, the more tumor cells clustered, the higher the risk of mortality, while the increased infiltration level of tumor-associated stromal cells reduced the hazard of death. Even though the mcf curves were calculated for cells in tumor and stroma tissue regions, more than 98% of cells in stroma region were immune cells including CD4+, CD14+, CD19+, and CD8+ T cells. Interestingly, this was in agreement with the conclusion from Johnson et al. that the increased spatial proximity of cancer cells to immune cells linked to better subject survival [35]. Section S.1.1 of S1 Text reported the summary output of the corresponding model. The p-value of 0.44 indicated no significant nonlinear association between the spatial interactions of tumor and stromal cells with overall survival. However, note that given a relatively large number of parameters to be estimated, our sample size might not provide enough power to detect such association. As a result, the qualitative interpretation of the model using the surface plots is still worthwhile.

Fig 3

NSCLC dataset.

(A) Representative image illustrating locations of stromal (red) and tumor cells (turquoise). (B) Mark connection functions (mcf) for the tumor—stromal cell distributions for all subjects. Dashed horizontal line represents mcf under complete randomness. Mcf values below 1 indicate strong clustering of cells of same type, while values above 1 suggest higher level of mixing in of the two cell types.

Fig 4

Estimated surface from AFCM using NSCLC dataset.

NSCLC dataset.

Estimated surface from AFCM using NSCLC dataset.

(A) Mcf curves corresponding to “low” survival (i.e., shortest survival times) and “high” survival (i.e., longest observed censored times) patients in red and black, respectively. Note that mcf values below 1 indicate strong clustering of cells of same type, while values above 1 suggest higher level of mixing in of the two cell types. (B) Moran’s I correlation using MHCII expression corresponding to “low” survival (i.e., shortest survival time) and “high” survival (i.e., longest observed censored time) patients in red and black, respectively. Moran’s I values above 0 indicate a direct relationship in MHCII expression between tumor and stromal cells. Negative Moran’s I values suggest an inverse association in MHCII expression between tumor and stromal cells. (C) Estimated surface from AFCM using mcf curves as functional covariates with values decreasing from positive (red) to negative (blue). (D) Estimated surface from AFCM using Moran’s I correlation in MHCII as functional covariates with values decreased from positive (red) to negative (blue). Positive corresponds to increased risk of mortality while negative associates with reduced hazard of death. Johnson et al. dichotomized lung cancer samples into high and low MHCII. In particular, specimens with more than 5% of lung cancer cells positive for MHCII were grouped into MHCII while the remaining samples were classified into MHCII group. The authors stated that the levels of immune infiltration increased in the MHCII TME, leading to a significantly improved overall survival. Motivated by this discovery, we explored the distributions of MHCII in cells from tumor and stroma regions separately. As illustrated in panel (A) of S1 Fig, we observed that in MHCII samples, the distribution of MHCII intensity across cells in stroma region was right-skewed while that marker expression of tumor cells were more symmetrically distributed and centered at a higher intensity. On the other hand, two distributions of MHCII in tumor and stromal cells had similar right-skewed shape for MHCII samples (panel (B) of S1 Fig). This inspired us to explore the correlation in MHCII between the two cell types as a function of cross-type cell distances via Moran’s I metric according to Eq (2). Fig 2A and 2B illustrate the estimated Moran’s I correlation curves for patients in MHCII and MHCII groups, respectively, as a function of distance between cross-type cells. There was slightly more variability between correlation curves of the subjects in the MHCII group as compared to that in MHCII group. The two groups shared a common trend of negative association in MHCII expression between tumor cells and their immediate neighboring stromal cells (r ≤ 10 μm). Note that in the 2D images captured by slicing 3D tissue samples, biological overlapping could potentially occur, however our 2D samples were sliced as thin as possible (≈ 4 μm per section) to minimize this potential overlap. Additionally, by investigating the diameters of tumor and stromal cells in the dataset, we noticed that the median diameter was about 7.2 μm. Altogether, it was possible to observe pairs of cross-type cells in the neighborhood less than 10 μm. Moreover, the tumor and stromal cells in such small radius span tended to have an inverse relationship in MHCII expression, i.e., tumor cells with low MHCII were likely to be adjacent to high MHCII stromal cells, leading to negative Moran’s I correlation (Fig 2). As the neighborhood radius expanded, the correlation increased at various rates. Instead of categorizing subjects into MHCII and MHCII groups before fitting the Cox proportional model, we explored the direct impact of correlation of MHCII expression across tumor and stromal cells as a function of cross-type cell distance on survival outcome. Specifically, each Moran’s I correlation curve served as a functional covariate in model (5). Fig 4B highlights 30 correlation curves corresponding to 15 patients with shortest survival times (red) and 15 patients with longest observed censored times (black) to aid the interpretation of the fitted functional model. Fig 4D depicts the estimated surface with values decreased from red to blue, in correspondent with a decrease in hazard of mortality, holding the remaining scalar variables fixed. If stromal and tumor cells in the neighborhood radius between 25 and 75 μm were positively correlated in MHCII expression, the risk of mortality increased. However, as the positive relationship in MHCII continued past 100 μm, the estimated hazard of mortality decreased, linking to a better survival outcome.

3.2 TNBC

Keren et al. [39] classified subjects into “compartmentalized” and “mixed” using a mixing score, which was defined as a ratio of immune-tumor interactions to the total number of immune interactions. Fig 5A and 5B depict two representative images of the two categories. Taking a different approach, we utilized the mcf between tumor and immune cell distributions within each image to capture the tumor-immune interactions as a function of cell distance, measured in microns (μm). Fig 5C illustrates the progression of such interactions as the distance between cells increased, colored in black and red corresponding to “compartmentalized” and “mixed” categories, respectively. Particularly, when tumor and immune cells mixed in (e.g., Fig 5A), more interactions between tumor and immune cells were observed as the neighborhood radius expanded. As a result, the corresponding mcf curves increased at a faster rate, as compared to the ones associated with the compartmentalized cell distribution.

Fig 5

“Mixed” vs. “Compartmentalized” cell distributions in TNBC dataset.

“Mixed” vs. “Compartmentalized” cell distributions in TNBC dataset.

(A) Cell-level image of patient 11 with “mixed” cell distribution. (B) Cell-level image of patient 5 with “compartmentalized” cell distribution. (C) Mark connection functions (mcf) in correspondence with “mixed” and “compartmentalized” tumor—immune cell distributions, colored by red and black, respectively. Dashed horizontal line represents mcf under complete randomness. Mcf values below 1 indicate strong clustering of cells of same type, while values above 1 suggest higher level of mixing in of the two cell types. Due to the small sample size, we only included patient age as a scalar predictor, in addition to the functional covariates provided by mcf values (Fig 5C). The estimated functional surface with values decreased from red to blue, corresponding to a drop from high to low hazard of mortality, while holding the scalar variable constant was shown in Fig 6C. In particular, if there were more immune cells surrounding tumor cells across all distances, which resulted in the mcf curves approaching the dashed horizontal line of 1 at a faster rate, the estimated log hazard ratio increased. On the contrary, when tumor and immune cells separated in their own compartments, corresponding to the mcf values less than 1, the hazard of death decreased.

Fig 6

TNBC dataset.

TNBC dataset.

(A) Mark connection functions (mcf) for the tumor—immune cell distributions for all subjects in TNBC dataset. Dashed horizontal line represents mcf under complete randomness. Mcf values below 1 indicate strong clustering of cells of same type, while values above 1 suggest higher level of mixing in of the two cell types. (B) Moran’s I correlation between tumor and immune cells across subjects using P53 marker expression. Moran’s I values above 0 indicate a direct relationship in P53 expression between tumor and immune cells. Negative Moran’s I values suggest an inverse association in P53 expression between tumor and immune cells. (C) Estimated surface from AFCM using mcf curves as functional covariates, with values decreasing from positive (red) to negative (blue). Positive corresponds to increased risk of mortality while negative associates with reduced hazard of death. The more immune cells surrounding tumor cells across all distances (i.e., mcf values > 1), the higher risk of mortality (i.e., ). (D) Estimated surface from AFCM using Moran’s correlation in P53 expression between tumor and immune cells, with values of decreasing from positive (red) to negative (blue). Weak positive correlation in P53 expression between tumor and immune cells across all distances associates with a declined in risk of mortality with . Pan et al. [40] investigated the prognostic role of P53 in TNBC by applying Kaplan-Meier analyses on P53 positive and negative groups of patients. In particular, a sample was defined as P53 positive if any cancer cells was positively expressed. The authors concluded that P53 positivity associated with negative prognostic significance in breast cancer patients. Inspired by the findings, we investigated the distributions of P53 expression in tumor and immune cells separately for each patient. We observed that in P53 positive patients, the distribution of P53 intensity in immune cells was extremely right-skewed while that marker expression of tumor cells were distributed more symmetrically and centered at a much higher intensity, as shown in panel (A) of S3 Fig. On the other hand, the P53 expression distributions were right-skewed, similarly between tumor and immune cells in P53 negative patients (panel (B) of S3 Fig). In other words, tumor and immune cells in P53 negative samples were more likely to be correlated while it might not be the case for P53 positive samples. This motivated us to explore the correlation in P53 between the two cell types as a function of cross-type cell distance (Fig 6B) without necessarily dichotomizing samples into “P53 positive” vs “P53 negative” groups, via Moran’s I metric (Eq 2). In a few patients, immune and tumor cells lying within 100 μm neighborhood radius, were positively correlated. As cell distance became larger, the correlation dropped close to 0. Though the correlation was rather weak, it was still worth studying the impact of the relationship between tumor and immune cells with respect to the P53 expression on patient survival using the proposed approach. In particular, the correlation curves served as functional covariates in the AFCM model in (5) in conjunction with age as a scalar predictor. The estimated functional surface shown in Fig 6D had the values decreased from red to blue, in correspondence with a decrease in hazard of mortality, while holding the scalar variable fixed. Interestingly, positive correlation in P53 expression between immune and tumor cells across distances associated with a decline in estimated risk of mortality. Section S.3 of S1 Text includes an additional analysis using a Vectra dataset of 114 ovarian cancer patients [41]. We followed the same approach to investigate the impact of spatial distribution of cells in tumor and stroma regions of the TME on patient overall survival. Furthermore, the correlation in Ki67 marker expression between tumor and stromal cells in close proximity was included in the model in (5) to study the prognostic value of Ki67 in ovarian cancer patients.

4 Simulation study

4.1 Setup

We performed simulation studies to evaluate the finite-sample properties of the proposed methodology. We considered a scenario with one functional covariate X and one scalar covariate Z for simplicity. Model (3) can be rewritten as: Both scalar and functional terms were simulated directly from the lung cancer dataset to mimic real-world parameter settings. More specifically, the scalar predictor was simulated from the normal distribution with mean and standard deviation obtained empirically from the distribution of age. Additionally, the functional covariates X(s) were simulated empirically by applying FPCA [42] to the estimated mcf curves. Following Cui et al. [31], we simulated the functional covariates as with M being the number of principal components. In the definition of , e were random Gaussian noise with mean 0 and variance one. The mean function , eigenvalues , and eigenfunctions were computed by applying FPCA on the estimated mcf curves using the package refund in R [43]. By specifying the scalar coefficient β = 1 and the functional form F(s, x) = x3s, the simulated linear predictor was generated as . With the prespecified functional form F, we scaled the functional domain , which represented the distance between cells, to [0, 1]. The functional covariate values were kept in the range [0, 1.2]. Doing so prevented the simulated linear predictor from being dominated by either term. From the fitted model in Section 3.1, we obtained the estimated cumulative baseline hazard , which was then used to generate a survival function for each individual, such that . The estimated survival times were generated from the survival function; and the censoring times were simulated based on the empirical distribution of the observed censoring times.

4.2 Predictive performance

Four datasets of different sizes (N = 1000, 500, 200, and 100, respectively) were simulated following the procedure in Section 4.1. Each dataset was partitioned into training (75%) and testing (25%) sets. Three models were fit using the training set: (1) AFCM model with both scalar and functional terms, (2) AFCM model with only functional term, and (3) regular Cox proportional hazard model with only scalar term. The predicted linear predictor , u = 1, 2, 3 was obtained from the testing set for the uth model. At each sample size, mean squared errors MSE( was computed as the average of squared differences between the predicted and the “true” linear predictor such that , with N denoting the number of subjects in the testing set. We repeated the simulation for 100 iterations and recorded the average MSE for each of the three models across four sample sizes N = 1000, 500, 200, 100 in Table 1. Fig 7 displays the distribution of MSEs for the three models at each sample size, respectively. As expected, the average MSEs and corresponding standard deviations increased as the sample size decreased from 1000 to 100. More precisely, as shown in Fig 7, the upper quartile MSE of the model (1) was less than the 25 percentile of MSEs of the remaining two models (2) and (3), across all four sample sizes. The discrepancy between these quantiles became more apparent as the sample size increased. However, regardless of the sample size, model (1), which included both functional and scalar terms, performed substantially better than the other two models (2) and (3).

Table 1

Mean squared errors (MSE) across three different models.

Different sample sizes are 1000, 500, 200, and 100, respectively. Corresponding standard deviations are recorded in parentheses.

Model	N = 1000	N = 500	N = 200	N = 100
(1) Both	0.64 (0.49)	0.90 (1.09)	0.73 (0.80)	1.00 (1.13)
(2) Functional covariate	1.86 (0.66)	2.11 (1.21)	1.88 (0.95)	2.16 (1.48)
(3) Scalar covariate	2.99 (1.29)	3.28 (2.02)	2.73 (1.88)	2.79 (2.26)

Fig 7

Mean squared errors (MSE) across three models.

Distribution of the MSEs for each of the three models: AFCM model with both scalar and functional terms (red), AFCM model with only functional term (green), regular Cox model with only scalar term (blue) at different sample sizes (A) N = 1000, (B) N = 500, (C) N = 200, (D) N = 100. The lower the MSE, the better the predictive performance.

Mean squared errors (MSE) across three models.

Mean squared errors (MSE) across three different models.

Different sample sizes are 1000, 500, 200, and 100, respectively. Corresponding standard deviations are recorded in parentheses.

5 Conclusions and discussion

Interactions between cells within the TME play a crucial role in understanding cancer development and progression. With advances in imaging technology, the additional dimension of spatial resolution is achievable in addition to the single cell profiles. Conventional approaches summarize the spatial information between cells into a single quantitative value (e.g., Morisita-Horn index, intratumor lymphocyte ratio, etc.) before fitting a Cox proportional hazard model to quantify the association between cell profiles and survival outcome. Though easy to implement, these scalar summaries are unable to embed cell interactions as a function of spatial proximity in models that test associations with clinical outcomes. Alternatively, we consider cells with corresponding cell locations and features within the TME as marked point patterns. Then, the relative spatial organization of primary tumor cells and a variety of tumor-associated stromal cells as well as immune cells can be described through spatial summary functions such as the mark connection function (for qualitative marks) or Moran’s I correlation (for quantitative marks). We propose an approach to incorporate the resulting spatial functions as functional covariates into an additive function Cox model to investigate the impact of spatial structure of cells in the TME on overall survival in addition to some clinical predictors (e.g., age, disease stage, total cell counts, etc.). We demonstrate the applicability of the proposed method by analyzing multiplex imaging datasets from three separate cancer applications collected under two different (Vectra and MIBI) imaging platforms: NSCLC, ovarian cancer, and TNBC. In particular, we can flexibly use mark connection functions (mcf) as a means to capture the level of stromal infiltration in the lung cancer dataset or immune infiltration in the TNBC dataset with respect to cell distance. In addition to the mcfs, continuous marker expression across cell types (e.g., tumor, immune, stromal, etc.) can be calculated using Moran’s I statistics. Thus, qualitative and/or quantitative spatial summary functions can be included in the AFCM, depending on the research question of interest. Furthermore, the advantage of integrating the spatial structure of cells in the TME and patient-level clinical information is demonstrated in the simulation study at different sample sizes. Though the additive model framework provides us flexibility to the model nonlinear relationship between spatial cell distribution and overall survival, there exist a few limitations. Such complex model requires a relatively large number of parameters to be estimated, which compromises our power to detect significant results provided our limited sample size, as we discuss in Sections 3.1 and 3.2. Recall that the study by Cui et al. [31] demonstrating the application of AFCM in quantifying the association between physical activity and survival data included 2816 participants. Additionally, as with other nonlinear models, the interpretation of results might not always be easy for all applications. We try to draw the connection between our conclusions and Johnson et al.’s that increased immune—tumor cell interactions beyond 100 μm reduces risk of mortality (Section 3.1). For inter-cell distances in the span of 25–75 μm radius, on the other hand, our results indicate a different implication that the more tumor cells cluster, the lower the hazard of death. Note that we have specifically chosen mark connection functions (mcf) and Moran’s I correlation as representative metrics to summarize qualitative and quantitative spatial information, respectively, in this paper. Other summary functions could be employed to capture various levels of spatial information, depending on the applications. For instance, with the interest in the level of immune infiltration, Barua et al. [29] utilized the G-cross function to quantify the distribution of the nearest immune cells relative to tumor cells within any given inter-cell distance. In such instance, the observed G-cross function could then be used in place of the mcf as functional covariates in the proposed approach. Motivated by our available datasets, we specifically focus on studying the TME and how spatial heterogeneity of cells within the TME impacts patient overall survival in this article. However, the proposed framework could certainly be extended to other analyses in two directions. First, the spatial architecture of cells being investigated is not necessarily restricted to the TME. Second, other types of response variables other than survival outcomes could be modelled, with necessary modifications made to the model parameterization and estimation procedure indeed. We take the study of human endocrine pancreas and immune system in type 1 diabetes (T1D) using images of pancreatic tissue sections collected from imaging mass cytometry (IMC) platform by Wang et al. [44] as an example. With abundance and localization of proteins at single-cell resolution available, the spatial interaction of immune and pancreatic epithelial cells in pancreatic islets could be captured by the mark connection functions (mcf). Furthermore, with the donors in the dataset being categorized into “controls” (i.e., no history of diabetes) and “T1D” groups, one could use a logistic regression model with the resulting mcfs between immune and pancreatic epithelial cells served as functional covariates, to make inferences on how the islet spatial architecture differs between the two groups and to gain insights into the progression of T1D. By switching to such outcomes from the exponential family, our proposed model would essentially become a functional generalized additive model, which McLean et al. [36] and Muller et al. [45] have discussed in detail. We have presented promising results associating image summary spatial statistics with subjects’ survival, however, our approach relies on the isotropic assumption, which only takes into account the relative distance between any two cells while disregarding their relative coordinations in a given image. Additionally, due to limited sample size, interactions between markers have not yet been considered in the model. Further modifications are needed to overcome such limitations. Based on locations and associated marker expression intensities of all the cells within an image, multivariate kernel density function may be used to capture the distribution of marker interactions jointly [46]. More precisely, by specifying a search radius bandwidth (e.g., 30 μm), a joint density representing marker interaction at a given location in an image is estimated using only cells within the radius such that closer cells are weighted more than distant cells. Simultaneously, the number of cells per unit area defined by the bandwidth serves as a multiplier of the cell density values. Such density functions are used as a means to reflect spatially varying marker interactions across images. Due to the nature of density functions, which are non-negative and integrate to one, they cannot be treated as unconstrained functional predictors as in Section 2.2. Intermediate steps to transform the kernel density functions are necessary. A recent approach introduced by Petersen et al. [47] could be used to map regular density functions into a new space through functional transformations including log quantile density and log hazard transformations. Finally, the transformed density functions are incorporated in the model as functional covariates in addition to clinical predictors as before to investigate the association between spatial heterogeneity in marker interactions and subject outcomes. This work is currently under investigation. After this paper had been submitted, we discovered a study by Wilson et al. [48], which discussed issues with imaging data collected from tissue microarrays (TMA). In particular, the authors highlighted the plausibility of the tissue slides being folded or torn during the slicing process, leading to some sections of the image with no cells present(e.g., panel (C) of S4 Fig). To overcome such challenge, Wilson et al. [48] introduced a framework (with the accompanying R package SpatialTIME [49]) that utilized the K-function in a permutation-based setting to account for such drawbacks. Inspired by the study, it would be interesting to extend our proposed framework to include the permutation-based summary function. More specifically, we would randomly permutate the cell labels to obtain the empirical distribution of the mcfs. Then, the mean mcf instead of the observed counterpart would then be used as functional covariate in the additive functional Cox model (AFCM) to investigate the association between spatial heterogeneity of cells and patient overall survival. Similar to other spatial analyses, our proposed approach relies on the cell-segmented data. In other words, if cells are not segmented correctly from raw images, the resulting spatial distribution of cells might not accurately reflect the underlying tissue architecture. Accordingly, results of any downstream analyses associating spatial summary information with a clinical outcome of interest would certainly be impacted. As we briefly discuss in Section 2.3, Johnson et al. [35] utilized the state-of-the-art commercial software, inForm, for cell and nucleus segmentation for the NSCLC dataset, while Keren et al. [39] adapted a convolutional neural network approach, DeepCell [38], to extract segmented data in the TNBC dataset. While it is possible that errors could potentially occur with either segmentation approach, this has not thoroughly been investigated and is beyond the scope of our current analysis.

A complete analysis for the ovarian cancer dataset.

(PDF) Click here for additional data file.

MHCII expression in NSCLC.

Distribution of MHCII expression in tumor (turquoise) vs. stromal (red) cells in (A) A representative MHCII sample. (B) A representative MHCII sample. (TIF) Click here for additional data file.

Representative TNBC tissue samples.

Top Row: Pixel-level images of (A) Patient 1 and (B) Patient 2; color-coded from cell segmentation process to be associated with cell-level data. Bottom Row: Corresponding cell-level images for (C) Patient 1 and (D) Patient 2. Each color represents a cell classification group. Dot size is proportional to cell size. (TIF) Click here for additional data file.

P53 expression in TNBC.

Distribution of P53 expression in tumor (turquoise) vs. immune cells (red) in: (A) A representative P53 positive sample. (B) A representative P53 negative sample. (C) Corresponding Moran’s I correlation using P53 expression in P53 positive (red) and P53 negative (black) samples. Moran’s I values above 0 indicate a direct relationship in P53 expression between tumor and immune cells. Negative Moran’s I values suggest an inverse association in P53 expression between tumor and immune cells. (TIF) Click here for additional data file.

Representative images in the ovarian cancer dataset.

Example images of four representative patients (A) Patient 1, (B) Patient 51, (C) Patient 75, and (D) Patient 103, with red and turquoise points denoting stromal and tumor cells, respectively. (TIF) Click here for additional data file.

Ovarian cancer dataset.

(A) Mcf curves for all patients. Note that mcf values below 1 indicate strong clustering of cells of same type, while values above 1 suggest higher level of mixing in of the two cell types. (B) Moran’s I correlation between tumor and stromal cells across subjects using Ki67 marker expression. Moran’s I values above 0 indicate a direct relationship in Ki67 expression between tumor and stromal cells. Negative Moran’s I values suggest an inverse association in Ki67 expression between tumor and stromal cells. (TIF) Click here for additional data file.

Estimated surfaces from AFCM using ovarian cancer dataset.

(A) Estimated surface from AFCM using mcf curves as functional covariates, with values of decreasing from positive (red) to negative (blue). (B) Estimated surface from AFCM using Moran’s I correlation in Ki67 expression between tumor and stromal cells, with values of decreasing from positive (red) to negative (blue). Positive corresponds to increased risk of mortality while negative associates with reduced hazard of death. (TIF) Click here for additional data file. 2 Dec 2021 Dear Dr. Vu, Thank you very much for submitting your manuscript "SPF: A Spatial and Functional Data Analytic Approach to cell Imaging data" for consideration at PLOS Computational Biology. As with all papers reviewed by the journal, your manuscript was reviewed by members of the editorial board and by several independent reviewers. In light of the reviews (below this email), we would like to invite the resubmission of a significantly-revised version that takes into account the reviewers' comments. We cannot make any decision about publication until we have seen the revised manuscript and your response to the reviewers' comments. Your revised manuscript is also likely to be sent to reviewers for further evaluation. When you are ready to resubmit, please upload the following: [1] A letter containing a detailed list of your responses to the review comments and a description of the changes you have made in the manuscript. Please note while forming your response, if your article is accepted, you may have the opportunity to make the peer review history publicly available. The record will include editor decision letters (with reviews) and your responses to reviewer comments. If eligible, we will contact you to opt in or out. [2] Two versions of the revised manuscript: one with either highlights or tracked changes denoting where the text has been changed; the other a clean version (uploaded as the manuscript file). Important additional instructions are given below your reviewer comments. Please prepare and submit your revised manuscript within 60 days. If you anticipate any delay, please let us know the expected resubmission date by replying to this email. Please note that revised manuscripts received after the 60-day due date may require evaluation and peer review similar to newly submitted manuscripts. Thank you again for your submission. We hope that our editorial process has been constructive so far, and we welcome your feedback at any time. Please don't hesitate to contact us if you have any questions or comments. Sincerely, Martin Meier-Schellersheim Associate Editor PLOS Computational Biology Jason Haugh Deputy Editor PLOS Computational Biology *********************** Reviewer's Responses to Questions Comments to the Authors: Please note here if the review is uploaded as an attachment. Reviewer #1: The authors have proposed a novel approach to modelling the spatial interactions between cells in the tumour microenvironment (TME) to identify patterns associated with patient survival. The effectiveness of this approach has been demonstrated on publicly available data. While the focus of the manuscript is on the TME, this approach is clearly generalisable to other contexts and, because of this, I will likely be integrating components of their analysis strategy into any future analysis I perform. ## “Estimates from AFCM plots” -- Could you center all of the colours on zero? Ie, red positive and blue negative. ## Line 122. “Specifically, tumor cells and their immediate neighboring stromal cells are negatively correlated in MHCII expression when they are less than 10 microns apart. As the cross-type cell distance increases, the correlation rises at different rates, and eventually drops close to 0.” -- Given the average diameter of a cell, what does MHC11 becoming negatively correlated when cells are less than 10 microns mean? Are there worries about poor segmentation? -- It would be helpful if the figure legends were more verbose, informative and/or repeated some of the information in the text. For example “Fig 4. Estimated surface from AFCM using lung cancer dataset. (a) Mcf curves and (b) Moran's I correlation using MHCII expression”. It would be very helpful for a reader (me) if you fleshed out how to interpret each axis, even though it is mentioned in the text. For example, what does a large mcf or small mcf mean? What does a positive (red) or negative (blue) F() mean? ## Line 181 “In particular, if tumor cells clustered together in the neighborhood of 25 { 75_m, the overall 182 survival improved, based on the negative estimate of the log hazard ratio. However, at distances beyond 183 75 _m, the more tumor cells clustered together, the higher the risk of mortality, while the increased 184 infiltration level of tumor-associated stromal cells reduced the hazard of death.” -- A few points below: 1) This conclusion would be easier to make if the F() in Figure 4 was centered on white. 2) I am not a fan of the terminology “clustered together”. My preference would be something like, “if strong clustering of tumor cells is observed in the neighborhood of 25 - 75_m”. Or clustering more than expected maybe? 3) The difference in F() appears much larger for r>100 than 254) Can you elaborate any more on the complete flipping of interpretation/association that happens around r = 90 in 4a and 4b? It is a hard concept to wrap my head around. 5) As 4a and 4b are coloured by F(), would it also be useful for interpretation for 3b to be moved to Figure 4? Can the lines in Figure 3b also be coloured? Does that make any sense at all? Perhaps you could leave 3b in Figure 3, but then include it in Figure 4 with the top 10% “best survival” patients highlighted in red and bottom 10% “worst survival” patients highlighted in blue? Or some other highlighting that makes sense. ## Figure 5. -- Why are 5c and 5d two plots? Wouldn’t it make more sense for them to be one plot coloured by mixed/not mixed? ## On line 246 “The authors concluded that P53 positivity associated with negative prognostic significance in breast cancer patients.” “In particular, a sample was defined as P53 positive if any cancer cells was positively expressed. “ -- Pan et al's conclusions appear to be about P53 positive cancer cells. Can you explain what it means for P53 to be correlated between cancer and immune cells? Is this a problem with cell segmentation? Or does P53 have the same function in immune cells as it does cancer cells? -- The introduction could potentially benefit from some edits to make it easier and quicker to digest. For instance, in the paragraph starting on line 14, I would suggest spending more time talking about multiplexed imaging techniques and avoid detouring to mention bulk/sc sequencing without context. -- This approach clearly extends beyond studying the TME, I’m not sure how or if it is worth emphasizing this? -- The authors don’t appear to refer to p-values / statistical significance at any point. Are the “Estimates from AFCM plots” purely for qualitative interpretation? If I were to use your method for my own analysis, is there a single value I can report to provide evidence of a relationship? How would someone use these results? -- The authors make reference to multiplex data which can quantify many cell types simultaneously however their examples only compare two types of cells (cancer vs immune). The authors briefly mention expanding the tests to include multiple or correlated objects in the conclusion. However, have the authors considered how they would simply apply this approach to multiple pairwise cell type comparisons? Or testing many markers for these comparisons? Is there a sensible way to rank pairwise comparisons? Or would I need to look at all pairwise plots and make a qualitative judgement? -- Overall, I found the manuscript easy to read. However, I will leave it to the editors to comment on the structure as it does not follow the “manuscript organisation” outlined on PLOS Computational Biology’s website. Some components of the text could be moved from the results into the methods section and the discussion (conclusion) section. Reviewer #2: This study aim to illustrate the use of established spatial statistics in studying cellular interactions. They motivate their work by discussing the role of tumour microenvironment in cancer evolution. They explore several spatial statistics in different datasets. In general, the concepts presented in the paper are interesting. However, the presented results show that these metrics are not really very useful in reflecting the biology. For example: “if tumor cells clustered together in the neighborhood of 25 – 75μm, the overall 182 survival improved, based on the negative estimate of the log hazard ratio. However, at distances beyond 183 75 μm, the more tumor cells clustered together, the higher the risk of mortality”. This does not make sense as increased cancer cell density should reflect a worse outcome. Similarly, other statements regard the other analyses is again difficult to interpret. This applies to other statements in the paper. Major concerns: - except for the introduction, the paper is very hard to follow. It reads as disparate pieces of experiments that are put together and no clear link between the sections. - Showing few survival values independent of any other information (e.g Fig. 3C) does not add any information. - It is not clear how surface plots can be useful. - It was not clear why simulation was used at the end. - It was not clear why only the centres of the cells were shown and no original cell images. ********** Have the authors made all data and (if applicable) computational code underlying the findings in their manuscript fully available? The PLOS Data policy requires authors to make all data and code underlying the findings described in their manuscript fully available without restriction, with rare exception (please refer to the Data Availability Statement in the manuscript PDF file). The data and code should be provided as part of the manuscript or its supporting information, or deposited to a public repository. For example, in addition to summary statistics, the data points behind means, medians and variance measures should be available. If there are restrictions on publicly sharing data or code —e.g. participant privacy or use of data from a third party—those must be specified. Reviewer #1: Yes Reviewer #2: No: In my experience data is available upon request = “not available” . Code is available ********** PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files. If you choose “no”, your identity will remain anonymous but your review may still be made public. Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy. Reviewer #1: Yes: Ellis Patrick Reviewer #2: No Figure Files: While revising your submission, please upload your figure files to the Preflight Analysis and Conversion Engine (PACE) digital diagnostic tool, . PACE helps ensure that figures meet PLOS requirements. To use PACE, you must first register as a user. Then, login and navigate to the UPLOAD tab, where you will find detailed instructions on how to use the tool. If you encounter any issues or have any questions when using PACE, please email us at . Data Requirements: Please note that, as a condition of publication, PLOS' data policy requires that you make available all data used to draw the conclusions outlined in your manuscript. Data must be deposited in an appropriate repository, included within the body of the manuscript, or uploaded as supporting information. This includes all numerical values that were used to generate graphs, histograms etc.. For an example in PLOS Biology see here: http://www.plosbiology.org/article/info%3Adoi%2F10.1371%2Fjournal.pbio.1001908#s5. Reproducibility: To enhance the reproducibility of your results, we recommend that you deposit your laboratory protocols in protocols.io, where a protocol can be assigned its own identifier (DOI) such that it can be cited independently in the future. Additionally, PLOS ONE offers an option to publish peer-reviewed clinical study protocols. Read more information on sharing protocols at https://plos.org/protocols?utm_medium=editorial-email&utm_source=authorletters&utm_campaign=protocols 28 Feb 2022 Submitted filename: ResponseLetter.pdf Click here for additional data file. 25 Apr 2022 Dear Dr. Vu, Thank you very much for submitting your manuscript "SPF: A Spatial and Functional Data Analytic Approach to cell Imaging data" for consideration at PLOS Computational Biology. As with all papers reviewed by the journal, your manuscript was reviewed by members of the editorial board and by several independent reviewers. The reviewers appreciated the attention to an important topic. Based on the reviews, we are likely to accept this manuscript for publication, providing that you modify the manuscript according to the review recommendations. Please prepare and submit your revised manuscript within 30 days. If you anticipate any delay, please let us know the expected resubmission date by replying to this email. When you are ready to resubmit, please upload the following: [1] A letter containing a detailed list of your responses to all review comments, and a description of the changes you have made in the manuscript. Please note while forming your response, if your article is accepted, you may have the opportunity to make the peer review history publicly available. The record will include editor decision letters (with reviews) and your responses to reviewer comments. If eligible, we will contact you to opt in or out [2] Two versions of the revised manuscript: one with either highlights or tracked changes denoting where the text has been changed; the other a clean version (uploaded as the manuscript file). Important additional instructions are given below your reviewer comments. Thank you again for your submission to our journal. We hope that our editorial process has been constructive so far, and we welcome your feedback at any time. Please don't hesitate to contact us if you have any questions or comments. Sincerely, Martin Meier-Schellersheim Associate Editor PLOS Computational Biology Jason Haugh Deputy Editor PLOS Computational Biology *********************** A link appears below if there are any accompanying review attachments. If you believe any reviews to be missing, please contact ploscompbiol@plos.org immediately: [LINK] Reviewer's Responses to Questions Comments to the Authors: Please note here if the review is uploaded as an attachment. Reviewer #1: Thank you again for the opportunity to review this manuscript. The authors have addressed all of my comments and in doing so I believe the manuscript is more accessible. Reviewer #2: The authors addressed most of the reviewers’ comments which improved the manuscript. Comments: - The proposed metrics can reveal interesting insights, however these are not always easy to interpret. It will be useful to discuss the limitations in the discussion. - The authors did not report on the methods they used for image analysis which only appeared in their response to reviewer 1. Multiplexed images can be extremely challenging and reasonable segmentation accuracy is needed for reliable conclusions. Details of the image analysis and results (raw images and segmented images) should be included. I appreciate the paper is presenting a method for studying the resulting spatial single cell distribution. However, making it clear in the discussion of potential errors and challenges in data segmentation should be taken into consideration when interpreting the results. - Following from that point, the red data points that in Figure 1 for Patient 131 are expected to be wrongly segmented cells as TMAs are usually cut as a round biopsy. These should be excluded from the data through QC. Could the author confirm this from the raw images or replace with another example. - That AFCM does not result in any statistically significant signatures when using this metric, raises a concern regarding their utility of these metrics which make the study merely exploratory. Could the authors elaborate on the potential reason in the discussion? - Fig. 2: The text is difficult to read The authors refer to r=10 in the text, it will be helpful to add a vertical line to indicate this on the figure - There are few mistakes in referring to the supplementary material: e.g. Line 185 should refer to Fig S2 not S1 and line 283 should refer to Fig. S3 not S1. Line 298 should refer to Section 3 not 2. - I am not sure why the ovarian dataset and the associated analysis was moved to the supplementary. It should be fit in the appropriate sections of the study. ********** Have the authors made all data and (if applicable) computational code underlying the findings in their manuscript fully available? The PLOS Data policy requires authors to make all data and code underlying the findings described in their manuscript fully available without restriction, with rare exception (please refer to the Data Availability Statement in the manuscript PDF file). The data and code should be provided as part of the manuscript or its supporting information, or deposited to a public repository. For example, in addition to summary statistics, the data points behind means, medians and variance measures should be available. If there are restrictions on publicly sharing data or code —e.g. participant privacy or use of data from a third party—those must be specified. Reviewer #1: Yes Reviewer #2: Yes ********** PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files. If you choose “no”, your identity will remain anonymous but your review may still be made public. Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy. Reviewer #1: Yes: Ellis Patrick Reviewer #2: No Figure Files: While revising your submission, please upload your figure files to the Preflight Analysis and Conversion Engine (PACE) digital diagnostic tool, https://pacev2.apexcovantage.com. PACE helps ensure that figures meet PLOS requirements. To use PACE, you must first register as a user. Then, login and navigate to the UPLOAD tab, where you will find detailed instructions on how to use the tool. If you encounter any issues or have any questions when using PACE, please email us at figures@plos.org. Data Requirements: Please note that, as a condition of publication, PLOS' data policy requires that you make available all data used to draw the conclusions outlined in your manuscript. Data must be deposited in an appropriate repository, included within the body of the manuscript, or uploaded as supporting information. This includes all numerical values that were used to generate graphs, histograms etc.. For an example in PLOS Biology see here: http://www.plosbiology.org/article/info%3Adoi%2F10.1371%2Fjournal.pbio.1001908#s5. Reproducibility: To enhance the reproducibility of your results, we recommend that you deposit your laboratory protocols in protocols.io, where a protocol can be assigned its own identifier (DOI) such that it can be cited independently in the future. Additionally, PLOS ONE offers an option to publish peer-reviewed clinical study protocols. Read more information on sharing protocols at https://plos.org/protocols?utm_medium=editorial-email&utm_source=authorletters&utm_campaign=protocols References: Review your reference list to ensure that it is complete and correct. If you have cited papers that have been retracted, please include the rationale for doing so in the manuscript text, or remove these references and replace them with relevant current references. Any changes to the reference list should be mentioned in the rebuttal letter that accompanies your revised manuscript. If you need to cite a retracted article, indicate the article’s retracted status in the References list and also include a citation and full reference for the retraction notice. 4 May 2022 Submitted filename: R2_Letter.pdf Click here for additional data file. 16 May 2022 Dear Dr. Vu, We are pleased to inform you that your manuscript 'SPF: A Spatial and Functional Data Analytic Approach to cell Imaging data' has been provisionally accepted for publication in PLOS Computational Biology. Before your manuscript can be formally accepted you will need to complete some formatting changes, which you will receive in a follow up email. A member of our team will be in touch with a set of requests. Please note that your manuscript will not be scheduled for publication until you have made the required changes, so a swift response is appreciated. IMPORTANT: The editorial review process is now complete. PLOS will only permit corrections to spelling, formatting or significant scientific errors from this point onwards. Requests for major changes, or any which affect the scientific understanding of your work, will cause delays to the publication date of your manuscript. Should you, your institution's press office or the journal office choose to press release your paper, you will automatically be opted out of early publication. We ask that you notify us now if you or your institution is planning to press release the article. All press must be co-ordinated with PLOS. Thank you again for supporting Open Access publishing; we are looking forward to publishing your work in PLOS Computational Biology. Best regards, Martin Meier-Schellersheim Associate Editor PLOS Computational Biology Jason Haugh Deputy Editor PLOS Computational Biology *********************************************************** Reviewer's Responses to Questions Comments to the Authors: Please note here if the review is uploaded as an attachment. Reviewer #2: The authors addressed most of our comments. ********** Have the authors made all data and (if applicable) computational code underlying the findings in their manuscript fully available? The PLOS Data policy requires authors to make all data and code underlying the findings described in their manuscript fully available without restriction, with rare exception (please refer to the Data Availability Statement in the manuscript PDF file). The data and code should be provided as part of the manuscript or its supporting information, or deposited to a public repository. For example, in addition to summary statistics, the data points behind means, medians and variance measures should be available. If there are restrictions on publicly sharing data or code —e.g. participant privacy or use of data from a third party—those must be specified. Reviewer #2: Yes ********** PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files. If you choose “no”, your identity will remain anonymous but your review may still be made public. Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy. Reviewer #2: No 9 Jun 2022 PCOMPBIOL-D-21-01681R2 SPF: A Spatial and Functional Data Analytic Approach to cell Imaging data Dear Dr Ghosh, I am pleased to inform you that your manuscript has been formally accepted for publication in PLOS Computational Biology. Your manuscript is now with our production department and you will be notified of the publication date in due course. The corresponding author will soon be receiving a typeset proof for review, to ensure errors have not been introduced during production. Please review the PDF proof of your manuscript carefully, as this is the last chance to correct any errors. Please note that major changes, or those which affect the scientific understanding of the work, will likely cause delays to the publication date of your manuscript. Soon after your final files are uploaded, unless you have opted out, the early version of your manuscript will be published online. The date of the early version will be your article's publication date. The final article will be published to the same URL, and all versions of the paper will be accessible to readers. Thank you again for supporting PLOS Computational Biology and open-access publishing. We are looking forward to publishing your work! With kind regards, Zsofia Freund PLOS Computational Biology | Carlyle House, Carlyle Road, Cambridge CB4 3DN | United Kingdom ploscompbiol@plos.org | Phone +44 (0) 1223-442824 | ploscompbiol.org | @PLOSCompBiol

34 in total

1. Multiplexed In Situ Imaging Mass Cytometry Analysis of the Human Endocrine Pancreas and Immune System in Type 1 Diabetes.

Authors: Yue J Wang; Daniel Traum; Jonathan Schug; Long Gao; Chengyang Liu; Mark A Atkinson; Alvin C Powers; Michael D Feldman; Ali Naji; Kyong-Mi Chang; Klaus H Kaestner
Journal: Cell Metab Date: 2019-01-31 Impact factor: 27.287

2. Melanoma growth stimulatory activity: isolation from human melanoma tumors and characterization of tissue distribution.

Authors: A Richmond; H G Thomas
Journal: J Cell Biochem Date: 1988-02 Impact factor: 4.429

3. Toward reproducible, scalable, and robust data analysis across multiplex tissue imaging platforms.

Authors: Erik A Burlingame; Jennifer Eng; Guillaume Thibault; Koei Chin; Joe W Gray; Young Hwan Chang
Journal: Cell Rep Methods Date: 2021-07-23

4. Multiplexed ion beam imaging of human breast tumors.

Authors: Michael Angelo; Sean C Bendall; Rachel Finck; Matthew B Hale; Chuck Hitzman; Alexander D Borowsky; Richard M Levenson; John B Lowe; Scot D Liu; Shuchun Zhao; Yasodha Natkunam; Garry P Nolan
Journal: Nat Med Date: 2014-03-02 Impact factor: 53.440

5. Functional Generalized Additive Models.

Authors: Mathew W McLean; Giles Hooker; Ana-Maria Staicu; Fabian Scheipl; David Ruppert
Journal: J Comput Graph Stat Date: 2014 Impact factor: 2.302

6. Spatial interaction of tumor cells and regulatory T cells correlates with survival in non-small cell lung cancer.

Authors: Souptik Barua; Penny Fang; Amrish Sharma; Junya Fujimoto; Ignacio Wistuba; Arvind U K Rao; Steven H Lin
Journal: Lung Cancer Date: 2018-02-04 Impact factor: 5.705

7. spatialTIME and iTIME: R package and Shiny application for visualization and analysis of immunofluorescence data.

Authors: Jordan H Creed; Christopher M Wilson; Alex C Soupir; Christelle M Colin-Leitzinger; Gregory J Kimmel; Oscar E Ospina; Nicholas H Chakiryan; Joseph Markowitz; Lauren C Peres; Anna Coghill; Brooke L Fridley
Journal: Bioinformatics Date: 2021-11-04 Impact factor: 6.931

8. The Capacity of the Ovarian Cancer Tumor Microenvironment to Integrate Inflammation Signaling Conveys a Shorter Disease-free Interval.

Authors: Kimberly R Jordan; Matthew J Sikora; Jill E Slansky; Angela Minic; Jennifer K Richer; Marisa R Moroney; Junxiao Hu; Rebecca J Wolsky; Zachary L Watson; Tomomi M Yamamoto; James C Costello; Aaron Clauset; Kian Behbakht; T Rajendra Kumar; Benjamin G Bitler
Journal: Clin Cancer Res Date: 2020-09-14 Impact factor: 12.531

9. Cancer Cell-Specific Major Histocompatibility Complex II Expression as a Determinant of the Immune Infiltrate Organization and Function in the NSCLC Tumor Microenvironment.

Authors: Amber M Johnson; Jennifer M Boland; Julia Wrobel; Emily K Klezcko; Mary Weiser-Evans; Katharina Hopp; Lynn Heasley; Eric T Clambey; Kimberly Jordan; Raphael A Nemenoff; Erin L Schenk
Journal: J Thorac Oncol Date: 2021-05-25 Impact factor: 20.121

10. Deep Profiling of Mouse Splenic Architecture with CODEX Multiplexed Imaging.

Authors: Yury Goltsev; Nikolay Samusik; Julia Kennedy-Darling; Salil Bhate; Matthew Hale; Gustavo Vazquez; Sarah Black; Garry P Nolan
Journal: Cell Date: 2018-08-02 Impact factor: 41.582