| Literature DB >> 34204319 |
Christopher M Wilson1, Oscar E Ospina1, Mary K Townsend2, Jonathan Nguyen3, Carlos Moran Segura3, Joellen M Schildkraut4, Shelley S Tworoger2, Lauren C Peres2, Brooke L Fridley1.
Abstract
Immune modulation is considered a hallmark of cancer initiation and progression. The recent development of immunotherapies has ushered in a new era of cancer treatment. These therapeutics have led to revolutionary breakthroughs; however, the efficacy of immunotherapy has been modest and is often restricted to a subset of patients. Hence, identification of which cancer patients will benefit from immunotherapy is essential. Multiplex immunofluorescence (mIF) microscopy allows for the assessment and visualization of the tumor immune microenvironment (TIME). The data output following image and machine learning analyses for cell segmenting and phenotyping consists of the following information for each tumor sample: the number of positive cells for each marker and phenotype(s) of interest, number of total cells, percent of positive cells for each marker, and spatial locations for all measured cells. There are many challenges in the analysis of mIF data, including many tissue samples with zero positive cells or "zero-inflated" data, repeated measurements from multiple TMA cores or tissue slides per subject, and spatial analyses to determine the level of clustering and co-localization between the cell types in the TIME. In this review paper, we will discuss the challenges in the statistical analysis of mIF data and opportunities for further research.Entities:
Keywords: cancer; data science; digital pathology; tumor immune microenvironment
Year: 2021 PMID: 34204319 PMCID: PMC8233801 DOI: 10.3390/cancers13123031
Source DB: PubMed Journal: Cancers (Basel) ISSN: 2072-6694 Impact factor: 6.575
Figure 1(A) Data are generated from biopsied tissue that is FFPE preserved, slices are then placed on a tissue microarray (TMA). (B) The slide is stained with antigen which are the sites that primary and secondary antibodies bind. (C) A range of different wavelengths of light is radiated at each location of the specimen and the wavelength emission goes through a spectral unmixing step (D), which deconvolves the observed intensity into cyan fluorescent protein (CFP), yellow fluorescent protein (YFP), background, and Raman components. In order to phenotype each cell (E–G), the tissue is segmented into tumor and stroma component using staining (E), intensities and information regarding the shape of the cell is used to derive the final phenotype via machine learning (random forest is a popular technique), followed by cell phenotyping (G).
Figure 2(A) Square-root transformed CD8 (Opal 520) and FOXP3 (Opal 570) fluorescence intensities of a tumor microarray core from an epithelial ovarian cancer tumor. Cell classifiers used in immunofluorescence studies can yield equivocal CD8+FOXP3+ assignments. Note that the CD8 threshold creates a clear separation of CD8+ cells, however the FOXP3 intensity threshold allows for a mixture of unassigned and FOXP3+ cells. (B) Square-root transformed percent of CD8+ cells detected in 1312 epithelial ovarian cancer tumor slices from 445 participants. The tumor slices come from 6 different TMAs, with initial collection of tissues starting at different times since the 1970s. The three horizontal lines represent the 1st, 2nd, and 3rd quartiles, and the width of the violin plots represent the number of slices showing a given percentage. As showed by narrower violin bases, the TMAs generated starting in the 1990s show less zeroes in CD8+ cell counts compared to the other TMAs generated in previous years. (C) mIF images from the same core from an ovarian cancer TMA. The two slices were stained with pan-cytokeratin (PCK) but were applied two different mIF panels to detect B and T cells (top). The cells detected after image processing are shown. Differences are observed between the two slices, including the presence of “holes”, making difficult to perform comparative spatial analysis of the two slices from the same TMA core. The white arrows correspond to a region that is similar across the different sections of the same core, while the green arrows correspond to regions that are dramatically different. Illustration that plots generated from mIF data capture these features and maintain the cell locations (bottom).
Figure 3(A) Illustration of the possible differences in immune activity within TMA cores. (B) Histograms with empirical and theoretical probability density function (top) and empirical and theoretical cumulative probability distribution (bottom) to guide in the selection of modelling assumptions for markers becoming increasingly rare (from left to right). The Poisson and binomial distribution do not account for overdispersion or zero-inflation, negative binomial and beta-binomial only account for over dispersion, and zero-inflated Poisson and binomial distribution only account for zero-inflation. The negative binomial and beta-binomial distributions are suitable for cell types where less than 50% of the cores have 0 for that cell type (CD3+, CD138+), while zero inflated models are best for excess 0s (CD19+).
Summary of the spatial measures outlined in Section 4 with the distinction for the spatial point processes being made to highlight the duality between distance to the nearest neighbor and locations of events.
| Type of Analysis | Name | Empirical Formula | Theoretical Value under CSR | Comments |
|---|---|---|---|---|
| Pixel/Area Based | Morisita Horn Index [ |
Robust to settings involving small number of cells [ | ||
| Duncan Segregation Index [ |
|
Do not work well for rare cell populations Checkerboard Problem [ | ||
| Nearest Neighbor | Euclidean Distance |
|
| |
| Nearest Neighbor |
|
| ||
| Spatial Point Processes |
|
|
| |
|
|
| |||
|
|
| |||
|
|
|
| ||
|
|
|
| ||
|
|
|
| ||
|
|
| |||
|
|
|
| ||
|
|
|
|
number of pixels; proportion of the population of cell type in the area or pixel; proportion of the population of cell type across the entire image; total number of cells; edge correction for the and cell; the randomly selected location; is a kernel function; hazard function of . Blue text corresponds to spatial point processes that based on the location of cells. These methods are also referred to as second order methods. Red text corresponds to spatial point process methods that focus on distance to the nearest neighbor.
Figure 4(A) Example images showing cores with little to significant damage. A point process generated from simulated data illustrating different approaches for spatial analysis: (B) Distance- or nearest neighbor-based methods, (C) neighborhood methods such as , and ; and (D) distance to neighbor measures such as and . (E) Example of original (left) and permuted point process (middle) with resulting histogram (right) of permutation-based estimates of K showing difference in theoretical and permuted-based estimates of CSR where the theoretical value under-estimated the value of under CSR.