Cellular imaging instrumentation advancements as well as readily available optogenetic and fluorescence sensors have yielded a profound need for fast, accurate, and standardized analysis. Deep-learning architectures have revolutionized the field of biomedical image analysis and have achieved state-of-the-art accuracy. Despite these advancements, deep learning architectures for the segmentation of subcellular fluorescence signals is lacking. Cellular dynamic fluorescence signals can be plotted and visualized using spatiotemporal maps (STMaps), and currently their segmentation and quantification are hindered by slow workflow speed and lack of accuracy, especially for large datasets. In this study, we provide a software tool that utilizes a deep-learning methodology to fundamentally overcome signal segmentation challenges. The software framework demonstrates highly optimized and accurate calcium signal segmentation and provides a fast analysis pipeline that can accommodate different patterns of signals across multiple cell types. The software allows seamless data accessibility, quantification, and graphical visualization and enables large dataset analysis throughput.
Cellular imaging instrumentation advancements as well as readily available optogenetic and fluorescence sensors have yielded a profound need for fast, accurate, and standardized analysis. Deep-learning architectures have revolutionized the field of biomedical image analysis and have achieved state-of-the-art accuracy. Despite these advancements, deep learning architectures for the segmentation of subcellular fluorescence signals is lacking. Cellular dynamic fluorescence signals can be plotted and visualized using spatiotemporal maps (STMaps), and currently their segmentation and quantification are hindered by slow workflow speed and lack of accuracy, especially for large datasets. In this study, we provide a software tool that utilizes a deep-learning methodology to fundamentally overcome signal segmentation challenges. The software framework demonstrates highly optimized and accurate calcium signal segmentation and provides a fast analysis pipeline that can accommodate different patterns of signals across multiple cell types. The software allows seamless data accessibility, quantification, and graphical visualization and enables large dataset analysis throughput.
In recent years there has been a surge in the use of automated-computer-aided detection for biomedical image processing and analysis (Cai et al., 2016; De Vos et al., 2016; Leigh et al., 2020; Roth et al., 2015; Teramoto et al., 2016). Automated biomedical image analysis incorporates advanced machine learning methods combined with computer vision and image processing techniques (Becker et al., 2013). Deep learning methods have shown encouraging results, outperforming experts in the field of medical imaging (Ganin and Lempitsky, 2014). Deep learning has also improved effectiveness of interpreting various modalities of data due to its computational and automated feature extraction abilities. Most biomedical images are of higher dimensions, i.e., medical resonance imaging, computed tomography, and Ca2+ imaging, and can be challenging and time-consuming to annotate manually. In addition, the acquiring devices from which the images are obtained may differ in their image output quality, therefore, are less optimal for deploying neural networks for such a task that requires a huge amount of consistent data for extracting features and generalizations. These limitations prompted the need for novel developments of effective systems to automate biomedical image tasks.Generative adversarial networks (GANs) are a recent machine-learning-based approach highly utilized in different image-related applications such as image translation (Chen and Hays, 2018; Sangkloy et al., 2017), editing (Dekel et al., 2018; Zhu et al., 2016), and image style transfer (Wang et al., 2018; Xian et al., 2018). GANs are categorized into two architectures called generators and discriminators, which are pitted against each other to improve their learning. The purpose of the discriminator is to classify the input image as a real or fake image, whereas the generator tries to synthesize images that are realistic to fool the discriminator, like a typical min-max game. GANs can potentially extract and learn fine and coarse information from images by combining multiple architectures with multiscale resolutions (Brown and Lowe, 2003; Burt and Adelson, 1987). Such examples are widespread in both conditional (Denton et al., 2015; Huang et al., 2017) and unconditional GAN settings (Chen and Koltun, 2017; Zhang et al., 2017). By incorporating multiple high-resolution architectures, they can learn distinct domain-specific features with high precision and robustness. Recent advancements in deep learning architectures including pix2pixHD (Wang et al., 2018), SPADE (Park et al., 2019), and Stargan-v2 (Choi et al., 2020) made way for the synthesis of high fidelity and vivid adversarial images.In addition, cross-domain image translation is another technique utilized that generates images from incomplete source information. It has been employed in medical image inpainting for standard statistical analysis (Dalca et al., 2018; Eilertsen et al., 2008; Van Tulder and de Bruijne, 2015), to improve standard steps of examination such as image registration (Iglesias et al., 2013; Wang et al., 2020a), to fuse information (Du et al., 2016; He et al., 2020; Wang et al., 2020b), in image segmentation (Chartsias et al., 2018; Dekel et al., 2018; Li et al., 2020; Roy et al., 2011), image construction (Commowick et al., 2009; Cordier et al., 2016), and in disease diagnosis (Li et al., 2014; Zhou et al., 2020). Each of these techniques convert images of one modality to another interchangeably. Some examples include magnetic resonance imaging (MRI), optical coherence tomography (OCT), spectral-domain OCT, positron emission tomography (PET), and ultrasound imaging (Dalca et al., 2018; Dekel et al., 2018; Eilertsen et al., 2008; Iglesias et al., 2013; Li et al., 2020; Van Tulder and de Bruijne, 2015; Wang et al., 2020a, 2020b).The ideal system is one that would have a dynamic end-to-end architecture in place that could both synthesize biomedical images from one modality to another and extract from and learn representations of the manifold features. In this paper we propose a new subcellular segmenting and analyzing tool for dynamic fluorescent signal maps that utilizes core generative architecture for segmenting dynamic fluorescent events (i.e., Ca2+ transient signals).Dynamic fluorescence imaging is a widely used tool in medical research and provides valuable information on cellular function and regulation. Dynamic fluorescent signals can be monitored at a cellular level to visualize changes in ions, pH, protein trafficking, or voltage in many pathologies. These subcellular signals are complex, for example, voltage and Ca2+ signaling patterns differ depending on cell type, cellular compartmentalization, and target tissue (Roome and Kuhn, 2018). Ca2+ signaling can range from intercellular and intracellular waves that spread for long distances, over several microns or more, to local Ca2+ release events that spread on the scale of nanometer to micrometer range (Baker et al., 2013; Berridge and Dupont, 1994; Cheng et al., 1996; Drumm et al., 2015; Hennig et al., 2010; Straub et al., 2000). Monitoring Ca2+ signaling can provide an in-depth approach to understanding the complexities of essential cellular activities. Extracting important information from cellular Ca2+ signals such as Ca2+ spread, duration, and initiation sites is almost entirely dependent on segmentation of Ca2+ signals during analysis. Ca2+ STMaps are widely used for the analysis and quantification of voltage and Ca2+ dynamics parameters that retain Ca2+ event information as a function of space occupied over time (Cheng et al., 1996; Colman et al., 2017; Hennig et al., 1999; Lee et al., 2009; Lentle and Hulls, 2018; Roome and Kuhn, 2018; Waadt et al., 2017). As a result, STMaps can provide a platform to effectively extract quantifiable cellular Ca2+ data (Baker et al., 2021a, 2021b; Drumm et al., 2014; Fedigan et al., 2017; Sancho et al., 2017; Sergeant et al., 2006). Extracting dynamic fluorescent data can be a challenging task, as the information is most often manually defined using single-line pixel measurement or region of interest (ROI). Because of this, discrepancies between users and individual user error are high, which can translate to inaccuracies during STMap event quantification. The current user-dependent process is highly variable, time-consuming, and labor-intensive; therefore, we propose a new software for efficient end-to-end architecture to precisely segment and quantify dynamic fluorescent cellular signals in a fast automated fashion. The tool will allow researchers to swiftly analyze large volume datasets with high accuracy.
Results
Subcellular signal segmentation using generative adversarial networks architecture
We used a state-of-the-art generative adversarial networks architecture to establish the proposed Spatiotemporal Subcellular Signal Segmentation Model “4SM”. The model incorporates multiple novel components to provide a powerful and consistent subcellular signal segmentation tool, as discussed further below.
Subcellular signal datasets
Datasets from subcellular Ca2+ signals imaged from gastrointestinal pacemaker-type cells called interstitial cells of Cajal (ICC) from the colon, stomach, and small intestine of Kit-Cre-GCaMP6f mice were used to train the 4SM model (Figure 1A). Ca2+ signals provide excellent variable patterns ranging from stochastic to rhythmic subcellular signals that can provide the basis of segmenting subcellar fluorescent signals in our model. Dynamic Ca2+ signals can then be plotted and visualized as STMaps. The 2D map is created from spatiotemporal reslicing of an X-Y-Z section in a movie or image stack (Figure 1B), and STMaps can be used for fluorescent signal analysis in a variety of cell types including neurons, cardiac myocytes, and smooth muscle cells. In our dataset, Ca2+ signals in ICC were recorded using spinning-disk confocal microscopy (Figures 1A and 1C). One cell type within the colon, subserosal ICC (ICC-SS), exhibited variable and complex patterns of Ca2+ signals (Figure 1). STMaps are a valuable tool because they allow for a more complete representation of individual cell Ca2+ signals through plotting Ca2+ events in cell space (x-axis) and temporally (y-axis) (Figure 1D). Therefore, Ca2+ signal features and quantification from STMaps can effectively describe the cellular Ca2+ dynamics and behaviors in many cell types. Accurate and successful analysis of Ca2+ imaging data and other fluorescent dynamic signals from STMaps depends heavily on defining and segmenting diverse signal events. Therefore, we created a novel deep-learning method that employs generative adversarial network architectures to segment signals with high accuracy.
Figure 1
Cellular Ca2+ spatiotemporal maps (STMaps)
(A) Representative image of subserosal ICC (ICC-SS) from the proximal colon of a Kit-Cre-GCaMP6f mouse in situ. The calcium sensor GCaMP6f is expressed in ICC cells. Scale bar in A is 10 μm.
(B) Image sequences of fluorescent signal (GCaMp6f) are acquired and presented as stacks or a movie. The stacks are composed of three dimensions x and y for each frame and z dimension, where z is time. Spatiotemporal two-dimensional map is constructed because of spatiotemporal reslicing of an X-Y-Z section in a movie or image stack to generate fluorescence STMaps.
(C) A single ICC-SS within the FOV (outlined in orange) and defines the cellular ROI for Ca2+ signal. Colored arrows from the cell correspond to the locations of each Ca2+ sites within the cell and plotted in the STMap in D.
(D) Representative two-dimensional color-coded intensity STMap of Ca2+ activity in ICC-SS. Each horizontal unit (cell space) in the STMaps represent a pixel or average of pixels in the original image. The coded pixels from each frame of the movie were used to construct a single row, and sequential rows were placed under each other to produce an STMap. Fluorescence intensity is plotted as (F/F0). Scale bars in D are x = 10 μm and y = 2.5s.
Cellular Ca2+ spatiotemporal maps (STMaps)(A) Representative image of subserosal ICC (ICC-SS) from the proximal colon of a Kit-Cre-GCaMP6f mouse in situ. The calcium sensor GCaMP6f is expressed in ICC cells. Scale bar in A is 10 μm.(B) Image sequences of fluorescent signal (GCaMp6f) are acquired and presented as stacks or a movie. The stacks are composed of three dimensions x and y for each frame and z dimension, where z is time. Spatiotemporal two-dimensional map is constructed because of spatiotemporal reslicing of an X-Y-Z section in a movie or image stack to generate fluorescence STMaps.(C) A single ICC-SS within the FOV (outlined in orange) and defines the cellular ROI for Ca2+ signal. Colored arrows from the cell correspond to the locations of each Ca2+ sites within the cell and plotted in the STMap in D.(D) Representative two-dimensional color-coded intensity STMap of Ca2+ activity in ICC-SS. Each horizontal unit (cell space) in the STMaps represent a pixel or average of pixels in the original image. The coded pixels from each frame of the movie were used to construct a single row, and sequential rows were placed under each other to produce an STMap. Fluorescence intensity is plotted as (F/F0). Scale bars in D are x = 10 μm and y = 2.5s.
Proposed generators of 4SM
4SM is composed of two generators, each consisting of various block components that effectively carries out the fine-grained segmentation process. Combining two generators for fine and coarse feature distinction is highly effective and has shown notable visual results for image translation tasks. We used two generators in our proposed method, Gfine and Gcoarse (Figure 2). Gfine learns local information of the event’s spatial spread and duration, whereas Gcoarse learns the boundary between multiple occasions in the generated Ca2+ STMaps. These generators are made of multiple residually connected blocks, downsampling and upsampling layers, and a skip connection of features between the fine and coarse generators. There is a manifold of inherent features that can be extracted from the STMaps, as shown in (Figures 1 and 6). To extract those features and to retain the visual information, we incorporated a spatial feature aggregation block as illustrated in Figures 2 and 3. Gfine has an input resolution of 64 × 64 × 1 and outputs images with the same resolution. Likewise, Gcoarse takes an image with half the resolution (32 × 32 × 1) and produces an image with the same resolution. Furthermore, the Gcoarse outputs a feature vector size 32 × 32 × 64 that is combined with one of the intermediate layers of Gfine using a skip connection module between the generators. In the next sections, we discuss each of these blocks in detail.
Figure 2
Overall architecture design for 4SM
Proposed generative adversarial network, 4SM, consists of two generators, Gcoarse (Gc) and Gfine (Gf), and two discriminators, Dcoarse (Dc) and Dfine (Df). Step 1: the coarse generator takes smaller Ca2+ STMaps as input and outputs coarsely segmented STMaps. Step 2: the fine generator takes larger Ca2+ STMaps as input and outputs finely segmented STMaps. Step3: fine and coarse discriminators take both original Ca2+ STMaps and segmented Ca2+ STMaps as input and outputs a probability map, which dictates if the pairs are either real or fake. The architecture also contains spatial-feature-aggregation (SFA) blocks in both the generators for extracting manifold features and synthesizing realistic images and two-dimensional convolution (Conv2d). The discriminators use auto-encoders for generated pixel-wise output to dictate if each pixel is real or fake.
Figure 6
Quantification of segmented Ca2+ events
(A) Representative image of ICC-MY cells from the gastric antrum of Kit-Cre-GCaMP6f mouse; the bar in A represents 20 μm.
(B) Corresponding color-coded STMap with fluorescent intensity indicated, plotted as (F/F0). Scale bars in B are x = 10 μm and y = 5 s.
(C) Representative segmented STMap using 4SM. The quantifications of Ca2+ events parameters are plotted in (D–G). (D) Ca2+ event area, (E) Ca2+ event duration, (F) Ca2+ event spatial spread, and (G) Ca2+ event frequency. Data were used from various 4SM STMaps from all ICC; n = 17. The red lines in D–G represent the average mean values.
Figure 3
Building blocks for generators and discriminators
The individual blocks of 4SM architecture consist of (A) downsampling block, (B) upsampling block, (C) spatial feature aggregation block, (D) residual generator block, and (E) residual discriminator block, where K stands for kernel size, S is for stride, and D is for dilation rate. The “+” indicates element-wise summation of features in the depth axis.
Overall architecture design for 4SMProposed generative adversarial network, 4SM, consists of two generators, Gcoarse (Gc) and Gfine (Gf), and two discriminators, Dcoarse (Dc) and Dfine (Df). Step 1: the coarse generator takes smaller Ca2+ STMaps as input and outputs coarsely segmented STMaps. Step 2: the fine generator takes larger Ca2+ STMaps as input and outputs finely segmented STMaps. Step3: fine and coarse discriminators take both original Ca2+ STMaps and segmented Ca2+ STMaps as input and outputs a probability map, which dictates if the pairs are either real or fake. The architecture also contains spatial-feature-aggregation (SFA) blocks in both the generators for extracting manifold features and synthesizing realistic images and two-dimensional convolution (Conv2d). The discriminators use auto-encoders for generated pixel-wise output to dictate if each pixel is real or fake.Building blocks for generators and discriminatorsThe individual blocks of 4SM architecture consist of (A) downsampling block, (B) upsampling block, (C) spatial feature aggregation block, (D) residual generator block, and (E) residual discriminator block, where K stands for kernel size, S is for stride, and D is for dilation rate. The “+” indicates element-wise summation of features in the depth axis.
Upsampling and downsampling blocks
Using convolution-based upsampling and downsampling may result in information loss and an inability to retain specific spatial information. However, we’ve created two new optimized blocks for these operations (Figures 3A and 3B). These two new upsampling and downsampling blocks can retain intrinsic characteristics and tackle information loss due to spatial feature compression. Both generators and discriminators of the 4SM model incorporates the downsampling block and only the generators use the upsampling block to generate the feature maps and output. The downsampling block comprises a convolution layer followed by a Batch-Normalization layer (Ioffe and Szegedy, 2015) and Leaky-ReLU activation layer (Figure 3A). Conversely, the upsampling block contains a transposed convolution layer, followed by Batch-Normalization (Ioffe and Szegedy, 2015) and Leaky-ReLU activation layer as illustrated in (Figure 3B). The downsampling block is used twice in Gcoarse after consecutive residual blocks, and the upsampling block is used twice to get the original spatial dimension’s feature output. For Gfine, the downsampling block is used once, and after multiple repetitions of residual blocks, a single upsampling block is used to get the original spatial dimension output. For both convolution and transposed convolution functions, a kernel size, K = 3, and stride, S = 2, was used.
Residual block
The basic configuration of our 4SM model comprises a residual unit with two successive separable convolution layers and a skip connection that adds the input feature tensor with the output. Regular convolution layers are computationally inefficient compared with separable convolution (Chollet, 2017). The main difference between regular convolution and separable convolution is that the latter incorporates a depth-wise convolution followed by a point-wise convolution. By utilizing separable convolution, the depth and spatial information is retained from the data. Some recent findings reported that combining separable convolutional layers with dilation allows for a more robust feature acquisition (Kamran et al., 2020). By incorporating this idea, we designed the proposed residual block to retain both depth and spatial information by using separable convolution followed by Batch-Normalization and Leaky-ReLU as a postactivation mechanism to decrease the number of computational parameters and ensure effective memory utilization (Figure 3D). Residual block reflection padding was also incorporated before each separable convolution operation to pad plausible data values by reusing values present along the borders of the input. After numerous experimentations, we found that a wider receptive field optimally captured neighboring information. To adjust for this, the residual block was further modified to contain two branches of separable convolution layers with distinct dilation rates, as illustrated in (Figure 3D); one branch with a dilation rate of D = 1 and the other with a dilation rate D = 2. A kernel size, K = 3, and stride, S = 1, was used for all separable convolution layers where each separable convolution has a pre-reflection padding layer, post-Batch-Normalization and Leaky-ReLU activation layer. Finally, the skip connection from the input and the two branches' output was combined collectively to generate the final output. In addition, a similar technique was incorporated to create a distinct residual block for the discriminators, as shown in (Figure 3E). Conversely, the residual block for the discriminator is made of a separable convolution layer, followed by Batch-Normalization and Leaky-ReLU activation functions where the separable convolution has a kernel size of K = 3 and stride, S = 1.
Spatial feature aggregation block
The proposed spatial feature aggregation (SFA) block (Figure 3C) was implemented to accommodate for the potential loss of spatial information from the manifold feature space due to consecutive upsampling and downsampling (Chen et al., 2018; Zhang et al., 2019). To retain intrinsic features, the skip connections combined with the SFA block were used to fuse feature information from the bottom layers of the network with the top layers. The block comprises two successive residual units with convolution, Batch-Normalization and Leaky-ReLU layers. Batch-Normalization is utilized for faster and more stable training by normalizing the inputs with re-centering and re-scaling (Ioffe and Szegedy, 2015). Leaky-ReLU is used for solving the vanishing gradient problem (Maas et al., 2013). Here, the kernel size is K = 3 and stride, S = 1. There are two skip connections: (1) coming from the input and added to the first residual unit’s output and (2) coming from the input and element-wise summed with the last residual unit’s output. Gcoarse consists of two SFA blocks that come out of two encoders and are successively added with the two decoders. Conversely, Gfine has only one SFA block between the encoder and decoder.
Autoencoders as discriminators
In the proposed subcellular signal segmenting model, we used two discriminators, one for Gcoarse and one for Gfine, termed Dcoarse and Dfine, respectively. The decoders were incorporated to convert the discriminators into a single auto-encoder, unlike the patchGAN discriminators seen in other architectures (Choi et al., 2020; Park et al., 2019; Wang et al., 2018). The auto-encoder dictates if each pixel, rather than patches, of the generated image is real or fake. This in turn helps with the pixel-level segmentation task, as more discriminative features are retained throughout the architecture and results in a highly accurate segmentation of signals in STMaps. Dfine interprets STMaps with an input size of 64 × 64 × 1 and Dcoarse interprets STMaps with an input size of 32 × 32 × 1 and outputs feature-maps of 64 × 64 × 1 and 32 × 32 × 1 successively. Each point in this feature vector represents values between −1 and +1. Here, −1 represents a real value and +1 represents a synthesized or adversarial pixel.
Adversarial objective function
To promote adversarial training, a multihinge loss for GANs (Lim and Ye, 2017; Zhang et al., 2019) was used in our architecture as illustrated in Equations 1 and 2. Conclusively, all the Ca2+ STMaps and their corresponding masks were normalized to [−1,1] to widen the gap between the real and synthesized segmentation map’s pixel intensities. In Equation 3, we added them and used as a weight multiplier with the .In Equation 1, the discriminators were first trained on real STMaps, x, and real segmentation maps, y, which is signified by D (x,y) and then retrained on the real STMaps, x, and the synthesized segmentation map, G(x), which is signified by D (x, G(x)). The E stands for expected value of multiple samples, given the input x,y and min is for taking the minimum value out of the two values inside the parenthesis. First, the discriminators Dfine and Dcoarse went through batch-wise training for several iterations on randomly sampled data. Following that, the Gcoarse was trained while the weights of Dfine and Dcoarse were kept isolated. In the same manner, the Gfine was trained on a batch of random STMaps, whereas the weights of both the Dfine and Dcoarse were kept isolated. The generators incorporated the reconstruction loss as shown in Equation 2. By utilizing these losses, the generated segmentation map depicted a more realistic representation of actual Ca2+ events. We also incorporated a feature-matching loss component (Wang et al., 2018) with the Dfine and Dcoarse, given in Equation 5.Where, is the reconstruction loss for a real STmap, y, given a generated segmentation map, G(x), and ||⋅|| stands for absolute value. The loss for both Gfine and Gcoarse was utilized so to allow our model to generate high-quality segmentation maps of two distinct scales. This technique has been previously utilized by combining GAN cost function with a mean squared error (MSE) loss, a commonly used loss function for regression (Pathak et al., 2016). Finally, Equation 6 was estimated by first taking the features from the intermediate layers of the discriminators by inserting the real and then the synthesized segmentation map sequentially. Here, N stands for the number of feature layers extracted from the discriminators, D
(x,y) is the intermediate feature from discriminator with real image as input, and D(x, G(x)) is the intermediate feature from discriminator with synthesized image as input. By joining Equations 3, 4, and 5 the final objective function for 4SM was formulated as shown in Equation 6.Here, , , and signify loss weighting that were multiplied with their corresponding losses. The loss weighting dictates which networks to focus on while training. For the proposed architecture, more weight is given to the and , and thus the bigger values are selected.
Segmenting model experimentation
Dataset validation
We trained our 4SM model on STMaps generated from 64 Ca2+ imaging movies with varied resolution, taking 64 × 64 overlapping crops with stride 8. Furthermore, we followed the same procedure to test on STMaps generated from 17 additional videos. The images were collected in gray-scale format, and the segmentation was performed in a binary format. We used the leave-one-out of cross-validation (LOO-XVE) for training the architecture and separated the dataset into 5 folds. This technique involved training on 4 folds and validating on the remaining single fold. We tested all 5 versions of the model on the test dataset and selected the most accurate as the primary model (Table 1).
Table 1
Five-fold cross-validation of 4SM segmentation masks
Fold
Mean-IoU
SSIM
Dice-coeff
1
92.98
89.55
96.31
2
92.96
89.60
96.30
3 (Best)
93.06
89.52
96.36
4
92.92
89.58
96.28
5
92.81
89.44
96.17
Five-fold cross-validation of 4SM segmentation masks
Stride and inference speed
STMaps can vary in image size, so we divided STMap images into smaller images by cropping with a fixed window size of 64 × 64 resolution for robust segmentation of Ca2+ events. The cropping window goes from left-to-right and then top-to-bottom and can overlap on top of each other. This overlapping factor is called stride or step-size. For example, given image size = 128 × 128, if we choose stride = 64, then the number of cropped images will be 4. If the stride = 8, the cropped images will be 81. The equation to find the number of crops per image can be calculated as follows:In Equation 7, I = image height or width, C = crop size, S = stride, and O = number of crops for height or width. The equation shows that a larger stride will create a small number of crops from each test image, and a smaller stride will yield a larger number of cropped images. Contrarily, larger strides will speed up the model’s prediction but will result in poor visual quality. At the same time, smaller strides will slow down the model’s prediction and yield a high-quality visual result. One of the features of our model is that the user can control this stride parameter while testing their images.
Hyper-parameter selection
For adversarial training, hinge loss was used for our segmentation model (Lim and Ye, 2017; Zhang et al., 2019). We selected , , and (Equation 6). For optimizer, Adam (Kingma and Ba, 2014) was used, with a learning rate of , , and . For training our model, a batch of trained for 100 epochs, which took approximately 12 h to train on NVIDIA P100 GPU.
Training
To train our segmentation model, multiple steps were generated: (1) all the hyper-parameters were initialized, (2) a sample batch of the real STMaps and segmentation map, x, and y were created; (3) the real STMaps were trained and fake segmentations were paired with Dfine and Dcoarse; (4) the Gfine and Gcoarse were used to synthesize fake segmentation maps and the real STMaps and fake segmentation-map, x, G(x) were used to train discriminators Dfine and Dcoarse; (5) the adversarial loss was calculated, , and weights were updated, while keeping the weights of the discriminators isolated; and (6) the generators were trained. We calculated the reconstruction loss and updated both generator’s weights. Subsequently, both discriminators’ weights were reintegrated, the feature matching loss was calculated and the discriminator weights were then updated. In the final stage, both discriminator’s weights were isolated and together, the discriminators and generators were adjusted. Lastly, total loss was calculated by adding and multiplying their relative weights in the model, and weights were recorded.
Evaluation and result
We refer to the newly proposed Spatiotemporal Subcellular Signal Segmenting Model as 4SM, and part of its novelty is due to its ability to pick up and discriminate subtle differences in Ca2+ events in a variety of cell types. Original STMaps for both rhythmic and stochastic Ca2+ style events were produced from two different subsets of ICC’s as shown in (Figure 4). The original STMaps (Figures 4
Ai and Bi) were segmented by our model (Figures 4
Aii & Bii), and the overlays are visualized in Figures 4
Aiii & Biii. To evaluate the segmentation efficiency of our architecture, we color-coded the original STMaps in red and the 4SM segmented maps in green to allow better visualization of Ca2+ events and overlap in yellow (Figures 4C and 4D). The area overlap percentages of rhythmic signals were 81.5% ± 2.1% (n = 5) and 86% ± 3.9% (n = 5) for stochastic Ca2+ signals. We also calculated the Pearson’s Correlation Coefficient (PCC), which shows the close correlation of overlap between the original image and the 4SM segmented image. PCC values were 0.94 for rhythmic signals and 0.86 for stochastic signals (Figures 4E and 4F; n = 5).
Figure 4
Predicted 4SM segmentation for Ca2+ STMaps
(A) Representative STMap of Ca2+ signals in colonic ICC-MY. (Ai) Original Ca2+ STMap; (Aii) predicted segmented map generated by 4SM. White pixels represent the Ca2+ events and black pixels are the background; (Aiii) semitransparent mask (red) from Aii overlayed on original STMap in Ai, which shows the accuracy and extent of the predicted mask. Scale bars in Ai are x = 10 μm and y = 5 s.
(B) Representative STMap of Ca2+ signals in small intestine ICC-DMP. (Bi) Original Ca2+ STMap; (Bii) 4SM segmented map, (Biii) semitransparent mask (red) from Bii overlayed on STMap Bi. Scale bars in Bi are x = 10 μm and y = 5 s.
(C) Combined map of two merged STMaps from rhythmic ICC-MY (red) and 4SM segmented map (green), and the overlay is representative of the complete pixel to pixel overlap of both maps (yellow).
(D) Two overlayed STMaps from the stochastic Ca2+ signals from ICC-DMP; both original (red) and 4SM segmented map (green).
(E and F) Summary data of percentage of Ca2+ signals area overlaps between the intracellular Ca2+ transients in the raw STMaps and 4SM masks, where E is for rhythmic and F is for stochastic events. Pearson’s correlation coefficient (PCC) showing the linear relationship of overlap between the original STMaps and the 4SM maps (E & F; n = 5).
Predicted 4SM segmentation for Ca2+ STMaps(A) Representative STMap of Ca2+ signals in colonic ICC-MY. (Ai) Original Ca2+ STMap; (Aii) predicted segmented map generated by 4SM. White pixels represent the Ca2+ events and black pixels are the background; (Aiii) semitransparent mask (red) from Aii overlayed on original STMap in Ai, which shows the accuracy and extent of the predicted mask. Scale bars in Ai are x = 10 μm and y = 5 s.(B) Representative STMap of Ca2+ signals in small intestine ICC-DMP. (Bi) Original Ca2+ STMap; (Bii) 4SM segmented map, (Biii) semitransparent mask (red) from Bii overlayed on STMap Bi. Scale bars in Bi are x = 10 μm and y = 5 s.(C) Combined map of two merged STMaps from rhythmic ICC-MY (red) and 4SM segmented map (green), and the overlay is representative of the complete pixel to pixel overlap of both maps (yellow).(D) Two overlayed STMaps from the stochastic Ca2+ signals from ICC-DMP; both original (red) and 4SM segmented map (green).(E and F) Summary data of percentage of Ca2+ signals area overlaps between the intracellular Ca2+ transients in the raw STMaps and 4SM masks, where E is for rhythmic and F is for stochastic events. Pearson’s correlation coefficient (PCC) showing the linear relationship of overlap between the original STMaps and the 4SM maps (E & F; n = 5).The proposed 4SM model was compared with current state-of-the-art segmentation models for Ca2+ STMap segmentation based on Weka segmentation models (Leigh et al., 2020) and a deep-learning-based auto-encoder model called U-Net (Ronneberger et al., 2015). The Weka model utilized a fast-random-forest for segmenting Ca2+ events with hand-selected features to generate binary segmentation of the Ca2+ transient events, whereas U-Net utilizes an encoder and decoder with convolution filters to learn spatial information from images and successively down-samples, up-samples, and extract features of different depths. U-Net has been extensively used in various biomedical imaging modalities (Alom et al., 2018, 2019; Zhuang, 2018). A side-by-side comparison of the new proposed 4SM, Weka, and U-Net predictions are given in (Figures 5A–5E). The output of the 4SM architecture produced more accurate segmentation maps that were 94% similar to the ground-truth (GT). Contrarily, Weka produced broken patches of Ca2+ events, and the boundaries of the events were not uniformly spread and therefore less accurate. U-Net cannot produce well-defined boundaries for the Ca2+ events and misclassifies background noise as events. This, in turn, makes 4SM the more optimal model out of the two. In addition, the spatial spread and duration of the events can be predicted accurately based on the predicted output using the connected component algorithm, as illustrated in (Figure 6).
Figure 5
Comparative predicted segmentation for 4SM
(A) Representative raw STMap of Ca2+ signals in colonic ICC-IM.
(B) Ground-truth segmentation mask annotated by an expert.
(C) Predicted segmentation mask generated by 4SM, (D) predicted segmentation mask generated by Weka, and (E) predicted segmentation mask generated by U-Net. The red rectangle boxes on STMaps are plotted every second row from the top and highlight the expanded scale of STMap Ca2+ events.
(F–H) STMap segmentation efficiency, measured as percentage efficiency, using SSIM (F), mean-IoU (G), and dice-coefficient (H) between Weka, 4SM, and U-Net architectures. Scale bars in A are x = 2 s and y = 10 μm.
Comparative predicted segmentation for 4SM(A) Representative raw STMap of Ca2+ signals in colonic ICC-IM.(B) Ground-truth segmentation mask annotated by an expert.(C) Predicted segmentation mask generated by 4SM, (D) predicted segmentation mask generated by Weka, and (E) predicted segmentation mask generated by U-Net. The red rectangle boxes on STMaps are plotted every second row from the top and highlight the expanded scale of STMap Ca2+ events.(F–H) STMap segmentation efficiency, measured as percentage efficiency, using SSIM (F), mean-IoU (G), and dice-coefficient (H) between Weka, 4SM, and U-Net architectures. Scale bars in A are x = 2 s and y = 10 μm.Quantification of segmented Ca2+ events(A) Representative image of ICC-MY cells from the gastric antrum of Kit-Cre-GCaMP6f mouse; the bar in A represents 20 μm.(B) Corresponding color-coded STMap with fluorescent intensity indicated, plotted as (F/F0). Scale bars in B are x = 10 μm and y = 5 s.(C) Representative segmented STMap using 4SM. The quantifications of Ca2+ events parameters are plotted in (D–G). (D) Ca2+ event area, (E) Ca2+ event duration, (F) Ca2+ event spatial spread, and (G) Ca2+ event frequency. Data were used from various 4SM STMaps from all ICC; n = 17. The red lines in D–G represent the average mean values.We used three metrics for evaluating the segmented Ca2+ STMaps: (1) mean-intersection over-union (mean-IoU), (2) structural similarity index measure (SSIM), and (3) dice-coefficient. Mean-IoU is a metric that measures the percentage overlap between the target mask and the prediction mask by dividing the number of pixels that overlap by the total number of pixels. Dice coefficient is calculated as 2∗TP/(2 TP + FP + FN). Here, TP, FP, and FN denote true positive, false positive, and false negative, respectively. Mean-IoU and dice-coefficient are the current gold standards for measuring segmentation results for many semantic segmentation challenges such as Pascal-VOC2012 (Everingham et al., 2015), MS-COCO (Lin et al., 2014), and Cityscapes (Cordts et al., 2016). Contrarily, SSIM is a standard metric for evaluating GANs for image-to-image translation tasks by measuring image quality degradation. More simply, SSIM predicts the perceptual difference and similarity between two similar images. We provide the 5-fold cross-validated results for our model in terms of mean-IOU, SSIM, and dice-coefficient as shown in (Table 1). The best performing model is Fold 3. For testing, we compared between Weka trained model, 4SM model (Fold −3), and U-Net. 4SM outperforms both U-Net and Weka segmentation, in terms of mean-IoU, SSIM, and dice-coefficient, the three main metrics for this task as shown in Figures 5F–5H and Table 2.
Table 2
Comparison between 4SM, Weka, and U-Net segmentation models
Method
Mean-IoU
SSIM
Dice-coeff
Weka
82.22
79.16
86.24
4SM
93.85
90.63
96.79
U-Net
90.62
84.83
91.28
Comparison between 4SM, Weka, and U-Net segmentation models
Quantification of signal event parameters
4SM incorporates effective quantification of subcellular event parameters. We achieved this by applying a connected component algorithm (He et al., 2009) on the 4SM segmented STMaps. The connected component algorithm, imported from the Open-CV library (Bradski, 2000), finds the frequency of Ca2+ events and defines the rectangular bounding box surrounding each event to define its duration, spatial spread, and area parameters in pixels (Figures 6A–6G). We also calculated the interval duration between Ca2+ events by subtracting the lower-most pixel of the first event from the upper-most pixel of the second event and applied the process for the remaining events. Moreover, the tool integrates an image calibration option to multiply the pixels in terms of scales (millimeter, centimeter, etc.); this allows the user to easily quantify the pixel values to its native scale size. Implementing the quantification process of several Ca2+ parameters within 4SM allows effective data interpretation and enhances the analysis pipeline speed.
Implementation of 4SM into a web-based tool
We employed a web-based solution to implement our python code (4SM) to successfully provide a software tool capable of analyzing subcellular fluorescent signals. The software tool includes novel algorithms to effectively analyze fluorescent signal STMaps and can be run through a user-friendly graphical interface as shown in Figure 7. The deep learning algorithms were implemented using Python, and the graphical interface was implemented using Streamlit—an open-source Python library. A key advantage of using Streamlit is to build the software user interface without additional frameworks requirements to be installed. Also, the software can run on any operating system that has Python installed and can be accessible through the web.
Figure 7
4SM interface features
(A) Representative image of the 4SM interface showing the user control segmentation and calibration settings (green box). The STMaps segmentation results are visualized within the tool (blue box). Quantification results (red box) and graphical plots (purple box) control are easily accessible by drop-down menus.
(B) The quantifications of Ca2+ signals parameters are displayed numerical format and (C) plotted into graphical plots to aid in interpretation of the data. An example analysis comparison between datasets from various 4SM STMaps that are divided into rhythmic (blue circles) and stochastic (orange circles) groups. n = 17.
4SM interface features(A) Representative image of the 4SM interface showing the user control segmentation and calibration settings (green box). The STMaps segmentation results are visualized within the tool (blue box). Quantification results (red box) and graphical plots (purple box) control are easily accessible by drop-down menus.(B) The quantifications of Ca2+ signals parameters are displayed numerical format and (C) plotted into graphical plots to aid in interpretation of the data. An example analysis comparison between datasets from various 4SM STMaps that are divided into rhythmic (blue circles) and stochastic (orange circles) groups. n = 17.4SM is capable of batch processing datasets to allow fast data analysis throughput. The graphical user interface enables the user to specify one or more input images that can be uploaded using drag and drop functions. The tool facilitates a real-time display of the output of the algorithm in a single view and allows for user control of certain input parameters such as threshold, connectivity, pixel calibration, and stride parameters to enhance the degree of event segmentation (Figure 7A). The graphical user interface enables visualization of original, segmented, predicted maps, and overlay images for each input map image. The software provides multiple quantification parameters including the frequency, area, duration, and spatial spread of signal events and enables the user to select, view, and download the output quantification parameters of current or previous runs (Figure 7B).Within the tool, quantified data can be visualized as graphical plots (Figure 7C) to facilitate the data interpretation on the fly. Each output image and dataset quantification can be visualized and exported within the software.
Discussion
We provide a new software tool “4SM,” a spatiotemporal subcellular signal segmentation and analysis tool that provides a seamless workflow and incorporates state-of-the-art machine learning segmentation. The software demonstrates fast segmentation, quantification of events, visualization, and graphical output of data within one tool. Overall, 4SM provides an automated end-to-end solution for subcellular spatiotemporal signal analysis that is both effective and accurate. At present, only a handful of groups have developed methods that use machine learning to segment Ca2+ signals (Denis et al., 2020; Giovannucci et al., 2019; Soltanian-Zadeh et al., 2019). Most of these studies do not permit high throughput of large datasets and are limited to analyze specific subsets of cells. They also rely on waveform analysis methods for data retrieval, which typically do not provide a redout of spatial information within a given cell. The major advantages of using STMaps for dynamic fluorescent signal analysis is that they can provide a platform to analyze a diverse type of cells independently of their shape or origin and can be used to segment and analyze a variety of subcellular dynamic fluorescent signals to monitor ions, pH, protein trafficking, or changes in cellular voltage. Thus, by combining the use of spatiotemporal maps of fluorescent signals and deep learning methodology for segmentation and analysis, we created a universal tool that can be utilized across multiple cell types and can interpret data from a variety of dynamic subcellular fluorescent probes. We validated this point by training our model with two morphologically and functionally distinct cell types of interstitial cells where the 4SM model was effective in both cases.Very recently, machine learning and deep learning approaches were adopted for Ca2+ imaging for STMap analysis by our group. Combining Waikato Environment for Knowledge Analysis (Weka) segmentation, we incorporated fast implementation of fast-random-forest and selective feature learning for Ca2+ event segmentation (Leigh et al., 2020). However, this technique requires hand-picked features for learning and segmentation, and user-error diminishes its precision. The 4SM model overcomes these limitations by automating this process using end-to-end deep learning architecture to provide a more robust standardization and high-throughput analysis of cellular Ca2+ dynamics.Our 4SM architecture incorporates multiple attention-based skip connections in generators and comprises novel residual blocks for both generators and discriminators. In addition, it employs reconstruction, feature-matching, and perceptual loss along with adversarial training to fundamentally learn shared features across domains. Because of these features, the 4SM model can distinguish distinct domain-specific features with high precision and robustness for image segmentation analysis.Cross-domain image translation is widely employed in medical image inpainting for standard statistical analysis (Dalca et al., 2018; Eilertsen et al., 2008; Van Tulder and de Bruijne, 2015) and utilized for converting images of one modality to another interchangeably. However, no technique has been incorporated to learn manifold features with coarse-to-fine generators or multiscale discriminators for coarse and fine feature learning. Our 4SM model successfully combines the learn manifold features utilizing the coarse and fine generators to achieve segmentation accuracy in an effective manner.Most of the image-to-image translation models are either focused on domain level transformation or combining style and textures of two images. For instance, some models use attention modules to generate high-resolution images to extract local feature information without utilizing perceptual loss (Choi et al., 2020; Kim et al., 2019), whereas other models emphasize more on incorporating perceptual loss with different styles of target images (Park et al., 2019; Wang et al., 2018). To adapt to these changes, our 4SM architecture combines perceptual loss and a multiscale discriminator to retain global information like the spatial spread and duration of the Ca2+ transient events. In addition, our model utilizes a feature-matching loss and introduces new multiattention modules to retain local features such as co-localization and boundary of signal events. The visual representation and quantitative results prove that our segmentation technique surpasses the current state-of-the-art architectures.In conclusion, our new subcellular signal segmenting analysis tool, 4SM, provides an automated, accurate, and fast workflow that can achieve high throughput of large datasets and can be utilized across various fluorescence signaling pattern applications.
Limitations of the study
This study provides a novel solution for the fast and accurate segmentation and analysis of dynamic fluorescence cellular signals. However, the proposed software requires newer GPUs to function properly and run at the intended speed, and many newer GPUs can be costly, which might limit the software’s accessibility. This limitation is not specific to 4SM as most deep learning architectures rely on a GPU-based computation. The GPU requirement limits the use of the software in commonly used image analysis software tools such as Image J. Despite these limitations, 4SM software can provide fast and accurate segmentation of cellular fluorescent signals to facilitate seamless signal analysis.
STAR★Methods
Key resources table
Resource availability
Lead contact
Further information should be directed to and will be fulfilled by the lead contact Dr. Sal Baker (sabubaker@med.unr.edu).
Materials availability
Software code generated in this study have been deposited to Github: https://github.com/SharifAmit/4SM/tree/main and https://doi.org/10.2139/ssrn.4016761.All data are available in the main text.
Method details
Tissue preparation
Colonic, antral and small intestinal tissues were removed from animals and small segments (2 cm in length) were incubated in Krebs-Ringer bicarbonate solution (KRB) as previously described (Baker et al., 2016). Tissues were cut along the mesenteric region and contents were removed. The mucosa and submucosal layers were removed and the remaining tunica muscularis was placed into a 60 mm Sylgard coated dish and pinned flat.
Ca2+ imaging
Muscle sheets isolated from the colon, stomach, and jejunum (∼5.0 × 10.0 mm) were pinned down and perfused with 37°C KRB solution. Tissues were equilibration for a period of 1 h. As previously described (Baker et al., 2018); We used a spinning-disk confocal microscope (CSU-W1 spinning disk; Yokogawa Electric Corporation) for all Ca2+ imaging experiments. The confocal head is connected to Nikon Eclipse FN1 microscope equipped with a 20 × 0.5 NA, 40 × 0.8 NA, and 60 × 1.0 NA CFI Fluor lens (Nikon instruments INC, NY, USA). Laser at 488 nm wavelength were directed using a Borealis system (ANDOR Technology, Belfast, UK). EMCCD Camera (Andor iXon Ultra; ANDOR Technology, Belfast, UK) was used to capture the GCaMP6f emission. Images were acquired at 33 frames per second using MetaMorph software (Molecular Devices INC, CA, USA). Nicardipine (100 nM) was used during the imaging experiments to minimize contractile artifacts.
Ca2+ event analysis
Analysis and quantifications of Ca2+ activity in ICC was performed as previously described (Baker et al., 2016, 2021b). Briefly, movies of Ca2+ activity in ICC (30 s long) were converted to a stack of TIFF (tagged image file format) images and imported into Fiji (version 2.0.0-rc-69/1.52, National Institutes of Health, MD, USA, https://fiji.sc/) for analysis. Whole cell ROIs were used to generate spatiotemporal maps (STMaps) of Ca2+ activity in individual ICC. STMaps allow for a more complete representation of individual cell Ca2+ signals through a 2D image that describes both space of a cell (x axis) and time (y axis) or visa versa. STMaps are a result of spatiotemporal reslicing of an X-Z section in a movie or image stack, where Z is time (Figure 1B). The space of a cell (x axis) is generated in the STMap as result of pixel line or via line-scan average analysis and the pixel values are plotted in time. Each horizontal unit (cell space) in the STMaps represent a pixel or average of pixels in the original image. The coded pixels from each frame of the movie were used to construct a single row and sequential rows were placed under each other to produce an STMap. STMaps presented in the results were generated by rotating image stacks so that ICC were oriented either horizontally or vertically and STMaps of Ca2+−induced fluorescence changes averaged across the diameter of the cell and were constructed using the reslice and z project functions. Ca2+ maps area overlap percentages were calculated using Fiji and Pearson’s coefficient between original STMaps and segmented 4SM STMaps were obtained using the Just Another colocalization plugin (JACoP; https://imagej.nih.gov/ij/plugins/track/jacop2.html).
Experimental model and subject details
Animals
GCaMP6f-floxed mice (B6;129S-Gt(ROSA)26Sor/J) were acquired from Jackson Laboratories (Bar Harbor, MN, USA) and crossed with Kit-Cre mice (c-Kit+/Cre−ERT2), provided by Dr. Dieter Saur (Technical University Munich, Munich, Germany). Kit-Cre-GCaMP6f mice were injected with tamoxifen at 6–8 weeks of age (2 mg for three consecutive days), as previously described (Baker et al., 2021a, 2021b). 15 days after tamoxifen injection, Kit-Cre-GCaMP6f mice were anaesthetized by isoflurane inhalation (Baxter, Deerfield, IL, USA) and sacrificed by cervical dislocation. All procedures were approved by the Institutional Animal Use and Care Committee at the University of Nevada, Reno. All animals used and the protocols carried out in this study were in accordance with the National Institutes of Health Guide for the Care and Use of Laboratory Animals.
Quantification and statistical analysis
Quantification and statistics of Ca2+ activity was reported using 4SM, Ca2+ event frequency in ICC was expressed as the number of events fired per cell per second (sec−1). The duration of Ca2+ events was expressed as full duration (ms), and Ca2+ event spatial spread was expressed as μm of cell propagated per Ca2+ event. Unless otherwise stated, data is represented as mean ± standard error (S.E.M.). Statistical analysis was performed using either a student’s t-test or with an ANOVA with a Dunnett post hoc test where appropriate. When describing data throughout the text, n refers to the number of STMaps used in that dataset.
4SM algorithms and implementation
The 4SM tool was developed using Python and the GAN-based algorithm was implemented using Keras (https://keras.io/) and Tensorflow Libraries (https://www.tensorflow.org/). For image pre-processing, visualization, and making utilities for training pipelines we used NumPy (https://numpy.org/), Matplotlib (https://matplotlib.org/), and Pillow (https://pillow.readthedocs.io/en/stable/) packages. Finally, the Streamlit library (https://streamlit.io/) was used to build the user interface for 4SM.
4SM testing
Dataset of 81 STMaps from ICC Ca2+ movies were professionally annotated. The dataset was then divided into two groups; group1 of 64 STMaps used for training and cross-validation of 4SM and group 2 of unseen 17 STMaps were used for testing the analysis tool. Coding of 4SM was performed using a PC desktop equipped with dual graphic cards RTX 2080 (Dell 2018; Dell inc; USA).
Download and installation
The open-source tool “4SM” can be downloaded from Github: https://github.com/SharifAmit/4SM/tree/main. Detailed description of installation and downloading the software is provided in the repository (README file) and documentation. A video with captions is also included to provide step by step demonstration of the software installation process (https://www.youtube.com/watch?v=t2LsQkyAGQc). For successful operation in a seamless manner additional components need to be installed which are provided in the repository. The software currently supports 8-bit JPG, PNG and TIF gray-scale images. The minimum hardware requirements for running the software: RAM: 16 GB, GPU: NVIDIA GPU 3 GB or higher memory with CUDA 10.0 support) and CPU Cores: 4 or highe.
Authors: Juan Eugenio Iglesias; Ender Konukoglu; Darko Zikic; Ben Glocker; Koen Van Leemput; Bruce Fischl Journal: Med Image Comput Comput Assist Interv Date: 2013
Authors: Somayyeh Soltanian-Zadeh; Kaan Sahingur; Sarah Blau; Yiyang Gong; Sina Farsiu Journal: Proc Natl Acad Sci U S A Date: 2019-04-11 Impact factor: 11.205
Authors: G W Hennig; N J Spencer; S Jokela-Willis; P O Bayguinov; H-T Lee; L A Ritchie; S M Ward; T K Smith; K M Sanders Journal: Neurogastroenterol Motil Date: 2010-01-05 Impact factor: 3.598
Authors: Salah A Baker; Grant W Hennig; Anna K Salter; Masaki Kurahashi; Sean M Ward; Kenton M Sanders Journal: J Physiol Date: 2013-10-21 Impact factor: 5.182