Benjamin Chidester1, Tianming Zhou1, Minh N Do2, Jian Ma1. 1. Computational Biology Department, School of Computer Science, Carnegie Mellon University, Pittsburgh, PA, USA. 2. Department of Electrical and Computer Engineering, University of Illinois, Urbana, IL, USA.
Abstract
MOTIVATION: Neural networks have been widely used to analyze high-throughput microscopy images. However, the performance of neural networks can be significantly improved by encoding known invariance for particular tasks. Highly relevant to the goal of automated cell phenotyping from microscopy image data is rotation invariance. Here we consider the application of two schemes for encoding rotation equivariance and invariance in a convolutional neural network, namely, the group-equivariant CNN (G-CNN), and a new architecture with simple, efficient conic convolution, for classifying microscopy images. We additionally integrate the 2D-discrete-Fourier transform (2D-DFT) as an effective means for encoding global rotational invariance. We call our new method the Conic Convolution and DFT Network (CFNet). RESULTS: We evaluated the efficacy of CFNet and G-CNN as compared to a standard CNN for several different image classification tasks, including simulated and real microscopy images of subcellular protein localization, and demonstrated improved performance. We believe CFNet has the potential to improve many high-throughput microscopy image analysis applications. AVAILABILITY AND IMPLEMENTATION: Source code of CFNet is available at: https://github.com/bchidest/CFNet. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
MOTIVATION: Neural networks have been widely used to analyze high-throughput microscopy images. However, the performance of neural networks can be significantly improved by encoding known invariance for particular tasks. Highly relevant to the goal of automated cell phenotyping from microscopy image data is rotation invariance. Here we consider the application of two schemes for encoding rotation equivariance and invariance in a convolutional neural network, namely, the group-equivariant CNN (G-CNN), and a new architecture with simple, efficient conic convolution, for classifying microscopy images. We additionally integrate the 2D-discrete-Fourier transform (2D-DFT) as an effective means for encoding global rotational invariance. We call our new method the Conic Convolution and DFT Network (CFNet). RESULTS: We evaluated the efficacy of CFNet and G-CNN as compared to a standard CNN for several different image classification tasks, including simulated and real microscopy images of subcellular protein localization, and demonstrated improved performance. We believe CFNet has the potential to improve many high-throughput microscopy image analysis applications. AVAILABILITY AND IMPLEMENTATION: Source code of CFNet is available at: https://github.com/bchidest/CFNet. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
Though the appeal of neural networks is their versatility for arbitrary classification tasks, there is still much benefit in designing them for particular problem settings. In particular, the effectiveness of neural networks can be greatly increased by encoding invariance to uniformative augmentations of the data (LeCun ). A key invariance inherent to many imaging contexts, including microscopy data, is rotation (Boland and Murphy, 2001). For biological imaging, since data is often scarce and difficult or expensive to acquire, improving the effectiveness and reliability of models by encoding such invariance is highly significant.Recently, convolutional neural networks (CNNs) have been applied to the highly relevant problem of cell phenotyping based on microscopy image analysis and have demonstrated much improved performance (Kraus , 2017). Formerly, crafted features that inherently exhibit such invariance, such as Zernike moments and Haralick texture features, were extracted and used for subsequent analysis (Boland and Murphy, 2001), whereas CNNs are able to learn relevant features directly. This has significant applications to spatial proteomics, which has enabled the systematic probing of changes of subcellular protein localizations, which are key to protein functions (Lundberg and Borner, 2019), as a response to various perturbations (Chong ; Kraus ). However, the encoding of rotation equivariance and invariance into CNNs to learn meaningful features for cell phenotyping is yet to be considered.Several approaches have been proposed recently for improving the performance of CNNs by encoding rotation equivariance. The most promising and popular of such methods is the group-equivariant CNN (G-CNN) (Cohen and Welling, 2016), which applies convolution over groups, such as rotation, translation and flips, thereby maintaining equivariance throughout the convolutional layers. Notably, G-CNNs have recently been applied to several biological imaging tasks, including cell boundary segmentation (Bekkers ; Weiler ), annotation of cancerous regions of tumors (Veeling ) and dermoscopy image segmentation (Li ).Here we consider the integration of rotation equivariance and invariance to analyze the localization of proteins in fluorescence images, which, to the best of our knowledge, is the first such work. Additionally, we propose a new simple and efficient rotation-equivariant convolutional scheme, called conic convolution as an effective alternative to group convolution, with advantages of computational and memory savings, interpretability of learned feature maps and improved performance. Rather than convolving each filter across the entire image, as in standard or group convolution, rotated filters are convolved only over corresponding conic regions of the input feature map that emanate from the origin, thereby intuitively transforming rotations in the input directly to rotations in the output. A comparison of conic convolution with other proposed convolution schemes is shown in Figure 1.
Fig. 1.
Comparison of convolution schemes. The domain of filter ‘F’ in the input and its corresponding outputs in the feature map are colored red. That of the rotation of ‘F’ by 225 degrees is colored blue. The local support on the domain for the convolution at a few points for each scheme is shown in gray. Conic convolution, with rotations of 45 degrees in this example, encodes rotation equivariance without introducing distortion to the support of the filter in the original domain (unlike the log-polar transform) and without requiring additional storage for feature maps (unlike group convolution). The example shown for group convolution is the first layer of a G-CNN, mapping from to the roto-translation group
Comparison of convolution schemes. The domain of filter ‘F’ in the input and its corresponding outputs in the feature map are colored red. That of the rotation of ‘F’ by 225 degrees is colored blue. The local support on the domain for the convolution at a few points for each scheme is shown in gray. Conic convolution, with rotations of 45 degrees in this example, encodes rotation equivariance without introducing distortion to the support of the filter in the original domain (unlike the log-polar transform) and without requiring additional storage for feature maps (unlike group convolution). The example shown for group convolution is the first layer of a G-CNN, mapping from to the roto-translation groupTo encode rotation invariance, we propose the integration of the magnitude response of the 2D-discrete-Fourier transform (2D-DFT) into a transition layer between convolutional and fully-connected layers. The 2D-DFT is able to integrate mutual orientation information between different filter responses, yielding more informative features for subsequent layers than most previous approaches. Though the insight of using the DFT to encode rotational invariance has been employed for texture classification using wavelets (Charalampidis and Kasparis, 2002; Do and Vetterli, 2002; Jafari-Khouzani and Soltanian-Zadeh, 2005; Ojala ) and for general image classification (Schmidt and Roth, 2012), as of yet, its application to CNNs has been relatively overlooked. As in these prior works, rotations of the input are transformed to circular shifts, to which the magnitude response of the 2D-DFT is invariant, in the transformed space.We call our new method the Conic Convolution and DFT Network (CFNet). We demonstrate the effectiveness of the two novel contributions in CFNet, namely conic convolution and integration of the DFT, based on evaluations from both synthetic and real microscopy images for localizing proteins in budding yeast cells. We show that CFNet improves classification accuracy generally over the standard raster convolution formulation and over the equivariant method of G-CNN across these settings. We also show that the 2D-DFT clearly improves performance across these diverse datasets, and that not only for the proposed conic convolution, but also for group convolution.
1.1 Related work
To encode rotation equivariance for general image classification, a variety of methods exist. One straightforward strategy is to transform the domain of the image to an alternative domain, such as the log-polar domain (Henriques and Vedaldi, 2017; Schmidt and Roth, 2012) in which rotation becomes some other transformation that is easier to manage, but this can be unstable to translations and this warping will introduce distortion, as pixels near the center of the image are sampled more densely than pixels near the perimeter. Our proposed conic convolution also encodes global rotation equivariance about the origin, but without introducing such distortion, which greatly helps mitigate its susceptibility to translation. The recently developed spatial transform layer (Jaderberg ) and deformable convolutional layer (Dai ) allow the network to learn non-regular sampling patterns and can potentially help learning rotation invariance, though invariance is not explicitly enforced, which would most likely be a challenge for tasks with small training sets.An alternative, simple means for achieving rotation equivariance and invariance was proposed in (Dieleman ), in which feature maps of standard CNNs are made equivariant or invariant to rotation by combinations of cyclic slicing, stacking, rolling and pooling. RotEqNet (Marcos ) improved upon this idea by storing, for each feature map for a corresponding filter, only the maximal response across rotations and the value of the corresponding rotation, to preserve pose information, yielding improved results and considerable storage savings. Our proposed conic convolution is most similar to these methods and further decreases storage and computation requirements. The recently developed capsule network (Sabour ) is able to auto-encode affine transformation, including rotation, by the routing-by-agreement process. However, our CFNet developed in this paper works well even without augmentation because equivariance and invariance are encoded. Another related work extended G-CNN using steerable filters (Weiler ), as proposed in H-Net (Worrall ), to provide equivariance for finer angles. This can be considered as a parallel contribution to our work, which could also use a steerable filter design. In summary, CFNet improves upon previous methods by reducing computation and storage requirements and improving interpretability and performance.
2 Materials and methods
We consider CFNet and G-CNN within the context of microscopy image analysis to classify cell features. Each network takes, as input, an image of a cell and predicts a label of interest, such as the localization of fluorescence-tagged proteins. The overall architecture of CFNet is illustrated in Figure 2. We first give a brief description of group-equivariant convolution and then describe our proposed conic convolution in CFNet, which uses similar notation from group theory. Next, we discuss the preservation of rotation equivariance through non-linear operations within a neural network as well as the efficiency of conic convolution. We then describe the integration of the 2D-DFT in CFNet as a transition layer between group or conic convolutional layers and subsequent fully-connected layers in a CNN.
Fig. 2.
The overall architecture of CFNet. (a) Filtering the image by various filters at rotations in corresponding conic regions preserves rotation-equivariance. (b) Subsequent convolutional feature maps are filtered similarly. Rotation-invariance is encoded by the transition from convolutional to fully-connected layers, which consists of (c) element-wise multiplication and sum, denoted by , with rotated weight tensors, transforming rotation to circular shift and (d) application of the magnitude response of the 2D-DFT to encode invariance to such shifts. (e) This output is reshaped and passed through the final, fully-connected layers
The overall architecture of CFNet. (a) Filtering the image by various filters at rotations in corresponding conic regions preserves rotation-equivariance. (b) Subsequent convolutional feature maps are filtered similarly. Rotation-invariance is encoded by the transition from convolutional to fully-connected layers, which consists of (c) element-wise multiplication and sum, denoted by , with rotated weight tensors, transforming rotation to circular shift and (d) application of the magnitude response of the 2D-DFT to encode invariance to such shifts. (e) This output is reshaped and passed through the final, fully-connected layers
2.1 Group-equivariant convolution
For convenience, as in Cohen and Welling (2016), we represent feature maps, of dimension K, and filters, , of a standard CNN as functions over the 2D space of integers, or pixel locations in the case of images. The expression for convolution of a filter over a feature map in a standard CNN is given by:The success of CNNs can be attributed largely to the fact that standard convolution is equivariant to translations and many image classification tasks are invariant to small local translations. However, standard convolution does not in general exhibit equivariance to other important transformations, such as rotations, unless certain constraints on the filters are met. The insight of Cohen and Welling (2016) was to generalize convolution to operate on functions on groups, thereby achieving equivariance for other types of transformations. A group is a mathematical term referring to a particular set paired with a binary operation, which together meet certain criteria. The set of indices with the operation of translation is a particular instance of a group.A more relevant group for microscopy image data is the p4 group, or roto-translation group, which consists of both translations and rotations about the origin of of , and a function on this group is indexed not just by translation, such as the x position of feature maps of normal CNNs, but also by rotation. In this way, rotation information is preserved throughout the network and equivariance can thereby be maintained.We denote this group by G, where is the transformation of rotation about the origin and a translation. The first group convolutional layer of a G-CNN operates on functions on , over which the input image is defined, and is given by:Whereas in standard convolution, the filter is translated over the image and the inner product is computed at each translation, in group convolution, the filter is transformed by each element . The output of group convolution is then a function of the group G. Subsequent layers of the network must therefore operate on such functions, and group convolution for these layers is defined as:As shown in Cohen and Welling (2016), standard operations used in neural networks, including pooling, batch normalization and activations, can be defined on the feature maps of group convolution to preserve the equivariance property, and a full G-CNN can be defined by the composition of such operations. We refer the reader to Cohen and Welling (2016) for more details.
Rather than operating on functions on groups, conic convolution is simpler in that it maintains rotation equivariance while operating still on functions on the spatial domain , as in standard convolution. We begin the formulation with a simpler, special case of conic convolution, which we call quadrant convolution. Its difference from standard convolution is that the filter being convolved is rotated by , depending upon the corresponding quadrant of the domain. We show that for quadrant convolution, rotations of of the input are straightforwardly associated with rotations of the output feature map, which is a special form of equivariance called same-equivariance [as coined by Dieleman )].Relevant to our formulation is the group of two-dimensional rotation matrices of , which we denote by G1 and which can be easily parameterized by g(r), and which acts on points in by matrix multiplication, i.e. for a given point :Let T denote the transformation of a function by a rotation in G1, where applies the inverse of g to an element of the domain of f. For an operation being the set of K-dimensional functions f on (which represent feature maps), to exhibit same-equivariance, applying rotation either before or after the operation yields the same result, i.e.Quadrant convolution can be interpreted as weighting the convolution for each rotation with a function that simply ‘selects’ the appropriate quadrant of the domain. The weighting function for the first quadrant is defined as:Since the origin does not strictly belong to a particular quadrant, it is handled by averaging the response of the filter at all four rotations. Boundary values are averaged over the responses of the neighboring regions. The appropriate weighting function for other quadrants is just a rotation of ω (i.e. ) by the appropriate angle. The output of the layer is then given by:In our notation, parenthesis convey the parameter of a function, whereas square brackets merely clarify the order of operations. Example convolutional regions with appropriate filter rotations are shown in Figure 1.Note that the equivariance property is established (see our detailed proof in the Supplementary Material) independent of the definition of ω, yet its definition will greatly influence the performance of the network. For example, if ω is simply the constant 1/4, it is equivalent to merely averaging the filter responses.
2.3 Generalization to conic convolutional layers
The above formulation can be generalized to conic convolution in which the rotation angle is decreased by an arbitrary factor of , for some positive integer R, instead of being fixed to . Rather than considering quadrants of the domain, we can consider conic regions emanating from the origin and their boundaries, defined by:The weighting function is changed to have value one only over this conic region:
of which is a special case.If we consider feature maps to be functions over the continuous domain instead of and define the group G, with parameterization:
for and , it is easy to show similarly as above that
is equivariant to G.However, due to subsampling artifacts when discretizing to , as in an image, rotation equivariance for arbitrary values of R cannot be guaranteed and can only be approximated. In particular, the filters will have to be interpolated for rotations that are not a multiple of . In our experiments when applying CFNet, we chose nearest neighbor interpolation, which preserves the energy of the filter under rotations. This defect notwithstanding, it can be shown that conic convolution maintains equivariance to rotations of , and as we found in our experiments, the approximation of finer angles of rotation can still improve performance. Additionally, we note that R need not be the same for each layer, and it may be advantageous to use a finer discretization of rotations for early layers, when the feature maps are larger, and gradually decrease R.A note must be made about subsequent nonlinear operations for a convolutional layer. It is typical in convolutional networks to perform subsampling, either by striding the convolution or by spatial pooling, to reduce the dimensionality of subsequent layers. Again, due to downsampling artifacts, rotational equivariance to rotations smaller than is not guaranteed. However, given that the indices of the plane of the feature map are in and are therefore centered about the origin, a downsampling of can be applied while maintaining rotational equivariance for rotations of , regardless of the choice of R. After subsampling, the result is passed through a non-linear activation function , such as ReLU, with an added offset .
2.4 Computational efficiency of conic convolution
In CFNet, the response for each rotation in conic convolution is only needed over its corresponding conic region. However, since GPUs are more efficient operating on rectangular inputs, it is faster to compute the convolution over each quadrant in which the conic region resides. The output of conic convolution can be achieved by convolving over the corresponding quadrant, multiplying by the weighting function, summing the responses in each quadrant together, and then concatenating the responses of quadrants. For the special case of quadrant convolution, this process incurs negligible additional computation beyond standard convolution. Additionally, conic convolution produces only one feature map per filter as in standard convolution and therefore incurs no additional storage costs, in contrast to G-CNN and cyclic slicing, which both produce one map per rotation (Cohen and Welling, 2016; Dieleman ), and two for RotEqNet, one for the filter response and one for the orientation (Marcos ).
2.5 Rotation-invariant transition using the magnitude of the 2D-DFT
After the final convolutional layer of a CNN, some number of fully-connected layers will be applied to combine information from the various filter responses. In general, fully-connected layers will not maintain rotation equivariance or invariance properties. Commonly, convolution and downsampling are applied until the spatial dimensions are eliminated and the resulting feature map of the final convolutional layer is merely a vector, with dimension equal to the number of filters.Rather than encoding invariance for each filter separately, as in most other recent works (Cohen and Welling, 2016; Weiler ), in CFNet we consider instead to transform the collective filter responses to a space in which rotation becomes circular shift so that the 2D-DFT can be applied to encode invariance. The primary advantage of the 2D-DFT as an invariant transform is that each output node is a function of every input node, and not just the nodes of a particular filter response, thereby capturing mutual information across responses.Since the formulation of this transition involves the DFT, which is defined only for finite-length signals, we switch to represent feature maps as tensors, rather than functions. We denote the feature map generated by the penultimate convolutional layer by , where .At the transition to fully-connected layers, the input f is passed through N fully-connected filters, . The operation of this layer can be interpreted as the inner product of the function and filter, . If we again consider rotations of the filter from the group G,
this is equivalent to the first layer of a G-CNN, mapping from the spatial domain to G (though this group does not include the translation group since the convolution is only applied at the origin), and rotations of the final convolutional layer f will correspond to permutations of G, which are just circular shifts in of the second dimension of the matrix .The magnitude response of the 2D-DFT is applied to to transform these circular shifts to an invariant space:This process of encoding rotation invariance corresponds to the ‘Convolutional-to-Full Transition’ in Figure 2. The result is then vectorized and passed into fully-connected layers that precede the final output layer, as in a standard CNN.In addition, the 2D-DFT, as a rotation invariant transform, can also be integrated into other rotation-equivariant networks, such as G-CNN. At the final layer of a fully-convolutional G-CNN, since the spatial dimension has been eliminated through successive convolutions and spatial downsampling, rotation is encoded along contiguous stacks of feature maps of each filter at four rotations. In this way, rotations similarly correspond to circular shifts in the final dimension. This representation is then passed through the 2D-DFT, as in Eqn. 14.
3 Results
3.1 Application to rotated MNIST
We first used the rotated MNIST dataset (Larochelle ), which has been utilized as a benchmark for previous works on rotation invariance, to place CFNet against results previously reported for G-CNN. The model was trained on 10 000 images, using training augmentation of rotations of arbitrary angles as in (Cohen and Welling, 2016) (Though the paper (Cohen and Welling, 2016) did not state the use of training augmentation, code posted by the authors at https://github.com/tscohen/gconv_experiments indicates that rotations of arbitrary angles were used.), and the best model parameters were selected based on scores on a validation set of 5000 images. Our best CFNet architecture consisted of six conic convolution layers, with R = 2 for the first three and R = 1 for the next three, followed by the DFT transition and an output softmax layer of 10 nodes. Filters were three pixels in size, with 15 filters per layer, and spatial max-pooling was applied after the second layer. This architecture was similar in terms of number of layers and filters per layer as that of the G-CNN of (Cohen and Welling, 2016). As shown in Table 1, on a held-out set of 50 000 test images, CFNet achieved a 25% reduction in test error over G-CNN. To evaluate the G-CNN with the DFT, the only changes we made from the reported architecture for G-CNN was to reduce the number of filters for each layer to 7, to offset the addition of the 2D-DFT, which was applied to the output of the final convolutional layer. Incorporating the DFT transition into G-CNN further reduces the test error by 13%. These results demonstrate in a standard setting the value of incorporating mutual rotational information between filters, through the DFT, when encoding invariance and the added value of conic convolution.
Table 1.
Test error on the rotated MNIST dataset
Algorithm
Test error (%)
Cohen and Welling (2016) (CNN)
5.03
Schmidt and Roth (2012)
3.98
Cohen and Welling (2016) (G-CNN)
2.28
G-CNN + DFT
2.00
CFNet
1.75
Test error on the rotated MNIST dataset
3.2 Application to synthetic biomarker images
To precisely evaluate the advantage of encoding rotation equivariance, we created a set of synthetic microscopy images in which we could explicitly control the manifestation of rotations and intra- and inter-class variation. We utilized Gaussian-mixture models (GMMs), which have been used previously to emulate real-world fluorescence microscopy images of biological signals (Zhao and Murphy, 2007). Examples of synthetic images from across and within classes are shown in Figure 3a and b. Specifically, we defined 50 distribution patterns and generated 50 and 100 examples per class for training and 200 examples per class for testing. Each image consists of points sampled from several Gaussians, which have mean and variance defined by their particular class. Some intensity fluctuation, exponential noise and jitter are incorporated into the generating model to add variation. The image size was 50 pixels. A batch size of 50 examples, a learning rate of and a weight decay penalty of were used during training. We used the Adam optimizer and decreased the learning rate by 0.95 every few epochs. To help all methods, we augmented the training data by rotations and random jitter of up to three pixels, as was done during image generation. A more detailed description of the approach for generating the synthetic images is provided in the Supplementary Material.
Fig. 3.
Comparison of the results of CFNet, CNet (network with conic convolution but without the DFT), G-CNN, G-CNN+DFT and a standard CNN on the synthetic biomarker images. (a, b) Example images, shown as heat maps for detail, showing inter- and intra-class variation. The results with varying numbers N of training examples per class are in (c, d)
Comparison of the results of CFNet, CNet (network with conic convolution but without the DFT), G-CNN, G-CNN+DFT and a standard CNN on the synthetic biomarker images. (a, b) Example images, shown as heat maps for detail, showing inter- and intra-class variation. The results with varying numbers N of training examples per class are in (c, d)Classification accuracies on the test dataset over training steps for various numbers of training samples, denoted by N, for several methods are shown in Figure 3c and d. A variety of configurations were trained for each network, and each configuration was trained three times. The darkest line shows the accuracy of the configuration that achieved the highest moving average, with a window size of 100 steps, for each method. The spread of each method, which is the area between the point-wise maximum and minimum of the error, is shaded with a light color, and three standard-deviations around the mean is shaded darker.We observed a consistent trend of CFNet outperforming G-CNN, which in turn outperforms the CNN, both in overall accuracy and in terms of the number of steps required to attain that accuracy Figure 3c and d. Additionally, the spread of CFNet is mostly above even the best performing models of G-CNN and the CNN, demonstrating that an instance of CFNet will outperform other methods even if the best set of hyperparameters has not been chosen. We also included a network consisting of conic convolutional layers, but without the DFT, noted as ‘CNet’ (Fig. 3), to show the relative advantage of the DFT. CNet performs comparably to the standard CNN while requiring significantly less parameters to attain the same performance, though the true advantage of conic convolution is shown when integrated with the DFT to achieve global rotation invariance. In comparison, including the 2D-DFT increases the performance of G-CNN, to a comparable level with CFNet, though it does not train as quickly.
3.3 Application to subcellular protein localization images in budding yeast cells
To further demonstrate the advantage of rotation equivariant architectures and CFNet, we evaluated the models on real microscopy images of budding yeast cells generated from Kraus , which were collected as follow-up analysis of the data from Chong and are more challenging, since they include more subclasses. In this dataset, cells were first modified by homologous recombination and SGA protocol to express fluorescent markers and GFP fusion query proteins. The cells were then transferred into 384-well plates and ten images (1338 × 1003 pixels) were taken per plate per channel. As shown in Figure 4, each image consists of a single or few cells and three stains, where blue shows the cytoplasmic region, pink the nuclear region and green the protein of interest. The classification for each image is the subcellular compartment in which the protein is localized and expressed, such as the cell periphery, mitochondria, or eisosomes, some of which exhibit very subtle differences. Our goal therefore is to predict the protein localization for a given image.
Fig. 4.
Evaluation results based on subcellular protein localization images from Chong . (a) Example images. (b–c) Comparison results of CFNet, G-CNN and a standard CNN with varying numbers N of training examples per class. (d) Confusion matrix for the results for CFNet (X-axis) as compared to the true labels (Y-axis)
Evaluation results based on subcellular protein localization images from Chong . (a) Example images. (b–c) Comparison results of CFNet, G-CNN and a standard CNN with varying numbers N of training examples per class. (d) Confusion matrix for the results for CFNet (X-axis) as compared to the true labels (Y-axis)We compared the performance of CFNet with G-CNN and a standard CNN. Figure 4b and c shows the results of each method for classifying the protein localization for each image. To compare with DeepLoc (Kraus ), we used the same reported architecture and hyperparameters for the CNN. For CFNet and G-CNN, we removed the last convolutional layer and reduced the number of filters per layer by roughly half to offset for encoding of equivariance and invariance. The same training parameters and data augmentation were used as for the synthetic data, except that a dropout probability of 0.8 was applied at the final layer and the maximum jitter was increased to five pixels, since many examples were not well-centered. For each method, several iterations were run, and the spread and the best performing model is shown. We found that CFNet consistently outperforms G-CNN and the standard CNN representing DeepLoc, when the number of training examples per class is either 50 or 100 (see Fig. 4b and c), demonstrating that the gains of the 2D-DFT and conic convolution translate to real-world microscopy data. We note that the best reported algorithm that did not use deep learning, called ensLOC (Chong ; Koh ), was only able to achieve an average precision of 0.49 for a less challenging set of yeast phenotypes and with ∼20 000 samples, whereas all runs of CFNet achieved an average precision of between 0.60 and 0.67 with ∼10% of the data used for training.We further analyzed the variation of performance for different protein localization labels (Fig. 4d). CFNet outperforms CNN on almost all classes. For instance, CFNet improves the accuracies on ‘nuclear periphery’, ‘nucleolus’, ‘nucleus’ and ‘punctate nuclear’ by 10, 14, 7 and 14%, respectively. Nucleolus and punctate nuclear are both structures inside the nucleus and their only difference is that punctate nuclear is generally smaller and rounder, which is rather subtle and CNN misassigns 13% of proteins that are in punctate nuclear with label ‘nucleolus’. In contrast, CFNet decreases this misassignment to less than 5%. However, we also observed a few classes in particular for which CFNet could be further improved. For example, we found that CFNet tends to confuse the class ‘bud’ and ‘budding periphery’, likely because many proteins are present in both locations. Nevertheless, the application of CFNet to the subcellular protein localization data demonstrates the effectiveness of the method.One of the most significant advantages of CFNet, especially for biological knowledge discovery, is its interpretability. Figure 5 shows the activations of two particular filters from both CFNet and CNN at their third layer for example images of ‘nucleus’ and ‘nuclear periphery’ localizations, two classes that are challenging to differentiate. Since rotations of the input correspond directly to rotations of the output of conic convolution, as seen, the activations of the learned features do not change, except for rotating, thereby eliminating rotation as a confounding source of variation. It is important to note that even for rotations of 45 degrees, which conic convolution with R = 2 approximates, the activations are noticeably similar. Conversely, the activations for the standard CNN significantly change based upon the orientation of the image. This is especially apparent for the activation of filter 1 for the nucleus sample, which has a high response at the nucleus that splits in half under 90 degree rotation. We also observe that the activation of the CNN’s filter 2 for the nuclear periphery sample only outlines the upper right boundary of the nucleus, since it is applied only at a specific orientation, whereas filter 1 of CFNet outlines the entire nucleus. The property of equivariance of conic convolution drastically enhances the ability to distinguish biological meaning of the learned representation from uninformative rotation.
Fig. 5.
Visualization of learned features of CFNet and CNN. Example images with protein localized (a) in the nucleus and (b) at the nuclear periphery. (c) Activations of two particular filters from the third layer in CFNet (top row) and CNN (bottom row) for each input rotated by 0, 45 and 90 degrees
Visualization of learned features of CFNet and CNN. Example images with protein localized (a) in the nucleus and (b) at the nuclear periphery. (c) Activations of two particular filters from the third layer in CFNet (top row) and CNN (bottom row) for each input rotated by 0, 45 and 90 degrees
4 Discussion
In this work, we explored the application of rotation equivariant and invariant neural networks to analyze cellular images. We have demonstrated the effectiveness of enforcing rotation equivariance and invariance in CNNs by means of the proposed conic convolutional layer and the 2D-DFT, even for group convolution. In addition, by applying our methods to a dataset of subcellular protein localizations, we showed that rotation equivariant models outperform the standard CNN and, in particular, CFNet with both conic convolutional layer and the 2D-DFT performs the best in our evaluations.There are a few directions that we can further improve our models. For example, CFNet could be potentially further improved by incorporating steerable filters (Freeman and Adelson, 1991; Liu ) for convolution, as was done in (Weiler ), to enhance group-equivariant convolution and in Worrall , which allow for finer sampling of rotations of filters without inducing artifacts. Further evaluations would be needed to thoroughly assess these new approaches. Additionally, in the future, we intend to apply CFNet to full micrograph screens in a multiple-instance learning setting, as was done for CNNs in (Kraus ), since this is the setting with potentially more microscopy data and applications.We believe that the proposed enhancements to the standard CNN will have much utility for future applications in many problem settings, in particular, high-throughput molecular and cellular imaging data, where training data is usually sparse, especially for rare cellular events. One of the most exciting frontiers in current biomedical research is to understand different cellular identities at single cell resolution, their functions and their compositions in different contexts, including various human tissues. With the datasets from large-scale projects such as the ongoing Human Cell Atlas (Rozenblatt-Rosen ) and the Human BioMolecular Atlas Program (HuBMAP) becoming available, our methods have the potential to complement existing approaches to more effectively analyze high-throughput cellular images.
Funding
This work was supported in part by the National Science Foundation grant 1717205 (J.M.).Conflict of Interest: none declared.Click here for additional data file.
Authors: Yolanda T Chong; Judice L Y Koh; Helena Friesen; Supipi Kaluarachchi Duffy; Kaluarachchi Duffy; Michael J Cox; Alan Moses; Jason Moffat; Charles Boone; Brenda J Andrews Journal: Cell Date: 2015-06-04 Impact factor: 41.582
Authors: Judice L Y Koh; Yolanda T Chong; Helena Friesen; Alan Moses; Charles Boone; Brenda J Andrews; Jason Moffat Journal: G3 (Bethesda) Date: 2015-04-15 Impact factor: 3.154
Authors: Oren Z Kraus; Ben T Grys; Jimmy Ba; Yolanda Chong; Brendan J Frey; Charles Boone; Brenda J Andrews Journal: Mol Syst Biol Date: 2017-04-18 Impact factor: 11.429