| Literature DB >> 34855126 |
Daniel Franco-Barranco1,2, Arrate Muñoz-Barrutia3,4, Ignacio Arganda-Carreras5,6,7.
Abstract
Electron microscopy (EM) allows the identification of intracellular organelles such as mitochondria, providing insights for clinical and scientific studies. In recent years, a number of novel deep learning architectures have been published reporting superior performance, or even human-level accuracy, compared to previous approaches on public mitochondria segmentation datasets. Unfortunately, many of these publications make neither the code nor the full training details public, leading to reproducibility issues and dubious model comparisons. Thus, following a recent code of best practices in the field, we present an extensive study of the state-of-the-art architectures and compare them to different variations of U-Net-like models for this task. To unveil the impact of architectural novelties, a common set of pre- and post-processing operations has been implemented and tested with each approach. Moreover, an exhaustive sweep of hyperparameters has been performed, running each configuration multiple times to measure their stability. Using this methodology, we found very stable architectures and training configurations that consistently obtain state-of-the-art results in the well-known EPFL Hippocampus mitochondria segmentation dataset and outperform all previous works on two other available datasets: Lucchi++ and Kasthuri++. The code and its documentation are publicly available at https://github.com/danifranco/EM_Image_Segmentation .Entities:
Keywords: Bioimage analysis; Deep learning; Electron microscopy; Mitochondria; Semantic segmentation
Mesh:
Year: 2021 PMID: 34855126 PMCID: PMC9546980 DOI: 10.1007/s12021-021-09556-1
Source DB: PubMed Journal: Neuroinformatics ISSN: 1539-2791
Fig. 1Graphical representation of the proposed network architectures. Depending on the model of choice, the processing blocks can be either simply convolutional or residual blocks, while the feature merge operations may imply a single concatenation or an additional attention gate
Fig. 2Types of processing blocks. Convolutional blocks (a) are used in the U-Net and Attention U-Net architectures, and residual blocks (b) are used in the Residual U-Net
Fig. 3Proposed 2D Attention U-Net architecture. Example with three downsampling levels and a detailed description of the attention gates used in the skip connections
Fig. 4Border effect in output image reconstruction. From left to right: output image reconstructed from patches with visible jagged predictions; and output image reconstructed using both the blending and ensemble techniques. Blue and red boxes show zoomed areas on both images
Fig. 5Sample images from public mitochondria datasets. From left to right: Lucchi and Kasthuri++ data sample with their corresponding binary mask. Blue and red boxes show zoomed areas on both images
Results obtained in the Lucchi++ and Kasthuri++ datasets. All our model scores correspond to optimal architectures found in Lucchi
| 2D U-Net | Casser et al. ( | 0.888 | - | 0.940 | - | |
| 2D U-Net+Z Filtering | Casser et al. ( | 0.900 | - | 0.946 | - | |
| 2D Residual U-Net (*) | Ours | 0.908 | 0.904±0.004 | 0.943 | 0.948±0.002 | |
| 2D U-Net (*) | Ours | 0.916 | 0.911±0.006 | 0.955 | 0.952±0.003 | |
| 2D Attention U-Net (*) | Ours | 0.919 | 0.914±0.003 | 0.956 | 0.954±0.001 | |
| 3D U-Net (a) | Ours | 0.923 | 0.915±0.007 | 0.958 | 0.954±0.004 | |
| 3D Attention U-Net (a) | Ours | 0.923 | 0.912±0.008 | 0.959 | 0.953±0.004 | |
| 3D Residual U-Net (a) | Ours | 0.919±0.005 | 0.957±0.003 | |||
| 2D U-Net | Casser et al. ( | 0.845 | - | 0.920 | - | |
| 2D U-Net+Z Filtering | Casser et al. ( | 0.846 | - | 0.920 | - | |
| 2D Residual U-Net (a) | Ours | 0.908 | 0.906±0.001 | 0.953 | 0.950±0.001 | |
| 2D Attention U-Net (a) | Ours | 0.915 | 0.913±0.001 | 0.956 | 0.954±0.001 | |
| 2D U-Net (a) | Ours | 0.916 | 0.913±0.002 | 0.955 | 0.954±0.001 | |
| 3D U-Net (a) | Ours | 0.934 | 0.932±0.001 | 0.965 | 0.965±0.001 | |
| 3D Residual U-Net (a) | Ours | 0.934 | 0.933±0.001 | 0.966 | 0.966±0.000 | |
| 3D Attention U-Net (a) | Ours | 0.934±0.001 | 0.966±0.001 | |||
(*) 0% overlap output reconstruction, blended ensemble and z-filtering post-processing
(a) 50% overlap output reconstruction and ensemble post-processing
Foreground IoU (mean±standard deviation) of reproduced state-of-the-art works in Lucchi dataset. Original refers to the exact configurations as reported by the authors, while Modified corresponds to the best configuration found by us. The different output reconstruction and post-processing methods adopted are indicated. More details available in Table S3.1
| Cheng et. al (2D) | 0.6M | 0.865 | |||||||||
| 0.59M | 0.503±0.233 | 0.517±0.240 | 0.517±0.239 | 0.521±0.243 | 0.541±0.250 | 0.548±0.254 | 0.526±0.244 | 0.537±0.244 | 0.543±0.252 | ||
| 0.59M | 0.848±0.012 | 0.851±0.011 | 0.863±0.010 | 0.868±0.010 | 0.865±0.008 | 0.871±0.008 | 0.853±0.011 | 0.865±0.009 | 0.871±0.008 | ||
| Maximum | - | 0.864 | 0.865 | 0.877 | 0.881 | 0.878 | 0.883 | 0.865 | 0.878 | 0.881 | |
| Casser et. al | 1.96M | 0.890 | |||||||||
| 1.96M | 0.824±0.014 | 0.815±0.016 | 0.825±0.013 | 0.831±0.013 | 0.831±0.011 | 0.838±0.011 | 0.820±0.016 | 0.833±0.011 | 0.839±0.012 | ||
| 1.96M | 0.844±0.014 | 0.837±0.008 | 0.846±0.016 | 0.850±0.017 | 0.850±0.016 | 0.855±0.017 | 0.842±0.006 | 0.853±0.015 | 0.858±0.015 | ||
| Maximum | - | 0.846 | 0.846 | 0.861 | 0.865 | 0.862 | 0.867 | 0.848 | 0.865 | 0.870 | |
| Oztel et. al | 0.14M | 0.907 | |||||||||
| 0.14M | - | - | - | - | - | - | 0.425±0.080 | 0.457±0.060 | 0.466±0.061 | ||
| 0.07M | - | - | - | - | - | 0.451±0.042 | 0.476±0.049 | 0.487±0.053 | |||
| Maximum | - | - | - | - | - | - | - | 0.500 | 0.531 | 0.544 | |
| Cheng et. al (3D) | 0.63M | 0.889 | |||||||||
| 0.79M | 0.053±0.000 | 0.053±0.000 | 0.053±0.000 | 0.053±0.000 | - | - | - | - | - | ||
| 0.79M | 0.623±0.039 | 0.714±0.040 | 0.053±0.034 | 0.053±0.034 | - | - | - | - | - | ||
| Maximum | - | 0.694 | 0.787 | 0.799 | 0.800 | - | - | - | - | - | |
| Xiao et. al | 1.1M | 0.900 | |||||||||
| 1.08M | 0.874±0.003 | 0.863±0.004 | 0.866±0.004 | 0.867±0.004 | - | - | - | - | - | ||
| 1.08M | 0.882±0.002 | 0.872±0.003 | 0.874±0.003 | 0.874±0.003 | - | - | - | - | - | ||
| Maximum | - | 0.885 | 0.880 | 0.880 | 0.880 | - | - | - | - | - | |
Foreground IoU results by the original and modified configurations of (Oztel et al., 2017) using their consecutive post-processing methods, i.e., Spurious Detection is applied over Full Images, then they are passed through Watershed, and finally through Z-filtering
| 0.425±0.080 | 0.426±0.091 | 0.540±0.100 | 0.573±0.106 | |
| 0.451±0.042 | 0.449±0.067 | 0.562±0.057 | 0.599±0.067 | |
| Maximum | 0.500 | 0.539 | 0.619 | 0.683 |
Performance of proposed and state-of-the-art networks for semantic segmentation in the Lucchi dataset (foreground IoU, mean±standard deviation). Scores are shown using the different post-processing and output reconstruction methods adopted. 3D patches required a minimum overlap so they are marked with *. Best results of each column and type of network (2D or 3D) are shown in bold. More details in Table S3.2
| Network | Param. Number | Per Patch | *Test-time aug. | *Test-time aug. *Z-Filtering | *Blending *Test-time aug. | *Blending *Test-time aug. *Z-Filtering | *Test-time aug. | *Test-time aug. *Z-Filtering | ||
|---|---|---|---|---|---|---|---|---|---|---|
| FCN 32 (Dai et al., | 50.38M | 0.040±0.000 | 0.677±0.005 | 0.679±0.006 | 0.680±0.006 | 0.659±0.004 | 0.661±0.004 | 0.657±0.003 | 0.659±0.003 | 0.660±0.003 |
| MultiResUNet (Ibtehaz & Rahman, | 7.26M | 0.815±0.000 | 0.814±0.014 | 0.820±0.010 | 0.824±0.010 | 0.834±0.010 | 0.840±0.009 | 0.828±0.016 | 0.833±0.010 | 0.839±0.010 |
| Tiramisu (Jégou et al., | 9.4M | 0.810±0.028 | 0.833±0.027 | 0.851±0.018 | 0.857±0.017 | 0.850±0.016 | 0.855±0.016 | 0.830±0.029 | 0.846±0.019 | 0.851±0.018 |
| MNet (Fu et al., | 8.54M | 0.851±0.011 | 0.865±0.008 | 0.870±0.007 | 0.874±0.007 | 0.874±0.006 | 0.878±0.006 | 0.867±0.008 | 0.872±0.006 | 0.876±0.008 |
| U-Net++ (Zhou et al., | 37.7M | 0.734±0.012 | 0.872±0.005 | 0.877±0.004 | 0.881±0.004 | 0.884±0.003 | ||||
| 2D SE U-Net (ours) | 1.95M | 0.863±0.002 | 0.873±0.003 | 0.878±0.003 | 0.882±0.003 | 0.880±0.003 | 0.883±0.003 | 0.875±0.002 | 0.881±0.002 | 0.881±0.002 |
| 2D Residual U-Net (ours) | 2.03M | 0.867±0.005 | 0.873±0.005 | 0.877±0.004 | 0.880±0.004 | 0.878±0.003 | 0.882±0.003 | 0.875±0.004 | 0.877±0.003 | 0.880±0.004 |
| FCN 8 (Dai et al. | 50.38M | 0.860±0.005 | 0.880±0.003 | 0.884±0.002 | 0.888±0.002 | 0.891±0.002 | 0.881±0.003 | 0.886±0.002 | ||
| nnU-Net (Isensee et al., | 52.1M | 0.867±0.004 | 0.876±0.004 | 0.881±0.003 | 0.884±0.003 | 0.882±0.003 | 0.886±0.003 | 0.861±0.007 | 0.864±0.009 | 0.868±0.008 |
| 2D U-Net (ours) | 0.874±0.003 | 0.881±0.002 | 0.884±0.002 | 0.888±0.002 | 0.884±0.000 | 0.889±0.002 | 0.882±0.003 | 0.884±0.002 | 0.887±0.003 | |
| 2D Attention U-Net (ours) | 1.99M | 0.886±0.001 | 0.890±0.002 | |||||||
| 3D Vanilla U-Net (Çiçek et al., | 19.07M | 0.402±0.005(*) | 0.851±0.004 | 0.857±0.006 | 0.857±0.006 | - | - | - | - | - |
| 3D SE U-Net (ours) | 0.79M | 0.387±0.007(*) | 0.867±0.009 | 0.873±0.007 | 0.874±0.007 | - | - | - | - | - |
| 3D Attention U-Net (ours) | 0.79M | 0.389±0.005(*) | 0.870±0.003 | 0.876±0.003 | 0.876±0.003 | - | - | - | - | - |
| 3D U-Net (ours) | 0.394±0.005(*) | 0.871±0.006 | 0.878±0.004 | 0.878±0.004 | - | - | - | - | - | |
| 3D Residual U-Net (ours) | 1.50M | - | - | - | - | - | ||||
Reported vs. reproduced scores in the Lucchi dataset. The Reported values correspond to the scores claimed by authors of each publication or the maximum score obtained by us. The Reproduced values refer to the maximum, mean and standard deviation obtained while reproducing each corresponding method. Best scores of each column are presented in bold
| FCN 32 | Ours using (Dai et al., | 0.688 | 0.688 (0.680±0.006) | 0.835 | 0.835 (0.831±0.003) | |
| MultiResUNet | Ours using (Ibtehaz & Rahman, | 0.847 | 0.847 (0.824±0.010) | 0.919 | 0.919 (0.902±0.007) | |
| 2D CNN | (Cheng & Varshney, | 0.865 | 0.883 (0.871±0.008) | - | 0.938 (0.932±0.004) | |
| 3D Vanilla U-Net | Ours using (Çiçek et al., | 0.866 | 0.866 (0.857±0.006) | 0.929 | 0.929 (0.924±0.003) | |
| Tiramisu | Ours using (Jégou et al., | 0.872 | 0.872 (0.857±0.017) | 0.932 | 0.932 (0.924±0.009) | |
| 2D U-Net | (Casser et al., | 0.878 | 0.865 (0.853±0.015) | 0.935 | 0.930 (0.922±0.007) | |
| 3D SE U-Net | Ours | 0.879 | 0.879 (0.874±0.007) | 0.936 | 0.936 (0.933±0.004) | |
| 3D Attention U-Net | Ours | 0.880 | 0.880 (0.876±0.003) | 0.936 | 0.936 (0.934±0.002) | |
| nnU-Net framework | (Isensee et al., | 0.882 | - | 0.938 | - | |
| MNet | Ours using (Fu et al., | 0.883 | 0.883 (0.874±0.007) | 0.938 | 0.938 (0.929±0.004) | |
| 2D Residual U-Net | Ours | 0.885 | 0.885 (0.880±0.004) | 0.939 | 0.939 (0.937±0.002) | |
| 3D U-Net | Ours | 0.885 | 0.885 (0.878±0.004) | 0.939 | 0.939 (0.935±0.002) | |
| nnU-Net | Ours using (Isensee et al., | 0.888 | 0.888 (0.881±0.005) | 0.941 | 0.941 (0.937±0.003) | |
| 3D Residual U-Net | Ours | 0.888 | 0.888 (0.883±0.002) | 0.941 | 0.941 (0.938±0.001) | |
| 2D SE U-Net | Ours | 0.888 | 0.888 (0.882±0.003) | 0.941 | 0.941 (0.937±0.002) | |
| U-Net++ | Ours using (Zhou et al., | 0.888 | 0.888 (0.884±0.003) | 0.941 | 0.941 (0.938±0.001) | |
| 3D CNN | (Cheng & Varshney, | 0.889 | 0.800 (0.738±0.034) | - | 0.894 (0.860±0.018) | |
| 2D U-Net+Z-filtering | (Casser et al., | 0.890 | 0.870 (0.858±0.015) | 0.942 | 0.931 (0.925±0.007) | |
| FCN 8 | Ours using (Dai et al., | 0.893 | ||||
| 2D U-Net | Ours | 0.893 | 0.942 | 0.942 (0.941±0.001) | ||
| 2D Attention U-Net | Ours | 0.893 | ||||
| 3D U-Net | (Xiao et al., | 0.900 | 0.881 (0.875±0.003) | - | 0.937 (0.934±0.002) | |
| CNN+3 Post-proc. | (Oztel et al., | 0.683 (0.599±0.067) | - | 0.800 (0.757±0.106) | ||
Ablation study of our full 2D model. From the top to the bottom, on each row, incremental modifications are applied based on the previous configuration, except batch normalization, which was discarded as it decreases the performance
| Baseline - 2D U-Net | 0.725±0.020 | 0.748±0.027 | 0.739±0.002 |
| + DA | 0.859±0.007 | 0.872±0.003 | 0.871±0.004 |
| (+ Batch norm.) | 0.856±0.005 | 0.864±0.004 | 0.869±0.002 |
| + Dropout | 0.870±0.003 | 0.880±0.002 | 0.881±0.002 |
| + ELU activation | 0.873±0.003 | 0.880±0.001 | 0.881±0.002 |
| + He initializer | 0.873±0.003 | 0.880±0.002 | 0.881±0.003 |
| + Attention Gates | |||