| Literature DB >> 35065667 |
Chaoxin Wang1, Doina Caragea2, Nisarga Kodadinne Narayana3, Nathan T Hein4, Raju Bheemanahalli5, Impa M Somayanda4, S V Krishna Jagadish4.
Abstract
BACKGROUND: Rice is a major staple food crop for more than half the world's population. As the global population is expected to reach 9.7 billion by 2050, increasing the production of high-quality rice is needed to meet the anticipated increased demand. However, global environmental changes, especially increasing temperatures, can affect grain yield and quality. Heat stress is one of the major causes of an increased proportion of chalkiness in rice, which compromises quality and reduces the market value. Researchers have identified 140 quantitative trait loci linked to chalkiness mapped across 12 chromosomes of the rice genome. However, the available genetic information acquired by employing advances in genetics has not been adequately exploited due to a lack of a reliable, rapid and high-throughput phenotyping tool to capture chalkiness. To derive extensive benefit from the genetic progress achieved, tools that facilitate high-throughput phenotyping of rice chalkiness are needed.Entities:
Keywords: Convolutional neural networks; Gradient-weighted class activation mapping; High night temperature; Image segmentation; Rice grain chalkiness detection
Year: 2022 PMID: 35065667 PMCID: PMC8783510 DOI: 10.1186/s13007-022-00839-5
Source DB: PubMed Journal: Plant Methods ISSN: 1746-4811 Impact factor: 4.993
Fig. 1Model Architecture. a A backbone CNN (e.g., ResNet-101) is trained to classify (resized) input grain images as chalky or non-chalky. ResNet-101 has four main groups of convolution layers, shown as Layer1, Layer2, Layer3, and Layer4, consisting of 3, 4, 23 and 3 bottleneck blocks, respectively. b Each bottleneck block starts and ends with a convolution layer, and has a layer in the middle. The number of filters in each layer is shown after the kernel dimension. c Grad-CAM uses the gradients of the chalky category to compute a weight for each feature map in a convolution layer. The weighted average of the features maps, transformed using the ReLU activation, is used as the heatmap for the current image at inference time
Fig. 2Image preprocessing. Steps used to crop individual rice seeds from the original scanned images, each with approximately 25–30 seeds. Five steps (i. to v.) are depicted below each image that illustrate the action achieved in each respective step
Fig. 3Manual annotations. a Image-level annotation: each seed is labeled as chalky or non-chalky (technically, the label was created by dragging each rice seed image into chalky or non-chalky folder, respectively). b Specific chalkiness annotation: chalkiness area is marked with polygons using VGG Image Annotator (each red dot in the image represents a click). The dark white opaque region in panel “a” is the chalk portion while the non-chalky region is translucent
Statistics on manual image annotation, specifically, the number of images labeled as chalky and non-chalky, and also the number of chalky images annotated in terms of chalky area, for polished images, and unpolished images from 12 chalky combinations, respectively
| Set of seeds | Chalky | Non-chalky | Total | Chalky area |
|---|---|---|---|---|
| Polished | 660 | 995 | 1645 | 660 |
| Unpolished (12) | 3934 | 1299 | 5233 | 240 |
Distribution over Training/Development/Test subsets
| Set of seeds | Training | Development | Test | Total | |||
|---|---|---|---|---|---|---|---|
| Chalky | Non-chalky | Chalky | Non-chalky | Chalky | Non-chalky | ||
| Polished | 326 | 497 | 168 | 243 | 166 | 245 | 1645 |
| Unpolished (12) | 1856 | 830 | 483 | 229 | 1595 | 240 | 5233 |
Fig. 4Calculating the IoU between binarized ground truth and prediction: a chalky seed; b corresponding ground truth chalkiness area; c binarized ground truth area; d predicted chalkiness area; e corresponding predicted binarized area; f intersection between the binarized ground truth (c) and prediction (e): the number of white pixels in the intersection is 5167; g union between the binarized ground truth (c) and prediction (e): the number of white pixels in the union is 6370; h Calculation of IoU
Classification results on polished rice with various networks as backbone in the weakly supervised Grad-CAM approach
| Model | Acc.(%) | Chalky | Non-chalky | ||||
|---|---|---|---|---|---|---|---|
| Pre.(%) | Rec.(%) | F1(%) | Pre.(%) | Rec.(%) | F1(%) | ||
| 94.58 | 96.31 | ||||||
| DenseNet-161 | 95.12 | 92.44 | 94.08 | 94.67 | 95.85 | ||
| DenseNet-169 | 94.63 | 92.86 | 93.98 | 93.41 | 95.87 | 95.08 | 95.47 |
| ResNet-18 | 94.63 | 94.44 | 92.17 | 93.29 | 94.76 | 96.31 | 95.53 |
| ResNet-34 | 94.15 | 93.29 | 92.17 | 92.73 | 94.72 | 95.49 | 95.10 |
| ResNet-50 | 94.88 | 92.17 | 93.58 | 94.78 | 95.74 | ||
| 93.45 | 95.49 | ||||||
| ResNet-152 | 94.88 | 93.94 | 93.37 | 93.66 | 95.51 | 95.90 | 95.71 |
| 94.58 | 96.28 | ||||||
| SqueezeNet-1.1 | 94.39 | 91.33 | 93.22 | 93.85 | 95.22 | ||
| VGG-11 | 94.88 | 93.37 | 93.66 | 95.51 | 95.71 | ||
| VGG-13 | 94.39 | 92.31 | 93.98 | 93.13 | 95.85 | 94.67 | 95.26 |
| 92.94 | 95.18 | 96.67 | 95.08 | ||||
| VGG-19 | 94.15 | 90.34 | 92.98 | 93.03 | 94.98 | ||
| EfficientNetB0 | 95.13 | 93.98 | 93.98 | 93.98 | 95.92 | 95.92 | 95.92 |
| EfficientNetB1 | 95.13 | 94.51 | 93.37 | 93.94 | 95.55 | 96.33 | 95.93 |
| EfficientNetB2 | 93.67 | 90.23 | 92.35 | 93.06 | 94.61 | ||
| EfficientNetB3 | 95.13 | 95.06 | 92.77 | 93.90 | 95.18 | 96.73 | 95.95 |
| 91.57 | 94.49 | ||||||
| EfficientNetB5 | 93.67 | 91.67 | 92.77 | 92.22 | 95.06 | 94.29 | 94.67 |
| EfficientNetB6 | 94.16 | 92.77 | 92.77 | 92.77 | 95.10 | 95.10 | 95.10 |
The number following a network’s name denotes the number of layers in the network (as in DenseNet-121 or ResNet-101) or the version of the network (as in SqueezeNet-1.0 or EfficientNetB0). Performance is reported in terms of Accuracy (Acc.), Precision (Pre.), Recall (Rec.) and F1 measure (F1). Precision, Recall and F1 measure values are reported separately for the Chalky and Non-Chalky classes. All models are trained/tuned/evaluated on the same training/development/test splits. The results reported are obtained on the test set. The best performance for each type of model for each metric is highlighted using bold font
Classification networks: training time and model size
| Model | Training time (s) | Number of parameters | Size (MB) | Acc. (%) |
|---|---|---|---|---|
| 1522.88 | 6955906 | 28.4 | ||
| DenseNet-161 | 2157.04 | 26,476,418 | 107.1 | 95.12 |
| DenseNet-169 | 1306.20 | 12,487,810 | 50.9 | 94.63 |
| ResNet-18 | 546.77 | 11,177,536 | 44.8 | 94.63 |
| ResNet-34 | 719.41 | 21,285,696 | 85.3 | 94.15 |
| ResNet-50 | 1011.85 | 23,512,128 | 94.4 | 94.88 |
| 1668.41 | 42,504,256 | 170.6 | ||
| ResNet-152 | 2172.97 | 58,147,904 | 233.4 | 94.88 |
| 533.15 | 736,450 | 3.0 | ||
| SqueezeNet-1.1 | 481.53 | 723,522 | 2.9 | 94.39 |
| VGG-11 | 2382.44 | 128,774,530 | 515.1 | 94.88 |
| VGG-13 | 2641.00 | 128,959,042 | 515.9 | 94.39 |
| 2745.00 | 134,268,738 | 537.1 | ||
| VGG-19 | 3079.89 | 139,578,434 | 558.4 | 94.15 |
| EfficientNetB0 | 1198.53 | 4,052,126 | 33.0 | 95.13 |
| EfficientNetB1 | 2243.48 | 6,577,794 | 53.4 | 95.13 |
| EfficientNetB2 | 1882.26 | 7,771,380 | 62.9 | 93.67 |
| EfficientNetB3 | 2696.21 | 10,786,602 | 87.1 | 95.13 |
| 3476.74 | 17,677,402 | 142.3 | ||
| EfficientNetB5 | 3584.68 | 28,517,618 | 229.1 | 93.67 |
| EfficientNetB6 | 4946.95 | 40,964,746 | 328.3 | 94.16 |
| Mask R-CNN | 14863.00 | 42,504,256 | 255.9 | N/A |
The number following a network’s name denotes the number of layers in the network (as in DenseNet-121 or ResNet-101) or the version of the network (as in SqueezeNet-1.0 or EfficientNetB0). All models are trained on AWS p3.2xlarge instances. The training time it took to train each model for 200 epochs is reported in seconds (s). Model complexity is reported as the number of trainable parameters of the model, as well as the size of the model in MB. The accuracy of each model is also shown, and the best accuracy (Acc.) obtained for each type of model is highlighted in bold font
Variation of the Average IoU (%) with the layer and threshold used for ResNet-101
| Layer | T = 20% | T = 30% | T = 40% | T = 50% | T = 60% | T = 70% | T = 80% |
|---|---|---|---|---|---|---|---|
| layer1_2_conv2 | 0.20 | 9.90 | 18.41 | 26.08 | 37.53 | 18.55 | 18.55 |
| 3.81 | 19.86 | 31.53 | 44.90 | 18.55 | 18.55 | ||
| layer3_1_conv2 | 1.77 | 9.59 | 18.92 | 28.22 | 41.59 | 18.55 | 18.55 |
| layer4_1_conv3 | 0.15 | 10.26 | 15.43 | 21.10 | 29.68 | 18.55 | 18.55 |
The layer is used to generate the heatmaps and the threshold T is used to binarize the heatmaps (e.g., means that only pixels with values at least of the max pixel value in the image are included in the binary mask). The layers were sampled to include a low-level layer (layer1_2_conv2), a high-level layer (layer4_1_conv3) and two intermediate layers (layer2_0_conv2 and layer3_1_conv2) that showed good results based on a qualitative inspection of the maps. The threshold T is varied from to in increments of 10. The best result and the corresponding layer and threshold are highlighted in bold font
Fig. 5Examples of Grad-CAM (ResNet-101) heatmaps for 10 sample chalky seed images (5 on the left side and 5 on the right side). For each seed, heatmaps corresponding to the following four layers are shown: (1) ResNet101 layer1_2_conv2; (2) ResNet101 layer2_0_conv2; (3) ResNet101 layer3_1_conv2; (4) ResNet101 layer4_1_conv3. The IoU values obtained for three thresholds T (20%, 40% and 60%, respectively) are shown under each heatmap
Fig. 6Examples of Grad-CAM heatmaps and corresponding binarized chalkiness masks. (a) Five sample chalky seed images; (b1) SqueezeNet-1.0 Heatmaps; (b2) SqueezeNet-1.0 Masks; (c1) DenseNet-121 Heatmaps; (c2) DenseNet-121 Masks; (d1) ResNet-101 Heatmaps; (d2) ResNet-101 Masks; (e1) VGG-19 Heatmaps; (e2) VGG-19 Masks; (f1) EfficientNetB4 Heatmaps; (f2) EfficientNetB4 Masks; (g1) Mask R-CNN Original Masks ; (g2) Mask R-CNN Binary Masks
Chalkiness Segmentation: results of the weakly supervised Grad-CAM approach with the best performing classification models as backbone
| Model | GT-known Loc. Acc. (%) | Loc. Acc. (%) | Avg. IoU (%) | Layer | T (%) |
|---|---|---|---|---|---|
| Grad-CAM (DenseNet-121) | 51.20 = 085/166 | 51.20 = 085/166 | 47.44 | Features_ denseblock2_ denselayer7_ conv2 | 60 |
| Grad-CAM (ResNet-101) | 84.34 = 140/166 | 83.13 = 138/166 | 68.11 | Layer2_0_ conv2 | 60 |
| Grad-CAM (SqueezeNet-1.0) | 15.06 = 025/166 | 0 = 00/166 | 31.01 | Features_12 _expand1x1 | 60 |
| Grad-CAM (VGG-16) | 7.23 = 012/166 | 7.23 = 012/166 | 24.92 | Features_ module_5 | 60 |
| Grad-CAM (EfficientNetB4) | 28.92 = 048/166 | 28.92 = 048/166 | 35.40 | Stem _conv | 50 |
| Mask R-CNN (ResNet-101) | 18.67 = 031/166 | N/A | 29.63 | N/A | N/A |
The results of Mask R-CNN with ResNet-101 as backbone are also shown. Only the 166 chalky seed images in the test set were used for chalkiness segmentation evaluation. Performance is reported using the following metrics (as applicable): Ground-Truth Localization Accuracy (GT-known Loc. Acc.), which represents the fraction of ground-truth chalky seed images with ; Localization Accuracy (Loc. Acc.), which represents the fraction of ground-truth chalky images, with , correctly predicted by the model; Average IoU (Avg. IoU), which represents the average IoU for the set of chalky seed images. To calculate the IoU, the mask of the predicted chalkiness is obtained using a threshold of the maximum pixel intensity. The last two columns show the layer that was used for generating the heatmap and the threshold used to binarize the heatmap when calculating IoU, respectively
Chalkiness segmentation results of the weakly supervised Grad-CAM approach with ResNet-101 as backbone on unpolished rice
| Grad-CAM (ResNet-101) | GT-known Loc. Acc. (%) | Loc. Acc. (%) | Avg. IoU (%) | Layer | T (%) |
|---|---|---|---|---|---|
| polished model | 7.92 = 019/240 | 7.92 = 19/240 | 26.79 | layer2_0_ conv2 | 60 |
| unpolished model | 63.75 = 153/240 | 63.75 = 153/240 | 51.76 | layer2_0_ conv2 | 60 |
| mixed model | 20.42 = 049/240 | 20.42 = 049/240 | 29.91 | layer2_3_ conv2 | 60 |
Only 240 chalky seed images in the Unpolished (12) test set were used for chalkiness segmentation evaluation. Performance is reported using the following metrics: Ground-Truth Localization Accuracy (GT-known Loc. Acc.), which represents the fraction of ground-truth chalky seed images with ; Localization Accuracy (Loc. Acc.), which represents the fraction of ground-truth chalky images, with , correctly predicted by the model; Average IoU (Avg. IoU), which represents the average IoU for the set of chalky seed images. To calculate the IoU, the mask of the predicted chalkiness is obtained using a threshold of the maximum pixel intensity. The last two columns show the layer that was used for generating the heatmap and the threshold used to binarize the heatmap when calculating IoU, respectively
Comparison between the chalkiness segmentation results of the weakly supervised approaches Grad-CAM, Grad-CAM++ and Score CAM with ResNet-101 as backbone on polished rice
| Approach | GT-known Loc. Acc. (%) | Loc. Acc. (%) | Avg. IoU (%) | Layer | T (%) |
|---|---|---|---|---|---|
| Grad-CAM | layer2_0_conv2 | 60 | |||
| Grad-CAM++ | 48.19 | 48.19 | 43.98 | layer2_0_conv2 | 60 |
| Score-CAM | 68.07 | 66.87 | 55.02 | layer2_2_conv3 | 60 |
Only 166 chalky seed images in the polished test set were used for chalkiness segmentation evaluation. Performance is reported using the following metrics: Ground-Truth Localization Accuracy (GT-known Loc. Acc.), which represents the fraction of ground-truth chalky seed images with ; Localization Accuracy (Loc. Acc.), which represents the fraction of ground-truth chalky images, with , correctly predicted by the model; Average IoU (Avg. IoU), which represents the average IoU for the set of chalky seed images. To calculate the IoU, the mask of the predicted chalkiness is obtained using a threshold of the maximum pixel intensity. The last two columns show the layer that was used for generating the heatmap and the threshold used to binarize the heatmap when calculating IoU, respectively
Classification results on unpolished rice when ResNet-101 is used as backbone in the weakly supervised Grad-CAM approach
| ResNet-101 | Acc.(%) | Chalky | Non-chalky | ||||
|---|---|---|---|---|---|---|---|
| Pre.(%) | Rec.(%) | F1(%) | Pre.(%) | Rec.(%) | F1(%) | ||
| Polished | 63.01 | 0.00 | 0.00 | 0.00 | |||
| Unpolished | 83.43 | 82.19 | 89.61 | 43.65 | 91.67 | 59.14 | |
| Mixed | 98.08 | 44.77 | 89.17 | 59.61 | |||
Three models are evaluated: 1) polished model trained on polished rice images; 2) unpolished model trained on Unpolished (12); 3) mixed model, obtained by further training the polished model using the Unpolished (12) images. Performance is reported in terms of Accuracy (Acc.), Precision (Pre.), Recall (Rec.) and F1 measure (F1). Precision, Recall and F1 measure values are reported separately for the Chalky and Non-Chalky classes. All three models are evaluated on the test subset corresponding to the Unpolished (12) rice images. The best performance for each type of model for each metric is highlighted using bold font
Fig. 7Examples of chalkiness binary masks for four unpolished rice grains. The binary masks obtained from the Grad-CAM heatmaps (with ResNet-101 as backbone) using a threshold are shown form the polished, unpolished and mixed models, respectively, by comparison with the ground truth binary mask
Fig. 8Sources of errors for the Grad-CAM models. Images (a–d) correspond to polished rice, while image (e) corresponds to unpolished rice. The sources of error can be summarized as: a Inconsistencies in the way chalkiness is manually annotated, due to the white gradient nature of chalkiness; b Scratches or marks (referred as noise) on the chalkiness area can be interpreted as non-chalkiness; c Irregular chalkiness shape makes it hard to annotate chalkiness very precisely; d Abrasion stains can be recognized as chalkiness (white dots on the right in the figure); e Irregular shape and fuzzy boundaries affect the ground truth annotations and the predictions in unpolished rice as well
Percentage chalkiness area and chalkiness score were obtained for individual seeds randomly selected across treatments and genotypes
| Chalkiness (% area) | Main panicle | Primary panicle | Other panicle | |||
|---|---|---|---|---|---|---|
| CNT | HNT | CNT | HNT | CNT | HNT | |
| CO-39 | 7.54 | 8.17 | 6.95 | 7.73 | 8.00 | 7.19 |
| IR1561 | 13.35 | 16.22 | 8.21 | 16.37 | 8.52 | 12.35 |
| IR-22 | 8.02 | 6.33 | 9.36 | 5.27 | 5.89 | 5.30 |
| Kati | 7.39 | 7.44 | 13.56 | 10.34 | 14.32 | 10.70 |
| Oryzica | 10.61 | 10.75 | 5.32 | 5.64 | 5.05 | 4.83 |
| WAS-174 | 7.25 | 5.76 | 5.91 | 9.39 | 4.44 | 12.81 |
A three-way analysis of variance for these traits (Chalkiness Area (%) and Score) were performed under completely randomized design (CRD) using PROC GLM procedure in SAS. Means were separated using HSD (Tukey’s Studentized Range ) test at p = 0.05. Table includes mean and ± SEM for three way comparison. Chalkiness area (%) was significantly affected by genotype (G) (), treatment (T) × G () and G × panicle type (P) () and T × G × P () interaction effects. Chalkiness score was significantly affected by G (), T × G (), G × P () and T × G × P () interaction effects