| Literature DB >> 30101442 |
Mario Valerio Giuffrida1,2, Peter Doerner3, Sotirios A Tsaftaris1,4.
Abstract
Direct observation of morphological plant traits is tedious and a bottleneck for high-throughput phenotyping. Hence, interest in image-based analysis is increasing, with the requirement for software that can reliably extract plant traits, such as leaf count, preferably across a variety of species and growth conditions. However, current leaf counting methods do not work across species or conditions and therefore may lack broad utility. In this paper, we present Pheno-Deep Counter, a single deep network that can predict leaf count in two-dimensional (2D) plant images of different species with a rosette-shaped appearance. We demonstrate that our architecture can count leaves from multi-modal 2D images, such as visible light, fluorescence and near-infrared. Our network design is flexible, allowing for inputs to be added or removed to accommodate new modalities. Furthermore, our architecture can be used as is without requiring dataset-specific customization of the internal structure of the network, opening its use to new scenarios. Pheno-Deep Counter is able to produce accurate predictions in many plant species and, once trained, can count leaves in a few seconds. Through our universal and open source approach to deep counting we aim to broaden utilization of machine learning-based approaches to leaf counting. Our implementation can be downloaded at https://bitbucket.org/tuttoweb/pheno-deep-counter.Entities:
Keywords: deep learning; image-based plant phenotyping; leaf counting; machine learning; multimodal; night images
Mesh:
Year: 2018 PMID: 30101442 PMCID: PMC6282617 DOI: 10.1111/tpj.14064
Source DB: PubMed Journal: Plant J ISSN: 0960-7412 Impact factor: 6.417
Figure 1Schematic of the proposed deep architecture.
(a) A modality branch, consisting of ResNet50 (He et al., 2016), extracts modality‐dependent plant features as a feature vector of 1024 neurons (RGB, visible light; NIR, near infrared; FMP, fluorescence).
(b) The fusion part combines those features to retain the most useful information from each modality.
(c) The regression part relates fused information with leaf count as a non‐linear regression. (This figure is best viewed in color online.) [Colour figure can be viewed at http://wileyonlinelibrary.com].
Testing set results for PhenoDC trained on visible light (RGB) images from the CVPPP 2017 dataset (Scharr et al., 2014; Bell and Dee, 2016; Minervini et al., 2016). Difference in count (DiC) and absolute DiC (|DiC|) are given as mean and standard deviation (in parenthesis), with lower values being better. For the mean squared error (MSE) a lower value is better, while for percentage agreement (%) a higher value is better
| A1 | A2 | A3 | A4 | A5 | All | ||
|---|---|---|---|---|---|---|---|
| DiC | |||||||
|
|
|
|
|
|
|
| |
| Giuffrida | −0.79 (1.54) | −2.44 (2.88) | −0.04 (1.93) | – | – | – | |
| Romera‐Paredes and Torr ( | 0.20 (1.40) | – | – | – | – | – | |
| Aich and Stavness ( | −0.33 (1.38) | −0.22 (1.86) | 2.71 (4.58) | 0.23 (1.44) | 0.80 (2.77) | 0.73 (2.72) | |
| |DiC| | |||||||
|
|
| 1.44 (1.01) |
|
|
|
| |
| Giuffrida | 1.27 (1.15) | 2.44 (2.88) | 1.36 (1.37) | – | – | – | |
| Romera‐Paredes and Torr ( | 1.10 (0.90) | – | – | – | – | – | |
| Ren and Zemel ( | 0.80 (1.10) | – | – | – | – | – | |
| Aich and Stavness ( | 1.00 (1.00) |
| 3.46 (4.04) | 1.08 (0.97) | 1.66 (2.36) | 1.62 (2.30) | |
| MSE | |||||||
|
|
|
|
|
|
|
| |
| Giuffrida | 2.91 | 13.33 | 3.68 | – | – | – | |
| Aich and Stavness ( | 1.97 | 3.11 | 28.00 | 2.11 | 8.28 | 7.90 | |
| % | |||||||
|
|
| 11.1 |
|
|
|
| |
| Giuffrida | 27.3 |
| 19.6 | – | – | – | |
| Aich and Stavness ( | 30.3 | 11.1 | 7.1 | 29.2 | 23.8 | 24.0 | |
A paired t‐test between our method and Aich and Stavness 2017 (the only two approaches from the CVPPP Workshop 2017) shows statistically significant differences (p value <0.0001).
Trained on A1 only.
Training and inference are performed using per‐leaf segmentations and not total leaf count as with the other methods.
Best values are those closer to 0.
Best values are those closer to 1 (i.e. 100% in the case of Percentage Agreement).
Entries in bold represent the best performance.
Figure 2Leaf count prediction in the CVPPP dataset (all images together).
(a) Ground truth versus prediction, shown as a scatter plot. Due to integer values the colors show how many points are overlapping. Dashed parallel lines show the ±1 leaf error range. Note that our approach has high agreement with the real leaf count.
(b) Error distribution. Observe that there is 83% chance that the error will be ±1 within 0 (highlighted area), a number close to the agreement among human observers (about 90%; Giuffrida et al., 2018). (This figure is best viewed in color online.) [Colour figure can be viewed at http://wileyonlinelibrary.com].
Testing the performance of PhenoDC on the multi‐modal dataset (Cruz et al., 2016). We report results when the network is trained using only a single modality and when also using all the three modalities
| Training on | DiC | |DiC| | MSE | % |
|---|---|---|---|---|
| RGB only | 0.02 (0.75) | 0.48 (0.57) | 0.56 | 55.7 |
| FMP only | −0.06 (0.72) | 0.45 (0.56) | 0.52 | 58.7 |
| NIR only | 0.13 (0.61) | 0.33 (0.53) | 0.39 | 69.6 |
| All (RGB, FMP, NIR) | 0.11 (0.40) | 0.13 (0.39) | 0.17 | 88.5 |
DiC, difference in count; |DiC|, absolute DiC; MSE, mean squared error; %. percentage difference; RGB, visible light; FMP, fluorescence; NIR, near infrared.
Best values are those closer to 0.
Best values are those closer to 1 (i.e. 100%).
Adapting (fine‐tuning) the parameters of the proposed architecture to work on tobacco images [A3 dataset (Minervini et al., 2016)] previously pre‐trained with Arabidopsis plants [A1, A2, and A4 (Bell and Dee, 2016; Minervini et al., 2016)]. We progressively increase the number of training images to find a suitable number of images required to create a meaningful model that can count tobacco leaves. The table reports the results on the held‐out testing set
| No. of training images | DiC | |DiC| | MSE | % |
|---|---|---|---|---|
| 7 | −0.39 (1.65) | 1.32 (1.07) | 2.83 | 23.2 |
| 14 | 0.00 (1.32) | 0.96 (0.90) | 1.75 | 32.1 |
| 21 | 0.27 (1.36) | 0.87 (1.07) | 1.91 | 41.1 |
| 27 | 0.25 (1.20) | 0.86 (0.87) | 1.50 | 37.5 |
DiC, difference in count; |DiC|, absolute DiC; MSE, mean squared error; %. percentage difference.
Best values are those closer to 0.
Best values are those closer to 1 (i.e. 100%).
Figure 3Error distribution of our network fine‐tuned using tobacco plants in the A3 dataset (Minervini et al., 2016).
We reported the distribution of the error committed in the testing set, after refining the network parameters with 7 (a), 14 (b), 21 (c) and 27 (d) tobacco plants. When we train with more images (≥21), the highlighted area (error up to ±1 leaf, cf. Figure 2) contains more than 80% of the cases. (This figure is best viewed in color online.) [Colour figure can be viewed at http://wileyonlinelibrary.com].
A similar process to that described in Table 3 but repeated for komatsuna plant leaf counting based on data available in Uchiyama et al. (2017). The model has been trained on Arabidopsis as described in Table 3. Results shown refer to the testing set
| No. of training images per plant | Hours of the day | DiC | |DiC| | MSE | % |
|---|---|---|---|---|---|
| 10 | 3 p.m. | −0.74 (1.08) | 0.96 (0.89) | 1.71 | 35.0% |
| 20 | 3 p.m., 11 a.m. | −0.54 (0.95) | 0.86 (0.65) | 1.19 | 25.0% |
| 30 | 3 p.m., 3 a.m. | 0.18 (0.92) | 0.67 (0.66) | 0.88 | 44.2% |
| 40 | 3 p.m., 3 a.m. | 0.24 (0.84) | 0.59 (0.64) | 0.76 | 49.1% |
DiC, difference in count; |DiC|, absolute DiC; MSE, mean squared error; %, percentage difference.
Best values are those closer to 0.
Best values are those closer to 1 (i.e. 100%).
Images taken on the following day.