| Literature DB >> 29375647 |
Jordan Ubbens1, Mikolaj Cieslak2, Przemyslaw Prusinkiewicz2, Ian Stavness1.
Abstract
Deep learning presents many opportunities for image-based plant phenotyping. Here we consider the capability of deep convolutional neural networks to perform the leaf counting task. Deep learning techniques typically require large and diverse datasets to learn generalizable models without providing a priori an engineered algorithm for performing the task. This requirement is challenging, however, for applications in the plant phenotyping field, where available datasets are often small and the costs associated with generating new data are high. In this work we propose a new method for augmenting plant phenotyping datasets using rendered images of synthetic plants. We demonstrate that the use of high-quality 3D synthetic plants to augment a dataset can improve performance on the leaf counting task. We also show that the ability of the model to generate an arbitrary distribution of phenotypes mitigates the problem of dataset shift when training and testing on different datasets. Finally, we show that real and synthetic plants are significantly interchangeable when training a neural network on the leaf counting task.Entities:
Keywords: 3D plant modeling; Deep learning; L-system; Machine learning; Phenotyping
Year: 2018 PMID: 29375647 PMCID: PMC5773030 DOI: 10.1186/s13007-018-0273-z
Source DB: PubMed Journal: Plant Methods ISSN: 1746-4811 Impact factor: 4.993
Real and synthetic training datasets
| Dataset | Number of images | Range of leaf counts | Accessions | Image size | Scale | Background |
|---|---|---|---|---|---|---|
| Ara2012 | 120 | 12–20 | Col-0 | Varied | 1:1 | Soil/tray |
| Ara2013-Canon | 165 | 5–13 | Col-0/mutants | Varied | 1:1 | Soil |
| S1 | 1000 | 12–20 | N/A |
| 1:1–1:2 | Soil |
| S2 | 1000 | 5–13 | N/A |
| 1:1–1:2 | Soil |
| S12 | 1000 | 5–20 | N/A |
| 1:1–1:2 | Varied |
Scale denotes the ratio of the plant diameter to the image size
Performance when training and testing on different datasets.
| Training data | Testing data | AbsCountDiff | CountDiff | MSE |
| Agreement (%) |
|---|---|---|---|---|---|---|
| Ara2013-Canon | Ara2012 | 5.45 (2.04) | 33.9 | 0 | ||
| Ara2012 | Ara2013-Canon | 5.39 (1.99) | 5.39 (1.99) | 33.13 | 0 | |
| S12 | Ara2012 | 1.38 (1.03) | 2.97 | 0.42 | 22 | |
| S12 | Ara2013-Canon | 1.82 (1.38) | 0.46 (2.24) | 5.25 | 20 |
Training on a single dataset of synthetic rosettes performs significantly better than training on a dataset of real rosettes with a different distribution of phenotypes
Interoperability between real and synthetic rosettes
| Training data | Testing data | AbsCountDiff | CountDiff | MSE |
| Agreement (%) |
|---|---|---|---|---|---|---|
| S2 | Ara2013-Canon | 1.29 (1.01) | 2.7 | 0.26 | 24 | |
| Ara2013-Canon | S2 | 0.81 (0.54) | 0.28 (0.93) | 0.95 | 0.82 | 34 |
| S1 | Ara2012 | 1.70 (1.21) | 0.67 (1.98) | 4.39 | 0.27 | 25 |
Fig. 1Leaf growth and shape functions used in the L-system model
Fig. 2Synthetic rosettes (left) generated by the L-system and real rosettes (right) from the public dataset [32]
Augmentation results, Ara2013-Canon dataset
| AbsCountDiff | CountDiff | MSE |
| Agreement (%) | |
|---|---|---|---|---|---|
| Ubbens and Stavness [ | 0.61 (0.52) | – | – | – | – |
| Synthetically augmented (S2) | 0.48 (0.58) | 0.15 (0.82) | 0.73 | 0.92 | 80 |
Fig. 3Distributions of relative count difference in the generalization experiment. Training on one dataset and testing on another exhibits severe dataset shift (top), while training on synthetic data significantly reduces this error by encompassing a comprehensive range of leaf counts (bottom)
Fig. 4Scatter plots of actual and predicted leaf counts in the interoperability experiments. Training on synthetic and testing on real (left), and training on real and testing on synthetic (right)
Fig. 5Comparison of training and testing loss on real (red) and synthetic (blue) rosettes. Real plants show significantly higher generalization error, while the synthetic dataset is relatively easy to fit
Fig. 6Test performance on purely synthetic data when using increasing sizes for the training set. Like with datasets of natural images, we see that generalization performance improves with larger training sets