Literature DB >> 32923745

Unlocking the potential of plant phenotyping data through integration and data-driven approaches.

Frederik Coppens^1,2, Nathalie Wuyts^1,2, Dirk Inzé^1,2, Stijn Dhondt^1,2.

Abstract

Plant phenotyping has emerged as a comprehensive field of research as the result of significant advancements in the application of imaging sensors for high-throughput data collection. The flip side is the risk of drowning in the massive amounts of data generated by automated phenotyping systems. Currently, the major challenge lies in data management, on the level of data annotation and proper metadata collection, and in progressing towards synergism across data collection and analyses. Progress in data analyses includes efforts towards the integration of phenotypic and -omics data resources for bridging the phenotype-genotype gap and obtaining in-depth insights into fundamental plant processes.

Entities: Chemical Disease Gene Species

Keywords: Data integration; Data management; Data-driven analysis; Plant phenotyping

Year: 2017 PMID： 32923745 PMCID： PMC7477990 DOI： 10.1016/j.coisb.2017.07.002

Source DB: PubMed Journal: Curr Opin Syst Biol ISSN： 2452-3100

Introduction

During the past decade, plant phenomics has evolved from an emerging niche to a thriving research field, both in academia and industry. This can be largely attributed to the use of imaging for the non-invasive analysis of structural, physiological and performance-related plant traits [1]. Automated image analysis procedures allow substantial increases in the throughput of trait measurements, thereby countering the so-called phenotyping bottleneck, which considers phenotypic measurements the rate-limiting factor in the functional analysis of specific genotypes or the assessment of genotype performance in plant breeding [2]. Improvements in plant imaging have been accompanied by technological advancements in plant handling and camera positioning to keep up with the speed of image acquisition. Plant-to-sensor systems, utilizing conveyors and grippers to present the plant to the camera, and sensor-to-plant systems, which move the camera to the plants, have been developed in growth cabinets, chambers and greenhouses [3]. While the vast majority of the phenotyping is still done manually under field conditions, automated image acquisition always occurs in a sensor-to-plant fashion, assisted by manual or engine-driven ‘phenomobiles’, gantry systems on the ground, or unmanned aerial vehicles (UAVs) [4]. Undoubtedly, it is the development of digital image sensors that underlies this remarkable evolution in plant phenotyping. Sensitivity of the sensor for a specific part of the electromagnetic spectrum, in combination with appropriate filters, defines which traits can be extracted. Typical Red Green Blue (RGB) color sensors are sensitive to wavelengths in a range from 400 to 1000 nm. Most color cameras provide an infrared (IR) cut-off filter for imaging specifically in the visible spectrum, but without this filter, they allow near-IR imaging, and as such image acquisition of plants in the dark 5, 6. Indium gallium arsenide (InGaAs) sensors show a spectral response to a range from approximately 900 to 1700 nm. These sensors are used in Short Wave InfraRed (SWIR) cameras, which can be adopted for the measurement of water content in plants [7]. Long Wave Infrared (LWIR) sensors with a spectral range of 3–14 μm, on the other hand, are used for thermal imaging of shoots as a proxy for stomatal conductance or water use behavior in general [8]. The use of advanced imaging systems has drastically increased the volume of data from a couple of bytes, e.g. manually scored traits in a spreadsheet, to several megabytes (MB) or sometimes more than 100 MB, e.g. in the case of hyperspectral imaging or scene characterization by means of video capture. Data are also stored in a myriad of formats on diverse types of media ranging from a researcher's hard drive to local server stations or in “the cloud”. Proper annotation of data to ensure their continued relevance after acquisition is thus essential. Furthermore, because the plant's phenotype is the result of a strong interaction between its genotype and the environment in which it grows (G × E) [9], plant phenotyping efforts should include the logging of environmental conditions, which in turn requires the collection of metadata on the sensors in use. Because of the tremendous amounts and diversity of data produced within the plant phenotyping research field, data management, storage and analysis are currently considered as the major challenges. On the other hand, large datasets may also create opportunities for data modeling and machine learning towards “Big Data” analyses.

Data management to enable data integration

The current technologies and methods used in plant phenotyping generate a huge amount of complex, unstructured “Big Data”, which can give the impression that a lot of the phenotype data might not be retrieved anymore [10]. In first instance, phenotypic data management requires the use of ontology terms for the unique and repeatable annotation of data in order to ensure their persistence in view of traceability and reuse under the form of data sharing and meta-analyses. The use of ontologies therefore promotes synergism. Moreover, in contrast to repositories such as the European Nucleotide Archive (ENA) [11] or Sequence Read Archive (SRA) [12] for sequencing data, there is currently no central, structured repository for phenotyping data or metadata. Although data can be uploaded to general purpose repositories such as Zenodo (https://zenodo.org/), FigShare (https://figshare.com) and Dryad (http://datadryad.org), these do not provide services to facilitate the description of, access to and integration of data. As a consequence of the lack of a central repository, advanced data mining and discovery depends on the error-prone scavenging of scientific literature. As a consequence, a plethora of resources has been developed by individual research groups and consortia, ranging from resources dedicated to one species or one type of phenotyping system to more generic platforms allowing the integration of several data types. AraPheno provides a central repository of population-scale phenotypes for Arabidopsis accessions [13], whereas the Plant Genomics and Phenomics (PGP) research data repository is an infrastructure to comprehensively publish plant research data covering cross-domain datasets [14]. The Phenomics Ontology Driven Data (PODD) repository was developed to handle and distribute phenotyping data and metadata from Australian facilities [15]. ClearedLeaves DB functions as an online database of cleared plant leaf images [16]. Phenopsis DB is an information system for sharing data generated by the PHENOPSIS plant phenotyping platform [17] and PhenoFront is a web-server front end to the LemnaTec Phenotyper platform [18]. Whereas BreeDB hosts datasets of tomato and potato populations (https://www.eu-sol.wur.nl), Genoplante Information System (GnpIS) is a multispecies integrative information system dedicated to plant and fungi pests, bridging genetic and genomic data [19]. This non-exhaustive list illustrates the variety of available resources, which in some cases, provide the data for download and further analysis. Many of these data resources have been built to organize a huge amount of collected phenotypic data. In the light of high-throughput phenotyping, there is a need for managing the data at the moment it is being generated (Figure 1). Besides data derived from experiments, provisions are made for metadata related to the environment sensors in use, and to the imaging sensors themselves, including the type of sensor, the camera systems and their optical properties. The latter are required for image analysis, whereas the whole ensures traceability and quality insurance. These functionalities are built-in in PIPPA, the PSB (Plant Systems Biology) Interface for Plant Phenotype Analysis (https://pippa.psb.ugent.be), a web-based framework for the analysis, visualization and management of phenotypic data, which enables biologists to perform dedicated image processing and (statistical) analyses of data generated by Weighing, Imaging and Watering Machine (WIWAM) phenotyping platforms or of externally imported data. Frameworks harboring comparable functionalities include Integrated Analysis Platform (IAP), and Plant Computer Vision (PlantCV) 18, 20.

Figure 1

A systems biology approach in phenotypic data management. A scientific hypothesis leads to new experiments including image-based plant phenotyping or other -omics approaches. Active vision systems can directly feedback into the image acquisition. Image acquisition features like the spatial and temporal resolution can also be optimized after data analysis. Sanity checks on the generated data help to quickly validate the image analysis. The analyzed data and images are saved along with the metadata and the experimental design in a dedicated data repository. Additional value is created by the integration of -omics data coming from private or public data resources, after which new hypotheses are generated through data-driven approaches like modeling, machine learning and meta-analysis.

Image data extraction

The advanced development of imaging in plant phenotyping enables multi-dimensional, high-throughput monitoring of plants at an increasing pace. Although numerous image analysis software tools are available for the extraction of biologically meaningful phenotypic or physiological parameters from these images 21, 22, they mainly focus on the analysis and often are disconnected from the data management part. To address this, dedicated analysis platforms have been developed: IAP [20], PlantCV [18], InfraPhenoGrid [23], OMERO [24], BisQue on CyVerse [25], and PIPPA (https://pippa.psb.ugent.be). These systems offer a user-friendly interface to a grid compute cluster that facilitates researchers without a computer science background to run image analysis pipelines. Moreover, they also cater for bioinformaticians as they are inherently flexible, allowing custom analysis pipelines through extensions or Application Programming Interfaces (APIs). These platforms ensure provenance through metadata and thus play an important role in data management. Data visualization is also an important aspect, both for reporting and interpretation, as well as for quality control of the input data (Figure 1). For example, PIPPA deploys several ‘sanity check’ algorithms to flag outliers for further inspection. As our capacity to extract information from images increases, so do the size and complexity of the derived data and downstream analyses. Therefore, the computing infrastructure needs to keep pace. Usage of Graphical Processing Units (GPUs) has the potential to dramatically increase the efficiency of image analysis algorithms, but programming GPUs is notoriously hard. Libraries such as OpenCV (http://opencv.org) or the QUASAR programming languages [26] encapsulate the usage of GPUs. However, the availability of tools for easier analysis optimization will be important to efficiently process the vast amount of data generated.

Value creation through integration

International projects such as transPLANT (Trans-national Infrastructure for Plant Genomic Science, http://transplantdb.eu) and EPPN (European Plant Phenotyping Network, http://plant-phenotyping-network.eu) recognize the need for metadata improvement and alignment [27]. They propose the Minimal Information About a Plant Phenotyping Experiment (MIAPPE, http://www.miappe.org) as the emerging standard for the description of a phenotyping experiment. Next to source material and experimental design, MIAPPE also allows detailed description of the environmental conditions, which has been shown to be crucial for comparison and interpretation 28, 29. During the development of the standard, it became clear that available metadata, the usage and interpretation of ontologies, as well as the method of access, differ between resources. Further development through community engagement and implementation of MIAPPE as a standard will be instrumental for the integration of phenotypic data from different providers and to promote synergism. On the systems biology side, the next step is the integration of image-derived data and various -omics datasets (Figure 1). In particular, the combined analysis of datasets that were never set out to be integrated, is a promising target for value creation. This requires a rigorous curation of input data, and more importantly, a harmonization of metadata. The use of different measurement methodologies which are not inherently interoperable makes this a challenging task, but efforts to map this will contribute to an increased alignment in the future. The BioSamples database serves as a central hub for metadata, which allows to link these different data types and provides a query interface and computational access through an API [30]. The Breeding API (BrAPI, http://docs.brapi.apiary.io) specifies such an interface for phenotype and genotype databases and is emerging as the standard in the field. Community-wide adoption of these technologies is essential for the identification of relevant data and an efficient data integration. Currently, the number of publicly available datasets that can be readily integrated is limited. This adoption constitutes the crucial next step and challenge for the plant community to make all data Findable, Accessible, Interoperable and Re-usable (FAIR) [31], both by humans and computer systems. ELIXIR, an infrastructure aimed at coordinating and integrating bioinformatics resources, recognized this challenge and has put this forward as one of the use cases in the H2020 ELIXIR-EXCELERATE project (https://www.elixir-europe.org/excelerate/plants).

Data-driven approaches aid in hypothesis generation

The speed of data generation in plant phenotyping has reached such a level that the question can be raised whether data-driven approaches can replace traditional hypothesis-driven analyses. The vast amount of data may indeed provide us with new insights, for example by means of machine learning approaches in data analysis [32]. Machine learning allows the development of algorithms that can learn from a dedicated training set and make decisions on newly presented data. Data associations can then be uncovered, which may lead to new insights and further developments in fields such as marker-assisted breeding. Machine learning has been applied for the identification, classification, quantification and prediction of plant stress, in which each level builds on the previous one. As an example, disease symptoms of three Alternaria species have been classified in oilseed rape based on thermal and hyperspectral imaging [33]. Also the severity of Verticillium wilt in olives has been quantified using these imaging technologies [34]. One should, however, realize that the resulting procedure will only be as good as the used training dataset. The number of publicly available datasets is currently rather limited for this to be widely applicable. In particular, advancing from associations to causal relations requires specifically designed experiments, e.g. detailed time series. Nowadays, machine learning also has its role at the level of image analysis [35]. Deep learning approaches can automatically determine useful features for image classification, deciding on whether an image patch contains a specific plant part, such as a root tip or wheat ears [36]. Such algorithms can help in the localization of these plant parts in entire images. Furthermore, machine learning can also aid in the segmentation of plants from their background, as exemplified for maize shoots [37]. With further advances and data availability, machine learning will undoubtedly prove to be a valuable resource to generate new hypotheses (Figure 1). Images and plant sensors provide “Big Data” information about the structure and function of whole-plants and plant organs throughout development. These data form the basis of functional-structural plant models (FSPM) that describe the development and physiology of growing plants over time. Furthermore, transcriptomic, metabolomic, proteomic, and possibly other -omics data continue to reveal potential control mechanisms in regulatory nodes of plant growth by providing insights into the molecular basis underlying major events during plant development. The integration of molecular networks into whole-plant level models allows the simulation of environmental and genetic perturbations [38], enabling a data-driven systems biology approach to advance our insights into plant growth and development (Figure 1). Furthermore, for a more applied point of view, such integrated, multi-scale FSPMs will need to be validated across genotypes and field environments and ultimately could form the basis of what we define as ‘prescription agriculture’. Plant image analysis, sensors and possible biomarkers could be used to alert the farmer that crops experience less optimal conditions and the FSPM model will provide a decision tool to predict the potential yield gained by the application of extra resources, such as irrigation and nitrogen fertilization.

Future challenges and perspectives

Deep integration of image analysis in high-throughput phenotyping will allow for on-the-fly feedback and decision-making. As such, image analysis can assist in the optimization of information generation already during an experiment, rather than weeks or even months after it is finished. Active vision systems allow for the repositioning of the camera/object in such a way that most additional information can be extracted from an image at its new position (Figure 1) [39]. Such technologies have the potential to reduce the amount of data capture and requirements for data storage and analysis, while ensuring and increasing the relevance of what is generated. Hence, image analysis technologies can pave the way for an agile systems biology approach that guides researchers to create value, for example the combination of feature selection and growth modeling supporting the biological interpretation of plant growth and stress tolerance in barley [40]. New and highly repeatable traits, such as maximum growth rate and stress elasticity, which are related to these complex agronomic phenotypes, have permitted the identification of stable QTLs controlling their expression. Integration of available datasets holds much potential to further deepen our knowledge. However, the amount of data readily available for such a meta-analysis within one resource is often insufficient to come up with strong conclusions or to provide solid evidence for a specific hypothesis. Therefore, linking of several data resources across phenotyping platforms to enable large-scale meta-analyses, would be a major step forward in data integration. This is envisioned within the ELIXIR-EXCELERATE project, which aims to annotate datasets and make phenotypic databases discoverable and interoperable through usage of ontologies and a standardized API (https://www.elixir-europe.org/excelerate/plants). The main challenge is to engage with the broad plant phenotyping community, across academia and industry, to converge on these standards for the description and access of the vast amount of currently distributed data. The future of plant phenotyping lies in synergism, as the comprehensive integration and analysis of this “Big Data” allow to unravel the biological processes governing plant growth and development, and to advance plant breeding for much-needed climate-resilient and high-yielding crops.

32 in total

Review 1. New phenotyping methods for screening wheat and barley for beneficial responses to water deficit.

Authors: Rana Munns; Richard A James; Xavier R R Sirault; Robert T Furbank; Hamlyn G Jones
Journal: J Exp Bot Date: 2010-07-06 Impact factor: 6.992

Review 2. Cell to whole-plant phenotyping: the best is yet to come.

Authors: Stijn Dhondt; Nathalie Wuyts; Dirk Inzé
Journal: Trends Plant Sci Date: 2013-05-23 Impact factor: 18.313

3. Machine Learning for Plant Phenotyping Needs Image Processing.

Authors: Sotirios A Tsaftaris; Massimo Minervini; Hanno Scharr
Journal: Trends Plant Sci Date: 2016-10-31 Impact factor: 18.313

4. High-resolution time-resolved imaging of in vitro Arabidopsis rosette growth.

Authors: Stijn Dhondt; Nathalie Gonzalez; Jonas Blomme; Liesbeth De Milde; Twiggy Van Daele; Dirk Van Akoleyen; Veronique Storme; Frederik Coppens; Gerrit T S Beemster; Dirk Inzé
Journal: Plant J Date: 2014-08-25 Impact factor: 6.417

5. A Versatile Phenotyping System and Analytics Platform Reveals Diverse Temporal Responses to Water Availability in Setaria.

Authors: Noah Fahlgren; Maximilian Feldman; Malia A Gehan; Melinda S Wilson; Christine Shyu; Douglas W Bryant; Steven T Hill; Colton J McEntee; Sankalpi N Warnasooriya; Indrajit Kumar; Tracy Ficor; Stephanie Turnipseed; Kerrigan B Gilbert; Thomas P Brutnell; James C Carrington; Todd C Mockler; Ivan Baxter
Journal: Mol Plant Date: 2015-06-20 Impact factor: 13.164

Review 6. Where have all the crop phenotypes gone?

Authors: Dani Zamir
Journal: PLoS Biol Date: 2013-06-25 Impact factor: 8.029

7. The sequence read archive.

Authors: Rasko Leinonen; Hideaki Sugawara; Martin Shumway
Journal: Nucleic Acids Res Date: 2010-11-09 Impact factor: 16.971

8. ClearedLeavesDB: an online database of cleared plant leaf images.

Authors: Abhiram Das; Alexander Bucksch; Charles A Price; Joshua S Weitz
Journal: Plant Methods Date: 2014-03-28 Impact factor: 4.993

9. PGP repository: a plant phenomics and genomics data publication infrastructure.

Authors: Daniel Arend; Astrid Junker; Uwe Scholz; Danuta Schüler; Juliane Wylie; Matthias Lange
Journal: Database (Oxford) Date: 2016-04-17 Impact factor: 3.451

10. AraPheno: a public database for Arabidopsis thaliana phenotypes.

Authors: Ümit Seren; Dominik Grimm; Joffrey Fitz; Detlef Weigel; Magnus Nordborg; Karsten Borgwardt; Arthur Korte
Journal: Nucleic Acids Res Date: 2016-10-24 Impact factor: 16.971

9 in total

1. ARADEEPOPSIS, an Automated Workflow for Top-View Plant Phenomics using Semantic Segmentation of Leaf States.

Authors: Patrick Hüther; Niklas Schandry; Katharina Jandrasits; Ilja Bezrukov; Claude Becker
Journal: Plant Cell Date: 2020-10-09 Impact factor: 11.277

Review 2. Capturing crop adaptation to abiotic stress using image-based technologies.

Authors: Nadia Al-Tamimi; Patrick Langan; Villő Bernád; Jason Walsh; Eleni Mangina; Sónia Negrão
Journal: Open Biol Date: 2022-06-22 Impact factor: 7.124

3. High-throughput field crop phenotyping: current status and challenges.

Authors: Seishi Ninomiya
Journal: Breed Sci Date: 2022-02-17 Impact factor: 2.014

Review 4. Advances in field-based high-throughput photosynthetic phenotyping.

Authors: Peng Fu; Christopher M Montes; Matthew H Siebers; Nuria Gomez-Casanovas; Justin M McGrath; Elizabeth A Ainsworth; Carl J Bernacchi
Journal: J Exp Bot Date: 2022-05-23 Impact factor: 7.298

5. ChronoRoot: High-throughput phenotyping by deep segmentation networks reveals novel temporal parameters of plant root system architecture.

Authors: Nicolás Gaggion; Federico Ariel; Vladimir Daric; Éric Lambert; Simon Legendre; Thomas Roulé; Alejandra Camoirano; Diego H Milone; Martin Crespi; Thomas Blein; Enzo Ferrante
Journal: Gigascience Date: 2021-07-20 Impact factor: 6.524

6. Integration of Gas Exchange With Metabolomics: High-Throughput Phenotyping Methods for Screening Biostimulant-Elicited Beneficial Responses to Short-Term Water Deficit.

Authors: Giulia Antonucci; Michele Croci; Begoña Miras-Moreno; Alessandra Fracasso; Stefano Amaducci
Journal: Front Plant Sci Date: 2021-06-01 Impact factor: 5.753

7. Rapid identification of an Arabidopsis NLR gene as a candidate conferring susceptibility to Sclerotinia sclerotiorum using time-resolved automated phenotyping.

Authors: Adelin Barbacci; Olivier Navaud; Malick Mbengue; Marielle Barascud; Laurence Godiard; Mehdi Khafif; Aline Lacaze; Sylvain Raffaele
Journal: Plant J Date: 2020-04-21 Impact factor: 6.417

Review 8. Opportunities and limits of controlled-environment plant phenotyping for climate response traits.

Authors: Anna Langstroff; Marc C Heuermann; Andreas Stahl; Astrid Junker
Journal: Theor Appl Genet Date: 2021-07-24 Impact factor: 5.699

9. Data challenges for future plant gene editing: expert opinion.

Authors: Rim Lassoued; Diego M Macall; Stuart J Smyth; Peter W B Phillips; Hayley Hesseln
Journal: Transgenic Res Date: 2021-06-09 Impact factor: 2.788

9 in total