| Literature DB >> 33329623 |
Soumyashree Kar1, Vincent Garin2, Jana Kholová2, Vincent Vadez2,3, Surya S Durbha1, Ryokei Tanaka4, Hiroyoshi Iwata4, Milan O Urban5, J Adinarayana1.
Abstract
The rapid development of phenotyping technologies over the last years gave the opportunity to study plant development over time. The treatment of the massive amount of data collected by high-throughput phenotyping (HTP) platforms is however an important challenge for the plant science community. An important issue is to accurately estimate, over time, the genotypic component of plant phenotype. In outdoor and field-based HTP platforms, phenotype measurements can be substantially affected by data-generation inaccuracies or failures, leading to erroneous or missing data. To solve that problem, we developed an analytical pipeline composed of three modules: detection of outliers, imputation of missing values, and mixed-model genotype adjusted means computation with spatial adjustment. The pipeline was tested on three different traits (3D leaf area, projected leaf area, and plant height), in two crops (chickpea, sorghum), measured during two seasons. Using real-data analyses and simulations, we showed that the sequential application of the three pipeline steps was particularly useful to estimate smooth genotype growth curves from raw data containing a large amount of noise, a situation that is potentially frequent in data generated on outdoor HTP platforms. The procedure we propose can handle up to 50% of missing values. It is also robust to data contamination rates between 20 and 30% of the data. The pipeline was further extended to model the genotype time series data. A change-point analysis allowed the determination of growth phases and the optimal timing where genotypic differences were the largest. The estimated genotypic values were used to cluster the genotypes during the optimal growth phase. Through a two-way analysis of variance (ANOVA), clusters were found to be consistently defined throughout the growth duration. Therefore, we could show, on a wide range of scenarios, that the pipeline facilitated efficient extraction of useful information from outdoor HTP platform data. High-quality plant growth time series data is also provided to support breeding decisions. The R code of the pipeline is available at https://github.com/ICRISAT-GEMS/SpaTemHTP.Entities:
Keywords: HTP-pipeline; SpATS; change point analysis; cross-validation; high-throughput phenotyping; simulation
Year: 2020 PMID: 33329623 PMCID: PMC7714717 DOI: 10.3389/fpls.2020.552509
Source DB: PubMed Journal: Front Plant Sci ISSN: 1664-462X Impact factor: 5.753
Dataset description.
| Chickpea E1 (CPE1) | 25 Nov 2014–17 Dec 2014 | 13.15 | 29.02 | 89.73 | 0.000 | 52.080 | 5.797 |
| Chickpea E2 (CPE2) | 01 Dec 2015–07 Jan 2016 | 13.97 | 31.22 | 89.78 | 1.040 | 56.250 | 17.434 |
| Sorghum E1 (SGE1) | 15 Mar 2015–06 Apr 2015 | 18.42 | 40.67 | 45.84 | 0.000 | 2.210 | 0.293 |
| Sorghum E2 (SGE2) | 22 Oct 2015–12 Nov 2015 | 14.02 | 30.51 | 89.67 | 0.000 | 26.560 | 1.456 |
FIGURE 1Block diagram of the three stages of SpaTemHTP pipeline, illustrated according to the sequence of steps followed for HTP data analysis.
List of strategies to process data from the raw data to the G-BLUE computation.
| S1 | ✗ | ✗ | ✗ |
| S2 | ✓ | ✗ | ✗ |
| S3 | ✗ | ✓ | ✗ |
| S4 | ✓ | ✓ | ✗ |
| S5 | ✗ | ✗ | ✓ |
| S6 | ✓ | ✗ | ✓ |
| S7 | ✗ | ✓ | ✓ |
| S8 | ✓ | ✓ | ✓ |
| S9 | Single-step mixed model | ||
Average predictive ability (ρ) and heritability (h2) for LA3D and PH traits of chickpea (CP) or sorghum (SG) experiments 1 and 2 (E1, E2) obtained with (denoted as “yes”) and without (denoted as “no”) the effect of each data treatment (outlier detection, missing value imputation, and spatial adjustment (ρ).
| Outlier detection | No | 0.75 | 0.58 | 0.89 | 0.67 | 0.67 | 0.60 | 0.75 | 0.56 |
| Yes | 0.75 | 0.57 | 0.88 | 0.69 | 0.67 | 0.56 | 0.75 | 0.56 | |
| Difference | 0.00 | 0.01 | 0.00 | −0.03 | 0.00 | 0.04 | 0.00 | −0.01 | |
| Missing value imputation | No | 0.75 | 0.57 | 0.88 | 0.68 | 0.67 | 0.56 | 0.75 | 0.55 |
| Yes | 0.75 | 0.59 | 0.88 | 0.69 | 0.67 | 0.60 | 0.75 | 0.56 | |
| Difference | 0.00 | 0.02 | 0.00 | 0.01 | 0.00 | 0.04 | 0.00 | 0.01 | |
| Spatial adjustment | No | 0.70 | 0.54 | 0.88 | 0.68 | 0.65 | 0.50 | 0.71 | 0.52 |
| Yes | 0.80 | 0.60 | 0.89 | 0.68 | 0.69 | 0.66 | 0.79 | 0.60 | |
| Difference | 0.10*** | 0.06** | 0.00 | 0.00 | 0.04 | 0.16*** | 0.08** | 0.08** | |
| S8 | 0.81 | 0.64 | 0.89 | 0.70 | 0.69 | 0.67 | 0.78 | 0.60 | |
| S9 | 0.81 | 0.63 | 0.88 | 0.70 | 0.70 | 0.67 | 0.80 | 0.61 | |
| Difference | 0.00 | 0.01 | 0.00 | 0.00 | 0.00 | 0.00 | −0.02 | −0.01 | |
| Outliers removal | No | 0.69 | 0.37 | 0.87 | 0.64 | 0.65 | 0.34 | 0.70 | 0.48 |
| Yes | 0.69 | 0.35 | 0.79 | 0.74 | 0.65 | 0.34 | 0.73 | 0.51 | |
| Difference | −0.01 | −0.01 | −0.08*** | 0.10*** | 0.00 | 0.01 | 0.03 | 0.04** | |
| Missing value imputation | No | 0.68 | 0.35 | 0.78 | 0.65 | 0.65 | 0.34 | 0.72 | 0.49 |
| Yes | 0.69 | 0.37 | 0.87 | 0.74 | 0.65 | 0.34 | 0.71 | 0.50 | |
| Difference | 0.01 | 0.03 | 0.09*** | 0.09** | 0.00 | 0.00 | −0.01 | 0.01 | |
| Spatial adjustment | No | 0.54 | 0.26 | 0.72 | 0.56 | 0.54 | 0.20 | 0.59 | 0.34 |
| Yes | 0.84 | 0.45 | 0.94 | 0.82 | 0.75 | 0.48 | 0.83 | 0.65 | |
| Difference | 0.31*** | 0.19*** | 0.23*** | 0.26*** | 0.21*** | 0.28*** | 0.24*** | 0.31*** | |
| S8 | 0.85 | 0.47 | 0.94 | 0.88 | 0.75 | 0.49 | 0.84 | 0.70 | |
| S9 | 0.85 | 0.46 | 0.95 | 0.87 | 0.76 | 0.47 | 0.86 | 0.68 | |
| Difference | 0.00 | 0.01 | −0.01 | 0.01 | 0.00 | 0.01 | −0.03 | 0.02 | |
G-BLUEs correlation between two experiments on the same population (ρ) of chickpea (CP) and sorghum (SG), for the traits LA3D and PH, obtained with (denoted as “yes”) and without (denoted as “no”) the effect of each data treatment (outlier detection, missing value imputation, and spatial adjustment.
| Outlier detection | No | 0.71 | 0.85 | 0.55 | 0.68 |
| Yes | 0.66 | 0.90 | 0.54 | 0.67 | |
| Difference | −0.04 | 0.05* | −0.01 | −0.01 | |
| Missing value imputation | No | 0.66 | 0.88 | 0.54 | 0.67 |
| Yes | 0.69 | 0.87 | 0.55 | 0.68 | |
| Difference | 0.03* | −0.01 | 0.01 | 0.01 | |
| Spatial adjustment | No | 0.65 | 0.84 | 0.53 | 0.65 |
| Yes | 0.69 | 0.90 | 0.56 | 0.69 | |
| Difference | 0.04* | 0.06* | 0.03 | 0.04* | |
| S8 | 0.72 | 0.90 | 0.57 | 0.70 | |
| S9 | 0.71 | 0.87 | 0.56 | 0.69 | |
| Difference | 0.01 | 0.03* | 0.01 | 0.01 | |
Results of the simulation using real data (SGE1 PH).
| Add | S5 | 0.99 | 0.98 | 0.96 | 0.94 | 0.92 |
| S6 | 0.98 | 0.97 | 0.95 | 0.93 | 0.91 | |
| S7 | 1 | 1 | 1 | 1 | 0.99 | |
| S8 | 0.98 | 0.98 | 0.98 | 0.98 | 0.98 | |
| S9 | 0.99 | 0.96 | 0.96 | 0.94 | 0.92 | |
| Add | S5 | 0.87 | 0.8 | 0.73 | 0.67 | 0.64 |
| S6 | 0.95 | 0.89 | 0.8 | 0.72 | 0.64 | |
| S7 | 0.87 | 0.8 | 0.73 | 0.67 | 0.64 | |
| S8 | 0.96 | 0.9 | 0.81 | 0.73 | 0.65 | |
| S9 | 0.93 | 0.8 | 0.74 | 0.68 | 0.66 | |
| Add | S5 | 0.84 | 0.68 | 0.57 | 0.48 | 0.39 |
| S6 | 0.93 | 0.79 | 0.6 | 0.47 | 0.38 | |
| S7 | 0.85 | 0.72 | 0.62 | 0.52 | 0.43 | |
| S8 | 0.94 | 0.83 | 0.68 | 0.51 | 0.43 | |
| S9 | 0.88 | 0.69 | 0.58 | 0.47 | 0.4 |
FIGURE 2Comparison of the biological growth pattern of the raw data for chickpea (CPE2) LA3D and sorghum (SGE2) PH and the series of genotypic BLUEs obtained from S5 to S9. The red line in each plot represents the average of the fitted logistic curves for all genotypes.
R2 estimates of the logistic fit for the G-BLUEs time series data obtained with strategies S5–S9 for traits, LA3D, and PH of each chickpea (CP) and sorghum (SG) experiments.
| S5 | 0.94 | 0.95 | 0.71 | 0.8 | 0.98 | 0.97 | 0.99 | 0.99 |
| S6 | 0.95 | 0.95 | 0.77 | 0.85 | 0.98 | 0.98 | 0.99 | 0.99 |
| S7 | 0.95 | 0.98 | 0.73 | 0.81 | 0.98 | 0.98 | 0.98 | 0.99 |
| S8 | 0.95 | 0.98 | 0.79 | 0.87 | 0.98 | 0.98 | 0.99 | 0.99 |
| S9 | 0.94 | 0.95 | 0.74 | 0.83 | 0.98 | 0.97 | 0.98 | 0.99 |
FIGURE 3The plots of change point analysis (CPA), illustrating the patterns of daily heritability (h2) estimate and the Clust-Dist for (A) LA3D of CPE2 and (B) PH of SGE2. The vertical red lines in the plots denote the “change points” and the annotations between two change points (i.e., within each time-window) denote the corresponding growth phases.
The degrees of freedom (Df), sum of squares (SS), percent sum of squares (SS%), mean squares (MS), and the F-value (F-val) are shown for each source of variation, obtained from the G × TW analysis of LA3D and PH of chickpea experiments (CPE1, CPE2).
| Genotypic clusters ( | 2 | 8.06E + 09 | 7.618 | 4.03E + 09 | 1046.981*** | 2 | 5.12E + 03 | 6.907 | 2.56E + 03 | 898.522*** |
| Time window (TW) | 2 | 9.24E + 10 | 87.361 | 4.62E + 10 | 12006.383*** | 2 | 6.23E + 04 | 84.098 | 3.12E + 04 | 10939.513*** |
| 4 | 2.02E + 09 | 1.91 | 5.05E + 08 | 131.258*** | 4 | 4.23E + 03 | 5.708 | 1.06E + 03 | 371.255*** | |
| Residuals | 855 | 3.29E + 09 | 3.111 | 3.85E + 06 | 855 | 2.44E + 03 | 3.286 | 2.85E + 00 | ||
| Total | 1.06E + 11 | 74,123 | ||||||||
| Genotypic clusters ( | 2 | 7.48E + 08 | 4.196 | 3.74E + 08 | 603.549*** | 2 | 7.20E + 03 | 10.186 | 3.60E + 03 | 1,002.001*** |
| Time window (TW) | 2 | 1.56E + 10 | 87.476 | 7.80E + 09 | 12,583.018*** | 2 | 5.71E + 04 | 80.867 | 2.86E + 04 | 7,955.314*** |
| 4 | 9.55E + 08 | 5.357 | 2.39E + 08 | 385.258*** | 4 | 3.25E + 03 | 4.602 | 8.13E + 02 | 226.348*** | |
| Residuals | 855 | 5.30E + 08 | 2.972 | 6.20E + 05 | 855 | 3.07E + 03 | 4.346 | 3.59E + 00 | ||
| Total | 1.78E + 10 | 70,669 | ||||||||
The degrees of freedom (Df), sum of squares (SS), percent sum of squares (SS%), mean squares (MS), and the F-value (F-val) are shown for each source of variation, obtained from the G × TW analysis of LA3D and PH of sorghum experiments (SGE1, SGE2).
| Genotypic clusters ( | 2 | 3.99E + 08 | 0.051 | 1.99E + 08 | 10.785*** | 2 | 3.21E + 04 | 0.371 | 1.61E + 04 | 58.197*** |
| Time window (TW) | 2 | 7.64E + 11 | 97.215 | 3.82E + 11 | 20,665.584*** | 2 | 8.31E + 06 | 95.947 | 4.16E + 06 | 15,063.085*** |
| 4 | 3.55E + 08 | 0.045 | 8.87E + 07 | 4.802** | 4 | 3.67E + 03 | 0.042 | 9.18E + 02 | 3.325* | |
| Residuals | 1,143 | 2.11E + 10 | 2.689 | 1.85E + 07 | 1143 | 3.15E + 05 | 3.64 | 2.76E + 02 | ||
| Total | 7.86E + 11 | 8,665,647 | ||||||||
| Genotypic clusters ( | 2 | 2.86E + 09 | 0.911 | 1.43E + 09 | 138.322*** | 2 | 3.38E + 04 | 1.153 | 1.69E + 04 | 186.857*** |
| Time window (TW) | 2 | 2.98E + 11 | 94.953 | 1.49E + 11 | 14,413.423*** | 2 | 2.74E + 06 | 93.53 | 1.37E + 06 | 15,154.881*** |
| 4 | 1.17E + 09 | 0.372 | 2.92E + 08 | 28.255*** | 4 | 5.25E + 04 | 1.789 | 1.31E + 04 | 144.959*** | |
| Residuals | 1,143 | 1.18E + 10 | 3.764 | 1.03E + 07 | 1,143 | 1.03E + 05 | 3.527 | 9.05E + 01 | ||
| Total | 3.14E + 11 | 2,932,587 | ||||||||