| Literature DB >> 28119705 |
Gustavo A Lobos1, Carlos Poblete-Echeverría2.
Abstract
This article describes public, free software that provides efficient exploratory analysis of high-resolution spectral reflectance data. Spectral reflectance data can suffer from problems such as poor signal to noise ratios in various wavebands or invalid measurements due to changes in incoming solar radiation or operator fatigue leading to poor orientation of sensors. Thus, exploratory data analysis is essential to identify appropriate data for further analyses. This software overcomes the problem that analysis tools such as Excel are cumbersome to use for the high number of wavelengths and samples typically acquired in these studies. The software, Spectral Knowledge (SK-UTALCA), was initially developed for plant breeding, but it is also suitable for other studies such as precision agriculture, crop protection, ecophysiology plant nutrition, and soil fertility. Various spectral reflectance indices (SRIs) are often used to relate crop characteristics to spectral data and the software is loaded with 255 SRIs which can be applied quickly to the data. This article describes the architecture and functions of SK-UTALCA and the features of the data that led to the development of each of its modules.Entities:
Keywords: collinearity; noise; outlier; phenomic; phenotyping; scan; spectral reflectance index (SRI); wavelength
Year: 2017 PMID: 28119705 PMCID: PMC5220079 DOI: 10.3389/fpls.2016.01996
Source DB: PubMed Journal: Front Plant Sci ISSN: 1664-462X Impact factor: 5.753
Nomenclature related to spectrometer data collection.
| Plot | Land area where a single genotype is growing; in other studies it could be considered as a replication. | |
| Scan | Data collection, spectrum or spectra collection, scanning | Action oriented to collect the spectrum or spectra by one scan or shoot (informal terminology). |
| Samples | Sample spectra, samples of scan, scanned samples, scanned data, artifacts, features | Some spectrometers are able to register several samples within the same scan; number of spectral signatures captured per scan. |
| Integrations per sample | Spectrum average or averaging | Integration of spectra within the same sample. |
Main SK-UTALCA functionalities according to the program menu.
| Input and output of information | Import X and Y data | Spectral data (X) | Import spectral data: first column or row (depends on the equipment) must include the assessed wavelengths. |
| Samples per plot | Indicate samples per scan (definitions in Table | ||
| Transpose data | Software works only with wavelengths as columns; the user will be able to transpose their data. | ||
| Response variable (Y) | Import response variables data (on columns) where the three first columns must be codes (free criteria). | ||
| Export data | Average | It is possible to export the average of the samples per scan or each sample individually. | |
| Empty data | Data can be exported including or excluding cells deleted during the cleaning of the data matrix. | ||
| Cleaning data matrix | Noise analysis | Wavelength segments | Ten different segments to analyze in relation to the percentage change among a determined neighbor size. |
| Noise elimination can be applied equally to all data (Group) or for each sample (Individual). Additionally, negative values can be also deleted. | |||
| Scan analysis | Maximum variation coefficient | Criteria to select samples within a same scan where the variation coefficient, at any wavelength, is lower than the established threshold (Scans without problems) and those that exceeded it (Scans with problems). | |
| Samples to delete | If there are inconsistencies in one or more samples within the same scan, it is possible to select and delete them. | ||
| Outlier analysis | Through a graphical analysis of the cloud of data points (response variable vs. SRI), it is possible to detect those out of range, identify the source of the problem and delete them in the case of clear evidence of a mistake. | ||
| Preliminary analysis | Collinearity analysis | For a given response variable, through linear or artificial neural network (ANN) analysis, it is possible to identify wavelengths without collinearity. | |
| Individual wavelength analysis | Through different regression models and statistical parameters, it is possible to identify wavelengths better associated with a given response variable. | ||
| SRI analysis | Full report | Through different regression models and a coefficient of determination threshold, it is possible to identify SRIs that are better associated with a given response variable. The software will be launched with a database of 255 SRIs (Supplementary Table | |
| Detailed index report | For subsequent graphical representation it is possible to export, for each genotype or measurement, individual values of SRIs and response variables. |
Figure 1Main screen divided horizontally into three sections: analysis, input data, and command history. Screen shows loaded databases (spectral and response variable data files); the transpose data option is also available for the spectral matrix.
Figure 2Example of noise analysis showing 400 scans (x 3 samples ea.) prior to (A) and after (B) the noise filter was applied. On both windows, red crosses (top) show where the maximum percentage of variations was exceeded and black crosses (top) where both criteria (% and the number of neighbors) were detected.
Figure 3Example of scan analysis. The software divided the scans or plots between those that did not surpass the maximum accepted variation coefficient (Scans without problems) and those where it was exceeded (Scans with problems). Scan 399 was selected, and its first sample (red) was identified for deletion (Apply filter).
Figure 4Example of outlier analysis showing four scatterplot graphs (NDVI, SR, PRI and WI vs. Yield) (A). Using the Edit function, NDVI vs. Yield was used to select scans with NDVI values below 0.31 (B) for deletion (C).
Figure 5Example of collinearity analysis for deletion of wavelengths delivering the same predictive information for Yield. The analysis, considering a linear regression method (R square cutoff = 0.95), selected 131 wavelengths without collinearity.
Figure 6Example of the individual wavelength analysis module. The relationships were searched by considering Yield and a coefficient of determination higher than 0.3 (A). Results were plotted for visual analysis (B) and exported to a spreadsheet (C).
Figure 7Example of the SRI analysis module. Three regression models were selected to search for SRI and response variables with a minimum adjusted coefficient of determination of 0.25 (A). The exported file shows the adjusted coefficient of determination for the best approximation (Best) and for each selected regression model (B). When a detailed report is required (C), the SRI value for each scan is calculated automatically (D).