| Literature DB >> 34945635 |
Maria Frizzarin1,2, Isobel Claire Gormley2, Alessandro Casa2, Sinéad McParland1.
Abstract
Including all available data when developing equations to relate midinfrared spectra to a phenotype may be suboptimal for poorly represented spectra. Here, an alternative local changepoint approach was developed to predict six milk technological traits from midinfrared spectra. Neighbours were objectively identified for each predictand as those most similar to the predictand using the Mahalanobis distances between the spectral principal components, and subsequently used in partial least square regression (PLSR) analyses. The performance of the local changepoint approach was compared to that of PLSR using all spectra (global PLSR) and another LOCAL approach, whereby a fixed number of neighbours was used in the prediction according to the correlation between the predictand and the available spectra. Global PLSR had the lowest RMSEV for five traits. The local changepoint approach had the lowest RMSEV for one trait; however, it outperformed the LOCAL approach for four traits. When the 5% of the spectra with the greatest Mahalanobis distance from the centre of the global principal component space were analysed, the local changepoint approach outperformed the global PLSR and the LOCAL approach in two and five traits, respectively. The objective selection of neighbours improved the prediction performance compared to utilising a fixed number of neighbours; however, it generally did not outperform the global PLSR.Entities:
Keywords: local changepoint analysis; midinfrared spectroscopy; neighbours
Year: 2021 PMID: 34945635 PMCID: PMC8700986 DOI: 10.3390/foods10123084
Source DB: PubMed Journal: Foods ISSN: 2304-8158
Mean, standard deviation (SD), median, minimum (Min), and maximum (Max) values for the 6 technological traits studied.
| Trait | Mean | SD | Median | Min | Max |
|---|---|---|---|---|---|
| RCT, min | 17.24 | 6.61 | 17.75 | 0.00 | 29.75 |
| k20, min | 4.95 | 2.88 | 4.25 | 0.00 | 13.56 |
| a30, mm | 31.78 | 15.49 | 31.71 | 0.00 | 74.12 |
| a60, mm | 32.22 | 12.50 | 30.66 | 0.00 | 66.00 |
| Casein micelle size, mm | 170.20 | 24.89 | 166.00 | 109.10 | 244.00 |
| pH | 6.66 | 0.09 | 6.65 | 6.42 | 6.93 |
Root mean square error of the validation samples (RMSEV) and correlation (r) between the true and predicted trait values using PLSR, using the global approach, the LOCAL approach, and using the local changepoint approach 1.
| Global | LOCAL | Local Changepoint | ||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Trait | MBIC 2 | SIC 2 | ||||||||||
| Spectra Standardised | Spectra Not Standardised | Spectra Standardised | Spectra Not Standardised | |||||||||
| RMSEV | r | RMSEV | r | RMSEV | r | RMSEV | r | RMSEV | r | RMSEV | r | |
| RCT, min |
|
| 6.29 | 0.35 | 6.28 | 0.45 | 6.23 | 0.44 | 6.30 | 0.42 | 6.53 | 0.40 |
| k20, min | 2.55 | 0.47 | 2.71 | 0.42 | 2.60 | 0.47 |
|
| 2.63 | 0.46 | 2.59 | 0.48 |
| a30, mm |
|
| 13.25 | 0.54 | 13.01 | 0.56 | 13.59 | 0.52 | 13.27 | 0.54 | 13.23 | 0.55 |
| a60, mm |
|
| 11.41 | 0.38 | 12.46 | 0.32 | 12.38 | 0.33 | 12.69 | 0.30 | 12.69 | 0.30 |
| CMS, mm |
|
| 25.22 | 0.21 | 25.97 | 0.16 | 25.71 | 0.17 | 25.16 | 0.22 | 26.73 | 0.12 |
| pH |
|
| 0.065 | 0.73 | 0.070 | 0.70 | 0.068 | 0.72 | 0.070 | 0.70 | 0.070 | 0.70 |
1 Bold numbers identify the lowest RMSEV and highest correlation value for each trait. 2 MBIC = modified Bayesian information criterion; SIC = Schwarz information criterion.
Root mean square error of validation sample (RMSEV) and correlation (r) between the true and predicted trait values across different LOCAL settings 1.
| Traits | 25 neigh 5 fact 2 | 25 neigh 10 fact | 25 neigh 20 fact | |||
| RMSEV | r | RMSEV | r | RMSEV | r | |
| RCT, min | 7.09 | 0.32 | 8.49 | 0.29 | 10.11 | 0.20 |
| k20, min | 3.42 | 0.27 | 3.87 | 0.26 | 4.78 | 0.19 |
| a30, mm | 15.26 | 0.46 | 19.25 | 0.30 | 22.56 | 0.25 |
| a60, mm | 13.55 | 0.30 | 15.84 | 0.22 | 18.17 | 0.15 |
| CMS, mm | 30.08 | 0.19 | 34.99 | 0.17 | 41.11 | 0.15 |
| pH | 0.083 | 0.54 | 0.094 | 0.54 | 0.110 | 0.50 |
| Traits | 50 neigh 5 fact | 50 neigh 10 fact | 50 neigh 20 fact | |||
| RMSEV | r | RMSEV | r | RMSEV | r | |
| RCT, min | 6.56 | 0.34 | 6.88 | 0.39 | 8.97 | 0.31 |
| k20, min | 2.86 | 0.39 | 3.10 | 0.37 | 4.28 | 0.25 |
| a30, mm | 13.93 | 0.52 | 14.73 | 0.49 | 17.50 | 0.43 |
| a60, mm | 12.00 | 0.38 | 13.11 | 0.38 | 15.85 | 0.31 |
| CMS, mm | 26.63 | 0.21 | 28.68 |
| 34.42 | 0.23 |
| pH | 0.074 | 0.63 | 0.070 | 0.70 | 0.083 | 0.65 |
| Traits | 100 neigh 5 fact | 100 neigh 10 fact | 100 neigh 20 fact | |||
| RMSEV | r | RMSEV | r | RMSEV | r | |
| RCT, min |
| 0.35 | 6.34 |
| 7.54 | 0.37 |
| k20, min |
|
| 2.81 | 0.41 | 3.49 | 0.34 |
| a30, mm |
| 0.54 | 13.55 |
| 16.21 | 0.44 |
| a60, mm |
| 0.43 | 11.53 |
| 13.09 | 0.44 |
| CMS, mm |
| 0.21 | 26.55 | 0.26 | 31.36 | 0.24 |
| pH | 0.092 | 0.65 |
|
| 0.074 | 0.69 |
1 Bold numbers identify the lowest RMSEV and highest correlation value for each trait. 2 neigh = number neighbours; fact = number of PLSR factors.
Figure 1Boxplots of the residuals for each trait from global, LOCAL, and local changepoint approaches according to the alternative penalties tested on spectral data either standardised or unstandardised prior to neighbour selection.
Root mean square error of validation samples (RMSEV) and correlation (r) between the true and predicted trait values for the observations with the 5% greatest Mahalanobis distance from the centre of the principal component space using global, LOCAL, and local changepoint approaches 1.
| Global | LOCAL | Local Changepoint | ||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Trait | MBIC 2 | SIC 2 | ||||||||||
| Spectra Standardised | Spectra Not Standardised | Spectra Standardised | Spectra Not Standardised | |||||||||
| RMSEV | r | RMSEV | r | RMSEV | r | RMSEV | r | RMSEV | r | RMSEV | r | |
| RCT, min |
|
| 9.39 | 0.27 | 8.23 | 0.62 | 8.23 | 0.62 | 6.87 | 0.64 | 9.43 | 0.48 |
| k20, min | 2.88 | 0.52 | 5.03 | 0.00 | 2.17 | 0.75 | 2.14 | 0.77 | 2.18 | 0.76 |
|
|
| a30, mm |
| 0.82 | 16.11 | 0.66 | 9.41 |
| 12.81 | 0.74 | 11.70 | 0.75 | 9.55 | 0.84 |
| a60, mm |
|
| 8.92 | 0.82 | 8.80 | 0.84 | 7.51 | 0.87 | 8.52 | 0.85 | 8.06 | 0.85 |
| CMS, mm | 23.75 | 0.49 | 27.06 | 0.29 | 23.47 | 0.53 | 24.50 | 0.48 | 24.84 | 0.43 |
|
|
| pH |
|
| 0.076 | 0.66 | 0.091 | 0.59 | 0.091 | 0.60 | 0.082 | 0.65 | 0.094 | 0.55 |
1 Bold numbers identify the lowest RMSEV and highest correlation value for each trait. 2 MBIC = modified Bayesian information criterion; SIC = Schwarz information criterion.
Figure 2Spectra in the centre () and on the edge () (the 5% greatest Mahalanobis distance from the centre, of the four-dimensional principal component space of standardised spectra.
Figure 3Neighbours selected according to the MBIC penalty after standardising the spectra for a target spectrum (blue square) that lies in the centre ((A) n = 133, red dots) and at the edge ((B) n = 21, red dots) of the four−dimensional principal component space. The green triangles represent the other spectra not selected as neighbours.
Median (range in parentheses) number of factors used to run the global PLSR, as well as the median (range in parentheses) number of factors used to run local changepoint PLSR. Results from the LOCAL approach are not reported, as a fixed number of PLSR factors are used for all the observations.
| Global | Local | ||||
|---|---|---|---|---|---|
| Trait | MBIC | SIC | |||
| Spectra Standardised | Spectra Not Standardised | Spectra Standardised | Spectra Not Standardised | ||
| RCT, min | 13 (13–14) | 5 (1–20) | 5 (1–20) | 4 (1–20) | 4 (1–20) |
| k20, min | 3 (3–13) | 3 (1–19) | 3 (1–18) | 2 (1–20) | 2 (1–20) |
| a30, mm | 13 (10–14) | 4 (1–20) | 3 (1–20) | 3 (1–20) | 3 (1–20) |
| a60, mm | 9 (8–9) | 3 (1–20) | 4 (1–20) | 3 (1–20) | 3 (1–20) |
| CMS, mm | 11 (9–13) | 2 (1–20) | 2 (1–20) | 2 (1–20) | 2 (1–20) |
| pH | 14 (14–14) | 9 (1–20) | 9 (1–20) | 8 (1–20) | 8 (1–20) |
Figure 4First and second principal components of the dataset, with each point representing a spectrum, and the points coloured according to the values of a30 and k20.