Literature DB >> 32083159

Data analysis on near infrared spectroscopy as a part of technology adoption for cocoa farmer in Aceh Province, Indonesia.

Purwana Satriyo1, Agus Arip Munawar1,2.   

Abstract

Presented manuscript described data analysis on near infrared spectroscopy used as adopted and portable technology for cocoa farmers in Aceh Province, Indonesia. The near infrared spectroscopy (NIRS) assisted farmers in post-harvest handling especially for cocoa quality evaluation. This technology was used to determine moisture content (MC) and fat content (FC) of intact cocoa bean samples rapidly and simultaneously. Near infrared spectra data were acquired as absorbance spectrum in wavelength range from 1000 to 2500 nm with co-added of 32 scans for a total of 72 intact bulk cocoa bean samples. Spectra data can be used to predict MC and FC of intact cocoa beans by establishing prediction models and validate with actual MC and FC measured by means of standard laboratory procedures. Prediction performances were evaluated using several statistical indicators: coefficient correlation (r), coefficient of determination (R2), root mean square error (RMSE) and residual predictive deviation (RPD) index. Near infrared spectra data can be enhanced using spectra pre-treatment methods to improve prediction performances. Moreover, prediction models can be developed using principal component regression (PCR), partial least squares regression (PLSR) and other regression approaches. Ideal prediction models should have r and R2 above 0.75, RPD index above 2.0 and RMSE lower than its standard deviation (SD). Dataset were available as raw MS Excel format and The Unscrambler files as *.unsb extension.
© 2020 The Author(s).

Entities:  

Keywords:  Cocoa; NIRS; Post-harvest; Spectroscopy; Technology

Year:  2020        PMID: 32083159      PMCID: PMC7021542          DOI: 10.1016/j.dib.2020.105251

Source DB:  PubMed          Journal:  Data Brief        ISSN: 2352-3409


Specifications Table Spectra data of intact cocoa bean samples can be used to predict several quality parameters of cocoa beans simultaneously and rapidly bypassing standard laboratory procedures. Provided dataset can be reanalysed and remodelled using different regression or spectra correction approaches. Obtained models were benefited for cocoa manufacturers and industries for fast quality inspection of their cocoa products. Spectral dataset can be corrected, enhanced using different pre-processing methods and applied onto prediction models. Data generated from adopted NIRS technology proven to be useful for cocoa farmers in evaluating quality parameters.

Data

Typical spectra data feature of intact cocoa bean sample in near infrared region is shown in Fig. 1. Spectra data can be represented as reflectance, absorbance or transmittance spectrum in a function of energy (cm−1) or wavelength (nm) of the electromagnetic radiation. Spectra data of intact cocoa bean contains important information such as moisture content, fat content, carbohydrates, fibre and other quality parameters that can be revealed through calibration using several regression approaches. There are adequate regression methods available like multiple linear regression (MLR), support vector machine regression (SVMR), artificial neural network (ANN) based regression and two most commonly used as calibration: principal component regression (PCR) and partial least squares regression (PLSR).
Fig. 1

Typical near infrared absorbance spectrum of intact cocoa bean sample.

Typical near infrared absorbance spectrum of intact cocoa bean sample. It is recommended to pre-process spectral data prior to calibration and prediction models development in order to achieve better prediction performance [3,4]. Several pre-processing techniques are widely available such as spectra normalization, de-trending (DT), standard normal variate (SNV), spectra derivatives, multiplicative scatter correction (MSC), orthogonal signal correction (OSC) and others. Sometimes, those methods are combined in order to increase and improve spectra data [5]. Chemical information buried in cocoa bean samples corresponds to inner quality parameters like fat, fibre, phenol and carbohydrates. They consisted from structured molecular bonds of C–H–O, C–H, O–H, N–H and R–O–H for alcohol. Specific wavelengths in near infrared region are corresponded to certain chemical properties and quality parameters of cocoa samples from which represents the amount of light reflected, absorbed or transmitted [6]. In NIRS applications and practices, there are some problems related to the selection of representative sample datasets used for calibration, cross validation and independent validation. Splitting a large dataset into calibration and validation subsets normally varied out by splitting onto 70:30 for those subsets respectively. NIRS users also found difficulties in identifying and detecting samples that are somehow considerably different from the majority of the remaining samples from which known as outliers. Based on literatures and common NIRS practices, outlier data can be identified by projecting spectra data onto principal component analysis (PCA) and applying Hotelling T2 ellipse as presented in Fig. 2. Data lies outside the ellipse needs to be checked due to their potential outliers.
Fig. 2

Spectra data projected onto principal component analysis (PCA) and Hotelling T2 ellipse.

Spectra data projected onto principal component analysis (PCA) and Hotelling T2 ellipse. To determine inner quality parameters in intact cocoa bean samples, NIR spectra data were regressed with actual measure moisture and fat contents using mentioned regression approaches. The two most common approaches are principal component regression (PCR) and partial least squares regression (PLSR) since these both regression methods were normally used in the emerging field of NIRS till date [7,8]. Both PCR and PLSR methods seek to find best correlation between near infrared spectra data and inner quality parameters i.e moisture content (MC) and fat content (FC) of cocoa bean samples as presented in Table 1, Table 2. To evaluate prediction performance, predicted results of moisture and fat content were then compared to the actual measured MC and FC as shown in Fig. 3 and Fig. 4 using principal component regression and partial least squares regression approaches respectively.
Table 1

Prediction performance between principal component regression (PCR) and partial least squares regression (PLSR) for moisture content prediction.

Regression approachStatistical indicators
FactorR2rRMSERPD
PCR70.820.900.542.37
PLSR70.880.940.423.05

R2: coefficient of determination, r: correlation coefficient, RMSE: the root mean square error, RPD: residual predictive deviation.

Table 2

Prediction performance between principal component regression (PCR) and partial least squares regression (PLSR) for fat content prediction.

Regression approachStatistical indicators
FactorR2rRMSERPD
PCR70.730.851.111.94
PLSR70.840.910.822.62

R2: coefficient of determination, r: correlation coefficient, RMSE: the root mean square error, RPD: residual predictive deviation.

Fig. 3

Prediction performance of moisture content (a) and fat content (b) by means of principal component regression (PCR) approach.

Fig. 4

Prediction performance of moisture content (c) and fat content (d) by means of partial least squares regression (PLSR) approach.

Prediction performance between principal component regression (PCR) and partial least squares regression (PLSR) for moisture content prediction. R2: coefficient of determination, r: correlation coefficient, RMSE: the root mean square error, RPD: residual predictive deviation. Prediction performance between principal component regression (PCR) and partial least squares regression (PLSR) for fat content prediction. R2: coefficient of determination, r: correlation coefficient, RMSE: the root mean square error, RPD: residual predictive deviation. Prediction performance of moisture content (a) and fat content (b) by means of principal component regression (PCR) approach. Prediction performance of moisture content (c) and fat content (d) by means of partial least squares regression (PLSR) approach. Prediction performance can be enhanced and improved by correcting spectral data prior to calibration development. It is strongly recommended to pre-process spectral data in order to generated and achieve more accurate and robust prediction results. Table 3 showed a comparison among different spectra correction methods in determining moisture and fat content in intact cocoa bean samples. Prediction models were established using partial least squares regression followed by cross validation method.
Table 3

Comparison among different spectra correction methods to the prediction performance of inner quality parameters in cocoa bean samples using partial least squares regression approach.

Quality parametersCorrection methodStatistical indicators
FactorR2rRMSERPD
Moisture contentMN50.900.950.403.18
Smoothing50.910.950.393.27
SNV50.910.950.383.37
EMSC50.920.950.373.46
Fat contentMN50.950.980.454.78
Smoothing50.900.950.683.16
SNV50.970.980.385.66
EMSC50.980.990.277.96

EMSC: extended multiplicative scatter correction, MN: mean normalization, R2: coefficient of determination, r: correlation coefficient, RMSE: the root mean square error, RPD: residual predictive deviation. SNV: standard normal variate.

Comparison among different spectra correction methods to the prediction performance of inner quality parameters in cocoa bean samples using partial least squares regression approach. EMSC: extended multiplicative scatter correction, MN: mean normalization, R2: coefficient of determination, r: correlation coefficient, RMSE: the root mean square error, RPD: residual predictive deviation. SNV: standard normal variate. Prediction accuracy and robustness normally improved when the models are constructed using enhanced spectral data. Yet, prediction performance may vary depends on spectra pre-processing method as shown in Fig. 5 for moisture content and fat content respectively. Spectra corrections can also be combined to increase prediction accuracy and robustness. However, combination selection must be chosen carefully to avoid overfitting and removal important information of quality attributes (see Fig. 6).
Fig. 5

Prediction performance for moisture content (MC) determination using enhanced EMSC spectra data and partial least square approach.

Fig. 6

Prediction performance for fat content (FC) determination using enhanced EMSC spectra data and partial least square approach.

Prediction performance for moisture content (MC) determination using enhanced EMSC spectra data and partial least square approach. Prediction performance for fat content (FC) determination using enhanced EMSC spectra data and partial least square approach.

Experimental design, materials, and methods

Spectra data acquisition

Near infrared spectral data of intact cocoa bean samples were acquired using a portable near infrared spectroscopy (FTIR PSD i15) in wavelength range from 1000 to 2500 nm. A total of 72 bulk intact cocoa bean amounted 50g per bulk were obtained from cocoa farmers in Pidie Jaya, Aceh Province, Indonesia. Cocoa beans were fermented for 3, 5 and 7 days in order to obtain different moisture content (MC) and fat content (FC) respectively. Spectra data were recorded as absorbance or Log (1/R) with co-added of 32 scans and resolution windows of 0.2 nm [9]. Spectra data were recorded and saved in two different file formats as *.csv and *.spa. For each spectra measurement, sample labelling was required automatically before acquisition to differ cocoa bean samples. Furthermore, predicted MC and FC of intact cocoa bean samples were obtained simultaneously by performing calibration models using principal component regression (PCR) and partial least squares regression (PLSR) followed by cross validation respectively. Data used for testing and validating models were obtained from calibration datasets from which cross validation method is employed [10].

Moisture and fat content measurements

After spectral data acquisition were completed, all cocoa bean samples were taken directly for actual inner quality parameters (moisture and fat content) measurement. Actual referenced moisture content was determined using a gravimetry method and measured in duplicate, then averaged. The moisture content of cocoa bean samples is expressed in % dry bulb. On the other hand, actual fat content was determined using Soxhlet method. Ten gram of cocoa sample was mixed in the tube with maximum of 150 ml n-hexane and extracted in Soxhlet apparatus at temperature of 95 °C for 6 h [11]. Fat content was then determined by evaporating those solvent using rotary evaporator until fat liquid is remained in the tube. Then, it dried at temperature of 105 °C for about 30 minutes. Fat content was expressed in % fat content and measured also in duplicate then averaged. Descriptive statistics of actual measured moisture and fat content of cocoa bean samples are shown in Table 4.
Table 4

Descriptive statistics of actual measured quality parameters of cocoa bean samples.

Moisture Content (%)Fat Content (%)
# of Sample7272
Mean9.0440.32
Max12.0845.75
Min6.7435.26
Range5.3410.49
Std. Deviation1.282.15
Variance1.634.64
RMS9.1240.38
Skewness0.840.10
Kurtosis0.07−0.46
Median8.7940.13
Q18.0938.55
Q39.5641.93
Descriptive statistics of actual measured quality parameters of cocoa bean samples.

Sample outlier detection

In many spectroscopy practices, there are some spectra data of related samples somehow considerably different from the majority of the remaining samples from which known as outliers. They can be found in the sample datasets used for model construction and validation, or arise among new samples during the use of those models for independent prediction. Several methods were widely employed to detect and identify outliers, among of them is the combination of principal component analysis (PCA) and Hoteling T2 ellipse. It used to define statistical boundaries assuming a normal distribution of scores of the PCA. First, raw spectral data of cocoa samples obtained from the NIRS instrument are projected onto PCA map. Then, Hoteling T2 ellipse is applied to this PCA projection. Typically, outliers are identified as samples found outside the ellipse confidence limit [12,13]. The ellipse usually established at the level of 95%. Another method that also can be used for outlier detection is the Mahalanobis distance and the spectral residual applied to raw spectra data. Raw spectra data refers to the original data before spectra correction and enhancement.

Spectra data corrections

To achieve robust and accurate prediction results, spectra data need to be corrected and pre-processed prior to prediction model development. To compensate some noises on spectra data, smoothing with Savitsky-Golay algorithm can be applied to eliminate noises. Then, mean normalization (MN) can be also a choice as spectra correction method to remove background irrelevant information and normalize spectral data. Moreover, standard normal variate (SNV) is a good choice to pre-process and enhance spectra data as this method is widely employed beside multiplicative scatter correction (MSC) either as full MSC or as extended (EMSC). Both these methods proven to be effective and improve prediction performance.

Prediction models

The main part of NIRS applications is to establish models used to predict desired quality parameters simultaneously of cocoa samples through a process called as calibration. In many NIRS cases, most classical linear algorithms such as multiple linear regression (MLR), principal component regression (PCR) and partial least squares regression (PLSR) have proven to be effective. In comparison, PLSR normally generated a better prediction results compared to the other two. Partial least square method can simultaneously decompose the spectral data (X-variable) and the actual measured quality parameters data (Y-variable). It considers the relationship between the two in the decomposition, strengthen the corresponding calculation and ensuring the best correction model [14,15]. Therefore, the PLSR method continue to be the workhorse for regression in NIRS applications. The original work of PLSR is a linear method which assuming a linear relationship of the modelled quality parameters or concentrations as a function of the near infrared spectral data variations. Weak nonlinearities may be solved by increasing the number of factor or also known as latent variables (LVs) included in the PLSR. Another most common method as NIRS user preference for constructing and developing prediction models is principal component regression (PCR). It is a similar multivariate regression method works based on principal component analysis and multiple linear regression. In recent years, prediction models in NIRS can be constructed using a non-linear regression approach like support vector regression and artificial neural networks. It was carried out due to the fact that there is a certain nonlinearity between the spectral data and the quality parameters data. Thus, nonlinear correction model must be established for the unique nonlinear characteristics of the system. The prediction performances were evaluated by means of these following statistical indicators: the coefficient of determination (R2) and correlation (r) between predicted and measured quality parameters (moisture and fat content), prediction error which is defined as the root mean square error (RMSE) and the residual predictive deviation (RPD) or some papers called it as the relative predictive determinant [8,16,17]. The RPD defined as the ratio between standard deviation (SD) of the actual measured value of moisture content (MC) and fat content (FC), and the RMSE of predicted quality parameters. Last but not least, optimum number of factor or known as latent variables (LVs) were chosen based on the minimization of RMSE during cross-validation.

Specifications Table

SubjectAgricultural and Biological SciencesPost-harvest Technology
Specific subject areaSpectroscopy, technology adoption for farmer, non-destructive technology for cocoa quality evaluation
Type of dataTableGraphSpectroscopic data
How data were acquiredNear infrared spectral data of intact cocoa bean samples were collected and acquired using a self-developed portable near infrared spectroscopy (FTIR PSD i15). A total of 72 bulk intact cocoa bean amounted 50g per bulk were obtained from cocoa farmers in Pidie Jaya, Aceh Province, Indonesia. Cocoa beans were fermented for 3, 5 and 7 days in order to obtain different moisture content (MC) and fat content (FC) respectively. Spectra data were recorded as absorbance or Log (1/R) spectrum in wavelength range from 1000 to 2500 nm with co-added of 32 scans with resolution windows of 0.2 nm. Predicted MC and FC of intact cocoa bean samples were obtained simultaneously by performing calibration models using principal component regression (PCR) and partial least squares regression (PLSR) followed by cross validation respectively. On the other hand, actual reference data of MC and FC were measured by means of standard laboratory procedures as proposed by Refs. [1,2]. Thermogravimetry and Soxhlet methods were employed to measure actual MC and FC of cocoa samples respectively. Spectra data and actual MC and FC were then combined as one table in MS Excel that can be used for further analysis.
Data formatRawAnalysedPresented as .xls and .unsb extension file formats
Parameters for data collectionIn cocoa trade markets, two main quality parameters considered are moisture content (MC) and fat content (FC). Both quality parameters were used as parameters for data collection and were predicted simultaneously using adopted NIRS technology.
Description of data collectionSpectra data were firstly subjected onto principal component analysis (PCA) and Hotelling T2 ellipse to detect potential outliers. Spectra data were then regressed with actual MC and FC to generate prediction models used to predict those both cocoa quality parameters. Predicted MC and FC were then compared with actual measured data in cross validation.
Data source locationSpectra data, actual moisture and fat content of intact cocoa beans were collected at the Department of Agricultural Engineering, Faculty of Agriculture Syiah Kuala University, Banda Aceh – Indonesia.
Data accessibilityCombined dataset are presented as MS Excel (.xlsx) and Unscrambler (.unsb) extension formats and available on this article. Dataset also can be found in Mendeley repository data: https://data.mendeley.com/datasets/7734j4fd98/1or https://doi.org/10.17632/7734j4fd98.1
Value of the Data

Spectra data of intact cocoa bean samples can be used to predict several quality parameters of cocoa beans simultaneously and rapidly bypassing standard laboratory procedures.

Provided dataset can be reanalysed and remodelled using different regression or spectra correction approaches.

Obtained models were benefited for cocoa manufacturers and industries for fast quality inspection of their cocoa products.

Spectral dataset can be corrected, enhanced using different pre-processing methods and applied onto prediction models.

Data generated from adopted NIRS technology proven to be useful for cocoa farmers in evaluating quality parameters.

  6 in total

1.  Infrared spectroscopy and outer product analysis for quantification of fat, nitrogen, and moisture of cocoa powder.

Authors:  Anezka Veselá; António S Barros; Andriy Synytsya; Ivonne Delgadillo; Jana Copíková; Manuel A Coimbra
Journal:  Anal Chim Acta       Date:  2007-08-26       Impact factor: 6.558

2.  Estimating cocoa bean parameters by FT-NIRS and chemometrics analysis.

Authors:  Ernest Teye; Xingyi Huang; Livingstone K Sam-Amoah; Jemmy Takrama; Daniel Boison; Francis Botchway; Francis Kumi
Journal:  Food Chem       Date:  2014-12-18       Impact factor: 7.514

Review 3.  Near infrared spectroscopy: A mature analytical technique with new perspectives - A review.

Authors:  Celio Pasquini
Journal:  Anal Chim Acta       Date:  2018-04-17       Impact factor: 6.558

4.  Determination of antioxidant capacity and phenolic content of chocolate by attenuated total reflectance-Fourier transformed-infrared spectroscopy.

Authors:  Yaxi Hu; Zhi Jie Pan; Wen Liao; Jiaqi Li; Pierre Gruget; David D Kitts; Xiaonan Lu
Journal:  Food Chem       Date:  2016-02-02       Impact factor: 7.514

5.  Assessment of cocoa (Theobroma cacao L.) butter content and composition throughout fermentations.

Authors:  Adrien Servent; Renaud Boulanger; Fabrice Davrieux; Marie-Neige Pinot; Eric Tardan; Nelly Forestier-Chiron; Clotilde Hue
Journal:  Food Res Int       Date:  2018-03-06       Impact factor: 6.475

6.  Near infrared spectroscopic data for rapid and simultaneous prediction of quality attributes in intact mango fruits.

Authors:  Agus Arip Munawar; Devi Wahyuni
Journal:  Data Brief       Date:  2019-11-11
  6 in total
  4 in total

1.  Dataset of adulteration with water in coconut milk using FTIR spectroscopy.

Authors:  Agustami Sitorus; Muhamad Muslih; Irwin Syahri Cebro; Ramayanty Bulan
Journal:  Data Brief       Date:  2021-04-20

2.  Calibration models database of near infrared spectroscopy to predict agricultural soil fertility properties.

Authors:  Agus Arip Munawar; Yuswar Yunus; Purwana Satriyo
Journal:  Data Brief       Date:  2020-04-08

3.  Enhanced near infrared spectral data to improve prediction accuracy in determining quality parameters of intact mango.

Authors:  Rita Hayati; Agus Arip Munawar; F Fachruddin
Journal:  Data Brief       Date:  2020-04-21

4.  Robust prediction performance of inner quality attributes in intact cocoa beans using near infrared spectroscopy and multivariate analysis.

Authors:  Rita Hayati; Zulfahrizal Zulfahrizal; Agus Arip Munawar
Journal:  Heliyon       Date:  2021-02-24
  4 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.