| Literature DB >> 32751425 |
Fanglin Mu1, Yu Gu1, Jie Zhang2, Lei Zhang1.
Abstract
In this study, an electronic nose (E-nose) consisting of seven metal oxide semiconductor sensors is developed to identify milk sources (dairy farms) and to estimate the content of milk fat and protein which are the indicators of milk quality. The developed E-nose is a low cost and non-destructive device. For milk source identification, the features based on milk odor features from E-nose, composition features (Dairy Herd Improvement, DHI analytical data) from DHI analysis and fusion features are analyzed by principal component analysis (PCA) and linear discriminant analysis (LDA) for dimension reduction and then three machine learning algorithms, logistic regression (LR), support vector machine (SVM), and random forest (RF), are used to construct the classification model of milk source (dairy farm) identification. The results show that the SVM model based on the fusion features after LDA has the best performance with the accuracy of 95%. Estimation model of the content of milk fat and protein from E-nose features using gradient boosting decision tree (GBDT), extreme gradient boosting (XGBoost), and random forest (RF) are constructed. The results show that the RF models give the best performance (R2 = 0.9399 for milk fat; R2 = 0.9301 for milk protein) and indicate that the proposed method in this study can improve the estimation accuracy of milk fat and protein, which provides a technical basis for predicting the quality of milk.Entities:
Keywords: electronic nose; milk; quality estimation; source identification
Mesh:
Year: 2020 PMID: 32751425 PMCID: PMC7435658 DOI: 10.3390/s20154238
Source DB: PubMed Journal: Sensors (Basel) ISSN: 1424-8220 Impact factor: 3.576
Figure 1Structure diagram of E-nose system.
Gas sensor information in E-nose system.
| No. | Sensor | Sensitive Substance |
|---|---|---|
| 1 | TGS2600 | Polluting gas |
| 2 | TGS822 | Volatile substances of alcohol and organic solvents |
| 3 | TGS2611 | Methane gas |
| 4 | TGS826 | Ammonia |
| 5 | TGS2602 | Volatile organic compounds (VOC), benzene |
| 6 | TGS832 | Freon gas |
| 7 | TGS2620 | Alcohol, carbon monoxide, other volatile organic vapors |
Figure 2E-nose detection structure.
Figure 3Response curve and radar chart for E-nose data: (a–c) response curve of E-nose; (d) radar chart of E-nose.
Figure 4Visualization of data dimensionality reduction: (a) Daily Herd Improvement (DHI) data dimension reduction results by Principal Component Analysis (PCA); (b) E-nose data dimension reduction results by PCA; (c) Fusion data reduction results by PCA; (d) DHI data dimension reduction results by Linear Discriminant Analysis (LDA); (e) E-nose data dimension reduction results by LDA; (f) Fusion data reduction results by LDA.
Accuracy (mean of five-fold cross-validation) in milk source identification based on PCA and LDA (%).
| Features | SVM | RF | LR | ||||
|---|---|---|---|---|---|---|---|
| Train | Test | Train | Test | Train | Test | ||
| DHI | PCA | 19.50 | 15.50 | 17.63 | 18.50 | 19.88 | 18.00 |
| LDA | 57.75 | 58.50 | 52.13 | 53.50 | 53.38 | 56.00 | |
| E-nose | PCA | 56.25 | 59.50 | 71.62 | 70.50 | 62.00 | 65.00 |
| LDA | 85.75 | 85.00 | 82.13 | 80.50 | 84.38 | 81.50 | |
| Fusion | PCA | 41.50 | 45.00 | 53.38 | 51.50 | 39.75 | 34.50 |
| LDA | 95.50 | 95.00 | 92.50 | 94.00 | 93.50 | 92.50 | |
Estimation models for fat content based on three algorithms.
| Model | Training Set | Testing Set | ||||
|---|---|---|---|---|---|---|
|
|
|
|
|
|
| |
| GBDT | 0.3267 | 0.1907 | 0.7201 | 0.3245 | 0.1926 | 0.7172 |
| XGBoost | 0.1063 | 0.0241 | 0.9645 | 0.1487 | 0.0573 | 0.9158 |
| RF | 0.1046 | 0.0253 | 0.9627 | 0.1253 | 0.0410 | 0.9399 |
Estimation models for protein content based on three algorithms.
| Model | Training Set | Testing Set | ||||
|---|---|---|---|---|---|---|
|
|
|
|
|
|
| |
| GBDT | 0.1773 | 0.0498 | 0.7003 | 0.1770 | 0.0501 | 0.6985 |
| XGBoost | 0.0616 | 0.0071 | 0.9572 | 0.0766 | 0.0123 | 0.9257 |
| RF | 0.0488 | 0.0052 | 0.9687 | 0.0607 | 0.0116 | 0.9301 |
Figure 5Model estimation error: (a) model errors for fat; (b) model errors for protein.