| Literature DB >> 35875725 |
Rathimala Kannan1, Haq'ul Aqif Abdul Halim2, Kannan Ramakrishnan3, Shahrinaz Ismail4, Dedy Rahman Wijaya5.
Abstract
Predictive maintenance employing machine learning techniques and big data analytics is a benefit to the industrial business in the Industry 4.0 era. Companies, on the other hand, have difficulties as they move from reactive to predictive manufacturing processes. The purpose of this paper is to demonstrate how data analytics and machine learning approaches may be utilized to predict production delays in a quarry firm as a case study. The dataset contains production records for six months, with a total of 20 columns for each production record for two machines. Cross Industry Standard Process for Data Mining approach is followed to build the machine learning models. Five predictive models were created using machine learning algorithms such as Decision Tree, Neural Network, Random Forest, Nave Bayes and Logistic Regression. The results show that Multilayer Perceptron Neural Network and Logistic Regression outperform other techniques and accurately predicts production delays with a F-measure score of 0.973. The quarry company's improved decision-making reducing potential production line delays demonstrates the value of this study.Entities:
Keywords: Machine Learning; Prediction models; Production delay; Quarry Industry
Year: 2022 PMID: 35875725 PMCID: PMC9287717 DOI: 10.1186/s40537-022-00644-w
Source DB: PubMed Journal: J Big Data ISSN: 2196-1115
Fig. 1Research methodology based on CRISP-DM
Fig. 2Preview of the production dataset—the raw data
Fig. 3The overall KNIME workflow for the prediction of production delay analysis
Fig. 4Pie chart on the machine operation occurrences
Fig. 5Correlation coefficients of variables
Fig. 6Boxplot illustrating basic statistics and outlier of the production dataset
Identification of optimal machine learning models from each algorithm
| Machine Learning Technique | Accuracy | StdDev | Delay | Sensitivity | Precision | F-measure |
|---|---|---|---|---|---|---|
| Decision Tree (Gain Ratio) | 0.963 | 0.014 | False | 0.98 | 0.926 | 0.952 |
| True | 0.952 | 0.988 | 0.957 | |||
| Decision Tree (Gini Index) | 0.956 | 0.008 | False | 0.961 | 0.925 | 0.942 |
| True | 0.952 | 0.976 | 0.964 | |||
| Neural Network—Multilayer perceptron (Min–Max normalization) | 0.904 | 0.015 | False | 0.784 | 0.867 | 0.86 |
| True | 0.976 | 0.952 | 0.927 | |||
| Neural Network—Multilayer perceptron (Z-score normalization) | 0.919 | 0.015 | False | 0.941 | 0.857 | 0.897 |
| True | 0.905 | 0.962 | 0.933 | |||
| Random Forest (Min–Max normalization, Gini Index) | 0.911 | 0.021 | False | 0.824 | 0.933 | 0.875 |
| True | 0.964 | 0.9 | 0.931 | |||
| Random Forest (Z-score normalization, Gini Index) | 0.948 | 0.013 | False | 0.941 | 0.923 | 0.932 |
| True | 0.952 | 0.964 | 0.958 | |||
| Random Forest (Min–Max normalization, Information Gain Ratio) | 0.881 | 0.008 | False | 0.745 | 0.927 | 0.826 |
| True | 0.964 | 0.862 | 0.91 | |||
| 0963 | 0.008 | False | 0.961 | 0.942 | 0.951 | |
| 0.964 | 0.976 | |||||
| Naïve Bayes (Z-score normalization) | 0.622 | 0.0 | False | 0 | – | – |
| True | 1 | 0.622 | 0.767 | |||
| Naïve Bayes (Min–max normalization) | 0.378 | 0.005 | False | 1 | 0.378 | 0.548 |
| True | 0 | – | – | |||
| Logistic Regression (Min–max normalization) | 0.889 | 0.025 | False | 0.745 | 0.95 | 0.835 |
| True | 0.976 | 0.863 | 0.916 | |||
| Logistic Regression (Z-score normalization) | 0.956 | 0.019 | False | 0.961 | 0.925 | 0.942 |
| True | 0.952 | 0.976 | 0.964 |
Optimal model is denoted with the bold font
Performance evaluation of optimal Machine Learning models
| Machine Learning Technique | Accuracy | Delay status | Sensitivity | precision | F-measure |
|---|---|---|---|---|---|
| Decision Tree (Gini Index) | 0.935 | False | 0.917 | 0.917 | 0.917 |
| True | 0.947 | 0.947 | 0.947 | ||
| 0.968 | False | 1 | 0.923 | 0.96 | |
| 0.947 | 1 | ||||
Random Forest (Z-score normalization, Information Gain Ratio) | 0.935 | False | 0.917 | 0.917 | 0.917 |
| True | 0.947 | 0.947 | 0.947 | ||
Naïve Bayes (Z-score normalization) | 0.613 | False | 0 | – | – |
| True | 1 | 0.613 | 0.76 | ||
| 0.968 | False | 1 | 0.923 | 0.96 | |
| 0.947 | 1 |
Optimal model is denoted with the bold font
Hyper-parameters used in the Machine learning models
| Machine learning model | Hyper-parameters |
|---|---|
| Decision tree | Quality measure/split criterion: gain ratio |
| Pruning method: no pruning | |
| Neural network—multilayer perceptron | Number of hidden layers: 1 |
| Number of hidden neurons per layer: 10 | |
| Maximum number of iterations: 100 | |
| Random forest | Quality measure / split criterion: Information Gain Ratio |
| Number of models: 100 (static random seed) | |
| Naïve Bayes | Default Probability: 0.0001 |
| Minimum standard deviation: 0.0001 | |
| Threshold standard deviation: 0.0 | |
| Logistic regression | Solver: Stochastic average gradient |