| Literature DB >> 33286593 |
Abstract
The negative impact of absenteeism on organizations' productivity and profitability is well established. To decrease absenteeism, it is imperative to understand its underlying causes and to identify susceptible employee subgroups. Most research studies apply hypotheses testing and regression models to identify features that are correlated with absenteeism-typically, these models are limited to finding simple correlations. We illustrate the use of interpretable classification algorithms for uncovering subgroups of employees with common characteristics and a similar level of absenteeism. This process may assist human resource managers in understanding the underlying reasons for absenteeism, which, in turn, could stimulate measures to decrease it. Our proposed methodology makes use of an objective-based information gain measure in conjunction with an ordinal CART model. Our results indicate that the ordinal CART model outperforms conventional classifiers and, more importantly, identifies patterns in the data that have not been revealed by other models. We demonstrate the importance of interpretability for human resource management through three examples. The main contributions of this research are (1) the development of an information-based ordinal classifier for a published absenteeism dataset and (2) the illustration of an interpretable approach that could be of considerable value in supporting human resource management decision-making.Entities:
Keywords: absenteeism; decision tree; human resource management; information gain; interpretable machine learning models; ordinal classification
Year: 2020 PMID: 33286593 PMCID: PMC7517405 DOI: 10.3390/e22080821
Source DB: PubMed Journal: Entropy (Basel) ISSN: 1099-4300 Impact factor: 2.524
Description of the dataset’s features.
| Feature Name | Feature Type | Possible Values (for Nominal Variables) |
|---|---|---|
| ID | Numerical | |
| Reason for absence | Categorical | 21 categories according to the International Code of Diseases (ICD) |
| Month of absence | Categorical | 1-January 2-February 3-March 4-April 5-May 6-June 7-July 8-August 9-September 10-October 11-November 12-December |
| Day of the week | Categorical | 2-Monday 3-Tuesday 4-Wednesday 5-Thursday 6-Friday |
| Season | Categorical | 1-summer 2-autumn 3-winter 4-spring |
| Transportation expense | Numerical | |
| Distance from residence to work (km) | Numerical | |
| Service time | Numerical | |
| Age | Numerical | |
| Workload (average daily) | Numerical | |
| Hit target | Numerical | |
| Disciplinary failure | Categorical | 1-yes 2-no |
| Education | Categorical | 1-high school 2-graduate 3-postgraduate 4-master/doctor |
| # of children | Numerical | |
| Social drinker | Categorical | 1-yes 2-no |
| Social smoker | Categorical | 1-yes 2-no |
| # of pets | Numerical | |
| Weight | Numerical | |
| Height | Numerical | |
| Body mass index | Numerical | |
| Absenteeism (hours) | Numerical |
Categorization of absenteeism classes.
| Absenteeism Hours ( | Absenteeism Class |
|
|
|
|---|---|---|---|---|
| 0 | not absent |
| 1 | 6% |
| 0 < | Hours |
| 2 | 57% |
| 8 ≤ | Days |
| 3 | 34% |
| Weeks |
| 4 | 3% |
Distribution of training dataset classes before and after Synthetic Minority Oversampling Technique (SMOTE) implementation.
| Not Absent | Hours | Days | Weeks | Total Instances | |
|---|---|---|---|---|---|
| Training before SMOTE | 6% | 57% | 34% | 3% | 592 |
| Training after SMOTE | 25% | 25% | 25% | 25% | 1360 |
Entropy and objective-based entropy (OBE) measures with selected statistics and for two different probability distributions of the absenteeism classes (“not absent”, “hours”, “days”, and “weeks”).
|
|
|
|
|
|---|---|---|---|
| (0.6,0.3,0,0.1) | 1.30 | 0.43 | 0.25 |
| (0.6,0.1,0,0.3) | 1.30 | 0.38 | 0.36 |
Average performance measures of different learning models for the absenteeism at work dataset.
| Performance Measures | |||||||
|---|---|---|---|---|---|---|---|
| F-score | Precision | Recall | Accuracy | AUC | MSE |
| |
|
| |||||||
| Extreme Gradient Boosting (XGBoost) | 0.69 | 0.72 | 0.68 | 0.68 | 0.73 | 0.32 | 0.52 |
| Multi-Layer Perceptron (MLP) | 0.42 | 0.33 | 0.57 | 0.57 | 0.50 | 0.49 | 0.40 |
| K-Nearest Neighbor | 0.56 | 0.56 | 0.56 | 0.56 | 0.60 | 0.58 | 0.35 |
| Naïve Bayes | 0.41 | 0.54 | 0.34 | 0.34 | 0.56 | 1.46 | 0.02 |
| Random Forest (RF) | 0.67 | 0.68 | 0.67 | 0.67 | 0.70 | 0.35 | 0.51 |
| CART | 0.66 | 0.66 | 0.66 | 0.66 | 0.69 | 0.36 | 0.41 |
|
| |||||||
| Ordinal CART | 0.69 | 0.70 | 0.69 | 0.69 | 0.72 |
| 0.53 |
| Ordinal CART |
|
|
|
|
| 0.34 |
|
Figure 1A comparative graph of Area Under the Curve (AUC) values (y-axis) for different learning models as a function of absenteeism classes (x-axis).
Figure 2Relationship between age and absenteeism at work for different subgroups of employees. The LHS and RHS respectively show (i) a “simple” partition by age and (ii) a series of patterns revealed by our ordinal CART model.
Figure 3Relationship between body characteristics and absenteeism at work for different subgroups of employees. The LHS and RHS respectively show (i) simple partitions by height and BMI and (ii) a more refined series of patterns revealed by the ordinal CART model.
Figure 4Relationship between workload and absenteeism at work for different subgroups of employees. The LHS and RHS, respectively, show (i) a simple partition by workload and (ii) a more refined series of patterns revealed by the ordinal CART model.
Figure 5A mechanism for guiding the selection and development of intervention programs for employee subgroups.