| Literature DB >> 32501316 |
Dana Pessach1, Gonen Singer2, Dan Avrahami1, Hila Chalutz Ben-Gal3, Erez Shmueli1, Irad Ben-Gal1.
Abstract
In this paper, we propose a comprehensive analytics framework that can serve as a decision support tool for HR recruiters in real-world settings in order to improve hiring and placement decisions. The proposed framework follows two main phases: a local prediction scheme for recruitments' success at the level of a single job placement, and a mathematical model that provides a global recruitment optimization scheme for the organization, taking into account multilevel considerations. In the first phase, a key property of the proposed prediction approach is the interpretability of the machine learning (ML) model, which in this case is obtained by applying the Variable-Order Bayesian Network (VOBN) model to the recruitment data. Specifically, we used a uniquely large dataset that contains recruitment records of hundreds of thousands of employees over a decade and represents a wide range of heterogeneous populations. Our analysis shows that the VOBN model can provide both high accuracy and interpretability insights to HR professionals. Moreover, we show that using the interpretable VOBN can lead to unexpected and sometimes counter-intuitive insights that might otherwise be overlooked by recruiters who rely on conventional methods. We demonstrate that it is feasible to predict the successful placement of a candidate in a specific position at a pre-hire stage and utilize predictions to devise a global optimization model. Our results show that in comparison to actual recruitment decisions, the devised framework is capable of providing a balanced recruitment plan while improving both diversity and recruitment success rates, despite the inherent trade-off between the two.Entities:
Keywords: Explainable artificial intelligence; Human resource analytics; Interpretable AI; Machine learning; Mathematical programming; Recruitment
Year: 2020 PMID: 32501316 PMCID: PMC7252110 DOI: 10.1016/j.dss.2020.113290
Source DB: PubMed Journal: Decis Support Syst ISSN: 0167-9236 Impact factor: 5.795
Fig. 1Literature review based on the functional dimension.
Formulation notations.
| Input parameters | |
|---|---|
| The set of candidates, | |
| The set of open positions (jobs), | |
| Equals 1 if candidate | |
| Probability for candidate | |
| Number of open jobs in position | |
| Value of a successful recruitment to position | |
| Set of candidate class types, | |
| Equals 1 if candidate | |
| Required proportion of workers of class | |
| A parameter that balances accuracy and demand objectives | |
Dimensions addressed by Formulation 1, Formulation 2.
| Dimension view | Demand | Accuracy | Diversity |
|---|---|---|---|
| Position | √( | √( | |
| Organization - total value | √( | √( | √( |
| Organization - balance across business units | √( | √( |
Fig. 2Predicted probabilities of success of assigning sixteen candidates of two types of populations to four positions. The entries are color-coded by the success probability values, green - high probability, red - low probability. (For interpretation of the references to color in this figure legend, the reader is referred to the web version of this article.)
Fig. 3Assignment of candidates to positions by four different solutions. For example, solution 1 (marked in red) suggests the following: i) recruiting 4 candidates to position 1409; ii) recruiting 6 candidates to position 1509; iii) recruiting 6 candidates to position 379 (note that none of them are of type 1); and iv) not recruiting any of the candidates to position 40 (For interpretation of the references to color in this figure legend, the reader is referred to the web version of this article.).
Illustrative example results. Entropy is used as a suitable measure for diversity in the case of more than two candidate type.
| Solution # | Description | Demand | Diversity | Average success probability | ||
|---|---|---|---|---|---|---|
| # of completely unassigned positions | Minimum proportion of type 1 population | Average position entropy | ||||
| 1 | 1 | 6 | 0 | 0.413 | 0.756 | |
| 2 | 0 | 2 | 0 | 0.703 | 0.711 | |
| 3 | 0 | 2 | 0.25 | 0.858 | 0.701 | |
| 4 | 0 | 3 | 0.33 | 0.939 | 0.667 | |
Feature summary (after data preparation procedures).
| Feature cluster | Lifestyle | Family | Interview and test scores | Special interview scores | Education | Position | Nationality | Language | Residence | Culture | Background record | Gender | Age |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| # of Features | 62 | 30 | 29 | 14 | 7 | 5 | 5 | 5 | 2 | 2 | 1 | 1 | 1 |
| Avg. rank by GBM importance | 12 | 11 | 2 | 9 | 3 | 1 | 7 | 8 | 5 | 6 | 4 | 10 | 13 |
Target definitions by HR department.
| Employment status | Completed expected time in position (Position dependent) | Reason for leaving | HR term | Target feature label |
|---|---|---|---|---|
| Left the organization | Yes | “Natural” reasons | Turnover | Successful recruitment |
| Left the organization | Yes | Negative reasons | Turnover | Unsuccessful recruitment |
| Left the organization | No | Negative reasons | Turnover | Unsuccessful recruitment |
| Employed in the organization | Yes | – | Retention | Successful recruitment |
| Employed in the organization | No | – | – | – |
| Employed in the organization | No | Promotion or job enrichment | Position change - promotion | Successful recruitment |
| Employed in the organization | No | Negative reasons | Position change demotion | Unsuccessful recruitment |
Evaluation of models.a
| Explainability/interpretability | AUC results over all test samples | AUC results by each position | ||
|---|---|---|---|---|
| Number of positions with AUC > 0.7 | Average rank over all positions (1 - highest) | |||
| GBM (gradient boosting) | No | 0.730 | 174 | 3.07 |
| RF (random forest) | No | 0.719 | 164 | 3.39 |
| VOBN (variable-order Bayesian networks) | Yes | 0.705 | 145 | 3.78 |
| LR (logistic regression) | Partial | 0.700 | 129 | 4.30 |
| SVM (support vector machine) | No | 0.697 | 100 | 5.16 |
| C45 (J48) | Yes | 0.682 | 103 | 5.37 |
| CHAID | Yes | 0.681 | 105 | 5.12 |
| Naive Bayes | Yes | 0.677 | 80 | 5.81 |
| CART | Yes | 0.644 | 7 | 7.92 |
Note that both the RF and GBM models and their implementations are generally robust to noisy and high dimensionality datasets, since they base their decisions on multiple permutations of the dataset (see [56,[66], [67], [68], [69]]). For the logistic regression and decision tree models, we implemented a feature selection preprocess by using information gain analysis (see [70]). For the SVM model, we used the built-in model as implemented in [71], that can deal with high dimensionality by testing different subsets of the data. In the VOBN model, there is a built-in preprocess procedure that uses mutual information to identify the high-impact features (see the Appendix A for further details).
We consider interpretable and non-interpretable models based on the classification presented in [72].
These results show the AUC for each position in the organization. The AUC scores were calculated over all the candidates that were recruited and placed in specific positions.
Out of 456 positions.
For each position, the compared algorithms were ranked by the AUC score —the values in this column represent the average rank for each algorithm over all positions. A lower rank implies a better average AUC score.
Fig. 4High analytical score effect on administrator position dropout rate over various subpopulations.
Fig. 5The effect of competencies on the position dropout rate for all positions and for a specific field support position.
Fig. 6The relationship between poor-skill levels and position dropout rates for male candidates in different business units.
Fig. 7The potential effect of the candidates' cultural background (A or B) on the dropout rate for all positions and for a specific administrative office position.
Fig. 8The effect of the oral language score on turnover changes for specific subpopulations of candidates.
Results for a yearly plan.
| # | Method | Accuracy | Diversity | ||||
|---|---|---|---|---|---|---|---|
| Solution | Diversity requirement (PR) | Average of Predicted success probability | Standard deviation over the mean probabilities of all positions | Minimum proportion of type 1 population | Average position entropy | Mean difference in accuracy between candidate croups | |
| 1.1 | Actual selection | – | 0.7087 | 0.1602 | 0 | 0.718 | 9.67% |
| 2.1 | 0 | 0.7654 | 0.1434 | 0.0057 | 0.619 | 7.43% | |
| 2.2 | 0.1 | 0.7653 | 0.1431 | 0.1003 | 0.653 | 7.33% | |
| 2.3 | 0.2 | 0.7648 | 0.1427 | 0.2 | 0.700 | 6.07% | |
| 2.4 | 0.3 | 0.7638 | 0.1422 | 0.3005 | 0.780 | 4.86% | |
| 2.5 | 0.4 | 0.7623 | 0.1414 | 0.4 | 0.810 | 3.69% | |
| 2.6 | 0.5 | 0.7607 | 0.1407 | 0.5 | 0.864 | 2.33% | |
Fig. 9Pareto efficiency for a yearly plan of the real-world scenario.
∑ ∑
|
∑ ∑
|
Notations.
| Notation | |
|---|---|
| L | Depth of the complete and balanced tree |
| R | Minimal frequency of samples per leaf for statistically significance. |
| s | Pattern, define by series of variable of the parent node |
| sb | Pattern define the descendent leaf, by the series of the variables of the parent |
| x | The value of the target variable. In our case, x ∈ X {turnoverFalse, turnoverTrue) |
| X | Finite set for the target variable X {turnoverFalse, turnoverTrue) |
| Number of samples with the value | |
| The (ideal) code length difference between the descendent node | |
| The estimated conditional probability for getting the value | |
| The estimated conditional probability for getting the symbol | |
| d | The size of the finite set |
| C | The pruning constant tuned to process requirements (with default |
| t | The pattern size of an examined node (depth of leaf). |