| Literature DB >> 32836592 |
Jorge Gallego1, Gonzalo Rivero2, Juan Martínez3.
Abstract
Is it possible to predict malfeasance in public procurement? With the proliferation of e-procurement systems in the public sector, anti-corruption agencies and watchdog organizations have access to valuable sources of information with which to identify transactions that are likely to become troublesome and why. In this article, we discuss the promises and challenges of using machine learning models to predict inefficiency and corruption in public procurement. We illustrate this approach with a dataset with more than two million public procurement contracts in Colombia. We trained machine learning models to predict which of them will result in corruption investigations, a breach of contract, or implementation inefficiencies. We then discuss how our models can help practitioners better understand the drivers of corruption and inefficiency in public procurement. Our approach will be useful to governments interested in exploiting large administrative datasets to improve the provision of public goods, and it highlights some of the tradeoffs and challenges that they might face throughout this process.Entities:
Keywords: Corruption; Forecasting; Inefficiency; Machine learning; Public procurement
Year: 2020 PMID: 32836592 PMCID: PMC7368425 DOI: 10.1016/j.ijforecast.2020.06.006
Source DB: PubMed Journal: Int J Forecast ISSN: 0169-2070
Overlap between the data provided by the Contraloría and Confecámaras.
| In | Not in | |
|---|---|---|
| In | 83 | 38,677 |
| Not in | 23,505 | 2,179,006 |
Values used for the grid search in hyper-parameter optimization.
| Model | Parameter name | Values |
|---|---|---|
| Lasso | ||
| GBM | Number of trees | 30 regularly-spaced points between 10 and 3,000 |
| Maximum tree depth | ||
| Shrinkage | 0.05, 0.01 | |
| Minimum node size | 25 | |
Number of observations in each of the outcomes.
| Negatives | 2,217,692 | 2,202,513 | 1,989,784 |
| Positives | 23,579 | 38,758 | 251,487 |
Fig. 1ROC curves for all models using the three outcome variables.
Fig. 2Precision–recall curves for all models using the three outcome variables.
Performance of the models.
| Model | Outcome | Unbalance | MAP100 | MAP1000 | NDCG100 | NDCG1000 | Brier | |
|---|---|---|---|---|---|---|---|---|
| GBM | Confecámaras | Raw | No | 0.23 | 0.11 | 0.64 | 0.57 | 0.05 |
| GBM | Confecámaras | Upsample | No | 0.28 | 0.11 | 0.66 | 0.57 | 0.01 |
| GBM | Contraloría | Raw | No | 0.61 | 0.22 | 0.83 | 0.63 | 0.06 |
| GBM | Contraloría | Upsample | No | 0.91 | 0.23 | 0.97 | 0.65 | 0.01 |
| GBM | Extension | Raw | No | 0.79 | 0.49 | 0.91 | 0.76 | 0.20 |
| GBM | Extension | Upsample | No | 0.94 | 0.57 | 0.98 | 0.80 | 0.08 |
| Lasso | Confecámaras | Raw | No | 0.20 | 0.11 | 0.62 | 0.56 | 0.05 |
| Lasso | Confecámaras | Upsample | No | 0.22 | 0.12 | 0.64 | 0.57 | 0.01 |
| Lasso | Contraloría | Raw | No | 0.22 | 0.13 | 0.64 | 0.57 | 0.09 |
| Lasso | Contraloría | Upsample | No | 0.32 | 0.13 | 0.69 | 0.58 | 0.02 |
| Lasso | Extension | Raw | No | 0.56 | 0.33 | 0.79 | 0.68 | 0.23 |
| Lasso | Extension | Upsample | No | 0.59 | 0.39 | 0.80 | 0.71 | 0.09 |
Fig. 3ROC curves for all models using the three outcome variables.
Fig. 4Precision–recall curves for all models using the three outcome variables.
Performance of the models.
| Model | Outcome | Unbalance | MAP100 | MAP1000 | NCDG100 | NCDG1000 | Brier | |
|---|---|---|---|---|---|---|---|---|
| GBM | Confecámaras | Raw | No | 0.18 | 0.07 | 0.62 | 0.54 | 0.02 |
| GBM | Confecámaras | Upsample | No | 0.25 | 0.07 | 0.66 | 0.55 | 0.00 |
| GBM | Confecámaras | Raw | Yes | 0.17 | 0.06 | 0.60 | 0.54 | 0.02 |
| GBM | Confecámaras | Upsample | Yes | 0.22 | 0.06 | 0.65 | 0.54 | 0.00 |
| GBM | Contraloría | Raw | No | 0.73 | 0.32 | 0.87 | 0.68 | 0.06 |
| GBM | Contraloría | Upsample | No | 0.92 | 0.36 | 0.95 | 0.71 | 0.01 |
| GBM | Contraloría | Raw | Yes | 0.77 | 0.30 | 0.90 | 0.68 | 0.08 |
| GBM | Contraloría | Upsample | Yes | 0.92 | 0.27 | 0.97 | 0.67 | 0.01 |
| GBM | Extension | Raw | No | 0.55 | 0.47 | 0.78 | 0.74 | 0.13 |
| GBM | Extension | Upsample | No | 0.73 | 0.52 | 0.88 | 0.77 | 0.05 |
| GBM | Extension | Raw | Yes | 0.68 | 0.41 | 0.85 | 0.72 | 0.10 |
| GBM | Extension | Upsample | Yes | 0.82 | 0.43 | 0.92 | 0.74 | 0.05 |
| Lasso | Confecámaras | Raw | No | 0.15 | 0.05 | 0.58 | 0.53 | 0.02 |
| Lasso | Confecámaras | Upsample | No | 0.20 | 0.06 | 0.62 | 0.54 | 0.00 |
| Lasso | Confecámaras | Raw | Yes | 0.08 | 0.04 | 0.55 | 0.52 | 0.02 |
| Lasso | Confecámaras | Upsample | Yes | 0.11 | 0.04 | 0.56 | 0.52 | 0.00 |
| Lasso | Contraloría | Raw | No | 0.28 | 0.21 | 0.65 | 0.61 | 0.10 |
| Lasso | Contraloría | Upsample | No | 0.50 | 0.25 | 0.75 | 0.64 | 0.02 |
| Lasso | Contraloría | Raw | Yes | 0.52 | 0.18 | 0.76 | 0.60 | 0.07 |
| Lasso | Contraloría | Upsample | Yes | 0.55 | 0.18 | 0.82 | 0.61 | 0.01 |
| Lasso | Extension | Raw | No | 0.40 | 0.32 | 0.73 | 0.67 | 0.15 |
| Lasso | Extension | Upsample | No | 0.49 | 0.34 | 0.77 | 0.68 | 0.05 |
| Lasso | Extension | Raw | Yes | 0.51 | 0.29 | 0.75 | 0.66 | 0.13 |
| Lasso | Extension | Upsample | Yes | 0.55 | 0.30 | 0.78 | 0.67 | 0.05 |
Fig. 5Distribution of variable importance.
Fig. 6Important variables.
Fig. 7Partial effects: Size of the contract.
Fig. 8Partial effects: Location of the contract.