| Literature DB >> 35028345 |
Marcelo Oliveira Vasconcelos1,2, Luís Cavique2,3.
Abstract
This data article describes a dataset of corruption approach and possible variables related, and this dataset was created by integrating eight different systems of Brazilian federal government and Federal District. We present real data from civil servants and militaries to comply with GDPR legislation, the attributes that could identify a person were removed, making the data anonymized.Entities:
Keywords: Corruption; Data enrichment; Imbalanced learning; Public administration; Risk
Year: 2021 PMID: 35028345 PMCID: PMC8741413 DOI: 10.1016/j.dib.2021.107768
Source DB: PubMed Journal: Data Brief ISSN: 2352-3409
Integer and numeric attributes.
| Dim | Attribute | Min | 25% | Median | 75% | max | mean | std |
|---|---|---|---|---|---|---|---|---|
| E | Salary (Real BR) | 0.01 | 4651.28 | 7866.74 | 1464.63 | 40,140.21 | 9473.91 | 9752.79 |
| E | SalaryMinusTax | 0.00 | 3455.90 | 5388.11 | 8111.12 | 30,162.96 | 6725.79 | 8178.93 |
| E | QtySIGRHOff | 0.00 | 0.00 | 0.00 | 1.00 | 20.00 | 0.75 | 1.61 |
| E | QtySIAPEOff | 0.00 | 0.00 | 0.00 | 0.00 | 8.00 | 0.37 | 0.99 |
| E | QtySIGRHSIAPEOff | 0.00 | 0.00 | 0.00 | 2.00 | 20.00 | 1.12 | 1.85 |
| E | QtySIGRHfunc | 0.00 | 1.00 | 1.00 | 2.00 | 13.00 | 1.42 | 0.87 |
| E | QtySIAPEfunc | 0.00 | 0.00 | 0.00 | 0.00 | 4.00 | 0.03 | 0.23 |
| E | QtySIGRHSIAPEfunc | 0.00 | 1.00 | 1.00 | 2.00 | 13.00 | 1.45 | 0.91 |
| B | OwnershipPerc | 0.00 | 0.00 | 0.00 | 0.00 | 100.00 | 9.03 | 22.54 |
| B | QtFirmActivities | 0.00 | 0.00 | 0.00 | 1.00 | 11.00 | 0.79 | 1.81 |
| B | DaysOwnership | 0.00 | 0.00 | 0.00 | 499.00 | 43,795.00 | 917.40 | 2054.12 |
Brazilian currency - Real.
Categorical attributes.
| Dim | Attribute | Number of Categories | N° of examples |
|---|---|---|---|
| P | ElectivePosition | 8 | 1317 |
| P | CodParty | 32 | 1317 |
| P | CandElectiveStatus | 5 | 1317 |
| P | CandEducationLevel | 6 | 1317 |
| P | CandMaritalStatus | 5 | 1317 |
| P | CodRoundStatus | 7 | 1317 |
| B | TypeOfOwnership | 2 | 86,058 |
| B | CodFirmActivity | 976 | 86,058 |
| B | CodFirmLegal | 33 | 86,058 |
| B | CodFirmStatus | 5 | 86,058 |
| B | CodFirmSize | 3 | 86,058 |
| B | CodFirmTaxOption | 5 | 86,058 |
Boolean attributes.
| Dim | Attribute | True | False |
|---|---|---|---|
| C | CorruptionTG | 428 | 302,608 |
| C | CEIS | 132 | 302,904 |
| C | TCDFrestriction | 274 | 302,762 |
| C | CEPIM | 0 | 303,036 |
Fig 1Illustrates the pipeline of ETL process (extract, transform and load) from different data sources integrated into a dataset aggregated by four domains and was submitted to a pre-processing (Data Enrichment and Data Cleansing).
Attributes/features description.
| # | Attribute name | Type | Brief Description |
|---|---|---|---|
| 1 | C.CorruptionTG | Boolean | Cases of dismission by corruption, this attribute could be a target for machine learning |
| 2 | C.CEIS | Boolean | Cases of individuals or legal entities with restrictions on the right to participate in tenders or to contract with the Public Administration by sanctions |
| 3 | C.TCDFrestriction | Boolean | Cases of person who are not qualified to exercise a position in a commission or a trust function within the Public Administration of the Federal District for a period of up to eight years due to serious irregularities found by the TCDF |
| 4 | C.CEPIM | Boolean | Cases of private non-profit entities that are prevented from entering into new agreements, on lending contracts or partnership terms with the Federal Public Administration, depending on irregularities not resolved in agreements, on lending contracts or partnership terms previously signed |
| 5 | E.Salary | Numeric | Salary (Brazilian currency - Real) of the civil servant or military that included the salary received by any of the bases (SIGRH and SIAPE) or the sum of salaries in the case of civil servants who accumulate public positions as permitted by the Federal Constitution |
| 6 | E.SalaryMinusTax | Numeric | Salary with several discounts and obtained in a similar way to the "Salary" (SIGRH and SIAPE bases) |
| 7 | E.QtySIGRHOff | Int | Quantity of positions that the civil servant or military held until Nov/2020 into the SIGRH determined only with the SIGRH base. |
| 8 | E.QtySIAPEOff | Int | Quantity of positions the civil servant or military held in Public Security until Nov/2020 at SIAPE (Public Security, SIAPE) |
| 9 | E.QtySIGSIPOff | Int | Quantity of positions that the civil servant or military held until Nov/2020 in these two databases (SIGRH and SIAPE). |
| 10 | E.QtySIGRHfunc | Int | Quantity of functions that the civil servant occupied until Nov/2020 in the SIGRH (Servers, except Public Security, SIGRH) |
| 11 | E.QtySIAPEfunc | Int | Quantity of functions that the civil servant or military occupied until Nov/2020 in SIAPE (SIAPE Public Security) |
| 12 | E.QtySIGSIPfunc | Int | Quantity of functions that the civil servant or military occupied until Nov/2020 in these two databases (SIGRH and SIAPE) |
| 13 | P.ElectivePosition | Categorical | Type of electoral position that the civil servant disputed (president or vice, governor or vice, mayor, senator, councilor, federal deputy, state deputy, or district deputy) |
| 14 | P.CodParty | Categorical | Code of the party in which the server was registered for the election |
| 15 | P.CandElectiveSt | Categorical | candidate's registration status, which can assume the values 'Apt' (candidate able to go to the ballot box); 'Unfit' (candidate unfit to go to the ballot box); 'Registered' (registration of candidacy carried out, but not yet judged by the electoral body) |
| 16 | P.CandEducation | Categorical | Candidate's level of education can be defined as: non-disclosable, reads and writes, incomplete or complete elementary school, incomplete, or complete high school, and incomplete or complete higher education |
| 17 | P.CandMaritalSt | Categorical | The civil status situation of the candidate civil servant: single, married, non-disclosable, widowed, legally separated or divorced |
| 18 | P.CodRoundSt | Categorical | This attribute identifies the candidate's totalization situation in the turn that can be (elected, elected by average, elected by the electoral quotient, unelected, alternate, or null) |
| 19 | B.OwnershipPerc | Numeric | Percentage of share capital that the civil servant or military presents at Nov/2020 |
| 20 | B.TypeOfOwner | Categorical | Type of partner of a civil servant or military is registered within the company to which it belongs |
| 21 | B.QtFirmAct | Int | Number of secondary activities registered by the company in which the civil servant or military is a partner |
| 22 | B.CodFirmAct | Categorical | The main activity of the firm/company in which the civil servant or military is a partner |
| 23 | B.CodFirmLegal | Categorical | Definition of legal nature of the company in which the civil servant or military is a partner, which may be in different denominations, such as: Mixed Economy Society, Public Limited or Closed Corporation, Limited Business Society, limited partnership, or by shares, among others |
| 24 | B.CodFirmSt | Categorical | Status of the company in which the civil servant or military is a partner, among the possible alternatives there are active, null, suspended, unsuitable, or closed |
| 25 | B.CodFirmSize | Categorical | Size of the company that can be Microenterprise (ME), Small Business (EPP), medium or large depending on the gross annual turnover of the head office and its branches, or that is, the global gross revenue defined in the tax legislation |
| 26 | B.DaysOwnership | Numeric | This attribute informs the number of days that the server is a partner in the company until Nov/2020 |
| 27 | B.CodFirmTaxOpt | Categorical | This attribute informs if the company opted for the simplified taxation system - Simples Nacional - which aims to help micro and small companies concerning the payment of taxes |
Source: An extract of this table was published on Vasconcelos et al. [4], p. 512/513 https://link.springer.com/chapter/10.1007/978–3–030–86230–5_40.
The Mendeley dataset is available at.
| Subject | Information Systems and Management |
| Specific subject area | Corruption, civil servant, logistic regression |
| Type of data | Table (csv file) |
| How data were acquired | Data were acquired from eight different databases from The Brazilian Government with |
| Data format | Mixed (raw and pre-processed) |
| Parameters for data collection | This dataset corresponds to data available in November 2020 and refers to all civil servants and militaries in Federal District. |
| Description of data collection | A compilation of Brazilian government databases was used for this research and integrated eight databases from Federal Government and Federal District related by sanctions, Civil Service Payment Systems, Political and Firms/Companies. |
| Data source location | Federal District, Brazil |
| Data accessibility | Repository name: Dataset for corruption risk assessment in a Public Administration |
| Related research article | Vasconcelos, M. O., Chaim, R. M., & Cavique, L. (2021). Imbalanced Learning in Assessing the Risk of Corruption in Public Administration BT - Progress in Artificial Intelligence (G. Marreiros, F. S. Melo, N. Lau, H. Lopes Cardoso, & L. P. Reis (eds.); pp. 510–523). Springer International Publishing. |
Literature related to corruption.
| DOMAINS | LITERATURE |
|---|---|
| Corruption (C) | Hanna and Wang |
| Employment (E) | Gans-Morse et al. |
| Political (P) | Pedersen and Johannsen |
| Business (B) | Carvalho |