| Literature DB >> 36170260 |
Letícia Moreira Valle1, Stefano Giacomazzi Dantas1, Daniel Guerreiro E Silva1, Ugo Silva Dias1, Leonardo Monteiro Monasterio2,3.
Abstract
Government transparency and openness are key factors to bring forth the modernization of the state. The combination of transparency and digital information has given rise to the concept of Open Government, that increases citizen understanding and monitoring of government actions, which in turn improves the quality of public services and of the government decision making process. With the goal of improving legislative transparency and the understanding of the Brazilian regulatory process and its characteristics, this paper introduces RegBR, the first national framework to centralize, classify and analyze regulations from the Brazilian government. A centralized database of Brazilian federal legislation built from automated ETL routines and processed with data mining and machine learning techniques was created. Our framework evaluates different NLP models in a text classification task on our novel Portuguese legal corpus and performs regulatory analysis based on metrics that concern linguistic complexity, restrictiveness, law interest, and industry-specific citation relevance. Our results were examined over time and validated by correlating them with known episodes of regulatory changes in Brazilian history, such as the implementation of new economic plans or the emergence of an energy crisis. Methods and metrics proposed by this framework can be used by policy makers to measure their own work and serve as inputs for future studies that could analyze government changes and their relationship with federal regulations.Entities:
Mesh:
Year: 2022 PMID: 36170260 PMCID: PMC9518867 DOI: 10.1371/journal.pone.0275282
Source DB: PubMed Journal: PLoS One ISSN: 1932-6203 Impact factor: 3.752
Fig 1RegBR ETL pipeline.
Fig 2CNAE’s structure example—Human health and social services.
Final classes.
| Class | Definition |
|---|---|
| 1 | Agriculture, livestock and forest |
| 2 | Extractive industry |
| 3 | Transformation industry |
| 4 | Electricity and gas |
| 5 | Water, sewage and waste |
| 6 | Construction |
| 7 | Commerce, accommodation & food and real estate services |
| 8 | Transportation, storage and mail |
| 9 | Information and communication |
| 10 | Financial, insurance and related services |
| 11 | Professional, scientific and technical activities |
| 12 | Administrative activities and complementary services |
| 13 | Public administration, defense and social security |
| 14 | Education |
| 15 | Human health and social service |
| 16 | Arts, culture, sports and recreation |
| 17 | Other services |
| 18 | Non-regulatory |
Hyperparameters used to generate the results.
| Models | Hyperparameters |
|---|---|
|
| Maximum number of features = 10,000 / n-gram range = (1,1) /no constraints in the frequency of words |
|
| Regularization coefficient = 2 |
|
| Regularization coefficient = 1 |
|
| Over-sampling using SMOTE and cleaning using Tomek links with default parameters |
|
| Linear Kernel, Regularization coefficient = 0.5 |
|
| XGBoost with 100 estimators |
|
| LSA with 500 topics / SVM with linear kernel with regularization coefficient = 1 |
|
| 300 dimensions |
|
| 300 dimensions |
|
| Embedding layer followed by a bi-directional with 64 hidden unitsfollowed by a fully connected layer with 256 units with dropout = 0.1and a final layer with 64 hidden units. |
|
| Embedding layer followed by 4 convolutional layers with 36 filters ofvarying kernel size (1,2,3,5) followed by a fully connected layerwith 144 hidden units with dropout = 0.1 |
|
| BERT pre-trained (12 layers with 110 million parameters for the base model,and 24 layers with 335 million parameters for the large model)using Portuguese corpus [ |
|
| One AWD-LSTM layer [ |
|
| Fully connected network with 25 hidden units in the first layer followed bybatch normalization, dropout = 0.25 and a final layer with 18 units |
Normative acts information.
| Type | Normative act in Portuguese | Normative act in English |
|---|---|---|
| 1 |
| Constitutional Amendment |
| 2 | Laws | |
| 3 |
| Decree Law |
| 4 | Provisional measure | |
| 5 |
| Supplementary Law |
| 6 |
| Decree |
| 7 |
| Resolution |
| 8 |
| Ordinance |
Classification results with all data.
| Models | Accuracy | Average F1-score |
|---|---|---|
| Logistic Regression (LR) | 62.64 ± 1.03% | 0.571 ± 0.093 |
| Ridge Classifier (RC) | 63.77 ± 0.94% | 0.59 ± 0.006 |
| LR | 63.57 ± 0.83% | 0.597 ± 0.009 |
| SVM | 63.96 ± 1.19% | 0.592 ± 0.013 |
| XGBoost | 60.94 ± 0.79% | 0.553 ± 0.013 |
|
| 63.59 ± 0.82% |
|
| Ensemble |
| 0.592 ± 0.10 |
| SVM | 59.50 ± 4.7% | 0.538 ± 0.045 |
| LSTM | 57.08 ± 1.02% | 0.5043 ± 0.03 |
| CNN | 59.99 ± 0.47% | 0.541 ± 0.072 |
| LSTM | 57.48 ± 0.38% | 0.5151 ± 0.088 |
| CNN | 59.48 ± 0.52% | 0.543 ± 0.064 |
| BERT | 61.84 ± 0.85% | 0.551 ± 0.024 |
| BERT | 48.70 ± 1.19% | 0.382 ± 0.067 |
| ULM-FiT | 55.29 ± 1.03% | 0.526 ± 0.055 |
| FC Neural Network | 58.58 ± 0.9% | 0.541 ± 0.011 |
Classification results with data post-1964.
| Models | Accuracy | Average F1-score |
|---|---|---|
| Logistic Regression (LR) | 65.93 ± 1.25% | 0.575 ± 0.015 |
| Ridge Classifier (RC) | 67.97 ± 0.95% | 0.612 ± 0.015 |
| LR | 66.15 ± 0.011% | 0.609 ± 0.013 |
| SVM | 67.72 ± 1.31% |
|
| XGBoost | 63.91 ± 1.11% | 0.568 ± 0.015 |
|
| 64.96 ± 1.83% | 0.591 ± 0.022 |
| Ensemble |
| 0.616 ± 0.015 |
| SVM | 61.51 ± 3.94% | 0.531 ± 0.031 |
| LSTM | 58.37 ± 1.02% | 0.521 ± 0.012 |
| CNN | 61.66 ± 0.92% | 0.565 ± 0.079 |
| LSTM | 59.78 ± 1.36% | 0.533 ± 0.014 |
| CNN | 61.21 ± 1.57% | 0.565 ± 0.016 |
| BERT | 62.21 ± 0.94% | 0.514 ± 0.061 |
| BERT | 52.72 ± 0.89% | 0.428 ± 0.056 |
| ULM-FiT | 58.14 ± 0.92% | 0.538 ± 0.033 |
| FC Neural Network | 63.66 ± 0.94% | 0.569 ± 0.020 |
Fig 3Normative acts of regulatory agencies over time.
Fig 4Regulatory stock evolution over time.
Fig 5Brazilian restrictive word count.
Fig 6String terms count.
Fig 7Industry citation relevance metric for the year of 2020.
Fig 8Industry citation relevance for eletricity, finance and transportation economic sectors.
Fig 9Average interest of normative acts on Google Trends.
Fig 10Average interest of normative acts on DOU.
Fig 11Median metrics from all sectors grouped together, a moving average of 14 years was used to smooth out the curves.
Fig 12Complexity projection, a moving average of 14 years was used to smooth out the curves.