| Literature DB >> 36177320 |
Zhiheng Zhang1, Yong Ma1, Yongjun Hua2.
Abstract
In recent years, there have been frequent incidents of financial fraud committed through various means. How to more efficiently identify financial fraud and maintain capital market order is a problem that scholars from all walks of life are discussing and urgently seeking to resolve. In this study, a financial fraud identification model is constructed based on the stacking ensemble learning algorithm, and the text of the management discussion and analysis (MD&A) chapter in annual reports is introduced based on financial and nonfinancial variables, using sentiment polarity, emotional tone, and text readability as text variables. The results show that when considering financial and nonfinancial variables and introducing text variables, the recognition effect of the stacking ensemble learning model constructed in this study is significantly better than the classification results of each single classifier model. In addition, the model recognition effect is better after adding text variables. Therefore, the model is expected to provide a new and more effective method of identifying financial fraud.Entities:
Mesh:
Year: 2022 PMID: 36177320 PMCID: PMC9514921 DOI: 10.1155/2022/1780834
Source DB: PubMed Journal: Comput Intell Neurosci
Figure 1Organizational framework of the paper.
Primary financial fraud identification indicators (financial and nonfinancial indicators).
| Dimension | Indicator name |
|---|---|
|
|
|
|
| |
|
|
|
|
| |
|
|
|
|
| |
|
|
|
|
| |
|
|
|
|
| |
|
|
|
|
| |
|
|
|
|
| |
|
|
|
|
| |
|
|
|
Primary financial fraud identification indicators (text indicators).
| Dimension | Indicator | Definition |
|---|---|---|
| Sentiment polarity | Positive | Positive words/MD&A words |
| Negative | Negative words/MD&A words | |
|
| ||
| Emotional tone | Strong modal | Strong modal words/MD&A words |
| Weak modal | Weak modal words/MD&A words | |
|
| ||
| Text readability | Professional term | Professional term words/MD&A words |
| Average sentence length | Text length/sentence numbers | |
| Text length | Text length | |
Figure 2Schematic diagram of text analysis and index calculation process.
Total variance explained.
| Component | Initial eigenvalues | Extraction sums of squared loadings | ||||
|---|---|---|---|---|---|---|
| Total | % of variance | Cumulative % | Total | % of variance | Cumulative % | |
| 1 | 11.534 | 14.239 | 14.239 | 11.534 | 14.239 | 14.239 |
| 2 | 6.818 | 8.418 | 22.657 | 6.818 | 8.418 | 22.657 |
| 3 | 6.127 | 7.565 | 30.221 | 6.127 | 7.565 | 30.221 |
| 4 | 4.784 | 5.906 | 36.127 | 4.784 | 5.906 | 36.127 |
| 5 | 3.832 | 4.731 | 40.858 | 3.832 | 4.731 | 40.858 |
| ⋮ | ⋮ | ⋮ | ⋮ | ⋮ | ⋮ | ⋮ |
| 23 | 1.013 | 1.251 | 78.449 | 1.013 | 1.251 | 78.449 |
| 24 | 0.999 | 1.233 | 79.682 | |||
| ⋮ | ⋮ | ⋮ | ⋮ | |||
| 81 | 0.001 | 0.001 | 100.000 | |||
Figure 3RF algorithm schematic diagram.
Figure 4AdaBoost algorithm schematic diagram.
Figure 5Financial fraud identification model based on stacking ensemble learning.
Figure 6Model implementation process.
Confusion matrix.
| Reported | Predicted | |
|---|---|---|
| Fraud (1) | Nonfraud (0) | |
| Fraud (1) | TP | FN |
| Nonfraud (0) | FP | TN |
Comparison of identification results by introduced text information.
| Variable | Accuracy | Precision | Recall |
| AUC |
|---|---|---|---|---|---|
|
| 0.8447 | 0.8654 | 0.8333 | 0.8490 | 0.8452 |
|
| 0.8738 | 0.8571 | 0.9057 | 0.8807 | 0.8733 |
The influence of each text index on the identification results.
| Variable | Accuracy | Precision | Recall |
| AUC |
|---|---|---|---|---|---|
|
| 0.8544 | 0.8298 | 0.8478 | 0.8387 | 0.8567 |
|
| 0.8252 | 0.8200 | 0.8200 | 0.8200 | 0.8251 |
|
| 0.8641 | 0.8636 | 0.8261 | 0.8444 | 0.8604 |
Analysis based on financial and nonfinancial variables.
| Model | Accuracy | Precision | Recall |
| AUC |
|---|---|---|---|---|---|
| RF | 0.8155 | 0.8182 | 0.8333 | 0.8257 | 0.8146 |
| AdaBoost | 0.8155 | 0.8367 | 0.7885 | 0.8119 | 0.8158 |
| GBDT | 0.8058 | 0.8654 | 0.7895 | 0.8257 | 0.8057 |
| Stacking | 0.8447 | 0.8654 | 0.8333 | 0.8490 | 0.8452 |
Figure 7Analysis based on financial and nonfinancial variables.
Introduction of text variables.
| Model | Accuracy | Precision | Recall |
| AUC |
|---|---|---|---|---|---|
| RF | 0.8252 | 0.8541 | 0.7885 | 0.8200 | 0.8256 |
| AdaBoost | 0.8350 | 0.7679 | 0.9149 | 0.8350 | 0.8414 |
| GBDT | 0.8252 | 0.8333 | 0.8000 | 0.8163 | 0.8245 |
| Stacking | 0.8738 | 0.8571 | 0.9057 | 0.8807 | 0.8733 |
Figure 8Introduction of text variables.