| Literature DB >> 35051184 |
Asoke K Nandi1,2, Kuldeep Kaur Randhawa3, Hong Siang Chua3, Manjeevan Seera4, Chee Peng Lim5.
Abstract
With the advancement in machine learning, researchers continue to devise and implement effective intelligent methods for fraud detection in the financial sector. Indeed, credit card fraud leads to billions of dollars in losses for merchants every year. In this paper, a multi-classifier framework is designed to address the challenges of credit card fraud detections. An ensemble model with multiple machine learning classification algorithms is designed, in which the Behavior-Knowledge Space (BKS) is leveraged to combine the predictions from multiple classifiers. To ascertain the effectiveness of the developed ensemble model, publicly available data sets as well as real financial records are employed for performance evaluations. Through statistical tests, the results positively indicate the effectiveness of the developed model as compared with the commonly used majority voting method for combination of predictions from multiple classifiers in tackling noisy data classification as well as credit card fraud detection problems.Entities:
Mesh:
Year: 2022 PMID: 35051184 PMCID: PMC8775357 DOI: 10.1371/journal.pone.0260579
Source DB: PubMed Journal: PLoS One ISSN: 1932-6203 Impact factor: 3.240
Two-dimensional BKS.
|
| 1 | 2 | … | |
|---|---|---|---|---|
|
| ||||
| 1 |
|
|
|
|
| 2 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Fig 1A hierarchical agent-based framework with the BKS.
Prediction outputs of Agents 1 and 2.
| Data | Actual class | Predicted Class | |
|---|---|---|---|
| Agent 1 | Agent 2 | ||
| Sample 1 | 1 | 1 | 1 |
| Sample 2 | 1 | 2 | 1 |
| Sample 3 | 2 | 2 | 2 |
| Sample 4 | 1 | 1 | 1 |
| Sample 5 | 2 | 2 | 2 |
| Sample 6 | 2 | 1 | 2 |
Creation of BKS for the classification scenario in Table 2.
| Agent 1 | Predicted Class = 1 | Predicted Class = 2 |
|---|---|---|
| Agent 2 | ||
| Predicted Class = 1 | No. of actual Class 1 samples = 2 | No. of actual Class 1 samples = 1 |
| No. of actual Class 2 samples = 0 | No. of actual Class 2 samples = 0 | |
| Predicted Class = 2 | No. of actual Class 1 samples = 0 | No. of actual Class 1 samples = 0 |
| No. of actual Class 2 samples = 1 | No. of actual Class 2 samples = 2 |
Fig 2Configuration of the hierarchical agent-based framework used in the experiments.
List and descriptions of benchmark datasets.
| Data set | Ref | Problem | Instances | Features | IR |
|---|---|---|---|---|---|
| B1 | [ | abalone-17_vs_7-8-9-10 | 2,338 | 8 | 39.3 |
| B2 | [ | abalone-20_vs_8-9-10 | 1,916 | 8 | 72.7 |
| B3 | [ | flare-F | 1,066 | 11 | 23.8 |
| B4 | [ | pima | 768 | 8 | 1.9 |
| B5 | [ | ring | 7,400 | 20 | 1.0 |
| B6 | [ | spambase | 4,597 | 57 | 1.5 |
| B7 | [ | twonorm | 7,400 | 20 | 1.0 |
| B8 | [ | winequality-red-4 | 1,599 | 11 | 29.2 |
| B9 | [ | winequality-white-3-9_vs_5 | 1,482 | 11 | 58.3 |
| B10 | [ | Credit card transactions by European cardholders | 284,807 | 30 | 577.9 |
Accuracy rates.
| Data set | BKS | Voting |
|---|---|---|
| B1 | 0.9734 | 0.9717 |
| B2 | 0.9864 | 0.9864 |
| B3 | 0.9453 | 0.9439 |
| B4 | 0.7204 | 0.7205 |
| B5 | 0.9541 | 0.9466 |
| B6 | 0.9514 | 0.9498 |
| B7 | 0.9782 | 0.9352 |
| B8 | 0.9501 | 0.9521 |
| B9 | 0.9823 | 0.9809 |
| B10 | 0.9981 | 0.9980 |
F1 scores.
| Data set | BKS | Voting |
|---|---|---|
| B1 | 0.9863 | 0.9855 |
| B2 | 0.9931 | 0.9931 |
| B3 | 0.9715 | 0.9704 |
| B4 | 0.7675 | 0.7700 |
| B5 | 0.9539 | 0.9451 |
| B6 | 0.9601 | 0.9588 |
| B7 | 0.9782 | 0.9353 |
| B8 | 0.9742 | 0.9753 |
| B9 | 0.9911 | 0.9904 |
| B10 | 0.9991 | 0.9990 |
Accuracy rates with and without noise.
| Data set | Noise | BKS | Voting |
|---|---|---|---|
| B1 | 0% | 0.9734 | 0.9717 |
| 10% | 0.9402 | 0.9263 | |
| 20% | 0.912 | 0.8923 | |
| B2 | 0% | 0.9864 | 0.9864 |
| 10% | 0.9716 | 0.9625 | |
| 20% | 0.9438 | 0.9296 | |
| B3 | 0% | 0.9453 | 0.9439 |
| 10% | 0.9405 | 0.9221 | |
| 20% | 0.9334 | 0.9027 | |
| B4 | 0% | 0.7204 | 0.7205 |
| 10% | 0.7391 | 0.6859 | |
| 20% | 0.7235 | 0.6822 | |
| B5 | 0% | 0.9541 | 0.9466 |
| 10% | 0.9534 | 0.9483 | |
| 20% | 0.9528 | 0.9460 | |
| B6 | 0% | 0.9514 | 0.9498 |
| 10% | 0.8697 | 0.7874 | |
| 20% | 0.8627 | 0.8150 | |
| B7 | 0% | 0.9782 | 0.9352 |
| 10% | 0.9468 | 0.9226 | |
| 20% | 0.9365 | 0.9004 | |
| B8 | 0% | 0.9501 | 0.9521 |
| 10% | 0.9155 | 0.8967 | |
| 20% | 0.8650 | 0.8236 | |
| B9 | 0% | 0.9823 | 0.9809 |
| 10% | 0.9469 | 0.9341 | |
| 20% | 0.8910 | 0.8709 | |
| B10 | 0% | 0.9981 | 0.9980 |
| 10% | 0.9629 | 0.9535 | |
| 20% | 0.9571 | 0.9230 |
Fig 3Number of BKS wins over majority voting in data sets with and without noise (red, yellow, and green lines indicate the threshold of wins requires for significance level of α = 0.1, 0.05 and 0.01, respectively).
Comparison of F1 scores with literature (best in bold).
| Dataset | BKS | Voting | GEP [ | CUSBoost [ |
|---|---|---|---|---|
| B1 |
| 0.9855 | 0.9048 | 0.3231 |
| B2 |
| 0.9931 | - | 0.3363 |
| B3 |
| 0.9704 | 0.927 | 0.1809 |
| B4 | 0.7675 |
| - | 0.5543 |
| B8 | 0.9742 |
| 0.9005 | 0.0939 |
| B9 |
| 0.9904 | 0.8964 | 0.1674 |
List of features and description.
| No | Features | Description |
|---|---|---|
| 1 | Account Number | Anonymized account number |
| 2 | Transaction Amount | Amount spent in the transaction |
| 3 | Transaction Date | Date of said transaction |
| 4 | Transaction Time | Time of said transaction |
| 5 | Device Type | Type of device used for transaction |
| 6 | MCC | Merchant category code |
| 7 | Acquiring Country | Country where transaction took place |
| 8 | For Country | Country where card was issued |
| 9 | Transaction Type | Sale or cancellation |
| 10 | Transaction Amount Count | Count of transactions by cardholder |
| 11 | Transaction Amount Sum | Sum of total transactions by cardholder |
| 12 | Acquiring Country Count | Count of unique acquiring country |
| 13 | Acquiring Country Sum | Sum of acquiring country for transaction |
| 14 | MCC Count | Count of all MCC |
| 15 | MCC Sum | Sum of specific MCC for transaction |
| 16 | Device Type Count | Count of different device types used |
| 17 | Device Type Sum | Sum of specific device type used for transaction |
Fig 4Feature importance using DT, RF, and XGBoost.
Accuracy rates and BKS wins with noise added.
| Noise | BKS | Voting | BKS Wins |
|---|---|---|---|
| 0% | 0.9993 | 0.9993 | 0 |
| 10% | 0.9961 | 0.9907 | 9 |
| 20% | 0.9872 | 0.9771 |
|
| 30% | 0.9699 | 0.9576 |
|
| 40% | 0.9656 | 0.9511 |
|
F1 scores with noise added.
| Noise | BKS | Voting |
|---|---|---|
| 0% | 0.9996 | 0.9996 |
| 10% | 0.9970 | 0.9963 |
| 20% | 0.9935 | 0.9881 |
| 30% | 0.9845 | 0.9779 |
| 40% | 0.9822 | 0.9762 |
Accuracy rates with different ratios of minority to majority samples.
| Sampling | BKS | Voting |
|---|---|---|
| Original | 0.9993 | 0.9993 |
| 1:100 | 0.9995 | 0.9981 |
| 1:500 | 0.9993 | 0.9980 |