| Literature DB >> 34970642 |
Abdallah Alshantti1, Adil Rasheed1,2.
Abstract
There has been an emerging interest by financial institutions to develop advanced systems that can help enhance their anti-money laundering (AML) programmes. In this study, we present a self-organising map (SOM) based approach to predict which bank accounts are possibly involved in money laundering cases, given their financial transaction histories. Our method takes advantage of the competitive and adaptive properties of SOM to represent the accounts in a lower-dimensional space. Subsequently, categorising the SOM and the accounts into money laundering risk levels and proposing investigative strategies enables us to measure the classification performance. Our results indicate that our framework is well capable of identifying suspicious accounts already investigated by our partner bank, using both proposed investigation strategies. We further validate our model by analysing the performance when modifying different parameters in our dataset.Entities:
Keywords: investigation strategies; money laundering; risk levels; self-organising map (SOM); suspicious accounts
Year: 2021 PMID: 34970642 PMCID: PMC8713506 DOI: 10.3389/frai.2021.761925
Source DB: PubMed Journal: Front Artif Intell ISSN: 2624-8212
Dataset attributes prior to aggregation.
| Attribute type | Number of attributes | Attribute names | Unique categories |
|---|---|---|---|
| Identifier | 1 | Account number | |
| Date | 1 | Transaction date | |
| Numerical | 1 | Transaction amount | |
| Categorical | 9 | CreditDebitCode | 2 |
| TransTypeID | 29 | ||
| TransMethodID | 21 | ||
| TransactionChannelID | 28 | ||
| TextCode | 84 | ||
| ProductCode | 292 | ||
| CurrencyCodeOrig | 112 | ||
| SourceSystemOrg | 110 | ||
| SourceSystemFetch | 3 | ||
| Binary | 1 | Class label |
FIGURE 1Architecture of the SOM money laundering classification framework.
FIGURE 2SOM neurons risk categorisation matrix.
FIGURE 3The self-organising map plot after training. Dark grey cells in the plot represent neurons that are further away from their neighbouring neurons, while the light grey or white cells are those that are in close proximity to their neighbours. Darker intensity markers of either the red crosses or green circles indicate the attribution of many observations of the same class to a given neuron.
Fifth-fold cross-validation SOM neurons inter-neural distances against their suspicious accounts composition.
| Inter-neural distance | Suspicious composition | Total | ||
|---|---|---|---|---|
| 0–9.99% | 10–34.99% | 35+% | ||
|
| 98 | 7 | 7 | 112 |
| 0.20–0.39 | 154 | 40 | 40 | 234 |
| 0.40–0.59 | 80 | 6 | 18 | 104 |
| 0.60–0.79 | 26 | 2 | 1 | 29 |
|
| 5 | 0 | 0 | 5 |
| Total | 363 | 55 | 66 | 484 |
The distribution of the fifth-fold cross-validation training accounts along the risk categorisation matrix. Notation of the numbers is: all accounts (suspicious accounts).
| Inter-neural distance | Suspicious composition | Total | ||
|---|---|---|---|---|
| 0–9.99% | 10–34.99% | 35+% | ||
|
| 3122 (27) | 119 (21) | 158 (111) | 3399 (159) |
| 0.20–0.39 | 2973 (29) | 602 (109) | 647 (418) | 4222 (556) |
| 0.40–0.59 | 855 (9) | 65 (11) | 233 (164) | 1,153 (184) |
| 0.60–0.79 | 265 (4) | 52 (7) | 4 (2) | 321 (13) |
|
| 33 (0) | 0 (0) | 0 (0) | 33 (0) |
| Total | 7248 (69) | 838 (148) | 1,042 (695) | 9,128 (912) |
Risk categorisation of the fifth-fold cross-validation training observations.
| Risk category | All accounts | Suspicious accounts |
|---|---|---|
| Low | 7334 | 90 |
| Medium | 910 | 238 |
| High | 884 | 584 |
| Total | 9,128 | 912 |
The distribution of test accounts along the risk categorisation matrix. Notation of the numbers is: all accounts (suspicious accounts).
| Inter-neural distance | Suspicious composition | Total | ||
|---|---|---|---|---|
| 0–9.99% | 10–34.99% | 35+% | ||
|
| 2379 (33) | 94 (9) | 72 (43) | 2545 (85) |
| 0.20–0.39 | 2809 (46) | 513 (78) | 407 (247) | 3729 (371) |
| 0.40–0.59 | 2548 (64) | 513 (83) | 534 (324) | 3595 (471) |
| 0.60–0.79 | 940 (12) | 174 (37) | 238 (128) | 1,352 (177) |
|
| 123 (5) | 11 (2) | 55 (30) | 189 (37) |
| Total | 8799 (160) | 1,305 (209) | 1,306 (772) | 11,410 (1,141) |
Risk categorisation of test observations.
| Risk category | All accounts | Suspicious accounts |
|---|---|---|
| Low | 8770 | 164 |
| Medium | 1,395 | 246 |
| High | 1,245 | 731 |
| Total | 11,410 | 1,141 |
Safe strategy confusion matrix.
| Predicted | |||
|---|---|---|---|
| Normal | Suspicious | ||
|
|
| 8606 | 1,663 |
|
| 164 | 977 | |
Fast strategy confusion matrix.
| Predicted | |||
|---|---|---|---|
| Normal | Suspicious | ||
|
|
| 9,637 | 632 |
|
| 405 | 736 | |
Classification performance metrics for the safe and fast strategies.
| Strategy | Accuracy | Precision | Recall | F1-score | AUC |
|---|---|---|---|---|---|
| Safe | 0.8399 | 0.3728 | 0.8562 | 0.5188 | 0.8472 |
| Fast | 0.9091 | 0.5480 | 0.6451 | 0.5897 | 0.7918 |
Classification rate recall scores.
| Suspicious ratio (%) | Safe strategy | Fast strategy | ||||||||
|---|---|---|---|---|---|---|---|---|---|---|
| Number of Features | Number of Features | |||||||||
| 10 | 25 | 50 | 100 | All | 10 | 25 | 50 | 100 | All | |
| 5 | 0.790 | 0.802 | 0.778 | 0.824 | 0.826 | 0.368 | 0.594 | 0.516 | 0.464 | 0.432 |
| 10 | 0.836 | 0.840 | 0.850 | 0.818 | 0.828 | 0.660 | 0.624 | 0.610 | 0.580 | 0.530 |
| 20 | 0.832 | 0.826 | 0.856 | 0.838 | 0.822 | 0.680 | 0.714 | 0.644 | 0.584 | 0.596 |
| 50 | 0.792 | 0.830 | 0.842 | 0.840 | 0.864 | 0.606 | 0.644 | 0.582 | 0.692 | 0.738 |
Classification rate precision scores.
| Suspicious ratio (%) | Safe strategy | Fast strategy | ||||||||
|---|---|---|---|---|---|---|---|---|---|---|
| Number of Features | Number of Features | |||||||||
| 10 | 25 | 50 | 100 | All | 10 | 25 | 50 | 100 | All | |
| 5 | 0.228 | 0.257 | 0.238 | 0.190 | 0.160 | 0.402 | 0.467 | 0.445 | 0.297 | 0.262 |
| 10 | 0.391 | 0.392 | 0.355 | 0.323 | 0.314 | 0.564 | 0.553 | 0.544 | 0.437 | 0.436 |
| 20 | 0.530 | 0.557 | 0.560 | 0.520 | 0.528 | 0.670 | 0.661 | 0.669 | 0.608 | 0.630 |
| 50 | 0.793 | 0.814 | 0.819 | 0.819 | 0.804 | 0.849 | 0.860 | 0.837 | 0.831 | 0.845 |
Recall scores’ p-values between different suspicious ratios.
| Suspicious ratio (%) | Safe strategy | Fast strategy | ||||
|---|---|---|---|---|---|---|
| Suspicious Ratio | Suspicious Ratio | |||||
| 10% | 20% | 50% | 10% | 20% | 50% | |
| 5 | 0.299 | 0.549 | 0.457 | 0.587 | 0.059 | 0.378 |
| 10 | 0.564 | 0.609 | 0.049 | 0.609 | ||
| 20 | 0.879 | 0.114 | ||||
Precision scores’ p-values between different suspicious ratios.
| Suspicious ratio (%) | Safe strategy | Fast strategy | ||||
|---|---|---|---|---|---|---|
| Suspicious Ratio | Suspicious Ratio | |||||
| 10% | 20% | 50% | 10% | 20% | 50% | |
| 5 | 0.000 | 0.000 | 0.000 | 0.024 | 0.000 | 0.000 |
| 10 | 0.000 | 0.000 | 0.007 | 0.000 | ||
| 20 | 0.000 | 0.000 | ||||