| Literature DB >> 28713450 |
Nicolas Gauvrit Gauvrit1, Jean-Charles Houillon2, Jean-Paul Delahaye3.
Abstract
The first significant (leftmost nonzero) digit of seemingly random numbers often appears to conform to a logarithmic distribution, with more 1s than 2s, more 2s than 3s, and so forth, a phenomenon known as Benford's law. When humans try to produce random numbers, they often fail to conform to this distribution. This feature grounds the so-called Benford analysis, aiming at detecting fabricated data. A generalized Benford's law (GBL), extending the classical Benford's law, has been defined recently. In two studies, we provide some empirical support for the generalized Benford analysis, broadening the classical Benford analysis. We also conclude that familiarity with the numerical domain involved as well as cognitive effort only have a mild effect on the method's accuracy and can hardly explain the positive results provided here.Entities:
Keywords: Benford analysis; fraud detection; generalized Benford’s law
Year: 2017 PMID: 28713450 PMCID: PMC5504535 DOI: 10.5709/acp-0212-x
Source DB: PubMed Journal: Adv Cogn Psychol ISSN: 1895-1171
Proportion of 1s, 2s,…, 9s as First Significant Digit in a Series Conforming to NBL
| Digit | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 |
|---|---|---|---|---|---|---|---|---|---|
| Prop (%) | 30.1 | 17.6 | 12.5 | 9.69 | 7.92 | 6.69 | 5.80 | 5.12 | 4.58 |
Note. NBL = Newcomb-Benford law.
Figure 1.Mean discrepancy ± SEM from GBL as measured by D, for three functions: Log (corresponding to NBL), f(x) = π × x2 (“Square”) and Square root. GBL = generalized Benford’s law. °p <.05. *p < .01. **p < .001. ***p < .0001.
Results of Two-Sample T-Tests (t) Comparing the Conformity of Fabricated and Real Data to GBL, With Log, Square and Square Root Function
| Log | Square | Square Root | |
|---|---|---|---|
| Cities | 2.07 ° | 1.55 | 4.01 ** |
| Numbers | 4.80 *** | – 4.77 *** | 3.41 * |
| Stars | 4.20 ** | 3.90 ** | 1.59 |
| Tuberculosis | 4.02 ** | 1.85 | 2.52 |
Note. GBL = generalized Benford’s law. °p < .05. *p < .01. **p < .001. ***p <. 0001.
Figure 2.Smoothed receiver operating characteristic (ROC) curves.
AUCs With 95% Confidence Intervals (DeLong). In Each Row, the Largest AUC is Bolded
| Log | Square | Square Root | |
|---|---|---|---|
| Cities | .55 (.42–.68) | .48 (.36–.61) | |
| Numbers | .78 (.69–.88) | .70 (.59–.81) | |
| Stars | .75 (.64–.86) | .59 (.46–.73) | |
| Tuberculosis | .58 (.47–.70) | .62 (.51–.73) |
Note. AUC = Area under the curve.