| Literature DB >> 29386583 |
Merylin Monaro1, Chiara Galante2, Riccardo Spolaor1, Qian Qian Li1, Luciano Gamberini2,3, Mauro Conti3,4, Giuseppe Sartori5,6.
Abstract
Identifying the true identity of a subject in the absence of external verification criteria (documents, DNA, fingerprints, etc.) is an unresolved issue. Here, we report an experiment on the verification of fake identities, identified by means of their specific keystroke dynamics as analysed in their written response using a computer keyboard. Results indicate that keystroke analysis can distinguish liars from truth tellers with a high degree of accuracy - around 95% - thanks to the use of unexpected questions that efficiently facilitate the emergence of deception clues.Entities:
Year: 2018 PMID: 29386583 PMCID: PMC5792443 DOI: 10.1038/s41598-018-20462-6
Source DB: PubMed Journal: Sci Rep ISSN: 2045-2322 Impact factor: 4.379
Figure 1Example of the computer screen. Participant were instructed to respond writing in the edit box which was located below the presented sentence. The subject was instructed to finish the response pressing ENTER.
List of the 18 questions presented to participants divided by type (control, expected and unexpected questions).
| Question type | Question text |
|---|---|
| Control | What is your gender? |
| Expected | What is your name? |
| Unexpected | How old are you? (in letters) |
Table reports the t-value and p-value of the 4 attributes which revealed a statistically significant difference between the two groups (truthtellers vs liars), considering a significance level of p < 0.0008. Effect-size Cohen’s d is also reported.
| Feature | Effect-size (Cohen’s | |
|---|---|---|
| Errors | ||
| Prompted-firstdigit | ||
| Prompted-firstdigit adjusted GULPEASE | ||
| Prompted-enter |
Error rate to control, expected and unexpected question for liars and truthtellers.
| Question type | Truthtellers | Liars |
|---|---|---|
| Control | 0/80 | 0/80 |
| Expected | 0/160 | 3/160 |
| Unexpected | 3/120 | 81/120 |
This table reports the correlation matrix among the five final predictor.
| Feature | Number of errors | Prompted-firstdigit adjusted GULPEASE | Firstdigit-enter | Writing time | Time key before enter down | Feature |
|---|---|---|---|---|---|---|
| Number of errors | 1.00 | 0.51 | 0.25 | 0.46 | 0.44 | 1.00 |
| Prompted-firstdigit adjusted GULPEASE | 0.51 | 1.00 | 0.66 | 0.60 | 0.54 | 0.51 |
| Firstdigit-enter | 0.25 | 0.66 | 1.00 | 0.67 | 0.52 | 0.25 |
| Writing time | 0.46 | 0.60 | 0.67 | 1.00 | 0.67 | 0.46 |
| Time key before enter down | 0.44 | 0.54 | 0.52 | 0.67 | 1.00 | 0.44 |
The table reports the percentage of accuracy obtained on the training set using a 10-fold cross-validation procedure and in the test set (20 new participants) for four different machine learning classifiers. In addition to accuracies, the table reports the weight average of True Positive Rate (TP Rate), False Positive Rate (FP Rate), Precision value, Recall value, F-Measure, Receiver Operating Characteristics (ROC) Area value and Precision-Recall Curve (PRC) Area value.
| Classifier | Accuracy | TP Rate | FP Rate | Precision | Recall | F-Measure | ROC Area | PRC Area |
|---|---|---|---|---|---|---|---|---|
|
| ||||||||
| Logistic | 90% | 0.900 | 0.100 | 0.904 | 0.900 | 0.900 | 0.959 | 0.948 |
| SVM (SMO) | 95% | 0.950 | 0.050 | 0.950 | 0.950 | 0.950 | 0.950 | 0.928 |
| LMT | 97.5% | 0.975 | 0.025 | 0.976 | 0.975 | 0.975 | 1.000 | 1.000 |
| Random Forest | 92.5% | 0.925 | 0.075 | 0.926 | 0.925 | 0.925 | 0.972 | 0.972 |
|
| ||||||||
| Logistic | 100% | 1.000 | 0.000 | 1.000 | 1.000 | 1.000 | 1.000 | 1.000 |
| SVM (SMO) | 90% | 0.900 | 0.100 | 0.917 | 0.900 | 0.899 | 0.900 | 0.867 |
| LMT | 90% | 0.900 | 0.100 | 0.917 | 0.900 | 0.899 | 1.000 | 1.000 |
| Random Forest | 95% | 0.950 | 0.050 | 0.955 | 0.950 | 0.950 | 1.000 | 1.000 |
The table reports the accuracies obtained from five different machine learning classifiers in the 10-fold cross-validation and in test set, using only normalized measures as predictors. In addition to accuracies, the table reports the weight average of True Positive Rate (TP Rate), False Positive Rate (FP Rate), Precision value, Recall value, F-Measure, Receiver Operating Characteristics (ROC) Area value and Precision-Recall Curve (PRC) Area value.
| Classifier | Accuracy | TP Rate | FP Rate | Precision | Recall | F-Measure | ROC Area | PRC Area |
|---|---|---|---|---|---|---|---|---|
|
| ||||||||
| Logistic | 90% | 0.900 | 0.100 | 0.900 | 0.900 | 0.900 | 0.946 | 0.912 |
| SVM (SMO) | 92.5% | 0.925 | 0.075 | 0.935 | 0.925 | 0.925 | 0.925 | 0.897 |
| LMT | 90% | 0.900 | 0.100 | 0.917 | 0.900 | 0.899 | 0.985 | 0.986 |
| Random Forest | 95% | 0.950 | 0.050 | 0.950 | 0.950 | 0.950 | 0.966 | 0.961 |
|
| ||||||||
| Logistic | 100% | 1.000 | 0.000 | 1.000 | 1.000 | 1.000 | 1.000 | 1.000 |
| SVM (SMO) | 90% | 0.900 | 0.100 | 0.917 | 0.900 | 0.899 | 0.900 | 0.867 |
| LMT | 90% | 0.900 | 0.100 | 0.917 | 0.900 | 0.899 | 1.000 | 1.000 |
| Random Forest | 100% | 1.000 | 0.000 | 1.000 | 1.000 | 1.000 | 1.000 | 1.000 |
The table reports the percentage of accuracy obtained in a test set of 151 participants (86 liars and 65 truthtellers) recruited online. In addition to accuracies, the table reports the weight average of True Positive Rate (TP Rate), False Positive Rate (FP Rate), Precision value, Recall value, F-Measure, Receiver Operating Characteristics (ROC) Area value and Precision-Recall Curve (PRC) Area value.
| Classifier | Accuracy | TP Rate | FP Rate | Precision | Recall | F-Measure | ROC Area | PRC Area |
|---|---|---|---|---|---|---|---|---|
|
| ||||||||
| Logistic | 86.1% | 0.861 | 0.135 | 0.864 | 0.861 | 0.861 | 0.930 | 0.911 |
| SVM (SMO) | 88.7% | 0.887 | 0.093 | 0.902 | 0.887 | 0.888 | 0.897 | 0.857 |
| LMT | 90.1% | 0.901 | 0.086 | 0.908 | 0.901 | 0.901 | 0.959 | 0.953 |
| Random Forest | 90.7% | 0.907 | 0.078 | 0.916 | 0.907 | 0.908 | 0.980 | 0.977 |
Table reports the prototypical keystroke pattern of a liar and a truthteller for the 5 predictors used in the classification models.
| Feature | Prototypical truthteller | Prototypical liar |
|---|---|---|
| Number of errors | 0/18 = 0.00 | 7/18 = 0.39 |
| Prompted-firstdigit adjusted GULPEASE | 1649 ms | 3508 ms |
| Firstdigit-enter | 3123 ms | 3567 ms |
| Writing time | 281 ms | 442 ms |
| Time key before enter down | 462 ms | 739 ms |
The number of errors is defined as the number of fields compiled by entering the wrong information. The prompted-firstdigit adjusted GULPEASE, is the interval between the onset of the sentence on the computer screen and the first key pressed. The firstdigit-enter is the time between the press of the first key and the press of ENTER. The writing time corresponds to the firstdigit-enter divided by the number of characters typed. The time key before enter down is the time between the press of the last key and the press of ENTER.