| Literature DB >> 35831945 |
Lutz Jäncke1,2, Seyed A Valizadeh1.
Abstract
We analysed a dataset comprising 118 subjects who were scanned three times (at baseline, 1-year follow-up, and 7-year follow-up) using structural magnetic resonance imaging (MRI) over the course of 7 years. We aimed to examine whether it is possible to identify individual subjects based on a restricted number of neuroanatomical features measured 7 years previously. We used FreeSurfer to compute 15 standard brain measures (total intracranial volume [ICV], total cortical thickness [CT], total cortical surface area [CA], cortical grey matter [CoGM], cerebral white matter [CeWM], cerebellar cortex [CBGM], cerebellar white matter [CBWM], subcortical volumes [thalamus, putamen, pallidum, caudatus, hippocampus, amygdala and accumbens] and brain stem volume). We used linear discriminant analysis (LDA), random forest machine learning (RF) and a newly developed rule-based identification approach (RBIA) for the identification process. Using RBIA, different sets of neuroanatomical features (ranging from 2 to 14) obtained at baseline were combined by if-then rules and compared to the same set of neuroanatomical features derived from the 7-year follow-up measurement. We achieved excellent identification results with LDA, while the identification results for RF were very good but not perfect. The RBIA produced the best results, achieving perfect participant identification for some four-feature sets. The identification results improved substantially when using larger feature sets, with 14 neuroanatomical features providing perfect identification. Thus, this study shows again that the human brain is highly individual in terms of neuroanatomical features. These results are discussed in the context of the current literature on brain plasticity and the scientific attempts to develop brain-fingerprinting techniques.Entities:
Keywords: FreeSurfer; MRI; linear discriminant analysis; random forest; rule-based identification approach; subject authentication; subject identification
Mesh:
Year: 2022 PMID: 35831945 PMCID: PMC9543309 DOI: 10.1111/ejn.15770
Source DB: PubMed Journal: Eur J Neurosci ISSN: 0953-816X Impact factor: 3.698
Summary of the identification results broken down for LDA and RF
| Accuracy | Sensitivity | Specificity | F1 | |
|---|---|---|---|---|
| LDA | 1.00 | 1.00 | 1.00 | 1.00 |
| RF (500) | .78 | .80 | .99 | .73 |
| RF (5000) | .76 | 1.00 | .76 | .71 |
Note: Acc, accuracy; F1, F1‐score; Sens, sensitivity; Spec, specificity. For RF, two variants have been applied: one using 500 and one using 5000 trees for training.
Abbreviations: LDA, linear discriminant analysis; RF, random forest.
Results of the stepwise LDA and the RF with feature ranking
| LDA feature selection | LDA classification | RF feature ranked | RF classification | |||||||
|---|---|---|---|---|---|---|---|---|---|---|
| ACC | SPC | SEN | F1 | ACC | SPC | SEN | F1 | |||
|
| Hip | .01 | .71 | .01 | .00 | CeWM | .08 | .92 | .08 | .05 |
|
| Tha | .08 | .92 | .08 | .04 | Amy | .19 | .97 | .19 | .15 |
|
| CA | .24 | .97 | .24 | .18 | CT | .32 | .98 | .32 | .26 |
|
| Amy | .50 | .99 | .50 | .43 | CA | .43 | .99 | .43 | .36 |
|
| Put | .81 | 1.00 | .81 | .77 | CBGM | .58 | .99 | .58 | .52 |
|
| CoGM | .84 | 1.00 | .84 | .80 | BST | .66 | 1.00 | .66 | .60 |
|
| BST | .92 | 1.00 | .92 | .90 | Pal | .72 | 1.00 | .72 | .66 |
|
| CBWM | .98 | 1.00 | .98 | .98 | Cau | .78 | 1.00 | .78 | .73 |
|
| Acc | .99 | 1.00 | .99 | .99 | Hip | .73 | 1.00 | .73 | .67 |
|
| Cau | 1.00 | 1.00 | 1.00 | 1.00 | CBWM | .74 | 1.00 | .74 | .68 |
|
| CeWM | 1.00 | 1.00 | 1.00 | 1.00 | Put | .79 | 1.00 | .79 | .74 |
|
| Pal | 1.00 | 1.00 | 1.00 | 1.00 | CoGM | .76 | 1.00 | .76 | .72 |
|
| CBGM | 1.00 | 1.00 | 1.00 | 1.00 | Acc | .79 | 1.00 | .79 | .74 |
|
| CT | 1.00 | 1.00 | 1.00 | 1.00 | Tha | .76 | 1.00 | .76 | .71 |
Note: The anatomical features are listed from top to bottom according to their importance for subject identification.
Abbreviations: LDA, linear discriminant analysis; RF, random forest.
500 trees are used for training.
Significant difference in Mcnemar test.
Number of possible combinations comprising 2–14 features (n of possible combinations) for the ‘rule‐based‐identification‐approach (RBIA)’
| Number of features |
|
|
|
| Proportion of correctly identified subjects |
|---|---|---|---|---|---|
| 2 | 91 | 10,738 | 4297 | 6441 | .40 |
| 3 | 364 | 42,952 | 34,683 | 8269 | .81 |
| 4 | 1001 | 118,118 | 107,928 | 10,190 | .91 |
| 5 | 2002 | 236,236 | 224,937 | 11,299 | .95 |
| 6 | 3003 | 354,354 | 344,256 | 10,098 | .97 |
| 7 | 3432 | 404,976 | 397,848 | 7128 | .98 |
| 8 | 3003 | 354,354 | 350,406 | 3948 | .99 |
| 9 | 2002 | 236,236 | 234,550 | 1686 | .99 |
| 10 | 1001 | 118,118 | 117,583 | 535 | 1.00 |
| 11 | 364 | 42,952 | 42,834 | 118 | 1.00 |
| 12 | 91 | 10,738 | 10,722 | 16 | 1.00 |
| 13 | 14 | 1652 | 1651 | 1 | 1.00 |
| 14 | 1 | 118 | 118 | 0 | 1 |
Note: Shown are also the ‘number of possible identifications’ (n of possible feature combinations * 118), the ‘number of correctly identified combinations’, the ‘n of wrongly identified combinations’, and the ‘proportion of correctly identified subjects’.
Shown are the best and worst combinations with respect to the accuracy (ACC) of subject identification broken down for each feature‐set 2–9 in the context of the ‘rule‐based‐identification‐approach (RBIA)’
| Number of features | Rank | Best | Worst | ||
|---|---|---|---|---|---|
| Combinations of anatomical features | ACC | Combinations of anatomical features | ACC | ||
| 2 | 1 | CoGM, Acc | 88% | CT, Tha | 32% |
| 2 | Pal, Acc | 87% | CT, CoGM | 34% | |
| 3 | 1 | CeWM, Cau, Acc | 98% | CT, CA, CoGM | 37% |
| 2 | CoGM, Pal, Acc | 97% | CA, CBGM, BST | 53% | |
| 41 | 1 | CT, CA, CoGM, CeWM | 100% | CT, CA, CoGM, CBGM | 93% |
| 2 | CT, CA, CoGM, Acc | 100% | CT, CA, CoGM, Cau | 94% | |
| 52 | 1 | CT, CA, CeWM, Cau, Acc | 100% | CT, CA, CoGM, CBWM, BST | 79% |
| 2 | CT, CA, CBGM, Tha, Acc | 100% | CT, CA, CoGM, CBGM, BST | 81% | |
| 63 | 1 | CT, CA, CoGM, CeWM, Cau, Acc | 100% | CT, CA, CoGM, CBGM, CBWM, BST | 87% |
| 2 | CT, CA, CoGM, CBGM, Tha, Acc | 100% | CT, CA, CoGM, CBGM, Cau, BST | 87% | |
| 74 | 1 | CT, CA,CoGM, CeWM, CBGM, Tha, Acc | 100% | CT, CA, CoGM, CBGM, Tha, Cau, BST | 90% |
| 2 | CT, CA, CoGM, CeWM, CBGM, Put, Acc | 100% | CT, CA, CoGM, CBGM, Put, Cau, BST | 92% | |
| 85 | 1 | CT, CA, CoGM, CeWM, CBGM, CBWM, Tha, Acc | 100% | CT, CA,CoGM, CBGM, Tha, Put, Cau,BST | 93% |
| 2 | CT, CA, CoGM, CeWM, CBGM, CBWM, Put, Acc | 100% | CT, CA, CoGM,CeWM, CBWM,Tha,Put, BST | 94% | |
| 96 | 1 | CT, CA, CoGM, CeWM, CBGM, CBWM, Tha, Put, Acc | 100% | CT, CA, CoGM, CeWM, CBGM, Tha, Put, Cau, BST | 95% |
| 2 | CT, CA, CoGM, CeWM, CBGM, CBWM, Tha, Pal, Hip | 100% | CT,CA,CoGM,CeWM, CBWM, Tha, Put, Cau, BST | 96% | |
Note: From feature sets with 10 features, the accuracy is perfect for all possible 10‐feature sets. (1) Number of four‐feature combinations with 100% accuracy: 5; (2) number of five‐feature combinations with 100% accuracy: 67; (3) number of six‐feature combinations with 100% accuracy: 284; (4) number of seven‐feature combinations with 100% accuracy: 5649; (5) number of eight‐feature combinations with 100% accuracy: 934 and (6) number of nine‐feature combinations with 100% accuracy: 898.
FIGURE 1Distribution of accuracy separately for each group of feature sets for the ‘rule‐based‐identification approach’ (RBIA). On the abscissa, the different feature sets are shown (2F = two‐feature set, 9F = nine‐feature set). Shown are box plots with the maximum and minimum of accuracies. There is a 100% correct identification starting from the four‐feature set on.