| Literature DB >> 36175413 |
Narmin Ghaffari Laleh1, Daniel Truhn2, Gregory Patrick Veldhuizen3, Tianyu Han4, Marko van Treeck1, Roman D Buelow5, Rupert Langer6,7, Bastian Dislich6, Peter Boor5, Volkmar Schulz4,8,9,10, Jakob Nikolas Kather11,12,13,14,15.
Abstract
Artificial Intelligence (AI) can support diagnostic workflows in oncology by aiding diagnosis and providing biomarkers directly from routine pathology slides. However, AI applications are vulnerable to adversarial attacks. Hence, it is essential to quantify and mitigate this risk before widespread clinical use. Here, we show that convolutional neural networks (CNNs) are highly susceptible to white- and black-box adversarial attacks in clinically relevant weakly-supervised classification tasks. Adversarially robust training and dual batch normalization (DBN) are possible mitigation strategies but require precise knowledge of the type of attack used in the inference. We demonstrate that vision transformers (ViTs) perform equally well compared to CNNs at baseline, but are orders of magnitude more robust to white- and black-box attacks. At a mechanistic level, we show that this is associated with a more robust latent representation of clinically relevant categories in ViTs compared to CNNs. Our results are in line with previous theoretical studies and provide empirical evidence that ViTs are robust learners in computational pathology. This implies that large-scale rollout of AI models in computational pathology should rely on ViTs rather than CNN-based classifiers to provide inherent protection against perturbation of the input data, especially adversarial attacks.Entities:
Mesh:
Year: 2022 PMID: 36175413 PMCID: PMC9522657 DOI: 10.1038/s41467-022-33266-0
Source DB: PubMed Journal: Nat Commun ISSN: 2041-1723 Impact factor: 17.694
Fig. 1Cancer subtyping with Deep Learning.
A Image classification with ResNet, B with a Vision Transformer (ViT). C Area under the receiver operating curve (AUROC) for subtyping of renal cell carcinoma (RCC) into clear cell (cc), chromophobe (ch), and papillary (pap). The box shows the median and quartiles of five repetitions (points) and the whiskers expand to the rest of the distribution (n = 249 patients). We used a two-sided t-test without adjustments for the performance comparison between the two models. D Representative highly scoring image tiles for RCC, as selected by ResNet and ViT. E AUROC for subtyping gastric cancer into diffuse and intestinal. The box shows the median and quartiles of five repetitions (points) and the whiskers expand to the rest of the distribution (n = 249 patients). We used a two-sided t-test without adjustments for the performance comparison between the two models. F Highly scoring image tiles for gastric cancer, as selected by ResNet and ViT.
Fig. 2Adversarial attacks on computational pathology.
A Adversarial attacks add noise to the image and flip the classification of renal cell carcinoma (RCC) subtyping into a clear cell (cc), chromophobe (ch), and papillary (pap). The model’s prediction confidence is shown on each image. B Experimental design for the baseline (normal) training, white-box, and black-box attacks and for adversarially robust training. C Different attack algorithms yield different noise patterns. We used the Fast Gradient Sign Method (FGSM), Projected Gradient Descent (PGD), Fast Adaptive boundary (FAB), Square attacks, AutoAttack (AA), and AdvDrop. D The attack strength ɛ increases the amount of noise which is added to the image. The average threshold for human perception is ɛ = 0.19 for ResNet.
Fig. 3Vision transformers are more robust to adversarial attacks than convolutional neural networks.
A Micro-averaged AUROC for ResNet and ViT under PGD attack for RCC subtyping without (left) and with (right) adversarially robust training. Epsilon * 10E-3. This figure shows the mean AUROC of five experiments ± the standard deviation. B AUROC for ResNet and ViT for gastric cancer subtyping. ɛ * 10e-3. This figure shows the mean AUROC of five experiments ± the standard deviation. C First two principal components of the latent space of ResNet and ViT before (original) and after the attack (perturbed) for RCC subtyping, for 150 highest-scoring image tiles. ViT has better separation of the clusters before the attack and its latent space retains its structure better after the attack. D Latent space for the gastric cancer subtyping experiment.
ViTs are more robust to adversarial attacks than ResNets, as measured by the attack success rate (ASR) for the RCC classification task
| ɛ | Normal models | ||||||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| FGSM | PGD | Square | FAB | AutoAttack | ɛ | AdvDrop | |||||||||||||
| ResNet | BiT | ViT | ResNet | BiT | ViT | ResNet | BiT | ViT | ResNet | BiT | ViT | ResNet | BiT | ViT | ResNet | BiT | ViT | ||
| 13.33% | 16.44% | 14.44% | 16.22% | 5.78% | 2.22% | 12.67% | 19.78% | 13.56% | 19.78% | 20 | 68.67% | 63.11% | |||||||
| 32.67% | 35.56% | 34.67% | 33.78% | 13.56% | 7.56% | 29.78% | 4356% | 33.11% | 44.44% | 40 | 67.56% | 68.44% | |||||||
| 46.00% | 46.00% | 50.22% | 45.56% | 24.00% | 15.78% | 44.44% | 56.44% | 48.67% | 56.89% | 60 | 55.78% | 70.00% | |||||||
| 64.22% | 62.00% | 64.00% | 63.33% | 58.00% | 55.78% | 58.00% | 55.11% | 54.89% | 58.00% | 55.78% | - | - | - | - | |||||
The computation time t is the time needed to apply the attack to each image. For pairwise comparisons between ResNet, BiT, and ViT for the same experimental condition, the one with the lower (better) ASR is printed in bold. In this experiment, 450 randomly selected tiles from AACHEN-RCC were used (same tiles for all experiments).
The best value in each category is typeset in bold font.