Abstract
This work evaluates the efficacy of adversarial robustness under transfer from CIFAR 100 to CIFAR 10. This allows us to identify transfer learning strategies under which adversarial defences are successfully retained, in addition to revealing potential vulnerabilities. We study the extent to which features crafted by fast gradient sign methods (FGSM) and their iterative alternative (PGD) can preserve their defence properties against black and white-box attacks under three different transfer learning strategies. We find that using PGD examples during training leads to more general robustness that is easier to transfer. Furthermore, under successful transfer, it achieves 5.2% more accuracy against white-box PGD attacks than the considered baselines. In this paper, we study the effects of using robust optimisation in the source and target networks. Our empirical evaluation sheds light on how well such mechanisms generalise while achieving comparable results to non-transferred defences.
Abstract (translated by Google)
URL
https://arxiv.org/abs/1905.02675