March 17, 2025

ikayaniaamirshahzad@gmail.com

Adversarial Example Researchers Need to Expand What is Meant by ‘Robustness’


The hypothesis in Ilyas et. al. is a special case of a more general principle that is well accepted in the
distributional robustness literature — models lack robustness to distribution shift because they latch onto
superficial correlations in the data. Naturally, the same principle also explains adversarial examples
because they arise from a worst-case analysis of distribution shift. To obtain a more complete understanding
of robustness, adversarial example researchers should connect their work to the more general problem of
distributional robustness rather than remaining solely fixated on small gradient perturbations.

Detailed Response

The main hypothesis in Ilyas et al. (2019) happens to be a special case of a more general principle that is
commonly accepted in the robustness to distributional shift literature
: a model’s lack of
robustness is largely because the model latches onto superficial statistics in the data. In the image
domain, these statistics may be unused by — and unintuitive to — humans, yet they may be useful for
generalization in i.i.d. settings. Separate experiments eschewing gradient perturbations and studying
robustness beyond adversarial perturbations show similar results. For example, a recent work
demonstrates that models can generalize to the test examples by learning from high-frequency information
that is both naturally occurring and also inconspicuous. Concretely, models were trained and tested with an
extreme high-pass filter applied to the data. The resulting high-frequency features appear completely
grayscale to humans, yet models are able to achieve 50% top-1 accuracy on ImageNet-1K solely from these
natural features that usually are “invisible.” These hard-to-notice features can be made conspicuous by
normalizing the filtered image to have unit variance pixel statistics in the figure below.

1
Models can achieve high accuracy using information from the input that would be unrecognizable
to humans. Shown above are models trained and tested with aggressive high and low pass filtering applied
to the inputs. With aggressive low-pass filtering, the model is still above 30% on ImageNet when the
images appear to be simple globs of color. In the case of high-pass (HP) filtering, models can achieve
above 50% accuracy using features in the input that are nearly invisible to humans. As shown on the
right hand side, the high pass filtered images needed be normalized in order to properly visualize the
high frequency features.

Given the plethora of useful correlations that exist in natural data, we should expect that our models will
learn to exploit them. However, models relying on superficial statistics can poorly generalize should these
same statistics become corrupted after deployment. To obtain a more complete understanding of model
robustness, measured test error after perturbing every image in the test set by a
Fourier basis vector,
as shown in Figure 2. The naturally trained model is robust to low-frequency perturbations, but,
interestingly, lacks robustness in the mid to high frequencies. In contrast, adversarial training improves
robustness to mid- and high-frequency perturbations, while sacrificing performance on low frequency
perturbations. For instance adversarial training degrades performance on the low-frequency fog corruption
from 85.7% to 55.3%. Adversarial training similarly degrades robustness to
contrast and low-pass
filtered noise. By taking a broader view of robustness beyond tiny p\ell_p

2
Model sensitivity to additive noise aligned with different Fourier basis vectors on CIFAR-10.
We fix the additive noise to have 2\ell_2

How, then, can the research community create models that robustly generalize in the real world, given that
adversarial training can harm robustness to distributional shift? To do so, the research community must take
a broader view of robustness and accept that p\ell_p

Response Summary: The demonstration of models that learn from
high-frequency components of the data is interesting and nicely aligns with our
findings. Now, even though susceptibility to noise could indeed arise from
non-robust useful features, this kind of brittleness (akin to adversarial examples)
of ML models has been so far predominantly viewed as a consequence of model
“bugs” that will be eliminated by “better” models. Finally, we agree that our
models need to be robust to a much broader set of perturbations — expanding the
set of relevant perturbations will help identify even more non-robust features
and further distill the useful features we actually want our models to rely on.

Response: The fact that models can learn to classify correctly based
purely on the high-frequency component of the training set is neat! This nicely
complements one of our takeaways: models
will rely on useful features even if these features appear incomprehensible to humans.

Also, while non-robustness to noise can be an indicator of models using
non-robust useful features, this is not how the phenomenon was predominantly viewed.
More often than not, the brittleness of ML models to noise was instead regarded
as an innate shortcoming of the models, e.g., due to poor margins. (This view is
even more prevalent in the adversarial robustness community.) Thus, it was often
expected that progress towards “better”/”bug-free” models will lead to them
being more robust to noise and adversarial examples.

Finally, we fully agree that the set of LpL_p


References

  1. Benchmarking Neural Network Robustness to Common Corruptions and Perturbations[PDF]
    Hendrycks, D. and Dietterich, T.G., 2019. CoRR, Vol abs/1903.12261.
  2. Measuring the tendency of CNNs to Learn Surface Statistical Regularities[PDF]
    Jo, J. and Bengio, Y., 2017. CoRR, Vol abs/1711.11561.
  3. Nightmare at Test Time: Robust Learning by Feature Deletionhttps://distill.pub/2019/advex-bugs-discussion/response-1
    Globerson, A. and Roweis, S., 2006. Proceedings of the 23rd International Conference on Machine Learning, pp. 353–360. ACM. DOI: 10.1145/1143844.1143889
  4. A Robust Minimax Approach to Classificationhttps://distill.pub/2019/advex-bugs-discussion/response-1
    Lanckriet, G.R., Ghaoui, L.E., Bhattacharyya, C. and Jordan, M.I., 2003. J. Mach. Learn. Res., Vol 3, pp. 555–582. JMLR.org. DOI: 10.1162/153244303321897726
  5. Generalisation in humans and deep neural networks[PDF]
    Geirhos, R., Temme, C.R.M., Rauber, J., Sch{\”{u}}tt, H.H., Bethge, M. and Wichmann, F.A., 2018. CoRR, Vol abs/1808.08750.
  6. A Fourier Perspective on Model Robustness in Computer Vision[PDF]
    Yin, D., Lopes, R.G., Shlens, J., Cubuk, E.D. and Gilmer, J., 2019. CoRR, Vol abs/1906.08988.
  7. Motivating the Rules of the Game for Adversarial Example Research[PDF]
    Gilmer, J., Adams, R.P., Goodfellow, I.J., Andersen, D. and Dahl, G.E., 2018. CoRR, Vol abs/1807.06732.
  8. Adversarial Examples Are a Natural Consequence of Test Error in Noise[PDF]
    Ford, N., Gilmer, J., Carlini, N. and Cubuk, E.D., 2019. CoRR, Vol abs/1901.10513.
  9. Robustness of classifiers: from adversarial to random noise[PDF]
    Fawzi, A., Moosavi-Dezfooli, S. and Frossard, P., 2016. Advances in Neural Information Processing Systems 29, pp. 1632–1640. Curran Associates, Inc.
  10. Natural Adversarial Examples[PDF]
    Hendrycks, D., Zhao, K., Basart, S., Steinhardt, J. and Song, D., 2019. ICML 2019 Workshop on Understanding and Improving Generalization in Deep Learning.
  11. {MNIST-C:} {A} Robustness Benchmark for Computer Vision[PDF]
    Mu, N. and Gilmer, J., 2019. CoRR, Vol abs/1906.02337.
  12. {NICO:} {A} Dataset Towards Non-I.I.D. Image Classification[PDF]
    He, Y., Shen, Z. and Cui, P., 2019. CoRR, Vol abs/1906.02899.
  13. Do ImageNet Classifiers Generalize to ImageNet?[PDF]
    Recht, B., Roelofs, R., Schmidt, L. and Shankar, V., 2019. CoRR, Vol abs/1902.10811.
  14. The Elephant in the Room[PDF]
    Rosenfeld, A., Zemel, R.S. and Tsotsos, J.K., 2018. CoRR, Vol abs/1808.03305.
  15. Using Videos to Evaluate Image Model Robustness[PDF]
    Gu, K., Yang, B., Ngiam, J., Le, Q.V. and Shlens, J., 2019. CoRR, Vol abs/1904.10076.


Updates and Corrections

If you see mistakes or want to suggest changes, please create an issue on GitHub.

Reuse

Diagrams and text are licensed under Creative Commons Attribution CC-BY 4.0 with the source available on GitHub, unless noted otherwise. The figures that have been reused from other sources don’t fall under this license and can be recognized by a note in their caption: “Figure from …”.

Citation

For attribution in academic contexts, please cite this work as

Gilmer & Hendrycks, "A Discussion of 'Adversarial Examples Are Not Bugs, They Are Features': Adversarial Example Researchers Need to Expand What is Meant by 'Robustness'", Distill, 2019.

BibTeX citation

@article{gilmer2019a,
  author = {Gilmer, Justin and Hendrycks, Dan},
  title = {A Discussion of 'Adversarial Examples Are Not Bugs, They Are Features': Adversarial Example Researchers Need to Expand What is Meant by 'Robustness'},
  journal = {Distill},
  year = {2019},
  note = {https://distill.pub/2019/advex-bugs-discussion/response-1},
  doi = {10.23915/distill.00019.1}
}



Source link

Leave a Comment