Score-CAM: Score-weighted visual explanations for convolutional neural networks H Wang, Z Wang, M Du, F Yang, Z Zhang, S Ding, P Mardziel, X Hu Proceedings of the IEEE/CVF conference on computer vision and pattern …, 2020 | 1109 | 2020 |
Universal and transferable adversarial attacks on aligned language models A Zou, Z Wang, N Carlini, M Nasr, JZ Kolter, M Fredrikson arXiv preprint arXiv:2307.15043, 2023 | 693 | 2023 |
Representation engineering: A top-down approach to ai transparency A Zou, L Phan, S Chen, J Campbell, P Guo, R Ren, A Pan, X Yin, ... arXiv preprint arXiv:2310.01405, 2023 | 165 | 2023 |
Globally-Robust Neural Networks K Leino, Z Wang, M Fredrikson Proceedings of ICML 2021, 2021 | 147 | 2021 |
Towards frequency-based explanation for robust cnn Z Wang, Y Yang, A Shrivastava, V Rawal, Z Ding arXiv preprint arXiv:2005.03141, 2020 | 51 | 2020 |
Smoothed Geometry for Robust Attribution Z Wang, H Wang, S Ramkumar, M Fredrikson, P Mardziel, A Datta Proceedings of NeurIPS 2020, 2020 | 49 | 2020 |
Consistent counterfactuals for deep models E Black, Z Wang, M Fredrikson, A Datta arXiv preprint arXiv:2110.03109, 2021 | 46 | 2021 |
Robust models are more interpretable because attributions look normal Z Wang, M Fredrikson, A Datta arXiv preprint arXiv:2103.11257, 2021 | 19 | 2021 |
Interpreting interpretations: Organizing attribution methods by criteria Z Wang, P Mardziel, A Datta, M Fredrikson Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern …, 2020 | 19 | 2020 |
Influence Patterns for Explaining Information Flow in BERT K Lu, Z Wang, P Mardziel, A Datta arXiv preprint arXiv:2011.00740, 2020 | 13 | 2020 |
Scaling in depth: Unlocking robustness certification on imagenet K Hu, A Zou, Z Wang, K Leino, M Fredrikson arXiv preprint arXiv:2301.12549, 2023 | 11 | 2023 |
Machine learning explainability and robustness: connected at the hip A Datta, M Fredrikson, K Leino, K Lu, S Sen, Z Wang Proceedings of the 27th ACM SIGKDD Conference on Knowledge Discovery & Data …, 2021 | 11 | 2021 |
A recipe for improved certifiable robustness: Capacity and data K Hu, K Leino, Z Wang, M Fredrikson arXiv preprint arXiv:2310.02513, 2023 | 8 | 2023 |
Improving robust generalization by direct pac-bayesian bound minimization Z Wang, N Ding, T Levinboim, X Chen, R Soricut Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern …, 2023 | 7 | 2023 |
Unlocking deterministic robustness certification on imagenet K Hu, A Zou, Z Wang, K Leino, M Fredrikson Advances in Neural Information Processing Systems 36, 2024 | 5 | 2024 |
Learning modulo theories M Fredrikson, K Lu, S Vijayakumar, S Jha, V Ganesh, Z Wang arXiv preprint arXiv:2301.11435, 2023 | 5 | 2023 |
Reconstructing Actions To Explain Deep Reinforcement Learning X Chen, Z Wang, Y Fan, B Jin, P Mardziel, C Joe-Wong, A Datta arXiv preprint arXiv:2009.08507, 2020 | 5* | 2020 |
Grounding neural inference with satisfiability modulo theories Z Wang, S Vijayakumar, K Lu, V Ganesh, S Jha, M Fredrikson Advances in Neural Information Processing Systems 36, 2024 | 4 | 2024 |
Transfer attacks and defenses for large language models on coding tasks C Zhang, Z Wang, R Mangal, M Fredrikson, L Jia, C Pasareanu arXiv preprint arXiv:2311.13445, 2023 | 3 | 2023 |
Is Certifying Robustness Still Worthwhile? R Mangal, K Leino, Z Wang, K Hu, W Yu, C Pasareanu, A Datta, ... arXiv preprint arXiv:2310.09361, 2023 | 2 | 2023 |