Remarkable progress in text-to-image diffusion models has brought a major concern about potentially generating images on inappropriate or trademarked concepts. Concept erasing has been investigated with the goals of deleting target concepts in diffusion models while preserving other concepts with minimal distortion. To achieve these goals, recent concept erasing methods usually fine-tune the cross-attention layers of diffusion models. In this work, we first show that merely updating the cross-attention layers in diffusion models, which is mathematically equivalent to adding \emph{linear} modules to weights, may not be able to preserve diverse remaining concepts. Then, we propose a novel framework, dubbed Concept Pinpoint Eraser (CPE), by adding \emph{nonlinear} Residual Attention Gates (ResAGs) that selectively erase (or cut) target concepts while safeguarding remaining concepts from broad distributions by employing an attention anchoring loss to prevent the forgetting. Moreover, we adversarially train CPE with ResAG and learnable text embeddings in an iterative manner to maximize erasing performance and enhance robustness against adversarial attacks. Extensive experiments on the erasure of celebrities, artistic styles, and explicit contents demonstrated that the proposed CPE outperforms prior arts by keeping diverse remaining concepts while deleting the target concepts with robustness against attack prompts.
(a) Comparison of fine-tuning approaches for concept erasing. Previous methods could affect both on target and remaining concepts as they merely fine-tunes CA layers. In contrast, our method, CPE, can adatively transmit the change for target concepts to erase while successfully suppressing it for remaining concepts, by using the proposed ResAGs. (b) Qualitative results on erasing “Claude Monet” artistic style, comparing with a baseline.
(a) Architecture of ResAG module in CA layers for selectively erasing a target concept while preserving remaining concepts. (b) To erase multiple targets during inference, we merge multiple ResAGs by only adding the ResAG of the target with the highest gate value for each token.
Qualitative results of our CPE and baselines on multiple concepts erasing. We erased 50 celebrities at once. It shows that CPE successfully preserves both similar and dissimilar concepts.
Qualitative results on artistic styles erasure. We erased 100 artistic styles at once. It shows that CPE successfully erases the target artistic styles while preserving diverse remaining concepts.
Results of detected number of explicit contents using NudeNet detector on I2P and preservation performance on MS-COCO 30K with CS, FID.
Qualitative results of CPE and baselines on robustness to adversarial attacks by UnlearnDiff (Zhang et al., 2023). It shows that CPE successfully defends against adversarial attack prompts.
@InProceedings{lee2024cpe,
author = {Lee, Byung Hyun and Lim, Sungjin and Lee, Seunggyu and Kang, Dong Un and Chun, Se Young},
title = {Concept Pinpoint Eraser for Text-to-image Diffusion Models via Residual Attention Gate},
booktitle = {ICLR},
year = {2025},
}