Universities logos

Memories of Forgotten Concepts

*Equal contribution, 1Tel Aviv University, 2Reichman University

Many studies aim to erase concepts from diffusion models. While these models may no longer generate images tied to the erased concept when prompted, we ask:

Is the concept truly erased? Could the model still reproduce it through other means?

Our work vs. prior art

Instead of analyzing a model that erased some concept by generating many images and analyzing them, we propose a method that analyzes it using latents that generate the erased concept.

Abstract

Diffusion models dominate the space of text-to-image generation, yet they may produce undesirable outputs, including explicit content or private data. To mitigate this, concept ablation techniques have been explored to limit the generation of certain concepts.

In this paper, we reveal that the erased concept information persists in the model and that erased concept images can be generated using the right latent. Utilizing inversion methods, we show that there exist latent seeds capable of generating high quality images of erased concepts.

Moreover, we show that these latents have likelihoods that overlap with those of images outside the erased concept. We extend this to demonstrate that for every image from the erased concept set, we can generate many seeds that generate the erased concept. Given the vast space of latents capable of generating ablated concept images, our results suggest that fully erasing concept information may be intractable, highlighting possible vulnerabilities in current concept ablation techniques.

Analysis

Starting with an image from the erased concept, we use image inversion to find a latent seed that generates the image. We explore how likely this latent is, along with the reconstruction error (PSNR) of the image using the latent and the diffusion model.

Analysis using inversion

We can also retrieve multiple distant latents that can generate a given image. This is done by inverting the diffusion model's decoder with different initializations (using arbitrary non erased images \( \mathcal{I}_{s_i} \)) before we perform image inversion.


Analysis using the 'Sequential Inversion Block'
The distant latents we find can be interpreted geometrically as points on a sphere (in the latent space) with a similar distance to the latent of the erased concept image.
All these latents produce images (using the Sequential Inversion Block as shown above) with low reconstruction error compared to \( \mathcal{I}_q \).
Geometric interpretation

BibTeX

@misc{rusanovsky2024memoriesforgottenconcepts,
     title={Memories of Forgotten Concepts},
     author={Matan Rusanovsky and Shimon Malnick and Amir Jevnisek and Ohad Fried and Shai Avidan},
     year={2024},
     eprint={2412.00782},
     archivePrefix={arXiv},
     primaryClass={cs.CV},
     url={https://arxiv.org/abs/2412.00782},
  }