Important Concepts

1. Introduction

Image diffusion models such as DALLE-E3, Imagen, and Stable Diffusion are attracting attention due to their ability to generate high-quality synthetic images. However, results show that diffusion models are much less private and are more vulnerable to privacy breaches compared to prior generative models such as GANs. Specifically, state-of-the-art diffusion models memorize and regenerate individual training examples.

1.1 Image Denoiser

Diffusion models are trained with the denoising diffusion probabilistic models, as image denoiser. Given a clean image $x$, the training produces a noised image $x'\leftarrow \sqrt{a_t}x+\sqrt{1-a_t}\epsilon$, with the time-step $t\in[0,T]$, Gaussian noise vector $\epsilon\sim \mathcal{N}(0,I)$, and some decaying parameter $a_t \in [0,1]$ where $a_0=1$ and $a_T=0$. That is, we initially have $x'=x$ and noise is added to $x$ at each time-step as $t \rightarrow T$, eventually gives $x'= \epsilon$ .

A diffusion model $f_\theta$ removes the noise $\epsilon$ to recover the original image $x$ by predicting the noise that was added. This is done by stochastically minimizing the objective

$$ \frac{1}{N}\sum_i \mathbb{E}{t,\epsilon} \mathscr{L}(x_i,t, \epsilon,f\theta) $$

where $\mathscr{L}(x_i,t,\epsilon,f_\theta)= ||\epsilon - f_\theta(\sqrt{a_t}x_i+\sqrt{1-a_t}\epsilon,t)||^2_2$

1.2 Image Generator

To generate image, first sample a random vector $z_T\sim\mathscr{N}(0,I)$ and then apply the diffusion model $f_\theta$ to remove the noise from this random “image”. This process of noise removal is done iteratively, obtaining final image $z_0$ from $z_T$ by the rule

$$ z_{t-1} = f_\theta(z_t,t)+\sigma_t\mathscr{N}(0,I) $$

for a noise schedule $\sigma_t$ (dependent on $a_t$) with $\sigma_1=0$

Some diffusion models are conditioned to generate a particular type of image, for example:

  1. Class-conditional: input a class-label alongside the noised image → a particular class of image (e.g. “dog” or “cat”)
  2. Text-conditional: further input the text embedding of some prompt → pre-trained language encoder → image that adheres to the text prompt

Unconditional diffusion models are trained on a dataset $D=\{x_1,x_1,...,x_n\}$. It outputs a generated image $x_{gen} \leftarrow Gen(r)$ using a fresh random noise $r$ as input. Conditional models are trained on annotated images with $D=\{(x_1,c_1),...,(x_n,c_n)\}$, and when queried with a prompt $p$, the system outputs $x_{gen}\leftarrow Gen(p;r)$.

2. Extracting Training Data

2.1 Threat Model

Consider an adversary $\mathscr{A}$ that interacts with a diffusion model $Gen$ (backed by a neural network $f_\theta$) to extract images from the model’s training set $D$. The three adversarial goals include:

  1. Data extraction: Recover an image $\hat{x}$ almost identical to some $x \in D$
  2. Data reconstruction: Recover features hence the full image based on partial knowledge of a training image $x \in D$
  3. Membership inference: Given an image $x$, infer whether $x$ is in training set $D$

2.2 Defining Image Memorization