Image diffusion models such as DALLE-E3, Imagen, and Stable Diffusion are attracting attention due to their ability to generate high-quality synthetic images. However, results show that diffusion models are much less private and are more vulnerable to privacy breaches compared to prior generative models such as GANs. Specifically, state-of-the-art diffusion models memorize and regenerate individual training examples.
Diffusion models are trained with the denoising diffusion probabilistic models, as image denoiser. Given a clean image $x$, the training produces a noised image $x'\leftarrow \sqrt{a_t}x+\sqrt{1-a_t}\epsilon$, with the time-step $t\in[0,T]$, Gaussian noise vector $\epsilon\sim \mathcal{N}(0,I)$, and some decaying parameter $a_t \in [0,1]$ where $a_0=1$ and $a_T=0$. That is, we initially have $x'=x$ and noise is added to $x$ at each time-step as $t \rightarrow T$, eventually gives $x'= \epsilon$ .
A diffusion model $f_\theta$ removes the noise $\epsilon$ to recover the original image $x$ by predicting the noise that was added. This is done by stochastically minimizing the objective
$$ \frac{1}{N}\sum_i \mathbb{E}{t,\epsilon} \mathscr{L}(x_i,t, \epsilon,f\theta) $$
where $\mathscr{L}(x_i,t,\epsilon,f_\theta)= ||\epsilon - f_\theta(\sqrt{a_t}x_i+\sqrt{1-a_t}\epsilon,t)||^2_2$
To generate image, first sample a random vector $z_T\sim\mathscr{N}(0,I)$ and then apply the diffusion model $f_\theta$ to remove the noise from this random “image”. This process of noise removal is done iteratively, obtaining final image $z_0$ from $z_T$ by the rule
$$ z_{t-1} = f_\theta(z_t,t)+\sigma_t\mathscr{N}(0,I) $$
for a noise schedule $\sigma_t$ (dependent on $a_t$) with $\sigma_1=0$
Some diffusion models are conditioned to generate a particular type of image, for example:
Unconditional diffusion models are trained on a dataset $D=\{x_1,x_1,...,x_n\}$. It outputs a generated image $x_{gen} \leftarrow Gen(r)$ using a fresh random noise $r$ as input. Conditional models are trained on annotated images with $D=\{(x_1,c_1),...,(x_n,c_n)\}$, and when queried with a prompt $p$, the system outputs $x_{gen}\leftarrow Gen(p;r)$.
Consider an adversary $\mathscr{A}$ that interacts with a diffusion model $Gen$ (backed by a neural network $f_\theta$) to extract images from the model’s training set $D$. The three adversarial goals include: