Generalization in diffusion models arises from geometry-adaptive harmonic representation (Kadkhodaie, 2023)

  1. Model variance vanishes when $N$ increases → density implicitly represented by the DNN becomes independent of training set
    1. Increasing training set size $N$ substantially increases performance on test set, while worsening performance on train set

Questions / Thoughts:

  1. Subsets may still be similar despite being “non-overlapping” → learns the same distribution
  2. Model w/ 5,000 vs. 10,000: influential samples are supporting the same mode, therefore removing 5,000 still generates identical images
  3. Model is supported by a small amount of samples (influential)
    1. D-TRAK: why does removing ~32 samples work in counter-factual experiment?

Follow-up questions: