Locating and Editing Knowledge

Goals: local and edit knowledge in a language model

  1. Locating: Casual Tracing using i). clean run, ii). corrupted run, and iii). attempt to restore target token in corrupted run
  2. Editing:

Questions:

Does Localization Inform Editing? Surprising Differences in Causality-Based Localization vs. Knowledge Editing in Language Models (NeurIPS, 2023)

Claim: Can change how a fact is stored in a model by editing weights that are in a different location than where existing methods suggest that the fact is stored.

  1. There is a substantial fraction of factual knowledge stored outside of the range of layers edited by ROME/MEMIT
  2. Correlation between Causal Tracing results and edit success $\approx 0$