Locating and Editing Knowledge
Goals: local and edit knowledge in a language model
- Locating: Casual Tracing using i). clean run, ii). corrupted run, and iii). attempt to restore target token in corrupted run
- Editing:
Questions:
- (follow-up paper): maybe ROME only finds where the “output” is stored, not where “knowledge” is stored
- Doesn’t work in subject-change (xx founded Microsoft vs. Microsoft was founded by xx)
- Problem with fixing a prefix
- Side-effect of directly editing
- What’s the motivation behind focusing on MLP?
Does Localization Inform Editing? Surprising Differences in Causality-Based Localization vs. Knowledge Editing in Language Models (NeurIPS, 2023)
Claim: Can change how a fact is stored in a model by editing weights that are in a different location than where existing methods suggest that the fact is stored.
- There is a substantial fraction of factual knowledge stored outside of the range of layers edited by ROME/MEMIT
- Correlation between Causal Tracing results and edit success $\approx 0$
- Localization: identifying components of a model responsible for a certain behavior
- Editing: changing model components in order to change model behavior