Knowledge Locating

Locating and Editing Knowledge

Goals: local and edit knowledge in a language model

Locating: Casual Tracing using i). clean run, ii). corrupted run, and iii). attempt to restore target token in corrupted run
Editing:

Questions:

(follow-up paper): maybe ROME only finds where the “output” is stored, not where “knowledge” is stored
- Doesn’t work in subject-change (xx founded Microsoft vs. Microsoft was founded by xx)
- Problem with fixing a prefix
- Side-effect of directly editing
What’s the motivation behind focusing on MLP?

Does Localization Inform Editing? Surprising Differences in Causality-Based Localization vs. Knowledge Editing in Language Models (NeurIPS, 2023)

Claim: Can change how a fact is stored in a model by editing weights that are in a different location than where existing methods suggest that the fact is stored.

There is a substantial fraction of factual knowledge stored outside of the range of layers edited by ROME/MEMIT
Correlation between Causal Tracing results and edit success $\approx 0$

Localization: identifying components of a model responsible for a certain behavior
Editing: changing model components in order to change model behavior