OpenAI：权重稀疏的变压器具有可解释的电路（英文版）

发布者：wx****a2

2025-11-27

4 MB 31 页

文件列表：

OpenAI：权重稀疏的变压器具有可解释的电路（英文版）.pdf

资源简介

Finding human-understandable circuits in language models is a central goal of the field of mechanistic interpretability. We train models to have more understandable circuits by constraining most of their weights to be zeros, so that each neuron only has a few connections. To recover fine-grained circuits underlying each of several hand-crafted tasks, we prune the models to isolate the part responsible for the task. These circuits often contain neurons and residual channels that correspond to

加载中...

本文档仅能预览20页

继续阅读请下载文档