×
img

OpenAI:权重稀疏的变压器具有可解释的电路(英文版)

发布者:wx****a2
2025-11-27
4 MB 31 页
文件列表:
OpenAI:权重稀疏的变压器具有可解释的电路(英文版).pdf
下载文档

Finding human-understandable circuits in language models is a central goal of the field of mechanistic interpretability. We train models to have more understandable circuits by constraining most of their weights to be zeros, so that each neuron only has a few connections. To recover fine-grained circuits underlying each of several hand-crafted tasks, we prune the models to isolate the part responsible for the task. These circuits often contain neurons and residual channels that correspond to


加载中...

本文档仅能预览20页

继续阅读请下载文档

网友评论>

开通智库会员享超值特权
专享文档
免费下载
免广告
更多特权
立即开通

发布机构

更多>>