×
img

通过可伸缩查找的条件记忆 大型语言模型的新稀疏轴(英文版)

发布者:wx****95
2026-01-15
1 MB 33 页
文件列表:
通过可伸缩查找的条件记忆 大型语言模型的新稀疏轴(英文版).pdf
下载文档

While Mixture-of-Experts (MoE) scales capacity via conditional computation, Transformers lack a native primitive for knowledge lookup, forcing them to inefficiently simulate retrieval through computation. To address this, we introduce conditional memory as a complementary sparsity axis, instantiated via Engram, a module that modernizes classic 𝑁-gram embedding for O (1) lookup. By formulating the Sparsity Allocation problem, we uncover a U-shaped scaling law that optimizes the trade-off betwe


加载中...

本文档仅能预览20页

继续阅读请下载文档

网友评论>

开通智库会员享超值特权
专享文档
免费下载
免广告
更多特权
立即开通

发布机构

更多>>