DeepSeek-V4技术报告（英文版）

发布者：wx****c2

2026-04-28

4 MB 58 页

文件列表：

DeepSeek-V4技术报告（英文版）.pdf

资源简介

We present a preview version of DeepSeek-V4 series, including two strong Mixture-ofExperts (MoE) language models — DeepSeek-V4-Pro with 1.6T parameters (49B activated) and DeepSeek-V4-Flash with 284B parameters (13B activated) — both supporting a context length of one million tokens. DeepSeek-V4 series incorporate several key upgrades in architecture and optimization: (1) a hybrid attention architecture that combines Compressed Sparse Attention (CSA) and Heavily Compressed Attention (HCA) t

加载中...

本文档仅能预览20页

继续阅读请下载文档