DeepSeek-V3. 2: Pushing the Frontier of Open Large Language Models We introduce DeepSeek-V3 2, a model that harmonizes high computational efficiency with superior reasoning and agent performance The key technical breakthroughs of DeepSeek-V3 2 are as follows: (1) DeepSeek Sparse Attention (DSA): We introduce DSA, an efficient attention mechanism that substantially reduces computational complexity while preserving model performance in long-context scenarios
DeepSeek Coder DeepSeek Coder comprises a series of code language models trained from scratch on both 87% code and 13% natural language in English and Chinese, with each model pre-trained on 2T tokens