DeepSeek满血微调秘籍开源！站在巨人肩膀打造私有模型，教程在此

明敏
2025-02-19
11:37:23

来源：量子位

震撼首发！

小明发自凹非寺

量子位 | 公众号 QbitAI

DeepSeek爆火甚至引发API低价内卷……

但是别忘了开源模型的最大好处是提供了“巨人的肩膀”啊！

微调DeepSeek-V3/R1，低成本打造高质量私有模型、提高业务竞争力，或许才是当下行业内更迫切的需求。

就在刚刚，已收获近4万GitHub StarColossal-AI发布开源大模型后训练工具箱，它包含：

DeepSeek-V3/R1满血671B LoRA低成本SFT微调；
完整的强化学习工具链PPO、GRPO、DPO、SimPO等；
无缝适配DeepSeek系列蒸馏模型在内的HuggingFace开源模型；
兼容支持英伟达GPU、华为昇腾NPU等多种硬件；
支持混合精度训练，gradient checkpoint等训练加速降低成本；
灵活的训练配置接口，支持自定义奖励函数、损失函数等；
提供灵活的并行策略配置接口，包括数据并行、模型并行、专家并行、ZeRO和Offload等，以适应不同硬件规模。

开源地址：https://github.com/hpcaitech/ColossalAI

低成本监督微调满血版DeepSeek-V3/R1-671B

6710亿参数规模的DeepSeek-V3/R1低成本微调，仅需以下几步，即可快速完成。

数据集准备

该脚本接收JSONL格式的文件作为输入数据集，例如：

https://github.com/hpcaitech/ColossalAI/blob/main/applications/ColossalChat/examples/training_scripts/lora_sft_data.jsonl。

数据集的每一行应为一个聊天对话列表。例如：

[{“role”: “user”, “content”: “你好，最近怎么样？”}, {“role”: “assistant”, “content”: “我很好。今天有什么可以帮你的吗？”}]

[{“role”: “user”, “content”: “火烧赤壁曹操为何不拨打119求救？”}, {“role”: “assistant”, “content”: “因为在三国时期，还没有电话和现代的消防系统，所以曹操无法拨打119求救。”}]

该数据格式，兼容Huggingface chat template，支持自定义system prompt，因此可灵活按需配置。

模型权重准备

为保证更好的微调效果，使用BF16权重进行微调。

如果已下载了FP8的DeepSeek-V3/R1权重，可以使用DeepSeek官方脚本https://github.com/deepseek-ai/DeepSeek-V3/blob/main/inference/fp8_cast_bf16.py通过GPU将权重转换为BF16。

对于使用国产华为昇腾算力，可以下载https://gitee.com/ascend/ModelZoo-PyTorch/blob/master/MindIE/LLM/DeepSeek/DeepSeek-V2/NPU_inference/fp8_cast_bf16.py脚本转换权重。

使用方法

在准备好数据集和模型权重后，可使用Colossal-AI 提供的一键启动脚本https://github.com/hpcaitech/ColossalAI/blob/main/applications/ColossalChat/examples/training_scripts/lora_finetune.py。

该脚本与常见SFT脚本类似，且完全兼容HuggingFace PEFT，启动命令：

colossalai run —hostfile path-to-host-file —nproc_per_node 8 lora_finetune.py —pretrained path-to-DeepSeek-R1-bf16 —dataset path-to-dataset.jsonl —plugin moe —lr 2e-5 —max_length 256 -g —ep 8 —pp 3 —batch_size 24 —lora_rank 8 —lora_alpha 16 —num_epochs 2 —warmup_steps 8 —tensorboard_dir logs —save_dir DeepSeek-R1-bf16-lora

有关每个参数的更多详细信息，可以运行python lora_finetune.py—help查看。该脚本可通过tensorboard记录学习率、loss、grad norm信息，方便对训练进行监控。

使用LoRA优化硬件资源消耗

通过使用LoRA等优化，示例命令已将SFT DeepSeek-V3/R1-671B最低硬件要求降低近10倍，可使用32个Ascend 910B NPU 64GB（使用 ep=8,pp=4）或24个H100/H800 GPU（使用 ep=8,pp=3）。如果你通过—zero_cpu_offload启用CPU offload，硬件要求可以进一步降低，但会损失一定的训练速度。

如下图验证，在SFT DeepSeek V3/R1 671B时，Loss 可以顺利降低。

对于资金充裕的开发团队，也可以使用上述脚本，将并行度高效扩展至数百及数千卡，快速完成DeepSeek-V3/R1-671B全参微调或并行加速。

对于预算有限，又想借助强化学习构建自己的类DeepSeek-R1模型， Colossal-AI也提供了解决方案，并利用小模型对算法进行了验证。

通过强化学习微调蒸馏版DeepSeek

Colossal-AI团队验证并实现了DeepSeek论文中的GRPO算法及verifiable reward，使用Qwen2.5-3B-Base模型进行了实验。其中，奖励的设计如下：

奖励=0，如果格式是正确的；
奖励=1，如果格式是正确的但是结果是错误的；
奖励=10，如果格式与结果都是正确的。

Colossal-AI团队以Qwen2.5-3B-Base模型为例，提供了用于验证GRPO的对话模板及设定

（https://github.com/hpcaitech/ColossalAI/blob/main/applications/ColossalChat/conversation_template/Qwen_Qwen2.5-3B.json），通过配置以下bash 文件，即可一键启动：
https://github.com/hpcaitech/ColossalAI/blob/main/applications/ColossalChat/examples/training_scripts/train_grpo.sh

同时，在GRPO章节，Colossal-AI团队还提供了验证过程中的部分发现及各种参数的详细描述，可供参考。

代码中设计了可灵活配置奖励函数的模板，因此，用户可根据自己的具体情况设计自己的奖励函数体系。

由下图可以看到，即使是3B的模型，平均奖励与模型回复长度随着时间逐步增长。

随着训练的进行，我们可以看到一些有意思的例子。例如随着训练迭代，模型开始了自我纠正：

Colossal-AI：最佳后训练工具箱

Colossal-AI在深耕大模型预训练降本增效的基础上，致力于进一步成为开发者开箱即用的最佳后训练工具，帮助用户基于开源模型，低成本快速构建私有模型。

开源地址：https://github.com/hpcaitech/ColossalAI

2025 年 2 月
一	二	三	四	五	六	日
	1	2
3	4	5	6	7	8	9
10	11	12	13	14	15	16
17	18	19	20	21	22	23
24	25	26	27	28

ufabet มีเกมให้เลือกเล่นมากมาย: เกมเดิมพันหลากหลาย ครบทุกค่ายดัง

tornado crypto mixer Discover the power of privacy with TornadoCash! Learn how this decentralized mixer ensures your transactions remain confidential.

ดูบอลสด Very well presented. Every quote was awesome and thanks for sharing the content. Keep sharing and keep motivating others.

ดูบอลสด Pretty! This has been a really wonderful post. Many thanks for providing these details.

ดูบอลสด Hi there to all, for the reason that I am genuinely keen of reading this website’s post to be updated on a regular basis. It carries pleasant stuff.

Obrazy Sztuka Nowoczesna Thank you for this wonderful contribution to the topic. Your ability to explain complex ideas simply is admirable.

ufabet Hi there to all, for the reason that I am genuinely keen of reading this website’s post to be updated on a regular basis. It carries pleasant stuff.

ufabet You’re so awesome! I don’t believe I have read a single thing like that before. So great to find someone with some original thoughts on this topic. Really.. thank you for starting this up. This website is something that is needed on the internet, someone with a little originality!

ufabet Very well presented. Every quote was awesome and thanks for sharing the content. Keep sharing and keep motivating others.

DeepSeek满血微调秘籍开源！站在巨人肩膀打造私有模型，教程在此

DeepSeek满血微调秘籍开源！站在巨人肩膀打造私有模型，教程在此

低成本监督微调满血版DeepSeek-V3/R1-671B

数据集准备

模型权重准备

使用方法

使用LoRA优化硬件资源消耗

通过强化学习微调蒸馏版DeepSeek

Colossal-AI：最佳后训练工具箱

test

test

文心AIGC

test

test