Skip to content

dsdanielpark/DeepSick-R1

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

DeepSick-R1

"Too much feeling sick while reproducing DeepSeek-R1!!"

🚑 Why do we need to see this repository although there are many open-source codes for building DeepSeek-R1?

  • My short code lines and a few code files make users happy.
  • This code doesn't use huggingface GRPOTrainer class which may bring in frustration because of too much complexities when users customize GRPOTrainer to fit individual research and production.
  • This code has only three files (main.py, trainer.py, and utils.py) to know for training, while famous repositories Open-R1, R1-V, verl, and TinyZero have 1000+ code files, many config files, and too much folders.
  • vLLM is applied so that users can generate answer candidates realy fastly.
  • Although vLLM is applied, total number of code lines is still short.
  • For training with multiple GPU, one GPU will be assigned to vLLM model to generate, and the other GPUs are focusing on training.

Requirements!!: This repository requires two GPUs at least, because vLLM should be assigned to another GPU in order to separate the training GPU and inference GPU.


🚀 Short Talks

  • When we train Qwen2-VL-2B-Instruct with 100k QA samples on 2 NVIDIA A100 80GB VRAM, it takes 14 hours to train.
  • Once I increase the number of GPUs to 8 NVIDIA A100 80GB VRAM, it takes 4.5 hours to train (Data communications between vLLM GPu and other GPUs may be getting slow down).
  • The GPU memory usage was 40~60GB when unfreezing all MLP parameters in LLM decoder part, where I use 2 batch, 4 number of generations, and 4 GRPO iterations.
  • This repository is dealing with vision language models (VLMs) only, but I believe this code is really easy, so users can easily modify the code for LLM version.
  • In the current version, Qwen2.5-VL and latest vLLM are not supported because there is first flash attention issue in latest vLLM version and model parameter access issues. I will let this code updated once it is all resolved.

🍉 Install

#!/bin/bash
conda create -n deepsick python=3.12 -y
conda activate deepsick

# install vllm [Error happens using FlashAttention when using latest vllm]
pip install vllm==0.7.2

# install package
pip install trl wandb debugpy datasets deepspeed accelerate

# flash attention
pip install flash-attn --no-build-isolation

🍲 What to see for understanding

# Total 825 lines
main.py (286 lines)
trainer.py (108 lines)
utils.py (431 lines)

💻 Training with multi-GPU

DeepSpeed-ZeRO3 is used.

# ds_accel.yaml is the config file for deepspeed zero3
bash train.sh

In this file, you can see the n_gpu. this variable automatically computes the process number for accelerator - DeepSpeed. Because vLLM and accelerate are not compatible, this simple trick is really helpful to address the compatibility issue.

#!/usr/bin/env bash
CUDA_DEVICES="0,1,2,3,4,5,6,7"
length=${#CUDA_DEVICES}
n_gpu=$(( ( (length + 1) / 2 ) - 1 ))

CUDA_VISIBLE_DEVICES=$CUDA_DEVICES \
accelerate launch --config_file ds_accel.yaml \
--num_processes=$n_gpu \
main.py \
--wandb True \

About

Reproduction of DeepSeek-R1

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 99.3%
  • Shell 0.7%