GitHub - flagos-ai/FlagScale: FlagScale is a large model toolkit based on open-sourced projects.

🔥 Latest News

[2025/09] Released v0.9.0:
- Training & Finetuning: Added LoRA for efficient finetuning, improved the autotuner for cross-chip heterogeneous training, and enabled distributed RWKV training.
- Inference & Serving: Introduced DiffusionEngine for FLUX.1-dev, Qwen-Image, and Wan2.1-T2V, support multi-model automatic orchestration and dynamic scaling.
- Embodied AI: Full lifecycle support for Robobrain, Robotics, and PI0, plus semantic retrieval for MCP-based skills for RoboOS.
- Elastic & Fault Tolerance: Detect task status automatically (errors, hangs, etc.) and periodically record them.
- Hardware & System: Broader chip support, upgraded patch mechanism with file-level diffs, and enhanced CICD for different chips.
[2025/04] Released v0.8.0:
- Introduced a new flexible and robust multi-backend mechanism and updated vendor adaptation methods.
- Enabled heterogeneous prefill-decoding disaggregation across vendor chips within a single instance via FlagCX (beta).
- Upgraded DeepSeek-V3 pre-training with the new Megatron-LM and added heterogeneous pre-training across different chips for MoE models like DeepSeek-V3.
[2025/02] Released v0.6.5:
- Added support for DeepSeek-V3 distributed pre-training (beta) and DeepSeek-V3/R1 serving across multiple chips.
- Introduced an auto-tuning feature for serving and a new CLI feature for one-click deployment.
- Enhanced the CI/CD system to support more chips and integrated the workflow of FlagRelease.
[2024/11] Released v0.6.0:
- Introduced general multi-dimensional heterogeneous parallelism and CPU-based communication between different chips.
- Added the full support for LLaVA-OneVision, achieving SOTA results on the Infinity-MM dataset.
- Open-sourced the optimized CFG implementation and accelerated the generation and understanding tasks for Emu3.
- Implemented the auto-tuning feature and enhanced the CI/CD system.
[2024/4] Released v0.3: Achieved heterogeneous hybrid training of the Aquila2-70B-Expr model on a cluster using both NVIDIA and Iluvatar chips. Adapted the Aquila2 series to AI chips from six different manufacturers.
[2023/11] Released v0.2: Introduced training support for Aquila2-70B-Expr, enabling heterogeneous training across chips with the same or compatible architectures.
[2023/10] Released v0.1: Supported Aquila models with optimized training schemes for Aquila2-7B and Aquila2-34B, including parallel strategies, optimizations, and hyper-parameter settings.

🔗 About

FlagScale is a comprehensive toolkit designed to support the entire lifecycle of large models, developed with the backing of the Beijing Academy of Artificial Intelligence (BAAI). It builds on the strengths of several prominent open-source projects, including Megatron-LM and vllm, to provide a robust, end-to-end solution for managing and scaling large models.

The primary objective of FlagScale is to enable seamless scalability across diverse hardware architectures while maximizing computational resource efficiency and enhancing model performance. By offering essential components for model development, training, and deployment, FlagScale seeks to establish itself as an indispensable toolkit for optimizing both the speed and effectiveness of large model workflows.

FlagScale is also a part of FlagAI-Open, an open-source initiative by BAAI that aims to foster an open-source ecosystem for AI technologies. It serves as a platform where developers, researchers, and AI enthusiasts can collaborate on various AI projects, contribute to the development of cutting-edge AI solutions, and share their work with the global community.

join our WeChat Group

✏️ Support List

Platform

Vendors	vllm	megatron
BI V150	✅	✅
Cambricon MLU	✅	✅
Huawei Atlas800 TA3 (Ascend)	✅	✅
Hygon BW1000	✅	✅
Kunlunxin R310p	✅	✅
Metax C550	✅	✅
MUSA S5000	✅	✅
Tsing Micro	✅	✅
NVIDIA+Cambricon MLU		✅

Model

Training

Model	Example config File
DeepSeek-V3	16b_a3b.yaml
Qwen2/2.5/3	235b_a22b.yaml
Qwen2.5-VL	7b.yaml
QwQ	32b.yaml
LLaMA2	7b.yaml
LLaMA3/3.1	70b.yaml
LLaVA-OneVision	7b.yaml
LLaVA1.5	7b.yaml
Mixtral	8x7b.yaml
RWKV	7b.yaml
Aquila	7b.yaml
...	...

Serve/Inference

Model	Example config File
DeepSeek-V3	671b.yaml
DeepSeek-R1	671b.yaml
Qwen2.5	72b.yaml
Qwen3	8b.yaml
Qwen2.5-VL	32b_instruct.yaml
Qwen3-Omni	30b.yaml
QwQ	32b.yaml
Grok2	270b.yaml
Kimi-K2	1t.yaml
...	...

🚀 Quick Start

FlagScale leverages Hydra for configuration management. The configurations are organized into two levels: an outer experiment-level YAML file and an inner task-level YAML file.

The experiment-level YAML file defines the experiment directory, backend engine, task type, and other related environmental configurations.
The task-level YAML file specifies the model, dataset, and parameters for specific tasks such as training or inference.

All valid configurations in the task-level YAML file correspond to the arguments used in backend engines such as Megatron-LM and vllm, with hyphens (-) replaced by underscores (_). For a complete list of available configurations, please refer to the backend engine documentation. Simply copy and modify the existing YAML files in the examples folder to get started.

🔧 Setup

We recommend using the latest release of NGC's PyTorch container for setup.

Clone the repository:

git clone https://github.com/FlagOpen/FlagScale.git

Install the requirements:

We offer two installation methods:

Source Installation

PYTHONPATH=./:$PYTHONPATH pip install . --no-build-isolation --verbose \
--config-settings=device=<device> \
--config-settings=backend=<backend>

# For vllm:
--config-settings=device=gpu
--config-settings=backend=vllm
# For megatron:
--config-settings=device=gpu
--config-settings=backend=Megatron-LM
# Or specify both:
--config-settings=device=gpu
--config-settings=backend=vllm,Megatron-LM

Whl Installation

# For vllm backend:
PYTHONPATH=./:$PYTHONPATH pip install .[vllm-gpu] --no-build-isolation --verbose
flagscale install --backend=vllm --device=gpu
# For megatron backend:
PYTHONPATH=./:$PYTHONPATH pip install .[megatron-gpu] --no-build-isolation --verbose
flagscale install --backend=megatron --device=gpu

The installation methods vary greatly in different chip environments, and the above installation methods currently only support GPU. More backends and chips will be supported in the future.

🎈 Run a Task

FlagScale provides a unified runner for various tasks, including training，inference and serve. Simply specify the configuration file to run the task with a single command. The runner will automatically load the configurations and execute the task. The following example demonstrates how to run a distributed training task.

Train

Require megatron env. See details in Setup

Prepare dataset demo:

We provide a small processed data (bin and idx) from the Pile dataset.

mkdir -p /path/to/data && cd /path/to/data
wget https://model.ks3-cn-beijing.ksyuncs.com/nlpdata/pile_wikipedia_demo.idx
wget https://model.ks3-cn-beijing.ksyuncs.com/nlpdata/pile_wikipedia_demo.bin

Edit config:

Modify the data path in ./examples/aquila/conf/train/7b.yaml

data:
    data_path: ${data_path:??}  # modify data path here
    split: 1
    tokenizer:
        legacy_tokenizer: true
        tokenizer_type: AquilaTokenizerFS
        vocab_file: ./examples/aquila/tokenizer/vocab.json
        merge_file: ./examples/aquila/tokenizer/merges.txt
        special_tokens_file: ./examples/aquila/tokenizer/special_tokens.txt
        vocab_size: 100008

Start the distributed training job:

python run.py --config-path ./examples/aquila/conf --config-name train action=run

Stop the distributed training job:

python run.py --config-path ./examples/aquila/conf --config-name train action=stop

Inference

Require vllm env. See details in Setup

Prepare model

modelscope download --model BAAI/Aquila-7B README.md --local_dir ./

Edit config

FlagScale/examples/aquila/conf/inference/7b.yaml

llm:
    model: /workspace/models/BAAI/Aquila-7B         # modify path here
    tokenizer: /workspace/models/BAAI/Aquila-7B     # modify path here
    trust_remote_code: true
    tensor_parallel_size: 1
    pipeline_parallel_size: 1
    gpu_memory_utilization: 0.5
    seed: 1234

Start inference:

python run.py --config-path ./examples/aquila/conf --config-name inference action=run

Serve

Setup env

PYTHONPATH=./:$PYTHONPATH pip install . --config-settings=domain=robotics --config-settings=device=gpu  --verbose --no-build-isolation

Download Tokenizer

mkdir -p /models/physical-intelligence/
cd /models/physical-intelligence/
git lfs install
git clone https://huggingface.co/physical-intelligence/fast

Edit Config

./examples/robobrain_x0/conf/serve/robobrain_x0.yaml

Change 3 fields:
- engine_args.model_sub_task -> /models/BAAI/RoboBrain-X0-Preview
- engine_args.port -> A port available in your env, for example: 5001
- engine_args.tokenizer_path ->/models/physical-intelligence/fast

Start the server:

python run.py --config-path ./examples/robobrain_x0/conf --config-name serve action=run

Stop the server:

python run.py --config-path ./examples/robobrain_x0/conf --config-name serve action=stop

🧱 DeepSeek-R1 Serving

We support the model serving of DeepSeek R1 and have implemented the flagscale serve command for one-click deployment. By configuring just two YAML files, you can easily serve the model using the flagscale serve command.

Configure the YAML files:

FlagScale/
├── examples/
│   └── deepseek_r1/
│       └── conf/
│           └── serve.yaml
|           └── hostfile.txt # Set hostfile (optional)
│           └── serve/
│               └── 671b.yaml # Set model parameters and server port

Note: When task covers multiple nodes, hostfile.txt is required. The file path should be set in serve.yaml.

Install FlagScale CLI:

cd FlagScale
PYTHONPATH=./:$PYTHONPATH pip install . --verbose --no-build-isolation

One-click serve:
```
flagscale serve deepseek_r1
```

Custom service parameters:

flagscale serve <MODEL_NAME> <MODEL_CONFIG_YAML>

The configuration files allow you to specify the necessary parameters and settings for your deployment, ensuring a smooth and efficient serving process.

🎨 Contributing

Patch the modifications to the specified third_party backend for PR.

cd FlagScale
python tools/patch/patch.py --backend Megatron-LM
python tools/patch/patch.py --backend vllm

📄 License

This project is licensed under the Apache License (Version 2.0). This project also contains other third-party components under other open-source licenses. See the LICENSE file for more information.

Name		Name	Last commit message	Last commit date
Latest commit History 3,064 Commits
.gemini		.gemini
.github		.github
build_backend		build_backend
docker		docker
examples		examples
flagscale		flagscale
hardware		hardware
install		install
requirements		requirements
tests		tests
third_party		third_party
tools		tools
.coveragerc		.coveragerc
.gitignore		.gitignore
.gitmodules		.gitmodules
.pre-commit-config.yaml		.pre-commit-config.yaml
LICENSE		LICENSE
README.md		README.md
__init__.py		__init__.py
flagopen.png		flagopen.png
pyproject.toml		pyproject.toml
run.py		run.py
setup.py		setup.py
version.py		version.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

🔥 Latest News

🔗 About

✏️ Support List

Platform

Model

Training

Serve/Inference

🚀 Quick Start

🔧 Setup

🎈 Run a Task

Train

Inference

Serve

🧱 DeepSeek-R1 Serving

🎨 Contributing

📄 License

About

Uh oh!

Releases 6

Packages

Uh oh!

Contributors 70

Uh oh!

Languages

License

flagos-ai/FlagScale

Folders and files

Latest commit

History

Repository files navigation

🔥 Latest News

🔗 About

✏️ Support List

Platform

Model

Training

Serve/Inference

🚀 Quick Start

🔧 Setup

🎈 Run a Task

Train

Inference

Serve

🧱 DeepSeek-R1 Serving

🎨 Contributing

📄 License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 6

Packages 0

Uh oh!

Contributors 70

Uh oh!

Languages

Packages