from transformers import AutoModelForCausalLM, AutoTokenizer
checkpoint = "amd/Instella-3B-Math"
tokenizer = AutoTokenizer.from_pretrained(checkpoint, trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained(checkpoint, device_map="auto", trust_remote_code=True)
prompt = [{"role": "user", "content": "Natalia sold clips to 48 of her friends in April, and then she sold half as many clips in May. How many clips did Natalia sell altogether in April and May? Let's think step by step and output the final answer within \\boxed{}."}]
inputs = tokenizer.apply_chat_template(
prompt,
add_generation_prompt=True,
return_tensors='pt'
)
tokens = model.generate(
inputs.to(model.device),
max_new_tokens=1024,
temperature=0.8,
do_sample=True
)
print(tokenizer.decode(tokens[0], skip_special_tokens=False))We conducted two-stage math SFT to enhance the math capabilities of the base Instella-3B-Instruct model.
We use the Instella codebase for the stage 1 math SFT, where the model is trained on the OpenMathInstruct-2 dataset with 4096 context length. Please follow the installation guide to set up the environment.
Run the following commands to prepare the stage 1 math SFT data:
git clone https://github.com/AMD-AIG-AIMA/Instella.git
cd Instella
bash scripts/prepare_math_sft_data.shLaunch the SFT job with the SFT config file:
torchrun --nproc_per_node=8 scripts/train.py configs/instella-3b-sft-math-stage1.yaml
Note: You need to convert the Huggingface Instella-3B-Instruct checkpoint to PyTorch format and then update load_path in the config file to the converted model checkpoint. Please see the instruction for checkpoint conversion here.
In the stage 2 math SFT, we continue to train the model on the English subset of the AM-DeepSeek-R1-Distilled-1.4M dataset with 1.3M samples, and increase the context length to 32K. We provide a script to prepare the dataset for training. The training is based on open-instruct. To run the stage 2 math SFT training:
cd sft
bash scripts/finetune_with_accelerate_config_stage3.sh configs/train_configs/instella/instella-3b-sft-math-stage2.yaml
Note: Please update model_name_or_path in the config to your stage 1 math SFT model. You need to convert the checkpoint to the Huggingface format (see the instruction here). In addition, please update dataset_name to the Huggingface repo of your processed dataset.
We conduct GRPO after SFT using VERL.
Run the following command to setup the docker image rocm6.3.4:vllm-0.8.5-numa-patch-ubuntu-22.04-cascade-fix
docker build -f docker/dockerfile -t rocm6.3.4:vllm-0.8.5-numa-patch-ubuntu-22.04-cascade-fix .Run the following commands to prepare the RL data:
cd verl
python examples/data_preprocess/big_math.py
python examples/data_preprocess/deep_math.py
python examples/data_preprocess/deepscaler.pycd verl
sbatch ./scripts/run_grpo_multi_nodes.shOur evaluations are based on the DeepScaleR codebase. All our evaluations are done on a single node with 8 AMD Instinct™ MI300X GPUs.
Run the following commands to setup the evaluation environment:
# For AMD GPUs:
cd evals
# Start the docker container:
bash start_docker.sh
docker exec -it instella-math-eval bash
# Install dependencies
bash install_setup.shAll the processed test datasets are avilable in evals/deepscaler/data_processed. Directly run the following command to reproduce our results:
bash run_eval_math.shThe RL training codebase is built from VERL.
The evaluation codebase is built from DeepScaleR.
- The Instella-3B-Math models are licensed for academic and research purposes under a ResearchRAIL license.
- Refer to the LICENSE and NOTICE files for more information.
Feel free to cite our Instella paper and give us a star⭐ if you find our work helpful :)
@article{liu2025instella,
title={Instella: Fully Open Language Models with Stellar Performance},
author={Liu, Jiang and Wu, Jialian and Yu, Xiaodong and Su, Yusheng and Mishra, Prakamya and Ramesh, Gowtham and Ranjan, Sudhanshu and Manem, Chaitanya and Sun, Ximeng and Wang, Ze and Brahma, Pratik Prabhanjan and Liu, Zicheng and Barsoum, Emad},
journal={arXiv preprint arXiv:2511.10628},
year={2025}
}