Instella-Math✨: Fully Open Language Model with Reasoning Capability

Getting Started

Example Usage

from transformers import AutoModelForCausalLM, AutoTokenizer
checkpoint = "amd/Instella-3B-Math"

tokenizer = AutoTokenizer.from_pretrained(checkpoint, trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained(checkpoint, device_map="auto", trust_remote_code=True)

prompt = [{"role": "user", "content": "Natalia sold clips to 48 of her friends in April, and then she sold half as many clips in May. How many clips did Natalia sell altogether in April and May? Let's think step by step and output the final answer within \\boxed{}."}]
inputs = tokenizer.apply_chat_template(
    prompt,
    add_generation_prompt=True,
    return_tensors='pt'
)

tokens = model.generate(
    inputs.to(model.device),
    max_new_tokens=1024,
    temperature=0.8,
    do_sample=True
)

print(tokenizer.decode(tokens[0], skip_special_tokens=False))

Supervised Fine-tuning (SFT)

We conducted two-stage math SFT to enhance the math capabilities of the base Instella-3B-Instruct model.

Stage 1

We use the Instella codebase for the stage 1 math SFT, where the model is trained on the OpenMathInstruct-2 dataset with 4096 context length. Please follow the installation guide to set up the environment.

Run the following commands to prepare the stage 1 math SFT data:

git clone https://github.com/AMD-AIG-AIMA/Instella.git
cd Instella
bash scripts/prepare_math_sft_data.sh

Launch the SFT job with the SFT config file:

torchrun --nproc_per_node=8 scripts/train.py configs/instella-3b-sft-math-stage1.yaml

Note: You need to convert the Huggingface Instella-3B-Instruct checkpoint to PyTorch format and then update load_path in the config file to the converted model checkpoint. Please see the instruction for checkpoint conversion here.

Stage 2

In the stage 2 math SFT, we continue to train the model on the English subset of the AM-DeepSeek-R1-Distilled-1.4M dataset with 1.3M samples, and increase the context length to 32K. We provide a script to prepare the dataset for training. The training is based on open-instruct. To run the stage 2 math SFT training:

cd sft

bash scripts/finetune_with_accelerate_config_stage3.sh configs/train_configs/instella/instella-3b-sft-math-stage2.yaml

Note: Please update model_name_or_path in the config to your stage 1 math SFT model. You need to convert the checkpoint to the Huggingface format (see the instruction here). In addition, please update dataset_name to the Huggingface repo of your processed dataset.

Reinforcement Learning (GRPO)

We conduct GRPO after SFT using VERL.

Installation

Run the following command to setup the docker image rocm6.3.4:vllm-0.8.5-numa-patch-ubuntu-22.04-cascade-fix

docker build -f docker/dockerfile -t rocm6.3.4:vllm-0.8.5-numa-patch-ubuntu-22.04-cascade-fix .

Data Preparation

Run the following commands to prepare the RL data:

cd verl
python examples/data_preprocess/big_math.py
python examples/data_preprocess/deep_math.py
python examples/data_preprocess/deepscaler.py

Training

cd verl
sbatch ./scripts/run_grpo_multi_nodes.sh

Evaluation

Our evaluations are based on the DeepScaleR codebase. All our evaluations are done on a single node with 8 AMD Instinct™ MI300X GPUs.

Run the following commands to setup the evaluation environment:

# For AMD GPUs:
cd evals
# Start the docker container:
bash start_docker.sh
docker exec -it instella-math-eval bash
# Install dependencies
bash install_setup.sh

All the processed test datasets are avilable in evals/deepscaler/data_processed. Directly run the following command to reproduce our results:

bash run_eval_math.sh

Acknowledgement

The RL training codebase is built from VERL.

The evaluation codebase is built from DeepScaleR.

License

The Instella-3B-Math models are licensed for academic and research purposes under a ResearchRAIL license.
Refer to the LICENSE and NOTICE files for more information.

Citations

Feel free to cite our Instella paper and give us a star⭐ if you find our work helpful :)

@article{liu2025instella,
  title={Instella: Fully Open Language Models with Stellar Performance},
  author={Liu, Jiang and Wu, Jialian and Yu, Xiaodong and Su, Yusheng and Mishra, Prakamya and Ramesh, Gowtham and Ranjan, Sudhanshu and Manem, Chaitanya and Sun, Ximeng and Wang, Ze and Brahma, Pratik Prabhanjan and Liu, Zicheng and Barsoum, Emad},
  journal={arXiv preprint arXiv:2511.10628},
  year={2025}
}

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Instella-Math✨: Fully Open Language Model with Reasoning Capability

Getting Started

Example Usage

Supervised Fine-tuning (SFT)

Stage 1

Stage 2

Reinforcement Learning (GRPO)

Installation

Data Preparation

Training

Evaluation

Acknowledgement

License

Citations

About

Uh oh!

Releases

Packages

Uh oh!

Contributors 4

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 25 Commits
docker		docker
evals		evals
sft		sft
verl		verl
.gitmodules		.gitmodules
LICENSE		LICENSE
NOTICE		NOTICE
README.md		README.md

License

AMD-AGI/Instella-Math

Folders and files

Latest commit

History

Repository files navigation

Instella-Math✨: Fully Open Language Model with Reasoning Capability

Getting Started

Example Usage

Supervised Fine-tuning (SFT)

Stage 1

Stage 2

Reinforcement Learning (GRPO)

Installation

Data Preparation

Training

Evaluation

Acknowledgement

License

Citations

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors 4

Uh oh!

Languages

Packages