Tiny Think Instruction

Purpose

This repository is designed for full fine-tuning of Tiny Models on a single local Blackwell GPU using:

Torch & Hugging Face Transformers
TRL (SFT, DPO)
vLLM (CUDA 12.8 backend)
llm-evaluation-harness (evaluation)

Hardware (CRITICAL)

All experiments run on one local machine only:

GPU: NVIDIA GeForce RTX 5060 Ti (16 GB VRAM, Blackwell)
CPU: AMD Ryzen 7 9700X
RAM: 32 GB
GPUs: 1 (no distributed or multi-GPU training)

Assume:

No DeepSpeed
No FSDP
No multi-node or multi-GPU setups
Everything must fit in 16 GB VRAM

Python Environment (STRICT)

This repository does NOT support arbitrary installs.

You must use:

Python 3.12
uv
A local .venv
A strict installation order

Environment Rules

If .venv exists
- Activate it
- Do NOT recreate it
- Do NOT reinstall packages unless debugging
If .venv does NOT exist
- Create it
- Install dependencies step by step in the exact order below

Environment Setup (AUTHORITATIVE)

Step 1: Create or Activate venv

if [ -d ".venv" ]; then
  source .venv/bin/activate
else
  uv venv .venv --python=3.12 --seed
  source .venv/bin/activate
fi

Step 2: Install Dependencies (ORDER MATTERS)

uv pip install "lm-eval[api]"
uv pip install langdetect immutabledict
uv pip install sympy math_verify antlr4-python3-runtime==4.11
uv pip install -U vllm --torch-backend=cu128
uv pip install trl
uv pip install liger-kernel
uv pip install kernels
uv pip install wandb

Do not:

Collapse this into requirements.txt
Change the order
Downgrade Python
Mix pip/conda installs

If something breaks, assume install order or CUDA backend mismatch first.

Supported Training Approaches

This repository supports

Full Fine-Tuning

Full weight updates (no adapters)

Full fine-tuning is:

Allowed
Experimental
Heavily constrained by VRAM

Because of the nature of the model full fine-tuning is preferable by default for quality!

Supported Post-Training Algorithms

Supervised Fine-Tuning (SFT)
Direct Preference Optimization (DPO)

All training must:

Run on a single GPU
Be memory-aware
Avoid distributed assumptions

Documentation (AUTHORITATIVE)

Use these links (and links from them) as the single source of truth for implementation decisions, troubleshooting, understanding, and documentation.

Evaluation + Contamination Rules

Use eval/run_eval_vllm_multi.sh as the main entrypoint:
- Default MODE=lm_eval uses eval/run_eval_vllm.sh (lm-eval, offline vLLM)
- MODE=math_eval uses eval/math_eval_vllm.py (GSM8K, MATH500; boxed-answer parsing)
Never train on benchmark test data (questions or answers) used in eval tasks
Do not edit evaluation scripts or task lists unless explicitly asked

Model Selection

All the experiments must use facebook/MobileLLM-R1-140M-base as the base model.

Agent Rules

Assume staff-level ML engineering context
Prefer correctness over convenience
Never introduce distributed complexity
Never reorder installs
If it doesn’t fit on RTX 5060 Ti (16 GB) → it’s out of scope

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Tiny Think Instruction

Purpose

Hardware (CRITICAL)

Python Environment (STRICT)

Environment Rules

Environment Setup (AUTHORITATIVE)

Step 1: Create or Activate venv

Step 2: Install Dependencies (ORDER MATTERS)

Supported Training Approaches

Full Fine-Tuning

Supported Post-Training Algorithms

Documentation (AUTHORITATIVE)

TRL Documentation

llm-evaluation-harness

vLLM

Evaluation + Contamination Rules

Model Selection

Agent Rules

FilesExpand file tree

AGENTS.md

Latest commit

History

AGENTS.md

File metadata and controls

Tiny Think Instruction

Purpose

Hardware (CRITICAL)

Python Environment (STRICT)

Environment Rules

Environment Setup (AUTHORITATIVE)

Step 1: Create or Activate venv

Step 2: Install Dependencies (ORDER MATTERS)

Supported Training Approaches

Full Fine-Tuning

Supported Post-Training Algorithms

Documentation (AUTHORITATIVE)

TRL Documentation

llm-evaluation-harness

vLLM

Evaluation + Contamination Rules

Model Selection

Agent Rules