containers · jeffmaury · Dec 1, 2025 · Dec 1, 2025
@@ -292,7 +292,7 @@
     },
     {
       "id": "hf.openai.gpt-oss-20b",
-      "name": "openai/gtp-oss-20b (Unsloth quantization)",
+      "name": "openai/gpt-oss-20b (Unsloth quantization)",
       "description": "\r\n# Welcome to the gpt-oss series, [OpenAI’s open-weight models](https://openai.com/open-models) designed for powerful reasoning, agentic tasks, and versatile developer use cases.\r\n\r\nWe’re releasing two flavors of the open models:\r\n- `gpt-oss-120b` — for production, general purpose, high reasoning use cases that fits into a single H100 GPU (117B parameters with 5.1B active parameters)\r\n- `gpt-oss-20b` — for lower latency, and local or specialized use cases (21B parameters with 3.6B active parameters)\r\n\r\nBoth models were trained on our [harmony response format](https://github.com/openai/harmony) and should only be used with the harmony format as it will not work correctly otherwise.\r\n\r\n> [!NOTE]\r\n> This model card is dedicated to the smaller `gpt-oss-20b` model. Check out [`gpt-oss-120b`](https://huggingface.co/openai/gpt-oss-120b) for the larger model.\r\n\r\n# Highlights\r\n\r\n* **Permissive Apache 2.0 license:** Build freely without copyleft restrictions or patent risk—ideal for experimentation, customization, and commercial deployment.\r\n* **Configurable reasoning effort:** Easily adjust the reasoning effort (low, medium, high) based on your specific use case and latency needs.\r\n* **Full chain-of-thought:** Gain complete access to the model’s reasoning process, facilitating easier debugging and increased trust in outputs. It’s not intended to be shown to end users.\r\n* **Fine-tunable:** Fully customize models to your specific use case through parameter fine-tuning.\r\n* **Agentic capabilities:** Use the models’ native capabilities for function calling, [web browsing](https://github.com/openai/gpt-oss/tree/main?tab=readme-ov-file#browser), [Python code execution](https://github.com/openai/gpt-oss/tree/main?tab=readme-ov-file#python), and Structured Outputs.\r\n* **Native MXFP4 quantization:** The models are trained with native MXFP4 precision for the MoE layer, making `gpt-oss-120b` run on a single H100 GPU and the `gpt-oss-20b` model run within 16GB of memory.\r\n\r\n---\r\n\r\n# Inference examples\r\n\r\n## Transformers\r\nYou can use `gpt-oss-120b` and `gpt-oss-20b` with Transformers. If you use the Transformers chat template, it will automatically apply the [harmony response format](https://github.com/openai/harmony). If you use `model.generate` directly, you need to apply the harmony format manually using the chat template or use our [openai-harmony](https://github.com/openai/harmony) package.\r\n\r\nTo get started, install the necessary dependencies:\r\n```\r\npip install -U transformers kernels torch \r\n```\r\n\r\n```py\r\nfrom transformers import pipeline\r\nimport torch\r\n\r\nmodel_id = \"openai/gpt-oss-20b\"\r\n\r\npipe = pipeline(\r\n    \"text-generation\",\r\n    model=model_id,\r\n    torch_dtype=\"auto\",\r\n    device_map=\"auto\",\r\n)\r\n\r\nmessages = [\r\n    {\"role\": \"user\", \"content\": \"Explain quantum mechanics clearly and concisely.\"},\r\n]\r\n\r\noutputs = pipe(\r\n    messages,\r\n    max_new_tokens=256,\r\n)\r\nprint(outputs[0][\"generated_text\"][-1])\r\n```\r\n\r\n## vLLM\r\nvLLM recommends using [uv](https://docs.astral.sh/uv/) for Python dependency management. You can spin up an OpenAI-compatible webserver:\r\n```\r\nuv pip install --pre vllm==0.10.1+gptoss \\\r\n    --extra-index-url https://wheels.vllm.ai/gpt-oss/ \\\r\n    --extra-index-url https://download.pytorch.org/whl/nightly/cu128 \\\r\n    --index-strategy unsafe-best-match\r\n\r\nvllm serve openai/gpt-oss-20b\r\n```\r\n\r\n## PyTorch / Triton\r\nSee [reference implementations](https://github.com/openai/gpt-oss?tab=readme-ov-file#reference-pytorch-implementation).\r\n\r\n## Ollama\r\n```bash\r\n# gpt-oss-20b\r\nollama pull gpt-oss:20b\r\nollama run gpt-oss:20b\r\n```\r\n\r\n## LM Studio\r\n```bash\r\n# gpt-oss-20b\r\nlms get openai/gpt-oss-20b\r\n```\r\n\r\n# Download the model\r\n```bash\r\n# gpt-oss-20b\r\nhuggingface-cli download openai/gpt-oss-20b --include \"original/*\" --local-dir gpt-oss-20b/\npip install gpt-oss\npython -m gpt_oss.chat model/\r\n```\r\n\r\n# Reasoning levels\r\n* **Low:** Fast responses for general dialogue.\r\n* **Medium:** Balanced speed and detail.\r\n* **High:** Deep and detailed analysis.\r\n\r\n# Tool use\r\n* Web browsing (built-in tools)\r\n* Function calling with schemas\r\n* Agentic operations\r\n\r\n# Fine-tuning\r\nThe smaller model `gpt-oss-20b` can be fine-tuned on consumer hardware, larger `gpt-oss-120b` can be fine-tuned on a single H100 node.",
       "registry": "Hugging Face",
       "license": "Apache-2.0",