[Core] Add GPT-OSS model support for Ascend NPU #2421

wusar · 2025-08-18T08:37:47Z

Description

This PR adds support for OpenAI's GPT-OSS (Mixture-of-Experts) model on Ascend NPU through vLLM Ascend framework. This extends vLLM Ascend's model registry to support the GPT-OSS architecture.

What this PR does

Adds GPT-OSS model implementation to vLLM Ascend model registry
Implements MoE architecture with 128 experts and top-4 routing using AscendFusedMoE
Adds sliding window attention mechanism (128 tokens for even layers, global for odd layers)
Integrates YARN RoPE scaling compatible with GPT-OSS configuration
Provides model conversion tools from HuggingFace format to vLLM-compatible format
Includes usage examples and documentation

Technical Implementation

Core Model Components

GPTOSSConfig: Model configuration class with MoE and attention parameters
GPTOSSAttention: Sliding window attention implementation with RoPE support
GPTOSSMoELayer: MoE layer using AscendFusedMoE for expert routing
GPTOSSDecoderLayer: Complete transformer decoder layer
GPTOSSModel: Main model class with embedding and layer stack
GPTOSSForCausalLM: Causal language modeling wrapper

Files Added/Modified

Core Implementation

vllm_ascend/models/gpt_oss.py - GPT-OSS model implementation
vllm_ascend/models/__init__.py - Model registry updates

Supporting Tools and Examples

tools/convert_gpt_oss.py - Model conversion utility
examples/gpt_oss_example.py - Usage examples
scripts/gpt_oss_quickstart.sh - Quick start script

Testing & Documentation

tests/ut/test_gpt_oss_model.py - Unit tests
docs/source/models/gpt_oss.md - Documentation
GPT_OSS_MIGRATION_README.md - Migration guide

Usage Example

from vllm import LLM, SamplingParams

# Initialize GPT-OSS model on Ascend NPU
llm = LLM(
    model="./gpt-oss-20b-converted",
    device="ascend", 
    tensor_parallel_size=1,
    dtype="bfloat16",
    trust_remote_code=True
)

# Generate text
prompts = ["Hello, how are you?"]
sampling_params = SamplingParams(temperature=0.8, max_tokens=100)
outputs = llm.generate(prompts, sampling_params)

- vLLM version: v0.10.1.1
- vLLM main: https://github.com/vllm-project/vllm/commit/b00e69f8ca55f4a82847d39466f57ceb748324c1

gemini-code-assist

Code Review

This pull request introduces support for the GPT-OSS model on Ascend NPU, a significant enhancement for the vLLM Ascend framework. The implementation is comprehensive, covering the model configuration, attention mechanism with sliding window and YARN RoPE scaling, and a Mixture-of-Experts layer using AscendFusedMoE. The code is well-structured and aligns with vLLM's architectural patterns. My review has identified a few areas with unused parameters and attributes within the new model implementation. Addressing these will improve code clarity, reduce potential for bugs, and enhance maintainability.

gemini-code-assist · 2025-08-18T08:38:43Z

vllm_ascend/models/gpt_oss.py

+        # Sink attention weights for streaming attention
+        self.sinks = nn.Parameter(
+            torch.zeros(self.num_heads, dtype=torch.float32)
+        )


The self.sinks parameter is initialized, presumably for streaming attention, but it is not used within the forward method. This results in dead code that should be removed to improve clarity and avoid unnecessary memory allocation.

gemini-code-assist · 2025-08-18T08:38:43Z

vllm_ascend/models/gpt_oss.py

+        # Custom swiglu activation with limit
+        self.swiglu_limit = config.swiglu_limit


The self.swiglu_limit attribute is set from the configuration but is never used in the GPTOSSMoELayer. If this parameter is not required by the AscendFusedMoE layer or elsewhere, it should be removed to eliminate dead code.

gemini-code-assist · 2025-08-18T08:38:43Z

vllm_ascend/models/gpt_oss.py

+    def forward(
+        self,
+        hidden_states: torch.Tensor,
+        attn_metadata: Optional[AttentionMetadata] = None,
+    ) -> torch.Tensor:


The attn_metadata parameter is included in the forward method's signature but is not used within the method's body. To improve code clarity and maintainability, this unused parameter should be removed.

def forward( self, hidden_states: torch.Tensor, ) -> torch.Tensor:

gemini-code-assist · 2025-08-18T08:38:43Z

vllm_ascend/models/gpt_oss.py

+        # MLP
+        hidden_states, residual = self.post_attention_layernorm(
+            hidden_states, residual)
+        hidden_states = self.mlp(hidden_states, attn_metadata)


In light of removing the unused attn_metadata parameter from GPTOSSMoELayer.forward, this call site should be updated to no longer pass the attn_metadata argument.

hidden_states = self.mlp(hidden_states)

github-actions · 2025-08-18T08:55:25Z

👋 Hi! Thank you for contributing to the vLLM Ascend project. The following points will speed up your PR merge:‌‌

A PR should do only one thing, smaller PRs enable faster reviews.
Every PR should include unit tests and end-to-end tests ‌to ensure it works and is not broken by other future PRs.
Write the commit message by fulfilling the PR description to help reviewer and future developers understand.

If CI fails, you can run linting and testing checks locally according Contributing and Testing.

github-actions · 2025-09-08T13:44:40Z

This pull request has conflicts, please resolve those before we can evaluate the pull request.

wusar added 3 commits August 7, 2025 11:56

[Feature] Add GPT-OSS model support

4ae48ff

1

31f9af9

Merge branch 'vllm-project:main' into add_gpt_oss_model

00ff57f

wusar marked this pull request as draft August 18, 2025 08:38

gemini-code-assist bot reviewed Aug 18, 2025

View reviewed changes

wusar mentioned this pull request Aug 18, 2025

[Core] Add GPT-OSS model support for Ascend NPU #2256

Closed

wusar added 4 commits August 22, 2025 17:43

Merge branch 'vllm-project:main' into add_gpt_oss_model

fb96eb7

1

a95415f

Merge branch 'vllm-project:main' into add_gpt_oss_model

615985c

Merge branch 'vllm-project:main' into add_gpt_oss_model

b2d2eb7

github-actions bot added the merge-conflicts label Sep 8, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Core] Add GPT-OSS model support for Ascend NPU #2421

[Core] Add GPT-OSS model support for Ascend NPU #2421

wusar commented Aug 18, 2025 •

edited by github-actions bot

Loading

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

gemini-code-assist bot Aug 18, 2025

Uh oh!

gemini-code-assist bot Aug 18, 2025

Uh oh!

gemini-code-assist bot Aug 18, 2025

Uh oh!

gemini-code-assist bot Aug 18, 2025

Uh oh!

github-actions bot commented Aug 18, 2025

Uh oh!

github-actions bot commented Sep 8, 2025

Uh oh!

Uh oh!

		# Custom swiglu activation with limit
		self.swiglu_limit = config.swiglu_limit

[Core] Add GPT-OSS model support for Ascend NPU #2421

Are you sure you want to change the base?

[Core] Add GPT-OSS model support for Ascend NPU #2421

Conversation

wusar commented Aug 18, 2025 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

What this PR does

Technical Implementation

Core Model Components

Files Added/Modified

Core Implementation

Supporting Tools and Examples

Testing & Documentation

Usage Example

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist bot Aug 18, 2025

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Aug 18, 2025

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Aug 18, 2025

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Aug 18, 2025

Choose a reason for hiding this comment

Uh oh!

github-actions bot commented Aug 18, 2025

Uh oh!

github-actions bot commented Sep 8, 2025

Uh oh!

Uh oh!

wusar commented Aug 18, 2025 •

edited by github-actions bot

Loading