Skip to content

failed to run Qwen3-8B, full example is needed. #97

@new-TonyWang

Description

@new-TonyWang

Search before asking

  • I had searched in the issues and found no similar issues.

Version

megatron: main branch with commit id "e8749f88691cac1eeefd11b6b68cb8a6557356d5", need to change some parallel_state code
mbridge:0.15.1
megatron patch:

parallel_state_patch.patch

I am running qwen3-8B with 1 training process and 1 inference process with the following code, then I failed even after I fixed some bug in awex.
eg :#96
looks like sglang does not know "self_attn.q_norm.weight"

awex_example.py

error log:

  File "/inspire/hdd/project/qianghuaxuexi/public/wty/sglang/python/sglang/srt/managers/scheduler.py", line 2713, in run_scheduler_process
    scheduler.event_loop_overlap()
  File "/inspire/hdd/global_user/wangtongyu-25057/miniconda3/envs/py12_dev/lib/python3.12/site-packages/torch/utils/_contextlib.py", line 120, in decorate_context
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
  File "/inspire/hdd/project/qianghuaxuexi/public/wty/sglang/python/sglang/srt/managers/scheduler.py", line 1009, in event_loop_overlap
    self.process_input_requests(recv_reqs)
  File "/inspire/hdd/project/qianghuaxuexi/public/wty/sglang/python/sglang/srt/managers/scheduler.py", line 1187, in process_input_requests
    output = self._request_dispatcher(recv_req)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/inspire/hdd/project/qianghuaxuexi/public/wty/sglang/python/sglang/utils.py", line 507, in __call__
    return fn(obj)
           ^^^^^^^
  File "/inspire/hdd/project/qianghuaxuexi/public/wty/sglang/python/sglang/srt/managers/scheduler.py", line 2479, in execute_task_in_model_worker
    result = self.model_worker.execute_task_in_model_worker(task_spec)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/inspire/hdd/project/qianghuaxuexi/public/wty/sglang/python/sglang/srt/managers/tp_worker.py", line 106, in execute_task_in_model_worker
    return self.model_runner.execute_task_in_model_worker(task_spec, models=models)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/inspire/hdd/project/qianghuaxuexi/public/wty/sglang/python/sglang/srt/model_executor/model_runner.py", line 931, in execute_task_in_model_worker
    return task_func(**kwargs)
           ^^^^^^^^^^^^^^^^^^^
  File "/inspire/hdd/project/qianghuaxuexi/public/wty/asystem-awex/awex/meta/infer_meta_resolver.py", line 169, in _get_model_param_info
    for hf_name, hf_param in sglang_to_hf_weight_converter.convert_param(
                             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/inspire/hdd/global_user/wangtongyu-25057/miniconda3/envs/py12_dev/lib/python3.12/site-packages/torch/utils/_contextlib.py", line 120, in decorate_context
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
  File "/inspire/hdd/project/qianghuaxuexi/public/wty/asystem-awex/awex/converter/sglang_converter.py", line 294, in convert_param
    converted_params = self._convert_layer_norm_param(
                       ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/inspire/hdd/project/qianghuaxuexi/public/wty/asystem-awex/awex/converter/sglang_converter.py", line 255, in _convert_layer_norm_param
    raise NotImplementedError(f"Unsupported layer norm parameter name: {name}")
NotImplementedError: Unsupported layer norm parameter name: self_attn.q_norm.weight

Component(s)

Framework

Minimal reproduce step

megatron: main branch with commit id "e8749f88691cac1eeefd11b6b68cb8a6557356d5", need to change some parallel_state code
mbridge:0.15.1
megatron patch:
parallel_state_patch.patch
codes:
awex_example.py

What did you expect to see?

success

What did you see instead?

"NotImplementedError: Unsupported layer norm parameter name: self_attn.q_norm.weight"

Anything Else?

No response

Are you willing to submit a PR?

  • I'm willing to submit a PR!

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions