Skip to content

[upstream_ut] failed on setup with "worker 'gw0' crashed while running test_nestedtensor_xpu.py #2443

@daisyden

Description

@daisyden

🐛 Describe the bug

Cases:
op_ut,third_party.torch-xpu-ops.test.xpu.test_nestedtensor_xpu.TestNestedTensorAutogradXPU,test_layer_norm_backward_5d_size_32_xpu
op_ut,third_party.torch-xpu-ops.test.xpu.test_nestedtensor_xpu.TestNestedTensorAutogradXPU,test_layer_norm_backward_size_2_xpu
op_ut,third_party.torch-xpu-ops.test.xpu.test_nestedtensor_xpu.TestNestedTensorAutogradXPU,test_layer_norm_backward_5d_size_128_xpu
op_ut,third_party.torch-xpu-ops.test.xpu.test_nestedtensor_xpu.TestNestedTensorAutogradXPU,test_layer_norm_backward_5d_size_4_xpu
op_ut,third_party.torch-xpu-ops.test.xpu.test_nestedtensor_xpu.TestNestedTensorAutogradXPU,test_layer_norm_backward_5d_size_2_xpu

pytest_command:
cd && PYTORCH_TEST_WITH_SLOW=1 pytest -v third_party/torch-xpu-ops/test/xpu/test_nestedtensor_xpu.py -k test_layer_norm_backward_size_2_xpu
cd && PYTORCH_TEST_WITH_SLOW=1 pytest -v third_party/torch-xpu-ops/test/xpu/test_nestedtensor_xpu.py -k test_layer_norm_backward_5d_size_32_xpu
cd && PYTORCH_TEST_WITH_SLOW=1 pytest -v third_party/torch-xpu-ops/test/xpu/test_nestedtensor_xpu.py -k test_layer_norm_backward_5d_size_4_xpu
cd && PYTORCH_TEST_WITH_SLOW=1 pytest -v third_party/torch-xpu-ops/test/xpu/test_nestedtensor_xpu.py -k test_layer_norm_backward_5d_size_2_xpu
cd && PYTORCH_TEST_WITH_SLOW=1 pytest -v third_party/torch-xpu-ops/test/xpu/test_nestedtensor_xpu.py -k test_layer_norm_backward_5d_size_128_xpu

Error Message:
failed on setup with "worker 'gw0' crashed while running 'third_party/torch-xpu-ops/test/xpu/test_nestedtensor_xpu.py::TestNestedTensorAutogradXPU::test_layer_norm_backward_5d_size_128_xpu'"

Trace Example:

Command: cd <pytorch>  && PYTORCH_TEST_WITH_SLOW=1 pytest -v third_party/torch-xpu-ops/test/xpu/test_nestedtensor_xpu.py -k test_layer_norm_backward_size_2_xpu
worker 'gw4' crashed while running 'third_party/torch-xpu-ops/test/xpu/test_nestedtensor_xpu.py::TestNestedTensorAutogradXPU::test_layer_norm_backward_size_2_xpu'

Versions

#2432

Metadata

Metadata

Assignees

No one assigned

    Labels

    skippedUsed for temp UT failure to parallel fix

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions