Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] Stream-K kernel breaks for some GEMM Problem-K #2100

Open
manishucsd opened this issue Feb 12, 2025 · 3 comments
Open

[BUG] Stream-K kernel breaks for some GEMM Problem-K #2100

manishucsd opened this issue Feb 12, 2025 · 3 comments
Labels
? - Needs Triage bug Something isn't working

Comments

@manishucsd
Copy link
Contributor

manishucsd commented Feb 12, 2025

GEMM Problem Shape --m=8 --n=8192 --k=8192 Does NOT Work

/tools/profiler/cutlass_profiler --dist=uniform,min:-2.3,max:2.3,scale:-1 --kernels=cutlass3x_sm90_tensorop_s64x128x16gemm_bf16_bf16_f32_bf16_bf16_128x128x64_4x1x1_0_tnn_align8_stream_k_warpspecialized_cooperative_epi_nosmem --m=8 --n=8192  --k=8192 --verification-enabled
=false 



=============================
  Problem ID: 1

        Provider: CUTLASS
   OperationKind: gemm
       Operation: cutlass3x_sm90_tensorop_s64x128x16gemm_bf16_bf16_f32_bf16_bf16_128x128x64_4x1x1_0_tnn_align8_stream_k_warpspecialized_cooperative_epi_nosmem

          Status: Success
    Verification: OFF
     Disposition: Failed


       Arguments: --gemm_kind=universal --m=8 --n=8192 --k=8192 --A=bf16:row --B=bf16:column --C=bf16:column --D=bf16:column  \
                  --alpha=1 --beta=0 --split_k_mode=serial --split_k_slices=1 --batch_count=1 --raster_order=heuristic  \
                  --runtime_input_datatype_a=invalid --runtime_input_datatype_b=invalid --use_pdl=false --enable_sm90_mixed_dtype_shuffle_test=false  \
                  --swizzle_size=1 --op_class=tensorop --accum=f32 --cta_m=128 --cta_n=128 --cta_k=64 --cluster_m=1 --cluster_n=1  \
                  --cluster_k=1 --cluster_m_fallback=0 --cluster_n_fallback=0 --cluster_k_fallback=0 --stages=7 --warps_m=4  \
                  --warps_n=2 --warps_k=1 --inst_m=64 --inst_n=128 --inst_k=16 --min_cc=90 --max_cc=90

           Bytes: 134479872  bytes
           FLOPs: 1073872896  flops
           FLOPs/Byte: 7

GEMM Problem Shape --m=8 --n=8192 --k=128 Works

./tools/profiler/cutlass_profiler --dist=uniform,min:-2.3,max:2.3,scale:-1 --kernels=cutlass3x_sm90_tensorop_s64x128x16gemm_bf16_bf16_f32_bf16_bf16_128x128x64_4x1x1_0_tnn_align8_stream_k_warpspecialized_cooperative_epi_nosmem --m=8 --n=8192  --k=128 --verification-enabled=false



=============================
  Problem ID: 1

        Provider: CUTLASS
   OperationKind: gemm
       Operation: cutlass3x_sm90_tensorop_s64x128x16gemm_bf16_bf16_f32_bf16_bf16_128x128x64_4x1x1_0_tnn_align8_stream_k_warpspecialized_cooperative_epi_nosmem

          Status: Success
    Verification: OFF
     Disposition: Not verified


       Arguments: --gemm_kind=universal --m=8 --n=8192 --k=128 --A=bf16:row --B=bf16:column --C=bf16:column --D=bf16:column  \
                  --alpha=1 --beta=0 --split_k_mode=serial --split_k_slices=1 --batch_count=1 --raster_order=heuristic  \
                  --runtime_input_datatype_a=invalid --runtime_input_datatype_b=invalid --use_pdl=false --enable_sm90_mixed_dtype_shuffle_test=false  \
                  --swizzle_size=1 --op_class=tensorop --accum=f32 --cta_m=128 --cta_n=128 --cta_k=64 --cluster_m=1 --cluster_n=1  \
                  --cluster_k=1 --cluster_m_fallback=0 --cluster_n_fallback=0 --cluster_k_fallback=0 --stages=7 --warps_m=4  \
                  --warps_n=2 --warps_k=1 --inst_m=64 --inst_n=128 --inst_k=16 --min_cc=90 --max_cc=90

           Bytes: 2230272  bytes
           FLOPs: 16908288  flops
           FLOPs/Byte: 7

         Runtime: 0.0130992  ms
          Memory: 158.567 GiB/s

            Math: 1290.79 GFLOP/s
@manishucsd manishucsd added ? - Needs Triage bug Something isn't working labels Feb 12, 2025
@manishucsd
Copy link
Contributor Author

  Problem ID: 1

        Provider: CUTLASS
   OperationKind: gemm
       Operation: cutlass3x_sm90_tensorop_s64x128x16gemm_bf16_bf16_f32_bf16_bf16_128x128x64_4x1x1_0_tnn_align8_stream_k_warpspecialized_cooperative_epi_tma

          Status: Success
    Verification: OFF
     Disposition: Failed

Another one that failed . The common pattern between these are cluster=4x1x1, stream_k, ws

@manishucsd
Copy link
Contributor Author

manishucsd commented Feb 12, 2025

  Problem ID: 1

        Provider: CUTLASS
   OperationKind: gemm
       Operation: cutlass3x_sm90_tensorop_s64x128x16gemm_bf16_bf16_f32_bf16_bf16_128x128x64_4x2x1_0_tnn_align8_stream_k_warpspecialized_cooperative_epi_nosmem

          Status: Success
    Verification: OFF
     Disposition: Failed
Problem ID: 1

        Provider: CUTLASS
   OperationKind: gemm
       Operation: cutlass3x_sm90_tensorop_s64x128x16gemm_bf16_bf16_f32_bf16_bf16_128x128x64_4x4x1_0_tnn_align8_stream_k_warpspecialized_cooperative_epi_nosmem

          Status: Error: internal
    Verification: OFF
     Disposition: Failed

@hwu36
Copy link
Collaborator

hwu36 commented Feb 12, 2025

@jackkosaian

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
? - Needs Triage bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants