Skip to content

Conversation

@xuzhao9
Copy link
Contributor

@xuzhao9 xuzhao9 commented Dec 1, 2025

Cutedsl now supports tvm-ffi: https://docs.nvidia.com/cutlass/latest/media/docs/pythonDSL/cute_dsl_general/compile_with_tvm_ffi.html

$ LD_LIBRARY_PATH=$HOME/.conda/envs/py312/lib python run.py --op launch_latency --only nop_python_function,nop_triton_kernel,nop_triton_compiled_kernel_run,nop_cutedsl_tvm_ffi,nop_cutedsl --force
  x_val    nop_python_function-walltime    nop_triton_kernel-walltime    nop_triton_compiled_kernel_run-walltime    nop_cutedsl_tvm_ffi-walltime    nop_cutedsl-walltime
-------  ------------------------------  ----------------------------  -----------------------------------------  ------------------------------  ----------------------
      0                     2.58363e-05                    0.00630748                                 0.00326835                     0.00654917               0.00668514
     19                     2.85705e-05                    0.0138997                                  0.00484651                     0.000710644              0.0444969
average                     2.72034e-05                    0.0101036                                  0.00405743                     0.00362991               0.025591
  x_val                                                                                                                                                      nop_cutedsl-walltime_kineto_trace                                                                                                                                              nop_cutedsl_tvm_ffi-walltime_kineto_trace
-------  -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------  -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
      0  https://www.internalfb.com/intern/perfdoctor/trace_view?filepath=tree/traces/tritonbench/devgpu089.cco2.facebook.com_2596165.1764990930051555915.pt.trace.json.gz&bucket=pyper_traces  https://www.internalfb.com/intern/perfdoctor/trace_view?filepath=tree/traces/tritonbench/devgpu089.cco2.facebook.com_2596165.1764990932868898706.pt.trace.json.gz&bucket=pyper_traces
     19  https://www.internalfb.com/intern/perfdoctor/trace_view?filepath=tree/traces/tritonbench/devgpu089.cco2.facebook.com_2596165.1764990935475979650.pt.trace.json.gz&bucket=pyper_traces  https://www.internalfb.com/intern/perfdoctor/trace_view?filepath=tree/traces/tritonbench/devgpu089.cco2.facebook.com_2596165.1764990939806473026.pt.trace.json.gz&bucket=pyper_traces
average

Triton:

  x_val                                                                                                                                                nop_triton_kernel-walltime_kineto_trace                                                                                                                                   nop_triton_compiled_kernel_run-walltime_kineto_trace
-------  -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------  -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
      0  https://www.internalfb.com/intern/perfdoctor/trace_view?filepath=tree/traces/tritonbench/devgpu089.cco2.facebook.com_2945792.1764991259550770338.pt.trace.json.gz&bucket=pyper_traces  https://www.internalfb.com/intern/perfdoctor/trace_view?filepath=tree/traces/tritonbench/devgpu089.cco2.facebook.com_2945792.1764991262721098781.pt.trace.json.gz&bucket=pyper_traces
     19  https://www.internalfb.com/intern/perfdoctor/trace_view?filepath=tree/traces/tritonbench/devgpu089.cco2.facebook.com_2945792.1764991265462897437.pt.trace.json.gz&bucket=pyper_traces  https://www.internalfb.com/intern/perfdoctor/trace_view?filepath=tree/traces/tritonbench/devgpu089.cco2.facebook.com_2945792.1764991270028992898.pt.trace.json.gz&bucket=pyper_traces
average

@meta-cla meta-cla bot added the cla signed label Dec 1, 2025
@xuzhao9 xuzhao9 changed the title [launch-latency] Add tvm-ffi [launch-latency] Add tvm-ffi to cutedsl Dec 1, 2025
@xuzhao9 xuzhao9 requested a review from htyu December 1, 2025 18:31
@xuzhao9 xuzhao9 temporarily deployed to docker-s3-upload December 1, 2025 18:34 — with GitHub Actions Inactive
@xuzhao9 xuzhao9 temporarily deployed to docker-s3-upload December 1, 2025 18:34 — with GitHub Actions Inactive
@xuzhao9 xuzhao9 force-pushed the xz9/add-walltime-triton branch from 69ba646 to 73d5a35 Compare December 5, 2025 22:31
@xuzhao9 xuzhao9 temporarily deployed to docker-s3-upload December 5, 2025 22:31 — with GitHub Actions Inactive
@xuzhao9 xuzhao9 temporarily deployed to docker-s3-upload December 5, 2025 22:31 — with GitHub Actions Inactive
@xuzhao9 xuzhao9 temporarily deployed to docker-s3-upload December 6, 2025 03:20 — with GitHub Actions Inactive
@xuzhao9 xuzhao9 merged commit 627f73b into main Dec 8, 2025
7 of 8 checks passed
@xuzhao9 xuzhao9 deleted the xz9/add-walltime-triton branch December 9, 2025 17:28
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants