Skip to content

Questions about topk_per_row_kernel in topk_plain_kernels.cu #2046

@zjin-lcf

Description

@zjin-lcf

In the source file https://github.com/ROCm/aiter/blob/main/csrc/kernels/topk_plain_kernels.cu, the kernel is launched when

       // Use topk_per_row kernel when:
        // n + K log²K ≥ 3 × Factor(n) × n
        // where Factor(n) = 1/3 + 1.6/(log₂(n) - 9.5)

Can you please explain these magic numbers ?

When this HIP kernel is converted to CUDA kernel, can you suggest where users need to do performance tuning targeting an NVIDIA GPU ?

Thanks

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions