Skip to content
View Bias92's full-sized avatar

Block or report Bias92

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don't include any personal information such as legal names or email addresses. Markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse

Pinned Loading

  1. flash-attention-cuda flash-attention-cuda Public

    Custom FlashAttention CUDA kernel for Llama 3 8B inference. 2.26x speedup over naive attention, targeting RTX 4060 Ti → Jetson AGX Orin deployment.

    Python 1

  2. flashattn-cuda-metal flashattn-cuda-metal Public

    FlashAttention CUDA kernel implementation and Metal port (RTX 4060 Ti, Apple M4 Pro)

    Python 1

  3. sdpa-attention-benchmark sdpa-attention-benchmark Public

    Benchmark PyTorch SDPA backends (math vs flash) on RTX 4060 Ti with Nsight Systems profiling

    Python 1