zack041

Zack Yu zack041

MLsys, GPU | freshman @ UC Berkeley

Achievements

wmma-flashattention-v2 wmma-flashattention-v2 Public

An optimized Flash-Attention-v2 kernel based on WMMA API

Cuda
Parallel-GAE Parallel-GAE Public

Parallel implementation of GAE for long-horizon RL. 65M tokens in ~1ms on A100.

Cuda
flashinfer flashinfer Public

Forked from flashinfer-ai/flashinfer

FlashInfer: Kernel Library for LLM Serving

Python
vllm vllm Public

Forked from vllm-project/vllm

A high-throughput and memory-efficient inference and serving engine for LLMs

Python
sglang sglang Public

Forked from sgl-project/sglang

SGLang is a high-performance serving framework for large language models and multimodal models.

Python