I'm currently at Huawei Canada as a research engineer working on large-scale LLM infrastructure.
Some topics I've worked on at Huawei:
- MoE-Attention disaggregation
- Fault-tolerant LLM inference
- Asynchronous RL fine-tuning
- On-policy distillation
- Elastic LLM inference scaling
I currently contribute to AReaL, and I focus primarily on the Ray scheduler, vLLM components, and the Ascend fork.


