Skip to content

Conversation

@lih827
Copy link
Contributor

@lih827 lih827 commented Nov 12, 2025

calculate send_token_idx in layout kernel
calculate recv_count, recv_offset in notify dispatch kernel
change num_recv_tokens_per_expert_list from List to tensor

before
[tuning] Dispatch (BF16) 115.81 GB/s (HCCS), avg_t: 4036.01 us

[tuning] Combine 98.46 GB/s (HCCS), avg_t: 4747.07 us

after
[tuning] Dispatch (BF16) 133.23 GB/s (HCCS), avg_t: 3508.14 us

[tuning] Combine 101.20 GB/s (HCCS), avg_t: 4618.60 us

@gemini-code-assist
Copy link
Contributor

Warning

You have reached your daily quota limit. Please wait up to 24 hours and I will start processing your requests again!

@lih827
Copy link
Contributor Author

lih827 commented Nov 12, 2025

/gemini review

@gemini-code-assist
Copy link
Contributor

Warning

You have reached your daily quota limit. Please wait up to 24 hours and I will start processing your requests again!

@Yael-X
Copy link
Collaborator

Yael-X commented Nov 13, 2025

Please update the latest performance data to the project homepage.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants