Skip to content

Conversation

@baoqiwen
Copy link
Contributor

@baoqiwen baoqiwen commented Jan 7, 2026

PR Category

Operator Mechanism

PR Types

Improvements

Description

Pcard-89071

优化 tokens_zip_kernel

  1. 调高向量化长度 + 修改 local_row_fetchlist 相关逻辑,挪到 shared 中提高占用率。
  2. 指令级的优化,实现真正的指令向量化,减少内存访问压力。
  3. 降低寄存器使用数量。

@paddle-bot
Copy link

paddle-bot bot commented Jan 7, 2026

你的PR提交成功,感谢你对开源项目的贡献!
请关注后续CI自动化测试结果,详情请参考Paddle-CI手册
Your PR has been submitted. Thanks for your contribution!
Please wait for the result of CI firstly. See Paddle CI Manual for details.

@baoqiwen baoqiwen force-pushed the bqw_unpermute branch 4 times, most recently from 66cb620 to 7c5a1a6 Compare January 7, 2026 13:38
Copy link
Contributor

@A-nnonymous A-nnonymous left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM,polish if convienient

@baoqiwen
Copy link
Contributor Author

baoqiwen commented Jan 8, 2026

/re-run all-failed

Copy link
Contributor

@wanghuancoder wanghuancoder left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@baoqiwen
Copy link
Contributor Author

baoqiwen commented Jan 8, 2026

/re-run all-failed

1 similar comment
@baoqiwen
Copy link
Contributor Author

baoqiwen commented Jan 9, 2026

/re-run all-failed

@lshpku lshpku merged commit f243a9c into PaddlePaddle:develop Jan 9, 2026
114 of 119 checks passed
baoqiwen added a commit to baoqiwen/Paddle that referenced this pull request Jan 9, 2026
baoqiwen added a commit to baoqiwen/Paddle that referenced this pull request Jan 9, 2026
baoqiwen added a commit to baoqiwen/Paddle that referenced this pull request Jan 9, 2026
baoqiwen added a commit to baoqiwen/Paddle that referenced this pull request Jan 12, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants