Skip to content

Conversation

@SigureMo
Copy link
Member

@SigureMo SigureMo commented Jan 3, 2026

PR Category

Execute Infrastructure

PR Types

Devs

Description

基于 #77167 的思路尝试优化全流水线构建架构,使用机内 L1 本地 cache + 机间 L2 CFS cache 来降低构建时间,每台机器约占用 50G 存储即可(跨流水线共用,只需要 50G)

原有架构:

l1-ccache drawio

新架构:

l2-ccache drawio

coverage 首次编译 5h30m:

Cacheable calls:   4750 / 4750 (100.0%)
  Hits:               0 / 4750 ( 0.00%)
    Direct:           0
    Preprocessed:     0
  Misses:          4750 / 4750 (100.0%)
Local storage:
  Cache size (GB):  7.6 / 50.0 (15.26%)
  Hits:               0 / 4750 ( 0.00%)
  Misses:          4750 / 4750 (100.0%)
Remote storage:
  Hits:               0 / 4750 ( 0.00%)
  Misses:          4750 / 4750 (100.0%)

coverage 重复编译 38min,低于 L1 CFS 的 55min:

Cacheable calls:   4750 / 4750 (100.0%)
  Hits:            4744 / 4750 (99.87%)
    Direct:        4744 / 4744 (100.0%)
    Preprocessed:     0 / 4744 ( 0.00%)
  Misses:             6 / 4750 ( 0.13%)
Local storage:
  Cache size (GB):  7.7 / 50.0 (15.39%)
  Hits:            4744 / 4750 (99.87%)
  Misses:             6 / 4750 ( 0.13%)
Remote storage:
  Hits:               0 /    6 ( 0.00%)
  Misses:             6 /    6 (100.0%)

TODOs

  • 添加流水线定期清理 L2 ccache

Copilot AI review requested due to automatic review settings January 3, 2026 09:08
@paddle-bot
Copy link

paddle-bot bot commented Jan 3, 2026

你的PR提交成功,感谢你对开源项目的贡献!
请关注后续CI自动化测试结果,详情请参考Paddle-CI手册
Your PR has been submitted. Thanks for your contribution!
Please wait for the result of CI firstly. See Paddle CI Manual for details.

@SigureMo SigureMo marked this pull request as draft January 3, 2026 09:08
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR optimizes the CI build pipeline by implementing a two-tier ccache architecture: L1 local cache on individual machines (50GB) and L2 shared cache on CFS (Common File System) for cross-machine sharing.

Key changes:

  • Migrated from single-tier ccache (200GB) to L1 (50GB local) + L2 (shared CFS) architecture
  • Updated runner group from GZ_BD-CPU to coverage-build-l2-debug for dedicated L2-optimized infrastructure
  • Modified Docker volume mounts to support the new cache directory structure

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@SigureMo SigureMo marked this pull request as ready for review January 6, 2026 20:54
@SigureMo SigureMo changed the title [CI] Use L2 ccache optimize build time [CI] Use L2 ccache reduce build time in all workflows Jan 7, 2026
@SigureMo SigureMo merged commit ad925a3 into PaddlePaddle:develop Jan 7, 2026
82 of 84 checks passed
@SigureMo SigureMo deleted the sot/use-l2-ccache-optimize-build-time branch January 7, 2026 03:01
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants