-
Notifications
You must be signed in to change notification settings - Fork 1
Enable RBE for builds and test #151
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
e611e5f to
54661a1
Compare
| exec_properties = { | ||
| "container-image": "docker://rocm/tensorflow-build@sha256:7cd444ac48657fee2f5087fbda7766266704d3f8fb2299f681952ae4eabed060", | ||
| "OSFamily": "Linux", | ||
| "Pool": "linux_x64_gpu", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please remove this Pool name. It seems in our setup we do not need it. It should be default then engflow will redirect test actions automatically into the gpu pool. Also we could remove the gpu suffix.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That requires this change, correct? openxla/xla#32954 How do you plan on getting that change into a branch that we can actually use? I'm 90% sure that using openxla/xla's main branch is going to bust our master branch. I would say to just land it in rocm-jaxlib-v0.7.1, but we really shouldn't be putting anything other than bug fixes in there.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I thought we could merge these kind of things into 0.7.1. It is infrastructure change not really affecting functionality.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not a fan, but I guess that's fine for now. It pollutes a branch that should be frozen. Really wish you guys had separate downstream branches for development and release.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll update my PR once that's in 0.7.1. I don't want to merge a change to our master that's going to bust our build because the RBE cluster and it's pools are screwy.
0de3314 to
9645465
Compare
9645465 to
75f5f9a
Compare
Motivation
Add the proper Bazel configuration to do remote builds with Bazel's RBE feature. This should greatly improve build times in CI.
Test Plan
Ran the
build/ci_build.pyscript locally and confirmed that it'll produce wheels with ROCm 7