-
Notifications
You must be signed in to change notification settings - Fork 763
HPCX: add configs to build OSS part of HPCX #7725
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Note that UCX from HPCX 2.3 is exactly the same as UCX 1.5.0rc1 (git checkout 02078b9), and OpenMPI 4.0.0 corresponds to v4.0.0-34-g03cf3e4, ie. git checkout 03cf3e4, which is a checkout of the v4.0.x branch of OpenMPI. My personal opinion is that if you compile your own OpenMPI, then HPCX is not so useful except exactly the proprietary components (hcoll and perhaps sharp in particular). Especially now since UCX 1.5.0 without rc has been released already. OMPI 4.0.1 should be coming soon too I guess. |
|
@bartoldeman I've update configs to match HPCX v2.3 exactly. Currently, the idea is to have open source parts of HPCX available through EasyBuild system. Once HPCX v2.4 available, I'll create an update for EB configs. Have a question regarding CI: I checked my configs with |
|
@bartoldeman could you please take a look? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
UCX belongs at the GCCcore level, and it's better to use official releases of it instead of release candidates. See #7924
|
@amaslenn The problem is that we require that easyconfig files are organized in subdirectories based on software name. So, It may be nicer to use a That would involve renaming the I think the latter makes more sense, maybe? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you're picking this up from OpenMPI directly then it is not the Mellanox version and it is just a normal OpenMPI
And OpenMPI belongs at the GCC level not foss.
|
This basically looks redundant since it's just a tweaked build of OpenMPI with UCX. |
|
@boegel @akesandgren I updated configs, now both UCX and OMPI are present in standard locations, but I'm still getting "missing dependencies" issue. Could you please help me understand why? I also applied all the changes you've suggested. Could you please review and let me know if I missed something? |
|
|
||
| dependencies = [ | ||
| ('UCX', '1.5.0rc1', '-hpcx', ('GCCcore', '8.2.0')), | ||
| ('OpenMPI', '4.0.x', '-hpcx', ('GCC', '8.2.0-2.31.1')) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You're still requesting 4.0.x version, but the OpenMPI easyconfig below is 4.0.0. You have to decide if you want a 4.0.0 version or the "4.0.x" version. Do not call the OpenMPI easyconfig 4.0.0 if it is not using the official 4.0.0 release.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Argh, missed that. Thanks! Version renamed to 4.0.x. Is OK?
| 'git_config': { | ||
| 'url': 'https://github.com/open-mpi', | ||
| 'repo_name': 'ompi', | ||
| 'commit': '03cf3e4', |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And repeating what i said in the other comment. Do not set the version of this OpenMPI build to 4.0.0 if it is not using the official 4.0.0 release from OpenMPI!
| toolchain = {'name': 'GCC', 'version': '8.2.0-2.31.1'} | ||
|
|
||
| dependencies = [ | ||
| ('UCX', '1.5.0rc1', '-hpcx', ('GCCcore', '8.2.0')), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You only need
('UCX', '1.5.0rc1', '-hpcx'),
('OpenMPI', '4.0.x', '-hpcx'),
It will find the correct UCX, OpenMPI version by itself.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Removed.
|
|
||
| dependencies = [ | ||
| ('zlib', '1.2.11'), | ||
| ('UCX', '1.5.0rc1', '-hpcx', ('GCCcore', '8.2.0')) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same here, just ('UCX', '1.5.0rc1', '-hpcx'),
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Removed.
|
@akesandgren looks good? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
|
Test report by @akesandgren |
| toolchainopts = {'pic': True} | ||
|
|
||
| source_urls = ['https://github.com/openucx/ucx/releases/download/v1.5.0-rc1'] | ||
| sources = ['ucx-1.5.0.tar.gz'] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmmmm, important note here, since you're pulling down 1.5.0-rc1 the source should file not be called ucx-1.5.0.tar.gz.
You need to do it like this or there will be checksum conflicts for people who already have the real 1.5.0 downloaded.
sources = [
{'filename': 'ucx-1.5.0-rc1.tar.gz', 'download_filename', 'ucx-1.5.0.tar.gz'}
]
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
|
@akesandgren could you please re-run tests? |
| 'filename': 'ucx-1.5.0rc1.tar.gz', | ||
| 'download_filename': 'ucx-1.5.0.tar.gz' | ||
| }] | ||
| source_urls = ['https://github.com/openucx/ucx/releases/download/v1.5.0-rc1'] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Source_urls should come before sources line
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed.
|
Test report by @akesandgren |
|
Test report by @akesandgren |
| 'commit': '03cf3e4', | ||
| }, | ||
| }] | ||
| checksums = ['ef53605823b01499e09844a610614a555724603c72a1bb9a5ca59b35389f67ec'] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This checksum is never going to be the same due to how it is created.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why? Is there any way to fix that?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
github produces the tar file on demand by first making a checkout of that commit causing directories to have timestamps that differs every time.
Don't think there is a way to fix that.
@boegel ??
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@boegel could you please advise?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry for not coming back to this...
For now, there's no way to reliable checksum tarballs that were created via git_config, see also discussion in easybuilders/easybuild-framework#2727
|
@akesandgren I switched to use official OMPI release. Could you please review/test? |
|
Test report by @akesandgren |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
|
Going in, thanks @amaslenn! |
|
Test report by @boegel |
No description provided.