Skip to content

Conversation

@amaslenn
Copy link
Contributor

No description provided.

@bartoldeman
Copy link
Contributor

Note that UCX from HPCX 2.3 is exactly the same as UCX 1.5.0rc1 (git checkout 02078b9), and OpenMPI 4.0.0 corresponds to v4.0.0-34-g03cf3e4, ie. git checkout 03cf3e4, which is a checkout of the v4.0.x branch of OpenMPI.

My personal opinion is that if you compile your own OpenMPI, then HPCX is not so useful except exactly the proprietary components (hcoll and perhaps sharp in particular). Especially now since UCX 1.5.0 without rc has been released already. OMPI 4.0.1 should be coming soon too I guess.

@amaslenn
Copy link
Contributor Author

@bartoldeman I've update configs to match HPCX v2.3 exactly. Currently, the idea is to have open source parts of HPCX available through EasyBuild system. Once HPCX v2.4 available, I'll create an update for EB configs.

Have a question regarding CI: I checked my configs with --robot=<scripts path> since they are not yet in repo. Is there a way to do it in CI? Or is there another option to pass verification?

@amaslenn
Copy link
Contributor Author

@bartoldeman could you please take a look?

@boegel boegel added the new label Mar 19, 2019
Copy link
Contributor

@akesandgren akesandgren Mar 19, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

UCX belongs at the GCCcore level, and it's better to use official releases of it instead of release candidates. See #7924

@boegel
Copy link
Member

boegel commented Mar 19, 2019

@amaslenn The problem is that we require that easyconfig files are organized in subdirectories based on software name.

So, HPCX_OMPI-4.0.0-foss-2018b.eb should be located in the h/HPCX_OMPI subdirectory; similar for HPCX_UCX.

It may be nicer to use a versionsuffix here instead, to keep things together.

That would involve renaming the HPCX_OMPI-4.0.0-foss-2018b.eb easyconfig file to HPCX-4.0.0-foss-2018b-ompi.eb, changing the software name to HPCX (for all 3 easyconfigs), and using versionsuffix = '-ompi' (and similar for HPCX_UCX).

I think the latter makes more sense, maybe?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you're picking this up from OpenMPI directly then it is not the Mellanox version and it is just a normal OpenMPI
And OpenMPI belongs at the GCC level not foss.

@akesandgren
Copy link
Contributor

This basically looks redundant since it's just a tweaked build of OpenMPI with UCX.

@amaslenn
Copy link
Contributor Author

@boegel @akesandgren I updated configs, now both UCX and OMPI are present in standard locations, but I'm still getting "missing dependencies" issue. Could you please help me understand why?

I also applied all the changes you've suggested. Could you please review and let me know if I missed something?


dependencies = [
('UCX', '1.5.0rc1', '-hpcx', ('GCCcore', '8.2.0')),
('OpenMPI', '4.0.x', '-hpcx', ('GCC', '8.2.0-2.31.1'))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're still requesting 4.0.x version, but the OpenMPI easyconfig below is 4.0.0. You have to decide if you want a 4.0.0 version or the "4.0.x" version. Do not call the OpenMPI easyconfig 4.0.0 if it is not using the official 4.0.0 release.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Argh, missed that. Thanks! Version renamed to 4.0.x. Is OK?

'git_config': {
'url': 'https://github.com/open-mpi',
'repo_name': 'ompi',
'commit': '03cf3e4',
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And repeating what i said in the other comment. Do not set the version of this OpenMPI build to 4.0.0 if it is not using the official 4.0.0 release from OpenMPI!

toolchain = {'name': 'GCC', 'version': '8.2.0-2.31.1'}

dependencies = [
('UCX', '1.5.0rc1', '-hpcx', ('GCCcore', '8.2.0')),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You only need
('UCX', '1.5.0rc1', '-hpcx'),
('OpenMPI', '4.0.x', '-hpcx'),

It will find the correct UCX, OpenMPI version by itself.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed.


dependencies = [
('zlib', '1.2.11'),
('UCX', '1.5.0rc1', '-hpcx', ('GCCcore', '8.2.0'))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here, just ('UCX', '1.5.0rc1', '-hpcx'),

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed.

@amaslenn
Copy link
Contributor Author

@akesandgren looks good?

Copy link
Contributor

@akesandgren akesandgren left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@akesandgren akesandgren added this to the 3.x milestone Mar 21, 2019
@akesandgren
Copy link
Contributor

Test report by @akesandgren
FAILED
Build succeeded for 0 out of 3 (3 easyconfigs in this PR)
b-an03.hpc2n.umu.se - Linux ubuntu 16.04, Intel(R) Xeon(R) CPU E5-2690 v4 @ 2.60GHz, Python 2.7.12
See https://gist.github.com/c585b499d62f0ac47f169fb3688e8f95 for a full test report.

toolchainopts = {'pic': True}

source_urls = ['https://github.com/openucx/ucx/releases/download/v1.5.0-rc1']
sources = ['ucx-1.5.0.tar.gz']
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmmmm, important note here, since you're pulling down 1.5.0-rc1 the source should file not be called ucx-1.5.0.tar.gz.
You need to do it like this or there will be checksum conflicts for people who already have the real 1.5.0 downloaded.

sources = [
    {'filename': 'ucx-1.5.0-rc1.tar.gz', 'download_filename', 'ucx-1.5.0.tar.gz'}
]

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

@amaslenn
Copy link
Contributor Author

@akesandgren could you please re-run tests?

'filename': 'ucx-1.5.0rc1.tar.gz',
'download_filename': 'ucx-1.5.0.tar.gz'
}]
source_urls = ['https://github.com/openucx/ucx/releases/download/v1.5.0-rc1']
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Source_urls should come before sources line

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed.

@akesandgren
Copy link
Contributor

Test report by @akesandgren
FAILED
Build succeeded for 1 out of 3 (3 easyconfigs in this PR)
b-an03.hpc2n.umu.se - Linux ubuntu 16.04, Intel(R) Xeon(R) CPU E5-2690 v4 @ 2.60GHz, Python 2.7.12
See https://gist.github.com/4859b6d7e140cea68d7b1b08614781a5 for a full test report.

@akesandgren
Copy link
Contributor

Test report by @akesandgren
FAILED
Build succeeded for 1 out of 3 (3 easyconfigs in this PR)
b-an03.hpc2n.umu.se - Linux ubuntu 16.04, Intel(R) Xeon(R) CPU E5-2690 v4 @ 2.60GHz, Python 2.7.12
See https://gist.github.com/03fdc914f7b6386f19b7a9ee298ac7c7 for a full test report.

'commit': '03cf3e4',
},
}]
checksums = ['ef53605823b01499e09844a610614a555724603c72a1bb9a5ca59b35389f67ec']
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This checksum is never going to be the same due to how it is created.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why? Is there any way to fix that?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

github produces the tar file on demand by first making a checkout of that commit causing directories to have timestamps that differs every time.
Don't think there is a way to fix that.

@boegel ??

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@boegel could you please advise?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for not coming back to this...

For now, there's no way to reliable checksum tarballs that were created via git_config, see also discussion in easybuilders/easybuild-framework#2727

@amaslenn
Copy link
Contributor Author

@akesandgren I switched to use official OMPI release. Could you please review/test?

@akesandgren
Copy link
Contributor

Test report by @akesandgren
SUCCESS
Build succeeded for 3 out of 3 (3 easyconfigs in this PR)
b-an03.hpc2n.umu.se - Linux ubuntu 16.04, Intel(R) Xeon(R) CPU E5-2690 v4 @ 2.60GHz, Python 2.7.12
See https://gist.github.com/314c2f3576b2bd975a0883247e63574c for a full test report.

Copy link
Contributor

@akesandgren akesandgren left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@akesandgren
Copy link
Contributor

Going in, thanks @amaslenn!

@akesandgren akesandgren merged commit 11db044 into easybuilders:develop Apr 1, 2019
@amaslenn amaslenn deleted the hpcx branch April 2, 2019 04:59
@easybuilders easybuilders deleted a comment from boegelbot Apr 2, 2019
@easybuilders easybuilders deleted a comment from boegelbot Apr 2, 2019
@easybuilders easybuilders deleted a comment from boegelbot Apr 2, 2019
@easybuilders easybuilders deleted a comment from boegelbot Apr 2, 2019
@boegel boegel modified the milestones: 3.x, 3.9.0 Apr 2, 2019
@boegel
Copy link
Member

boegel commented Apr 2, 2019

Test report by @boegel
SUCCESS
Build succeeded for 3 out of 3 (3 easyconfigs in this PR)
node2416.golett.os - Linux centos linux 7.6.1810, Intel(R) Xeon(R) CPU E5-2680 v3 @ 2.50GHz, Python 2.7.5
See https://gist.github.com/89d87b5e31456afe76e6ff97a31f6796 for a full test report.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants