-
Notifications
You must be signed in to change notification settings - Fork 94
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Platforms.groups #3793
Platforms.groups #3793
Conversation
1d0fff1
to
3e5a94b
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks ok.
b93bd3f
to
631cfc5
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks fine, but I wonder if it might be worth implementing a non-random selection method now to prevent test flakiness - which is likely to be painful even if it occurs infrequently.
cylc/flow/cfgspec/globalcfg.py
Outdated
@@ -469,7 +469,7 @@ | |||
Conf('hosts', VDR.V_STRING_LIST, ['localhost']) | |||
|
|||
# Platform Groups | |||
with Conf('platform groups'): | |||
with Conf('platform aliases'): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have a vague feeling that we decided in the workshop to use "groups" rather than "aliases", perhaps because "groups" is more descriptive? - the purpose of this is to group a bunch of platforms together to be treated as one. However my notes don't mention that, and I suppose we should have picked this up in the proposal review stage if it mattered. "Aliases" isn't too bad anyway, so no biggie.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, you're right, platform groups are listed in the "upgraded config" here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right then - I'll fix that.
One way of doing it would be to create a platform generator the first time an alias is used, then yield the next true platform name each time the alias is used. Then we could implement any selection method quite easily. Just for the hell of it, I tried "random" and "endless cycle" this way (cycling would enable a non-flaky test, and might be a useful real-world option too?) - it seems to work fine, here's the diff in case you want to consider it. I guess a similar method could be used for host selection within platforms too? diff --git a/cylc/flow/platforms.py b/cylc/flow/platforms.py
index 666889fd5..73e991f42 100644
--- a/cylc/flow/platforms.py
+++ b/cylc/flow/platforms.py
@@ -19,11 +19,33 @@
import random
import re
from copy import deepcopy
+from itertools import cycle
from cylc.flow.exceptions import PlatformLookupError
from cylc.flow.cfgspec.glbl_cfg import glbl_cfg
+platform_generators = {}
+
+
+def random_gen(stuff):
+ while True:
+ yield random.choice(stuff)
+
+
+def choose_platform(alias, platforms, method="random"):
+ method = "cycle" # TEST ME
+ if alias not in platform_generators:
+ if method == "cycle":
+ gen = cycle(platforms)
+ elif method == "random":
+ gen = random_gen(platforms)
+ platform_generators[alias] = gen
+ else:
+ gen = platform_generators[alias]
+ return next(gen)
+
+
def platform_from_name(platform_name=None, platforms=None):
"""
Find out which job platform to use given a list of possible platforms and
@@ -55,9 +77,11 @@ def platform_from_name(platform_name=None, platforms=None):
for platform_name_re in reversed(list(platform_aliases)):
if re.fullmatch(platform_name_re, platform_name):
- platform_name = random.choice(
- platform_aliases[platform_name_re]['platforms']
- )
+ # platform_name is an alias; replace with a real platform
+ platform_name = choose_platform(
+ platform_name,
+ platform_aliases[platform_name_re]['platforms'])
+ print('>>>>>>>>>>', platform_name) # DEBUG
# The list is reversed to allow user-set platforms (which are loaded
# later than site set platforms) to be matched first and override site |
(This would require adding |
This seems like a reasonable template @hjoliver - and a good way around the flaky test. Thanks. I'll get on it. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good.
cylc/flow/platforms.py
Outdated
|
||
if platform_name is None: | ||
platform_data = deepcopy(platforms['localhost']) | ||
platform_data['name'] = 'localhost' | ||
return platform_data | ||
|
||
for platform_name_re in reversed(list(platform_aliases)): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Funny thought, what if someone defines both a platform and a platform group with the same name?
- Should we validate platform groups to prevent this from occurring.
- Otherwise the approach used here (go through platform groups before platforms) seems sensible to me.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are cases where this might be reasonable:
[platform aliases]
# Site user prevents anyone specifying a login node
# (unless they do it explicitly by creating their own platform)
[[hpc\d?]]
platforms = hpc1, hpc2 #, hpc3 is out of service, but everyone's hpc3 tasks work anyway!
...
(although I can't off the top of my head imagine why hpc 1-3 are separate platforms (I suppose that they might have different bs command templates or something...))
...but equally allows wonderful brainaches...
[platform aliases]
[[chocolate]]
platforms = vanilla
[[vanilla]]
platforms = chocolate
On consideration I'd say leave it, but perhaps make sure we document in the section of the docs for site managers.
No, this would be platform selection method - host selection method would be added to the platform... |
0512197
to
a013578
Compare
Nice simple approach, a There's just one barrier to overcome which is that these generators would be long lived? Which means that the list of hosts provided to the generator would be fossilised for the life of the workflow run. This gets in the way of automatically reloading the global config at regular intervals would would enable site admins to, say, remove a login node or drain a system. [platform groups]
[[my_group]]
- platforms = foo, bar, baz
+ platforms = bar, baz Note we will encounter the same issues when it comes to selecting hosts from a platform: [platforms]
[[my_platform]]
- hosts = foo, bar, baz
+ hosts = bar, baz So I think we need to be passing the list of platforms to the selection method with each call. For a selection method like def cycle_selection(_):
index = 0
def _cycle_selection(platforms):
nonlocal index
index += 1
return platforms[index % len(platforms)]
return _cycle_selection
METHODS = {
'cycle': cycle_selection()
}
def choose_platform(alias, platforms, method="random"):
meth = METHODS[method](platforms) There is a future nice-to-have about allowing user-defined selection functions which would require this interface anyway
|
See the related #3800 for thoughts about integrating the existing Cylc host-selection logic into platforms. |
I find myself uncomfortable doing too much jiggery-pokery for a platform selection method I don't think will ever actually be used. I'll have a quick play though. |
Hmm, I kinda thought the generator approach was more elegant. Presumably we could just "reset" a generator by replacement, at the appropriate time: >>> from itertools import cycle
>>> lunch = ['cat', 'dog', 'sausage']
>>> gen = cycle(lunch)
>>> next(gen)
1
>>> next(gen)
2
>>> gen = cycle(lunch) # RESET
>>> next(gen)
1
>>> next(gen)
2 However, I haven't actually thought about the mechanics of how to do that in context, and @wxtim has implemented your approach, which works, - so no biggie 👍 |
74dacf0
to
a013578
Compare
I'm afraid that it all became a bit wierd and complex - @oliver-sanders original fix didn't remember which platform_group was which, and I had trouble getting modified logic working. Since in the current form the test is very, very unlikely (I can make it slower to fail correctly and less likely to give a false negative if you want?) to give a false negative I'd like to commend the PR in its original form to @oliver-sanders and @hjoliver. No reason that we can't have a follow on request. |
Alright, fine with me 👍 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good, tested as working. 👍
Maybe bang up a follow-up issue for better/additional section methods?
FYI: we realised that we are going to require a more advanced platform/host selection interface in order to be able to automatically handle network/system outages. Here's a purely illustrative example to explain what I mean by this: def host_from_platform_group(group):
for platform in group:
for host in platform['hosts']
yield host
def submit_job(job):
for host in host_from_platform_group(job['platform']):
try:
submit(job, host)
break
except (PlatformLookupError, HostSelectError, socket.gaierror, SomeTimeoutError):
continue |
Agreed, but can we get this mergéd? |
Is there a follow-up issue up for this? |
|
Synopsis
Part addresses issue #3686
As per the platforms proposal it was agreed that platform aliases should be available. In their most basic form (here) these allow a task with a platform set by the user to be assigned by the site administrator to any of a set of platforms.
Work Carried Out
[platform aliases][ALIAS]platforms
config to theglobal.cylc
config spec.random.choice
to pick a platform from the list.Work not carried out
Additional infrastructure for more sophisticated platform from alias selection.
Discussion Point
Can anyone suggest a better way to test this. My unit test is horrible, because I cannot see a way to test this functionality nicely.