Skip to content

refactor: refine shell and scheduler arguments #805

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 5 commits into
base: develop
Choose a base branch
from
Open

Conversation

aplowman
Copy link
Contributor

@aplowman aplowman commented Apr 22, 2025

  • move Scheduler.options to QueuedScheduler and rename (with deprecation warning) as directives since it only applies to schedulers that use a job submission system
  • remove Scheduler attributes shell_args and shebang_args (and class variables DEFAULT_SHELL_ARGS and DEFAULT_SHEBANG_ARGS) and replace with Scheduler.shebang_executable, which will typically not be used, and is a way to override the shell's _executable and executable_args in the jobscript shebang line.
  • remove unused DEFAULT_SHELL_EXECUTABLE class variable from Scheduler sub-classes
  • add Shell.executable_args (e.g. for using bash in --login mode) and include this in Shell.executable
  • aims to fix Executables on the PATH are no longer found #762

@gcapes please could you test this PR on CSF3 and check if it resolves your issue?

Recommended changes to configuration files

If the schedulers.[scheduler-name].defaults.shebang_args is used in a configuration file, this will need to be removed and added to the shell's new executable_args configuration, e.g. shells.bash.defaults.executable_args, which should be a list of strings. An example change is shown here.

Recommended changes to workflow template files

The Scheduler argument options (for verbatim scheduler directives) has been renamed directives. The options key can still be used, but a deprecation warning will be printed.

- move `Scheduler.options` to `QueuedScheduler` and rename (with deprecation warning) as `directives` since it only applies to schedulers that use a job submission system
- remove `Scheduler` attributes `shell_args` and `shebang_args` (and class variables `DEFAULT_SHELL_ARGS` and `DEFAULT_SHEBANG_ARGS`) and replace with `Scheduler.shebang_executable`, which will typically not be used, and is a way to override the shell's executable and executable arguments in the jobscript shebang line.
- remove unused `DEFAULT_SHELL_EXECUTABLE` class variable from `Scheduler` sub-classes
- add `Shell.executable_args` and include this in `Shell.executable`
@aplowman aplowman requested a review from a team as a code owner April 22, 2025 15:29
@aplowman aplowman marked this pull request as draft April 22, 2025 15:39
@aplowman aplowman marked this pull request as ready for review April 22, 2025 16:27
Copy link
Collaborator

@dkfellows dkfellows left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Subject to one question, this is fine.

The question: what's the way of handling adding extra bits to a shebang line? The classic example of that is running a script interpreter with PATH searching using: #!/usr/bin/env interpreter though I've also seen an occasional need to pass other options there too. It's more common with Perl scripts I suppose...

In any case, as long as there is a documented answer, whatever it is, I'm happy with this PR.

@gcapes
Copy link
Collaborator

gcapes commented Apr 23, 2025

Thanks Adam.

I made a new nenv and installed hpcflow using pip install git+https://github.com/hpcflow/hpcflow-new.git@fix/shell
then installed matflow.

On CSF3 using SGE it has worked :)

Using SLURM, it doesn't. I updated the config like this:

  manchester-CSF3-slurm:
    invocation:
      environment_setup:
      match:
        hostname: login1.csf3.man.alces.network
    config:
      machine: manchester-CSF3-new
      telemetry: true
      log_file_path: logs/<<app_name>>_v<<app_version>>.log
      environment_sources:
      - /mnt/iusers01/support/mbexegc2/.matflow-new/envs-slurm.yaml
      task_schema_sources: []
      command_file_sources: []
      parameter_sources: []
      default_scheduler: slurm
      default_shell: bash
      schedulers:
        direct:
          defaults: {}
        slurm:
          defaults: {}
          partitions:
            serial:
              num_cores:
              - 1
              - 1
              - 1
            multicore:
              num_nodes:
              - 1
              - 1
              - 1
              num_cores:
              - 2
              - 1
              - 168
              num_cores_per_node:
              - 2
              - 1
              - 168
              parallel_modes:
              - distributed
              - shared
              - hybrid
            multinode:
              num_nodes:
              - 2
              - 1
              - 
              num_cores:
              - 80
              - 40
              - 
              num_cores_per_node:
              - 40
              - 40
              - 40
              parallel_modes:
              - distributed
              - hybrid
      shells:
        bash:
          defaults:
            executable_args: [--login]

But the workflow wouldn't submit.

$ matflow go wheresmatlab.yaml 
Traceback (most recent call last):
  File "/net/scratch/mbexegc2/matflow-demo-workflows/.venv-hpcflowlocal/bin/matflow", line 8, in <module>
    sys.exit(cli())
  File "/net/scratch/mbexegc2/matflow-demo-workflows/.venv-hpcflowlocal/lib64/python3.9/site-packages/click/core.py", line 1161, in __call__
    return self.main(*args, **kwargs)
  File "/net/scratch/mbexegc2/matflow-demo-workflows/.venv-hpcflowlocal/lib64/python3.9/site-packages/click/core.py", line 1082, in main
    rv = self.invoke(ctx)
  File "/net/scratch/mbexegc2/matflow-demo-workflows/.venv-hpcflowlocal/lib64/python3.9/site-packages/click/core.py", line 1697, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "/net/scratch/mbexegc2/matflow-demo-workflows/.venv-hpcflowlocal/lib64/python3.9/site-packages/click/core.py", line 1443, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/net/scratch/mbexegc2/matflow-demo-workflows/.venv-hpcflowlocal/lib64/python3.9/site-packages/click/core.py", line 788, in invoke
    return __callback(*args, **kwargs)
  File "/net/scratch/mbexegc2/matflow-demo-workflows/.venv-hpcflowlocal/lib64/python3.9/site-packages/hpcflow/sdk/cli.py", line 253, in make_and_submit_workflow
    out = app.make_and_submit_workflow(
  File "/net/scratch/mbexegc2/matflow-demo-workflows/.venv-hpcflowlocal/lib64/python3.9/site-packages/hpcflow/sdk/app.py", line 1644, in <lambda>
    return lambda *args, **kwargs: func(*args, **kwargs)
  File "/net/scratch/mbexegc2/matflow-demo-workflows/.venv-hpcflowlocal/lib64/python3.9/site-packages/hpcflow/sdk/app.py", line 2882, in _make_and_submit_workflow
    wk = self._make_workflow(
  File "/net/scratch/mbexegc2/matflow-demo-workflows/.venv-hpcflowlocal/lib64/python3.9/site-packages/hpcflow/sdk/app.py", line 2762, in _make_workflow
    wk = self.Workflow.from_file(
  File "/net/scratch/mbexegc2/matflow-demo-workflows/.venv-hpcflowlocal/lib64/python3.9/site-packages/hpcflow/sdk/log.py", line 81, in wrapper
    return func(*args, **kwargs)
  File "/net/scratch/mbexegc2/matflow-demo-workflows/.venv-hpcflowlocal/lib64/python3.9/site-packages/hpcflow/sdk/core/workflow.py", line 1293, in from_file
    template = cls._app.WorkflowTemplate.from_file(
  File "/net/scratch/mbexegc2/matflow-demo-workflows/.venv-hpcflowlocal/lib64/python3.9/site-packages/hpcflow/sdk/log.py", line 81, in wrapper
    return func(*args, **kwargs)
  File "/net/scratch/mbexegc2/matflow-demo-workflows/.venv-hpcflowlocal/lib64/python3.9/site-packages/hpcflow/sdk/core/workflow.py", line 696, in from_file
    return cls.from_YAML_file(path_, variables=variables)
  File "/net/scratch/mbexegc2/matflow-demo-workflows/.venv-hpcflowlocal/lib64/python3.9/site-packages/hpcflow/sdk/log.py", line 81, in wrapper
    return func(*args, **kwargs)
  File "/net/scratch/mbexegc2/matflow-demo-workflows/.venv-hpcflowlocal/lib64/python3.9/site-packages/hpcflow/sdk/core/workflow.py", line 622, in from_YAML_file
    return cls._from_data(data)
  File "/net/scratch/mbexegc2/matflow-demo-workflows/.venv-hpcflowlocal/lib64/python3.9/site-packages/hpcflow/sdk/log.py", line 81, in wrapper
    return func(*args, **kwargs)
  File "/net/scratch/mbexegc2/matflow-demo-workflows/.venv-hpcflowlocal/lib64/python3.9/site-packages/hpcflow/sdk/core/workflow.py", line 539, in _from_data
    ts_dat, shared_data=cls._app._shared_data
  File "/net/scratch/mbexegc2/matflow-demo-workflows/.venv-hpcflowlocal/lib64/python3.9/site-packages/hpcflow/sdk/app.py", line 1697, in _shared_data
    return cast("Mapping[str, Any]", self.template_components)
  File "/net/scratch/mbexegc2/matflow-demo-workflows/.venv-hpcflowlocal/lib64/python3.9/site-packages/hpcflow/sdk/app.py", line 1691, in template_components
    self._load_template_components()
  File "/net/scratch/mbexegc2/matflow-demo-workflows/.venv-hpcflowlocal/lib64/python3.9/site-packages/hpcflow/sdk/log.py", line 81, in wrapper
    return func(*args, **kwargs)
  File "/net/scratch/mbexegc2/matflow-demo-workflows/.venv-hpcflowlocal/lib64/python3.9/site-packages/hpcflow/sdk/app.py", line 1765, in _load_template_components
    for env_j in read_YAML_file(e_path):
  File "/net/scratch/mbexegc2/matflow-demo-workflows/.venv-hpcflowlocal/lib64/python3.9/site-packages/hpcflow/sdk/log.py", line 81, in wrapper
    return func(*args, **kwargs)
  File "/net/scratch/mbexegc2/matflow-demo-workflows/.venv-hpcflowlocal/lib64/python3.9/site-packages/hpcflow/sdk/core/utils.py", line 457, in read_YAML_file
    return read_YAML_str(yaml_str, typ=typ, variables=variables)
  File "/net/scratch/mbexegc2/matflow-demo-workflows/.venv-hpcflowlocal/lib64/python3.9/site-packages/hpcflow/sdk/log.py", line 81, in wrapper
    return func(*args, **kwargs)
  File "/net/scratch/mbexegc2/matflow-demo-workflows/.venv-hpcflowlocal/lib64/python3.9/site-packages/hpcflow/sdk/core/utils.py", line 435, in read_YAML_str
    return yaml.load(yaml_str)
  File "/net/scratch/mbexegc2/matflow-demo-workflows/.venv-hpcflowlocal/lib64/python3.9/site-packages/ruamel/yaml/main.py", line 451, in load
    return constructor.get_single_data()
  File "/net/scratch/mbexegc2/matflow-demo-workflows/.venv-hpcflowlocal/lib64/python3.9/site-packages/ruamel/yaml/constructor.py", line 114, in get_single_data
    node = self.composer.get_single_node()
  File "_ruamel_yaml.pyx", line 706, in ruamel.yaml.clib._ruamel_yaml.CParser.get_single_node
  File "_ruamel_yaml.pyx", line 724, in ruamel.yaml.clib._ruamel_yaml.CParser._compose_document
  File "_ruamel_yaml.pyx", line 773, in ruamel.yaml.clib._ruamel_yaml.CParser._compose_node
  File "_ruamel_yaml.pyx", line 850, in ruamel.yaml.clib._ruamel_yaml.CParser._compose_sequence_node
  File "_ruamel_yaml.pyx", line 775, in ruamel.yaml.clib._ruamel_yaml.CParser._compose_node
  File "_ruamel_yaml.pyx", line 889, in ruamel.yaml.clib._ruamel_yaml.CParser._compose_mapping_node
  File "_ruamel_yaml.pyx", line 773, in ruamel.yaml.clib._ruamel_yaml.CParser._compose_node
  File "_ruamel_yaml.pyx", line 850, in ruamel.yaml.clib._ruamel_yaml.CParser._compose_sequence_node
  File "_ruamel_yaml.pyx", line 775, in ruamel.yaml.clib._ruamel_yaml.CParser._compose_node
  File "_ruamel_yaml.pyx", line 891, in ruamel.yaml.clib._ruamel_yaml.CParser._compose_mapping_node
  File "_ruamel_yaml.pyx", line 904, in ruamel.yaml.clib._ruamel_yaml.CParser._parse_next_event
ruamel.yaml.parser.ParserError: while parsing a block mapping
  in "<unicode string>", line 27, column 5
did not find expected key
  in "<unicode string>", line 38, column 5

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Executables on the PATH are no longer found
3 participants