Skip to content

Update examples #12

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 7 commits into from
Jul 22, 2025
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
31 changes: 30 additions & 1 deletion docs/source/installation.rst
Original file line number Diff line number Diff line change
Expand Up @@ -73,7 +73,7 @@
via the ``environment_sources`` key.
Once this has been done, your environment file can be be opened using ``{{ app_module }} open env-source``.

Below is an example environments file that defines an environment for running Pyton scripts.
Below is an example environments file that defines an environment for running Python scripts.
Domain-specific tools can be added to the environments file as required, each with their own
setup instructions for loading that tool on your machine.

Expand All @@ -88,3 +88,32 @@
start: 1
stop: 32
parallel_mode: null

Note also that any {{ app_name }} environment which activates a python virtual environment
as part of the `setup`,
must also have the {{ app_name }} python package installed,
and it must be the same version as is used to submit the workflow.
In practice, this is most easily achieved by creating one python virtual environment
and using it in each of these {{ app_name }} environments and to submit workflows.

Tips for SLURM
**************

{{ app_name }} currently has a fault such that it doesn't select a SLURM partition
based on the resources requested in your workflow file.
As such, users must manually define this in their workflow files e.g.

.. code-block:: yaml

resources:
any:
scheduler_args:
directives:
--time: 00:30:00
--partition: serial

Note also that for many SLURM schedulers, a time limit must also be specified as shown above.

A `default time limit and partition <https://github.com/hpcflow/matflow-configs/blob/main/manchester-CSF3.yaml#L21-L25>`_
can be set in the config file, which will be used for tasks which don't have this set explicitly
in a ``resources`` block like the example above.
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
template_components:
task_schemas:
- objective: process_some_data
- objective: process_data
inputs:
- parameter: input_data
outputs:
Expand All @@ -10,21 +10,21 @@ template_components:
- input_file: my_input_file
from_inputs:
- input_data
script: <<script:/path/to/generate_input_file.py>>
script: <<script:/full/path/to/generate_input_file.py>>
environments:
- scope:
type: any
environment: python_env
script_exe: python_script
script: <<script:/path/to/process_input_file.py>>
script: <<script:/full/path/to/process_input_file.py>>
save_files:
- processed_file
output_file_parsers:
parsed_output:
from_files:
- my_input_file
- processed_file
script: <<script:/path/to/parse_output.py>>
script: <<script:/full/path/to/parse_output.py>>
save_files:
- parsed_output

Expand All @@ -33,23 +33,25 @@ template_components:
- parameter: input_data
- parameter: path
actions:
- script: <<script:/path/to/generate_input_file.py>>
- script: <<script:/full/path/to/generate_input_file.py>>
script_data_in: direct
script_exe: python_script
save_files:
save_files:
- my_input_file
environments:
- scope:
type: any
environment: python_env
- script: <<script:/path/to/process_input_file.py>>
requires_dir: true
- script: <<script:/full/path/to/process_input_file.py>>
script_exe: python_script
environments:
- scope:
type: any
environment: python_env
save_files:
- processed_file
requires_dir: true

command_files:
- label: my_input_file
Expand All @@ -64,7 +66,7 @@ template_components:


tasks:
- schema: process_some_data
- schema: process_data
inputs:
input_data: [1, 2, 3, 4]
- schema: process_data_without_input_file_generator
Expand Down
124 changes: 19 additions & 105 deletions docs/source/user/getting_started/advanced_workflow_concepts.rst
Original file line number Diff line number Diff line change
Expand Up @@ -12,13 +12,10 @@ Requesting resources can be done using a ``resources`` block, either for the who

resources:
any:
scheduler: sge # Setting the scheduler is not normally needed because a
# `default_scheduler` will be set in the config file.
shell_args:
executable_args: ["--login"]
scheduler_args:
directives:
-l: short
--time: 1:00:00
--partition: multicore

or at the task level

Expand Down Expand Up @@ -67,22 +64,20 @@ resources, and will run the command which matches those resources.
There are lots of :ref:`resource options <reference/_autosummary/{{ app_module }}.ResourceSpec:{{ app_module }}.ResourceSpec>`
available that can be requested.

Scheduler arguments can be passed like this e.g. to target high memory nodes:
Scheduler arguments can be passed like this e.g. to set a time limit of 1 hour

.. code-block:: yaml

resources:
any:
num_cores: 10
SGE_parallel_env: smp.pe
scheduler_args:
directives:
-l: mem512
any:
scheduler_args:
directives:
--time: 1:00:00
num_cores: 10

Anything specified under `directives` is passed directly to the scheduler as a jobscript command (i.e. isn't processed by {{ app_name }} at all).

If you have set resource options at the top level (for the whole workflow), but would like to "unset" them for a particular task,

you can pass an empty dictionary:

.. code-block:: yaml
Expand All @@ -93,7 +88,6 @@ you can pass an empty dictionary:
num_cores: 16
scheduler_args:
directives: {} # "Clear" any previous directives which have been set.
inputs:


Task sequences
Expand Down Expand Up @@ -141,41 +135,9 @@ Then whichever parameters are linked with the group in the task schema will be r

Here is an example workflow using sequences and groups that you might wish to run to solidify your understanding

.. code-block:: yaml

# groups_workflow.yaml

template_components:
task_schemas:
- objective: s1
inputs:
- parameter: p1
outputs:
- parameter: p2
actions:
- commands:
- command: echo $(( <<parameter:p1>> + 1 )) # This is printed to stdout
- command: echo $(( <<parameter:p1>> + 1 )) # This is captured as p2
stdout: <<int(parameter:p2)>>
- objective: s2
inputs:
- parameter: p2
group: my_group
outputs:
- parameter: p3
actions:
- commands:
- command: echo <<parameter:p2>> # This one is printed to stdout
- command: echo $(( <<sum(parameter:p2)>> )) # This is captured as p3
stdout: <<int(parameter:p3)>>
tasks:
- schema: s1
sequences:
- path: inputs.p1
values: [1, 2]
groups:
- name: my_group
- schema: s2
.. literalinclude:: groups_workflow.yaml
:language: YAML


Task schema shortcuts
Expand Down Expand Up @@ -243,63 +205,15 @@ This is because an output file parser only has one named output parameter,
so a dictionary isn't needed to distinguish different output parameters.

The :ref:`previous example <command_files_example_workflow>` has been reworked and
expanded below to demonstrate ``input_file_generators`` and ``output_file_parsers``.
expanded below to demonstrate ``input_file_generators`` and ``output_file_parsers``,
along with the alternative code which would be needed to achieve the same result
as the input file generator:

.. code-block:: yaml
.. literalinclude:: advanced_workflow.yaml
:language: yaml

# workflow.yaml

template_components:
task_schemas:
- objective: process_some_data
inputs:
- parameter: input_data
outputs:
- parameter: parsed_output
actions:
- input_file_generators:
- input_file: my_input_file
from_inputs:
- input_data
script: <<script:/full/path/to/generate_input_file.py>>
environments:
- scope:
type: any
environment: python_env
script_exe: python_script
script: <<script:/full/path/to/process_input_file.py>>
save_files:
- processed_file
output_file_parsers:
parsed_output:
from_files:
- my_input_file
- processed_file
script: <<script:/full/path/to/parse_output.py>>
save_files:
- parsed_output

This workflow uses the same python scripts as before, with the addition of

.. code-block:: python

# parse_output.py

import json
def parse_output(my_input_file: str, processed_file: str):
"""Do some post-processing of data files.

In this instance, we're just making a dictionary containing both the input
and output data.
"""
with open(my_input_file, "r") as f:
input_data = json.load(f)
with open(processed_file, "r") as f:
processed_data = json.load(f)

combined_data = {"input_data": input_data, "output_data": processed_data}
# Save file so we can look at the data
with open("parsed_output.json", "w") as f:
json.dump(combined_data, f, indent=2)

return {"parsed_output": combined_data}
This workflow uses the same python scripts as before, with the addition of ``parse_output.py``:

.. literalinclude:: parse_output.py
:language: python
41 changes: 41 additions & 0 deletions docs/source/user/getting_started/command_files_example.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,41 @@
# workflow.yaml
template_components:
task_schemas:
- objective: process_data
inputs:
- parameter: input_data
- parameter: path
default_value: input_file.json
actions:
- script: <<script:/path/to/generate_input_file.py>>
script_data_in: direct
script_exe: python_script
save_files: # A copy of any command files listed here will be saved in the the artifacts directory
- my_input_file
environments:
- scope:
type: any
environment: python_env
- script: <<script:/path/to/process_input_file.py>>
script_exe: python_script
environments:
- scope:
type: any
environment: python_env
save_files:
- processed_file

command_files:
- label: my_input_file
name:
name: input_file.json
- label: processed_file
name:
name: processed_file.json


tasks:
- schema: process_data
inputs:
input_data: [1, 2, 3, 4]
path: input_file.json
Loading