Skip to content

[WIP] Pulsar client for ARC #401

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Draft
wants to merge 15 commits into
base: master
Choose a base branch
from
Draft

Conversation

kysrpex
Copy link
Contributor

@kysrpex kysrpex commented Jul 3, 2025

Implement a Pulsar client that runs jobs on computing infrastructure using the Advanced Resource Connector (ARC) middleware.

Requires #399, jmchilton/simple-job-files#1, jmchilton/simple-job-files#2, jmchilton/simple-job-files#3.

mvdbeek and others added 9 commits February 3, 2025 15:09
This should be the general strategy for collecting input and output
files for ARC, DIRAC, AWS batch etc.
Change base image from `conda/miniconda3` (based off Debian Stretch) to `python:3.12-bookworm`. Miniconda is not required in the base image.

Add the Galaxy Depot repository, which provides SLURM DRMAA packages for Debian Buster and newer releases.

Do not install the package `apt-transport-https`, it is now a dummy package, see https://packages.debian.org/en/bookworm/apt-transport-https. Install the package `slurm` instead of `slurm-llnl`.

Newer versions of the `munge` package include the binary `/usr/sbin/mungekey` instead of `/usr/sbin/create-munge-key`. Nevertheless, the key seems to be created automatically when installing the package, as running `mungekey` yields 'mungekey: Error: Failed to create "/etc/munge/munge.key": File exists'.
Build wheel automatically when building the Docker image. Exclude the source code from the output image through a multistage build.
…oexecutionLaunchMixin` to `BaseRemoteConfiguredJobClient`
@kysrpex
Copy link
Contributor Author

kysrpex commented Jul 3, 2025

This PR is definitely not fully ready yet, but of course I really appreciate your comments. Please focus on 4755525 and subsequent commits, previous commits belong to #399.

kysrpex added 6 commits July 17, 2025 15:15
…Collector`

At the moment, JSON staging and outputs manifests are constructed by tracking all actions mapped by the `FileActionMapper` using a list `FileActionMapper.actions`. This makes the `FileActionMapper` stateful, requires including `file_type` as keyword argument for `BaseAction` and its children, requires defining a finalize()` method for `FileActionMapper` and for `BaseAction` and its children.

Paying the small price of refactoring `JsonTransferAction`, generate the staging manifest from `FileStager.transfer_tracker.remote_staging_actions` and the output manifest as `ResultsCollector` collects the outputs.
Set `JsonTransferAction.whole_directory_transfer_supported` to `False`, as the job files API is not capable of serving directories.
Using `basename(action.path)` creates a flat structure for each file type (e.g. job_directory/unstructured/human.fa), but Pulsar expects tree structures to work (e.g. job_directory/unstructured/f0d0164494db6cbf92c12aeb6119ac38/bwa/human.fa).
Implement a Pulsar client that runs jobs on computing infrastructure using the Advanced Resource Connector (ARC) middleware.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants