- 
                Notifications
    You must be signed in to change notification settings 
- Fork 55
Introduce file staging delegation via JSON staging manifest #399
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
This should be the general strategy for collecting input and output files for ARC, DIRAC, AWS batch etc.
| @mvdbeek That's all I needed to build the ARC integration (coming soon as a different PR). You may want to compare it to your branch offline_connector_marius. I stripped out the Pulsar client (that comes in the other PR) and a couple of things that were not needed to complete the integration. | 
Change base image from `conda/miniconda3` (based off Debian Stretch) to `python:3.12-bookworm`. Miniconda is not required in the base image. Add the Galaxy Depot repository, which provides SLURM DRMAA packages for Debian Buster and newer releases. Do not install the package `apt-transport-https`, it is now a dummy package, see https://packages.debian.org/en/bookworm/apt-transport-https. Install the package `slurm` instead of `slurm-llnl`. Newer versions of the `munge` package include the binary `/usr/sbin/mungekey` instead of `/usr/sbin/create-munge-key`. Nevertheless, the key seems to be created automatically when installing the package, as running `mungekey` yields 'mungekey: Error: Failed to create "/etc/munge/munge.key": File exists'.
Build wheel automatically when building the Docker image. Exclude the source code from the output image through a multistage build.
…oexecutionLaunchMixin` to `BaseRemoteConfiguredJobClient`
1e1a3c7    to
    7c8371a      
    Compare
  
    | If this one is fine, then #370 should be closed. | 
| @@ -1,41 +1,60 @@ | |||
| FROM conda/miniconda3 | |||
| FROM python:3.12-bookworm | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think it is fair at all to say "Miniconda" is not required in the base image as the commit message suggests. I get that it isn't the modality that you wish to run it in but it is documented in https://pulsar.readthedocs.io/en/latest/containers.html#co-execution as an option and it is an option that makes a lot of sense to me. Is the slurm stuff required for your use case?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think it is fair at all to say "Miniconda" is not required in the base image as the commit message suggests.
I am assuming "Miniconda" is not required in the base image because I saw line 60 (the last line): RUN _pulsar-conda-init --conda_prefix=/pulsar_dependencies/conda. It looks like it installs Miniconda, thus it should make it to the Docker image without needing to put it in the base image. But I am not familiar with Pulsar so I am not sure if line 60 actually has an effect equivalent to including it in the base image.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
About the SLURM stuff, it is not needed for this use case but it was already in the old Dockerfile (it is needed for other use cases).
    # Install packages
    && apt-get update \
    && apt-get install -y --no-install-recommends gcc \
        libcurl4-openssl-dev \
        cvmfs cvmfs-config-default \
        slurm-llnl slurm-drmaa-dev \
        bzip2 \
Because I am changing the base image to Debian Bookworm, the changes are required not to break the build with the new base image (that's why the whole commit has the title "Change base image in coexecutor Dockerfile"). slurm-drmaa1 is not available on the Debian repos anymore for Bookworm (it has to be installed from the Galaxy Depot). The latest version of slurm-drmaa1 available in the Galaxy depot (1.1.4) requires a library libslurm36 from Debian Bullseye. There exists a new version 1.1.5 of libslurm-drmaa1 on GitHub that maybe works on Bookworm without libraries from Bullseye, but @natefoo has not published a Debian package yet.
…Collector` At the moment, JSON staging and outputs manifests are constructed by tracking all actions mapped by the `FileActionMapper` using a list `FileActionMapper.actions`. This makes the `FileActionMapper` stateful, requires including `file_type` as keyword argument for `BaseAction` and its children, requires defining a finalize()` method for `FileActionMapper` and for `BaseAction` and its children. Paying the small price of refactoring `JsonTransferAction`, generate the staging manifest from `FileStager.transfer_tracker.remote_staging_actions` and the output manifest as `ResultsCollector` collects the outputs.
Set `JsonTransferAction.whole_directory_transfer_supported` to `False`, as the job files API is not capable of serving directories.
Using `basename(action.path)` creates a flat structure for each file type (e.g. job_directory/unstructured/human.fa), but Pulsar expects tree structures to work (e.g. job_directory/unstructured/f0d0164494db6cbf92c12aeb6119ac38/bwa/human.fa).
1480061    to
    496185e      
    Compare
  
    
Introduce a JSON transfer action that indicates that the pulsar server should create a JSON manifest that can be used to stage files by an external system that can stage files in and out of the compute environment. This should be the general strategy for collecting input and output files for ARC, DIRAC, AWS batch etc.
Before launching jobs, Pulsar clients receive the staging manifest as a dictionary. When the job is complete, the script
pulsar-create-output-manifestcan be used to create a manifest declaring which files should be staged out by the external system.This PR also updates the coexecutor Dockerfile so that it works again and so that it builds the Pulsar wheel (no need to build it separately anymore).