Skip to content

Conversation

@leongdl
Copy link
Contributor

@leongdl leongdl commented Jan 6, 2026

Fixes: #634

Usage:

--force-s3-check / --no-force-s3-check
A boolean flag pair for controlling S3 verification behavior during job attachment uploads.

Usage
# Force S3 HEAD verification on every file (skip cache, skip integrity check)
deadline bundle submit /path/to/job-bundle --force-s3-check

# Use cache with integrity sampling (default behavior)
deadline bundle submit /path/to/job-bundle --no-force-s3-check

# Omit the flag to use config setting (defaults to false)
deadline bundle submit /path/to/job-bundle
Behavior
Flag	Value	Behavior
--force-s3-check	True	Always verify files exist in S3 via HEAD request before skipping upload. Slower but most reliable. Skips integrity sampling since every file is verified anyway.
--no-force-s3-check	False	Use local cache with periodic integrity sampling against S3. If sampled files are missing from S3, the cache is reset and files are re-uploaded.
(omitted)	None	Falls back to settings.force_s3_check in config file.
Config File
~/.deadline/config:

[settings]
force_s3_check = false
When to Use
Use --force-s3-check when S3 bucket contents may be out of sync with your local cache (e.g., after bucket lifecycle policies, manual deletions, or cross-machine workflows).
Use --no-force-s3-check (or omit) for faster uploads when you trust your local cache is accurate.

What was the problem/requirement? (What/Why)

When S3 Lifecycle policies delete job attachment files from the CAS bucket, the local s3_check_cache.db becomes stale. Subsequent job submissions trust the cached hash and skip uploading, resulting in jobs that reference non-existent S3 objects. Users had to manually delete ~/.deadline/cache/s3_check_cache.db to recover.

What was the solution? (How)

Added a --verify-job-attachment-existence CLI flag and corresponding settings.verify_job_attachment_existence config setting. When enabled:

  • The upload phase skips the S3CheckCache lookup entirely
  • Always performs an S3 HEAD request to verify the object exists
  • Re-uploads the file if the S3 object is missing
  • Updates the S3CheckCache after successful verification

This approach keeps all verification logic in the upload phase without coupling the HashCache and S3CheckCache.

What is the impact of this change?

  • Users with S3 Lifecycle policies can enable verification to ensure job attachments exist before submission
  • Default behavior (flag disabled) is unchanged - no performance impact for normal workflows
  • When enabled, adds one S3 HEAD request per file with a HashCache hit

How was this change tested?

See DEVELOPMENT.md for information on running tests.

  • Have you run the unit tests?
Required test coverage of 80.0% reached. Total coverage: 82.80%
======================================================================= slowest 5 durations =======================================================================
5.42s call     test/unit/deadline_client/cli/test_cli_job.py::test_cli_job_download_output_stdout_with_only_required_input
5.33s call     test/unit/deadline_client/cli/test_cli_handle_web_url.py::test_cli_handle_web_url_download_output_only_required_input
4.62s call     test/unit/deadline_client/api/test_job_bundle_submission_asset_refs.py::test_create_job_from_job_bundle_with_all_asset_ref_variants
4.32s call     test/unit/deadline_client/api/test_api_farm.py::test_list_farms_paginated
4.13s call     test/unit/deadline_job_attachments/test_upload.py::TestUpload::test_asset_management_no_outputs_large_number_of_inputs_already_uploaded[2023-03-03-200]
========================================================== 2035 passed, 64 skipped, 1 xfailed in 40.40s ===========================================================
  • Additional testing

    • 1: Tested with a fresh config file, run the config gui to toggle on/off setting
    • 2: Tested with Deadline Bundle submit, submit a job, delete the CAS file to simulate S3 eviction, re-submit to see the file uploaded again. (Hash config checked)
    • 3: Tested with Deadline Bundle submit, submit a job, delete the CAS file to simulate S3 eviction, re-submit to see the file not uploaded again. (Hash config unchecked)
  • Have you run the integration tests?

    • Not applicable.
  • Have you made changes to the download or asset_sync modules? If so, then it is highly recommended that you ensure that the docker-based unit tests pass.

    • No

Was this change documented?

  • Are relevant docstrings in the code base updated?
  • Has the README.md been updated? If you modified CLI arguments, for instance.

Does this PR introduce new dependencies?

This library is designed to be integrated into third-party applications that have bespoke and customized deployment environments. Adding dependencies will increase the chance of library version conflicts and incompatabilities. Please evaluate the addition of new dependencies. See the Dependencies section of DEVELOPMENT.md for more details.

  • This PR adds one or more new dependency Python packages. I acknowledge I have reviewed the considerations for adding dependencies in DEVELOPMENT.md.
  • This PR does not add any new dependencies.

Is this a breaking change?

No. This is an additive change with a new opt-in flag. Default behavior is unchanged.

Does this change impact security?

No. This change only adds an optional S3 HEAD verification step during upload. It does not modify file permissions, create new files/directories, or change authentication/authorization flows.


By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.


Design Proposal: HashCache Job Attachment Verification Option

Problem Statement

Currently, the HashCache and S3CheckCache operate independently during job submission:

  1. HashCache (_process_input_path in upload.py:1053-1108): Looks up file by (file_path, hash_algorithm). If found and mtime matches, it skips hashing entirely and uses the cached hash.

  2. S3CheckCache (upload_object_to_cas in upload.py:639-680): During upload phase, checks if the S3 key exists in cache. If found, skips both the S3 HEAD request and upload.

The Gap: When a file is in HashCache with matching mtime, the system trusts the cached hash. But if the corresponding object was deleted from S3 (bucket cleanup, lifecycle policy, manual deletion), the S3CheckCache may be stale or the entry may have expired (30-day TTL). This results in a job referencing non-existent S3 objects.

Current Code Flow

┌─────────────────────────────────────────────────────────────────────────────┐
│                           HASHING PHASE                                      │
│  _process_input_path() [upload.py:1053]                                     │
│                                                                              │
│  1. HashCache.get_connection_entry(file_path, hash_alg)                     │
│  2. If entry exists AND mtime matches:                                       │
│       → FileStatus.UNCHANGED, use cached hash (NO HASHING)                  │
│  3. If entry exists AND mtime differs:                                       │
│       → FileStatus.MODIFIED, rehash, update cache                           │
│  4. If no entry:                                                             │
│       → FileStatus.NEW, hash file, add to cache                             │
└─────────────────────────────────────────────────────────────────────────────┘
                                    │
                                    ▼
┌─────────────────────────────────────────────────────────────────────────────┐
│                           UPLOAD PHASE                                       │
│  upload_object_to_cas() [upload.py:639]                                     │
│                                                                              │
│  1. S3CheckCache.get_connection_entry(s3_key)                               │
│  2. If entry exists (and not expired):                                       │
│       → SKIP upload entirely (no S3 HEAD, no upload)                        │
│  3. If no entry:                                                             │
│       → file_already_uploaded() does S3 HEAD request                        │
│       → Upload if not in S3, then add to S3CheckCache                       │
└─────────────────────────────────────────────────────────────────────────────┘

Proposed Solution

Add a --verify-job-attachment-existence CLI option (and corresponding API parameter) that performs an S3 HEAD check for files found in HashCache before trusting the cached hash.

New Flow with verify_job_attachment_existence=True

┌─────────────────────────────────────────────────────────────────────────────┐
│                           HASHING PHASE (unchanged)                          │
│  _process_input_path() [upload.py:1053]                                     │
│                                                                              │
│  (Same as current behavior - HashCache lookup, hash if needed)              │
└─────────────────────────────────────────────────────────────────────────────┘
                                    │
                                    ▼
┌─────────────────────────────────────────────────────────────────────────────┐
│                    UPLOAD PHASE (with verification)                          │
│  upload_object_to_cas() [upload.py:639]                                     │
│                                                                              │
│  IF verify_job_attachment_existence:                                         │
│    → Skip S3CheckCache lookup entirely                                       │
│    → Always do S3 HEAD request                                               │
│    → If NOT EXISTS: upload file                                              │
│    → Update S3CheckCache after HEAD/upload                                   │
│  ELSE:                                                                       │
│    → (current behavior)                                                      │
└─────────────────────────────────────────────────────────────────────────────┘

Implementation Plan

1. CLI Option (bundle_group.py)

@click.option(
    "--verify-job-attachment-existence",
    is_flag=True,
    help="Verify that cached file hashes exist in S3 before skipping re-hash. "
         "Use when S3 bucket contents may have been modified outside of normal workflow.",
)

2. API Parameter (_submit_job_bundle.py)

Add verify_job_attachment_existence: bool = False parameter to create_job_from_job_bundle().

Pass through to _upload_attachments().

3. Core Implementation (upload.py)

3.1 Modify S3AssetUploader.upload_object_to_cas()

Add parameter:

  • verify_job_attachment_existence: bool = False

New logic:

def upload_object_to_cas(
    self,
    file: base_manifest.BaseManifestPath,
    hash_algorithm: HashAlgorithm,
    s3_bucket: str,
    source_root: Path,
    s3_cas_prefix: str,
    s3_check_cache: S3CheckCache,
    progress_tracker: Optional[ProgressTracker] = None,
    verify_job_attachment_existence: bool = False,
) -> Tuple[bool, int]:
    local_path = source_root.joinpath(file.path)
    s3_upload_key = self._generate_s3_upload_key(file, hash_algorithm, s3_cas_prefix)
    is_uploaded = False

    # Skip S3CheckCache lookup if verification is requested - always do S3 HEAD
    if not verify_job_attachment_existence:
        if s3_check_cache.get_connection_entry(
            s3_key=f"{s3_bucket}/{s3_upload_key}", 
            connection=s3_check_cache.get_local_connection()
        ):
            logger.debug(
                f"skipping {local_path} because {s3_bucket}/{s3_upload_key} exists in the cache"
            )
            return (is_uploaded, file.size)

    # Always do S3 HEAD check (either cache miss or verification requested)
    if self.file_already_uploaded(s3_bucket, s3_upload_key):
        logger.debug(
            f"skipping {local_path} because it has already been uploaded to s3://{s3_bucket}/{s3_upload_key}"
        )
    else:
        self.upload_file_to_s3(
            local_path=local_path,
            s3_bucket=s3_bucket,
            s3_upload_key=s3_upload_key,
            progress_tracker=progress_tracker,
        )
        is_uploaded = True

    # Update S3CheckCache after successful HEAD or upload
    s3_check_cache.put_entry(
        S3CheckCacheEntry(
            s3_key=f"{s3_bucket}/{s3_upload_key}",
            last_seen_time=self._get_current_timestamp(),
        )
    )

    return (is_uploaded, file.size)

3.2 Modify S3AssetUploader.upload_input_files()

Pass verify_job_attachment_existence to upload_object_to_cas() calls.

3.3 Modify S3AssetManager.upload_assets()

Add parameter and pass through to upload_input_files().

4. Config File Setting (Optional Enhancement)

Add settings.verify_job_attachment_existence to allow setting this as a default behavior.

Design Decision

Selected: Option D - Always S3 HEAD in upload phase

This approach is the simplest and keeps all verification logic in the upload phase:

  • Hashing phase remains completely unchanged
  • Upload phase skips S3CheckCache lookup when flag is set, always does S3 HEAD
  • If S3 HEAD fails (object missing), upload proceeds normally
  • No coupling between caches, no new return values needed

Why HashCache invalidation is NOT needed: The HashCache stores the mapping of (file_path, hash_algorithm) → (file_hash, mtime). This remains correct even when S3 objects are deleted - the local file hasn't changed, so its hash is still valid. The only stale cache is S3CheckCache, which Option D bypasses entirely by doing the actual S3 HEAD check.

Alternative approaches rejected:

  • Option A: Don't invalidate S3CheckCache - upload phase would skip due to stale cache entry
  • Option B: Return missing S3 keys from hashing, invalidate in orchestration layer - adds complexity and changes return signatures
  • Option C: Track files needing forced upload and skip cache in upload phase - similar to D but more complex tracking
  • Option D: (Selected) Always S3 HEAD in upload phase - simplest approach, no cache coupling

Files to Modify

File Changes
src/deadline/client/cli/_groups/bundle_group.py Add --verify-job-attachment-existence CLI option
src/deadline/client/api/_submit_job_bundle.py Add verify_job_attachment_existence parameter, pass to _upload_attachments
src/deadline/client/api/_job_attachment.py No changes needed
src/deadline/job_attachments/upload.py Modify upload_object_to_cas, upload_input_files, and S3AssetManager.upload_assets to accept and use the flag

Performance Considerations

  • Default OFF: No performance impact for normal workflows
  • When enabled: Adds one S3 HEAD request per file with HashCache hit (unchanged files)
  • Mitigation: Could batch HEAD requests or use S3 inventory for large submissions
  • Trade-off: Extra latency vs. guaranteed S3 consistency

Alternative Approaches Considered

  1. Always verify S3 existence: Too expensive for normal workflows
  2. Periodic S3CheckCache validation: Already exists (verify_hash_cache_integrity samples 30 entries), but only runs at upload time
  3. Invalidate HashCache when S3CheckCache entry expires: Complex coupling between caches
  4. Store S3 verification timestamp in HashCache: Adds complexity to cache schema

Testing Strategy

  1. Unit tests for upload_object_to_cas with verify_job_attachment_existence=True
  2. Integration test: Submit job, delete S3 object, resubmit with flag - verify file is re-uploaded

Persisted Configuration Design

Overview

In addition to the CLI flag --verify-job-attachment-existence, this setting can be persisted in the Deadline Cloud configuration file (~/.deadline/config). This allows users to enable verification by default without specifying the flag on every submission.

Precedence

CLI flag (--verify-job-attachment-existence) > Config file setting > Default (false)

When the CLI flag is provided, it overrides the config file setting. This follows the existing pattern used by --yes / settings.auto_accept.

Configuration File Changes

1. Add Setting to SETTINGS Dictionary (config_file.py)

SETTINGS: Dict[str, Dict[str, Any]] = {
    # ... existing settings ...
    "settings.verify_job_attachment_existence": {
        "default": "false",
        "description": (
            "When enabled, always verify that cached file hashes exist in S3 before skipping upload. "
            "Use when S3 bucket contents may have been modified outside of normal workflow "
            "(e.g., bucket lifecycle policies, manual deletions)."
        ),
    },
}

CLI Integration (bundle_group.py)

The CLI flag behavior:

  • If --verify-job-attachment-existence is provided → use True
  • If not provided → read from settings.verify_job_attachment_existence config
@click.option(
    "--verify-job-attachment-existence",
    is_flag=True,
    default=None,  # None means "not specified", allowing config fallback
    help="Verify that cached file hashes exist in S3 before skipping upload. "
         "Overrides the 'settings.verify_job_attachment_existence' config setting.",
)
def bundle_submit(..., verify_job_attachment_existence: Optional[bool], ...):
    # Resolve the effective value
    if verify_job_attachment_existence is None:
        verify_job_attachment_existence = config_file.str2bool(
            config_file.get_setting("settings.verify_job_attachment_existence", config=config)
        )
    # Pass to create_job_from_job_bundle()

GUI Integration (Optional)

Add a checkbox in DeadlineConfigDialog under "General settings":

# In deadline_config_dialog.py, _init_general_settings_group()
self.verify_job_attachment_existence = self._init_checkbox_setting(
    group,
    layout,
    "settings.verify_job_attachment_existence",
    tr("Verify job attachment existence in S3"),
)

Config File Example

[settings]
verify_job_attachment_existence = true

Files to Modify (Additional)

File Changes
src/deadline/client/config/config_file.py Add settings.verify_job_attachment_existence to SETTINGS dict
src/deadline/client/cli/_groups/bundle_group.py Update CLI option to support config fallback
src/deadline/client/ui/dialogs/deadline_config_dialog.py (Optional) Add checkbox in General settings

User Workflow

  1. One-time setup: User runs deadline config set settings.verify_job_attachment_existence true
  2. All subsequent submissions: Verification is enabled by default
  3. Override when needed: User can still use --no-verify-job-attachment-existence (if implemented) to disable for a single submission

Alternative: --no-verify-job-attachment-existence Flag

To allow users to disable verification when the config is enabled, consider adding a negation flag:

@click.option(
    "--verify-job-attachment-existence/--no-verify-job-attachment-existence",
    is_flag=True,
    default=None,
    help="Verify that cached file hashes exist in S3 before skipping upload.",
)

This provides full flexibility:

  • Config false + no flag → false
  • Config false + --verify-job-attachment-existencetrue
  • Config true + no flag → true
  • Config true + --no-verify-job-attachment-existencefalse

@github-actions github-actions bot added the waiting-on-maintainers Waiting on the maintainers to review. label Jan 6, 2026
@leongdl leongdl force-pushed the always-hash branch 2 times, most recently from fc6c8d2 to 87de313 Compare January 6, 2026 01:07
@leongdl leongdl force-pushed the always-hash branch 4 times, most recently from 9566d8f to cce76e1 Compare January 6, 2026 22:04
@leongdl leongdl marked this pull request as ready for review January 6, 2026 22:38
@leongdl leongdl requested review from a team as code owners January 6, 2026 22:38
stangch
stangch previously approved these changes Jan 6, 2026
mwiebe
mwiebe previously approved these changes Jan 7, 2026
@baxeaz
Copy link
Contributor

baxeaz commented Jan 7, 2026

I guess one other option that doesn't seem to be mentioned is to just have the option do reset_s3_check_cache. If I have a bucket with data from many different scenes I've worked on and I know a lot of the data in my bucket may have been removed by lifecycle rules/altered somehow I think I could get in a state where I'd have to keep using this option until I think I'd covered all of the possible assets I have in my local S3 cache vs the one time "ok, just invalidate all of my local recorded S3 hashes". This way does save even looking at the DB which is nice though. Maybe there's a case longer term for both options?

@leongdl
Copy link
Contributor Author

leongdl commented Jan 7, 2026

I guess one other option that doesn't seem to be mentioned is to just have the option do reset_s3_check_cache. If I have a bucket with data from many different scenes I've worked on and I know a lot of the data in my bucket may have been removed by lifecycle rules/altered somehow I think I could get in a state where I'd have to keep using this option until I think I'd covered all of the possible assets I have in my local S3 cache vs the one time "ok, just invalidate all of my local recorded S3 hashes". This way does save even looking at the DB which is nice though. Maybe there's a case longer term for both options?

Good suggestion - but deleting the database is more invasive right? In the end, both would lead to a S3 head and then upload. The current option does the head and updates the database with the current status.

We can add this as another option too, both would solve this problem.

@patrickjahns
Copy link

patrickjahns commented Jan 8, 2026

I guess one other option that doesn't seem to be mentioned is to just have the option do reset_s3_check_cache. If I have a bucket with data from many different scenes I've worked on and I know a lot of the data in my bucket may have been removed by lifecycle rules/altered somehow I think I could get in a state where I'd have to keep using this option until I think I'd covered all of the possible assets I have in my local S3 cache vs the one time "ok, just invalidate all of my local recorded S3 hashes". This way does save even looking at the DB which is nice though. Maybe there's a case longer term for both options?

From our experience - several "3D artists" work on projects and use defined queues ( which are tied to a buckets). For us, these artists share common libraries with textures etc. for different projects. We often encounter the issue, that especially these shared files do expire - even though they are frequently used. So a single artist will not be aware of when shared files are uploaded and it's difficult to gauge when files might be expired due to lifecycle policies. Right now we remind the artists to frequently clear their cache - neither the less we experience frequent aborts from jobs because of missing assets. While having a cli command would be nice - right now just deleting the local cache folder does the same job and is a cumbersome experience.

The strategy from this PR seems more reasonable, especially if a team works on projects with shared resources. The performance trade-off is okay for us.

@leongdl leongdl dismissed stale reviews from mwiebe and stangch via c3ed5a0 January 8, 2026 19:27
@leongdl leongdl force-pushed the always-hash branch 3 times, most recently from 20ebb0f to 13b6a68 Compare January 8, 2026 19:50
Comment on lines 219 to 220
- True: Skip cache, always do S3 HEAD to verify existence (skips integrity check)
- False/None: Use cache with periodic integrity sampling against S3 (default)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Attempted wording improvement, including phrase "S3 check cache" because that's what it's called in the code and on disk so easier to connect the dots.

Suggested change
- True: Skip cache, always do S3 HEAD to verify existence (skips integrity check)
- False/None: Use cache with periodic integrity sampling against S3 (default)
- True: Skip the S3 check cache, always check whether uploads are already in S3.
- False/None: Use the S3 check cache, with periodic integrity sampling against S3 (default)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thx for the suggestion! Updated.

Comment on lines 665 to 666
- True: Skip cache, always do S3 HEAD to verify existence
- False/None: Use cache, fall back to S3 HEAD if not in cache (default)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
- True: Skip cache, always do S3 HEAD to verify existence
- False/None: Use cache, fall back to S3 HEAD if not in cache (default)
- True: Skip the S3 check cache, always check whether uploads are already in S3.
- False/None: Use the S3 check cache, with periodic integrity sampling against S3 (default)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thx for the suggestion, updated.

Comment on lines 1530 to 1531
- True: Skip cache, always do S3 HEAD to verify existence (skips integrity check)
- False/None: Use cache with periodic integrity sampling against S3 (default)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
- True: Skip cache, always do S3 HEAD to verify existence (skips integrity check)
- False/None: Use cache with periodic integrity sampling against S3 (default)
- True: Skip the S3 check cache, always check whether uploads are already in S3.
- False/None: Use the S3 check cache, with periodic integrity sampling against S3 (default)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thx for the suggestion, updated.

@sonarqubecloud
Copy link

sonarqubecloud bot commented Jan 8, 2026

@leongdl leongdl enabled auto-merge (squash) January 8, 2026 20:58
"Add...": "添加...",
"All": "全部",
"All fields below are optional": "以下所有字段均为可选",
"Always check S3 job attachments": "始终检查 S3 作业附件",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we translate job attachments to 作业附件? Also curious why the simplified is 作业 while traditional is 任務.

@leongdl leongdl merged commit c53f3be into aws-deadline:mainline Jan 8, 2026
26 of 27 checks passed
@godobyte
Copy link
Contributor

godobyte commented Jan 8, 2026

nit - PR description seems to be outdated with previous names.

folouiseAWS pushed a commit to folouiseAWS/deadline-cloud that referenced this pull request Jan 9, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

waiting-on-maintainers Waiting on the maintainers to review.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Feature request: Support setting S3 Lifecycle configuration for S3 assets without erroring when they are deleted

6 participants