Skip to content

Stop and fail jobs where the ongoing error rate exceeds a threshold #1176

@carlosgjs

Description

@carlosgjs

Summary

If a job is running into a widespread data or processing issue it should be stopped short to avoid wasting resources and time.

E.g. a job with 10,000 images in which for some reason the model is erroring out 9/10 times or where no images can be loaded due an incorrect data path. In such a case it's wasteful to process all 10,000 images.

A criteria along the lines of: if 200 images have been processed and the fail/success ratio is >60% stop.

Implementation Details

Can be implemented in process_nats_pipeline_result()

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions