feat(breadbox): Add upload progress updates #348

pgm · 2025-07-29T16:43:17Z

We poll for task status after we've uploaded a file. However, our file uploads are now taking > 1 minute for large files. In the event that things go wrong, it's hard to tell whether it's stuck or just slow, so I'm adding progress updates like we have for other "long" tasks that run inside of celery.

pgm · 2025-07-30T17:23:54Z

breadbox/breadbox/io/data_validation.py

@@ -469,7 +470,9 @@ def validate_and_upload_dataset_files(
    )

    # TODO: Move save function to api layer. Need to make sure the db save is successful first
-    save_dataset_file(dataset_id, data_dfw, value_type, filestore_location)
+    save_dataset_file(
+        dataset_id, data_dfw, value_type, filestore_location, ProgressTracker()


This is a legacy code path, so I don't care that it's not reporting status. We're not supposed to use this anyway.

jessica-cheng

This looks good to me though I notice that ProgressTracker is only updating messages in the code block for when our df_wrapper isParquetDataFrameWrapper. I know this is mostly what we're testing but it would be nice for completeness that it updates messages in the other code path for when our df_wrapper is DataFrame

jessica-cheng · 2025-07-30T19:28:54Z

breadbox/breadbox/io/hdf5_utils.py

        create_index_dataset(f, "features", pd.Index(df_wrapper.get_column_names()))
        create_index_dataset(f, "samples", pd.Index(df_wrapper.get_index_names()))
+        progress.update_message("Complete")


Maybe we should put this in the finally statement?

finally is run on failure too.

@rcreasi That's true it doesn't make sense for progress to be considered complete for failures. I realize we haven't been catching for failures in this try block so I've added one recently.

pgm · 2025-08-01T18:37:04Z

@jessica-cheng

I know this is mostly what we're testing but it would be nice for completeness that it updates messages in the other code path for when our df_wrapper is DataFrame

Yes, in the other code path, there's no incremental progress to report, but I can sprinkle a few updates in so we can see which stage of the process we're at.

github-actions added 2 commits July 29, 2025 12:33

Added progress tracker to be used for uploads

0536345

fixed unit tests

0ffb50d

pgm requested a review from jessica-cheng July 29, 2025 16:43

Merge branch 'master' into add-upload-progress

3132c51

pgm commented Jul 30, 2025

View reviewed changes

jessica-cheng approved these changes Jul 30, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat(breadbox): Add upload progress updates #348

feat(breadbox): Add upload progress updates #348

Uh oh!

pgm commented Jul 29, 2025

Uh oh!

pgm Jul 30, 2025

Uh oh!

jessica-cheng left a comment

Uh oh!

jessica-cheng Jul 30, 2025

Uh oh!

rcreasi Jul 30, 2025

Uh oh!

jessica-cheng Aug 4, 2025 •

edited

Loading

Uh oh!

pgm commented Aug 1, 2025

Uh oh!

Uh oh!

feat(breadbox): Add upload progress updates #348

Are you sure you want to change the base?

feat(breadbox): Add upload progress updates #348

Uh oh!

Conversation

pgm commented Jul 29, 2025

Uh oh!

pgm Jul 30, 2025

Choose a reason for hiding this comment

Uh oh!

jessica-cheng left a comment

Choose a reason for hiding this comment

Uh oh!

jessica-cheng Jul 30, 2025

Choose a reason for hiding this comment

Uh oh!

rcreasi Jul 30, 2025

Choose a reason for hiding this comment

Uh oh!

jessica-cheng Aug 4, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

pgm commented Aug 1, 2025

Uh oh!

Uh oh!

jessica-cheng Aug 4, 2025 •

edited

Loading