-
Notifications
You must be signed in to change notification settings - Fork 1
feat(breadbox): Add upload progress updates #348
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
@@ -469,7 +470,9 @@ def validate_and_upload_dataset_files( | |||
) | |||
|
|||
# TODO: Move save function to api layer. Need to make sure the db save is successful first | |||
save_dataset_file(dataset_id, data_dfw, value_type, filestore_location) | |||
save_dataset_file( | |||
dataset_id, data_dfw, value_type, filestore_location, ProgressTracker() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a legacy code path, so I don't care that it's not reporting status. We're not supposed to use this anyway.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks good to me though I notice that ProgressTracker is only updating messages in the code block for when our df_wrapper isParquetDataFrameWrapper
. I know this is mostly what we're testing but it would be nice for completeness that it updates messages in the other code path for when our df_wrapper is DataFrame
create_index_dataset(f, "features", pd.Index(df_wrapper.get_column_names())) | ||
create_index_dataset(f, "samples", pd.Index(df_wrapper.get_index_names())) | ||
progress.update_message("Complete") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe we should put this in the finally statement?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
finally
is run on failure too.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@rcreasi That's true it doesn't make sense for progress to be considered complete for failures. I realize we haven't been catching for failures in this try block so I've added one recently.
Yes, in the other code path, there's no incremental progress to report, but I can sprinkle a few updates in so we can see which stage of the process we're at. |
We poll for task status after we've uploaded a file. However, our file uploads are now taking > 1 minute for large files. In the event that things go wrong, it's hard to tell whether it's stuck or just slow, so I'm adding progress updates like we have for other "long" tasks that run inside of celery.