-
Notifications
You must be signed in to change notification settings - Fork 2
Description
storytime topic: zipped_moab_versions being created before zip creation to indicate that replication has started for the druid/version/endpoint; and the related inscrutability/inconvenience of having to figure out what needs cleanup in the DB, as happens when the ZMV rows get created but then related jobs get dropped from the replication queue. Would collapsing zip_parts into zipped_moab_versions help with this? that is ticketed as #2505
blocked by storytime -- Before remediating and losing possible test data, we want to discuss data model and/or code changes that might make this corner of replication tracking more legible. Though it would not be terribly hard to recreate this in stage (e.g. pause zipmaker, version some stuff, let the zipmaker jobs get queued, delete the queue contents to simulate replication jobs getting inadvertently evicted and leaving permanently childless ZMVs).
For the query, this will require joining back to PreservedObject to get the druid, since that's what the method wants. But we should use the prune_replication_failures method because it's tested and has some safeguards in place to prevent pruning too aggressively, and to make sure that deletes are done transactionally, and in the right order.
This could take a while (possibly days) to run, and so should be done in a screen session.
If it's looking like this will take weeks to run, we can discuss a more efficient approach.