Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This PR refactors the code for two major changes:
It introduces the concept of a "multi-queue", which wraps multiple sub-queues of the same type.
This is necessary because of the unconventional way we use our
Ready
queue.The
BatchingService
contains logic to "hold" anchor request SQS messages until one of two conditions is satisfied - either the batch linger duration has run out, or the batch is full. While requests are held this way, they are considered "in-flight" w.r.t. SQS. Once a batch is formed and recorded, all the messages that comprise the batch are ACK'd and thus deleted from SQS. If a batch fails to get created, all messages will be NACK'd (if there is a handled error) or simply not ACK'd (if there is an unhandled error / crash), which will make them visible to theBatchingService
once it recovers/retries.This allows SQS to be the persistence for messages flowing through the system versus needing to store batches in a DB to be able to recover messages in case of a failure. While the latter is doable, it pushes the (non-trivial) complexity for maintaining batches down to the DB.
With this context and with the quotas SQS allows, we can only have batches of up to 120,000 requests with a single queue, since that is the maximum number of in-flight message a queue can have. Using a multi-queue allows the
BatchingService
to hold a much larger number of anchor requests in-flight, up to 120,000 per sub-queue. This allows batches to be constructed with virtually any size that we want.It changes the way batches are sent to Anchor Workers.
Before, the contents of a batch, including the batch ID and a list of anchor request IDs (anchor DB UUIDs) were sent in the SQS message payload. SQS has a 256 KB maximum message size, which can only include a few thousand anchor request IDs sent this way.
To work around this limit, the Scheduler will write the batch JSON to S3 (using the batch ID as the filename) and send only the batch ID to Anchor Workers.
Via this PR, Anchor Workers will detect whether a batch was received without anchor request IDs and use the batch ID to load the batch from S3, thereafter treating it like a normal batch to be anchored.
This allows batches to be sent to Anchor Workers with virtually any size that we want.