feat: NATS + S3 for large requests #24

tedzhouhk · 2025-07-14T20:22:28Z

No description provided.

itay · 2025-07-15T00:37:43Z

@tedzhouhk I'm curious, have we considered doing this over HTTP rather than externalizing the request data into external storage?

For reference, we considered this at Octo, but ended up going the HTTP route. Specifically, we used a queue (in our case, Redis) to write the fact that there was a request and metadata about who held the request (which "frontend"), and then the worker who picked it up would call the original holder to get the actual data. This was to handle the exact use case here of large bodies, such as input images/videos/etc.

The issue with S3 is that it can be both slow and have the costs add up, especially at high request rates where you are paying for the cost of the the operations more than the storage cost. It also avoids needing to have another external dependency (S3/minio/what not) and leave everything more cluster internal.

tedzhouhk · 2025-07-15T00:50:21Z

@tedzhouhk I'm curious, have we considered doing this over HTTP rather than externalizing the request data into external storage?

For reference, we considered this at Octo, but ended up going the HTTP route. Specifically, we used a queue (in our case, Redis) to write the fact that there was a request and metadata about who held the request (which "frontend"), and then the worker who picked it up would call the original holder to get the actual data. This was to handle the exact use case here of large bodies, such as input images/videos/etc.

The issue with S3 is that it can be both slow and have the costs add up, especially at high request rates where you are paying for the cost of the the operations more than the storage cost. It also avoids needing to have another external dependency (S3/minio/what not) and leave everything more cluster internal.

@itay good question, I had similar doubts but was convinced by @ryanolson that s3 is fast enough if we decide to store them in disk (time to process those long requests should be way slower than pulling it from s3). If we decide to store them in CPU RAM then I think we should seek other solutions.

itay · 2025-07-15T04:44:27Z

I'd mostly question whether it's necessary. Given the request will likely end up in RAM anyway as we transmit it, and I'm not sure we generally have a huge queue of requests that get backed up.

ryanolson · 2025-07-15T14:42:37Z

We can absolutely store them in CPU buffers in which case we could use NIXL or http to fetch the data.

HTTP would be pretty simple. NIXL is also doable but probably overkill.

I do think S3 will be the easiest (essentially two lines of code with the s3 SDK) and we will want object storage for multimodal in the future.

It's would be interesting to benchmark, but I'm getting the perf delta at 1MiB plus is relatively small.

nnshah1 · 2025-07-16T15:53:00Z

Couple items here - we should ideally reuse or extend the scheme @whoisj added for transmitting payloads over nixl

Would that work here actually? Could we just use nixl to transport? We have the ability to add that to the payload and recover it today for the E / P /D work

Another thought: why not split the request into multiple nats messages?

vvenkates27 · 2025-07-19T17:29:36Z

@ryanolson NIXL already also supports s3 through object storage plugin backed by AWS s3 SDK, we can use this today to store and retrieve through S3 object store

tedzhouhk added 2 commits July 14, 2025 13:21

add draft

6a8dca8

typo

c8aa30a

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat: NATS + S3 for large requests #24

feat: NATS + S3 for large requests #24

Uh oh!

tedzhouhk commented Jul 14, 2025

Uh oh!

itay commented Jul 15, 2025

Uh oh!

tedzhouhk commented Jul 15, 2025

Uh oh!

itay commented Jul 15, 2025

Uh oh!

ryanolson commented Jul 15, 2025

Uh oh!

nnshah1 commented Jul 16, 2025

Uh oh!

vvenkates27 commented Jul 19, 2025

Uh oh!

Uh oh!

feat: NATS + S3 for large requests #24

Are you sure you want to change the base?

feat: NATS + S3 for large requests #24

Uh oh!

Conversation

tedzhouhk commented Jul 14, 2025

Uh oh!

itay commented Jul 15, 2025

Uh oh!

tedzhouhk commented Jul 15, 2025

Uh oh!

itay commented Jul 15, 2025

Uh oh!

ryanolson commented Jul 15, 2025

Uh oh!

nnshah1 commented Jul 16, 2025

Uh oh!

vvenkates27 commented Jul 19, 2025

Uh oh!

Uh oh!