Adds `utils.BatchReadChannel()` utility #52342

tcsc · 2025-02-20T08:18:02Z

Adds a utiity function thet blocks on a channel until it receives a message,
and then reads all the messages in the channel until a read would block.

This is used by the IC and provisioning integrations to handle monitoring
events in batches to minimize re-work.

Adds a utiity function thet blocks on a channel until it receives a message, and then reads all the messages in the channel until a read would block. This is used by the IC and provisioning integrations to handle monitoring events in batches to minimize re-work.

lib/utils/batchread.go

lib/utils/batchread_test.go

zmb3 · 2025-02-20T14:10:25Z

I don't quite follow why this is necessary.

Adds a utiity function thet blocks on a channel until it receives a message,
and then reads all the messages in the channel until a read would block

Isn't this exactly how a for range loop would behave?

kopiczko · 2025-02-20T15:10:17Z

Adds a utiity function thet blocks on a channel until it receives a message,
and then reads all the messages in the channel until a read would block

Isn't this exactly how a for range loop would behave?

Range loops are awkward to use with context.

zmb3 · 2025-02-20T17:54:56Z

Range loops are awkward to use with context.

Sure, then do a for loop with a select covering the channel and the context.

The behavior here looks like it will be racy. We do one spot check to see if anything is sitting in the channel after a read - if we miss it by a millisecond, the batch read operation returns and we have to call it again to pick up the item we just missed.

Why is that behavior preferable to a loop that picks up new items as they become available?

tcsc · 2025-02-21T01:11:28Z

@zmb3:

Why is that behavior preferable to a loop that picks up new items as they become available?

It's the batching that is important here, i.e. draining all of the currently pending messages in the channel.

The motivating examples are the Provisioning and Identity Center services. In those, I specifically want to read a batch of messages so I can so some filter out duplicate or obsolete events in the batch and do less work. The events generated by the resource monitor for these services tend to be quite bursty (for example; you can get a bunch of notifications for a single Access List change, because you get notified for both the Access List itself and once for each member, because I need to watch for individual member changes as well).

There are also other events that cause those services to recalculate everyone's state, so any outstanding requests to re-evaluate an individual User or Access List's state immediately become obsolete. And multiple "recalculate everything" events can be issued together, and so they should obviously be merged into one if at all possible.

Usage looks something like this:

for some_condition {
  pendingMessages := make(map[string]*Msg)
  for msg, ok := range BatchReadChannel(ctx, ch, maxBatchSize) {
    if !ok {
      return
    }
    // last message received for any given target wins
    pendingMessages[msg.target] = msg
  }
  for _, msg := range pendingMessages {
    // handle single message for each affected target
  }
}

In this case, it's not 100% critical that duplicate events are handled only once, its just wasteful if they aren't. Handling the events in batches lets me filter out a bunch of duplicate events, while still being responsive.

It's a pretty niche use case, so I can understand if utils is not where it belongs. I just have 2 copies of it at the moment in the IC and Provisioning services and want to only have 1 copy, and I can't see an obvious candidate package for it elsewhere. I'm also open to better ways of implementing it.

…nnel

zmb3 · 2025-02-21T02:54:13Z

I'm hesitant of any code that attempts to implement some sort of new concurrency primitive, especially when it's put in a generic package that encourages reuse.

If batching is what you care about then why not limit your batches based on a time interval instead of based on a timing-dependent emptiness check?

// read batch reads as many items as it can from ch within d.
func readBatch(d time.Duration, ch <-chan foo) []foo {
    deadline := time.After(d)
    var result []foo
    for {
        select {
        case <- deadline:
            return result
        case f := <- ch:
            foo = append(foo, f)
        }
    }
}

You can even extend this and add a max argument so that you can limit the size of a batch.

This comes from someone who has written a number flaky tests due to a select with a default clause 😇

tcsc added no-changelog Indicates that a PR does not require a changelog entry backport/branch/v17 aws-iam-identity-center labels Feb 20, 2025

github-actions bot requested review from EdwardDowling and kimlisa February 20, 2025 08:18

github-actions bot added the size/md label Feb 20, 2025

tcsc requested review from kopiczko, flyinghermit and smallinsky and removed request for EdwardDowling and kimlisa February 20, 2025 08:18

kopiczko reviewed Feb 20, 2025

View reviewed changes

lib/utils/batchread.go Outdated Show resolved Hide resolved

lib/utils/batchread_test.go Outdated Show resolved Hide resolved

Address review feedback

67e36a1

tcsc requested a review from kopiczko February 21, 2025 00:39

Merge remote-tracking branch 'origin/master' into tcsc/batch-read-cha…

137b618

…nnel

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Adds `utils.BatchReadChannel()` utility #52342

Adds `utils.BatchReadChannel()` utility #52342

tcsc commented Feb 20, 2025

zmb3 commented Feb 20, 2025

kopiczko commented Feb 20, 2025

zmb3 commented Feb 20, 2025

tcsc commented Feb 21, 2025 •

edited

Loading

zmb3 commented Feb 21, 2025

Adds utils.BatchReadChannel() utility #52342

Are you sure you want to change the base?

Adds utils.BatchReadChannel() utility #52342

Conversation

tcsc commented Feb 20, 2025

zmb3 commented Feb 20, 2025

kopiczko commented Feb 20, 2025

zmb3 commented Feb 20, 2025

tcsc commented Feb 21, 2025 • edited Loading

zmb3 commented Feb 21, 2025

Adds `utils.BatchReadChannel()` utility #52342

Adds `utils.BatchReadChannel()` utility #52342

tcsc commented Feb 21, 2025 •

edited

Loading