Skip to content

Conversation

periklis
Copy link
Collaborator

What this PR does / why we need it:
This pull request is a medium sized refactoring of the dataobj index builder to support handling stale partitions in terms of buffered events per index but less than Config.EventsPerIndex. In particular:

  • All the build index code is moved to a separate abstraction in indexer.go that the builder is feeding via a golang channel.
  • The builder incorporates now a secondary async routine that flushes buffered event objects to the indexer according to configured flush timeout.

Which issue(s) this PR fixes:
Fixes grafana/loki-private#1967

Special notes for your reviewer:

Checklist

  • Reviewed the CONTRIBUTING.md guide (required)
  • Documentation added
  • Tests updated
  • Title matches the required conventional commits format, see here
    • Note that Promtail is considered to be feature complete, and future development for logs collection will be in Grafana Alloy. As such, feat PRs are unlikely to be accepted unless a case can be made for the feature actually being a bug fix to existing behavior.
  • Changes that require user attention or interaction to upgrade are documented in docs/sources/setup/upgrade/_index.md
  • If the change is deprecating or removing a configuration option, update the deprecated-config.yaml and deleted-config.yaml files respectively in the tools/deprecated-config-checker directory. Example PR

@periklis periklis self-assigned this Sep 12, 2025
Copy link
Contributor

github-actions bot commented Sep 12, 2025

@periklis periklis marked this pull request as ready for review September 15, 2025 13:56
@periklis periklis requested a review from a team as a code owner September 15, 2025 13:56
@periklis periklis force-pushed the index-builder-correctness branch from 2bd1e56 to a5632c9 Compare September 16, 2025 07:46
Copy link
Contributor

@benclive benclive left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for this - it looks good apart from a couple of questions.

I had a thought about the approach: Would it be simpler to use a mutex or semaphore within buildIndex? That way you don't need to coordinate across goroutines and dispatch work, each entrypoint could just call buildIndex directly and wait for it complete. I may be missing some nuance, however!

switch tt {
case triggerTypeAppend:
return "append"
case triggerTypeFlush:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: maybe this should be "max-age" or something instead of flush?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Considering that the config is called max-idle-time we can say triggerTypeMaxIdle is the winner?

processingErrors.Add(fmt.Errorf("failed to download object: %w", obj.err))
continue
}
p.wg.Add(1)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does p.wg.Wait() ever get called?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes p.wg.Wait() is called in stopping(). AFAICT my analysis is that this is ok:

  1. Add(1) calls are called for the flush ticker routine and for each flush async partition routine
  2. Done is called when the flush ticker routine and for each flush async partition routine exit
  3. Wait is called in stopping waiting for all routines to end before closing the client.

Did I miss something here?

return nil, nil
}
case triggerTypeFlush:
if len(state.events) < p.cfg.MinFlushEvents {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this condition & flag can be removed. If something is older than the MaxIdleTime, we need to flush it anyway even if it means it'll be a small index.

}

// Extract records for committing
records := make([]*kgo.Record, len(req.events))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

very minor performance optimization, but records isn't used unless the build was successful so you could do this after the error check later

// Successfully sent event for download
case <-ctx.Done():
return "", ctx.Err()
default:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this default case needed?
If the channel is closed, the context should already have been cancelled and would be caught at the start of the loop

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are right for a moment a test was failing and that was a fix but never got back to review this properly.

}

func (p *Builder) cleanupPartition(partition int32) {
p.partitionsMutex.Lock()
defer p.partitionsMutex.Unlock()

p.cancelActiveCalculation(nil)
// Cancel active calculation for this partition
p.calculationsMutex.Lock()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think calculations mutex is always acquired under the partitionsMutex - are they both needed?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are right! The calculationMutex was in play because I use it also in stopping which does not rely on the partitionMutex. However now that the context propagation is refactor to pass through functions we can rely on the golang pattern let context cancelation drive cleanup.

@periklis
Copy link
Collaborator Author

periklis commented Sep 16, 2025

I had a thought about the approach: Would it be simpler to use a mutex or semaphore within buildIndex? That way you don't need to coordinate across goroutines and dispatch work, each entrypoint could just call buildIndex directly and wait for it complete. I may be missing some nuance, however!

Practically yes you are right. However I decided to use channels for the buildWorker to stay similar to the downloadWorker. For now the limitation to keep only one buildWorker is CPU usage, but we may lift this and add more workers later. WDYT?

edit: one more thing that came into my mind when I built this is that downloading/processing are two independent queues so we could better observe where things go wrong/slow later.

@periklis periklis force-pushed the index-builder-correctness branch from e9a7737 to 2a6babd Compare September 16, 2025 13:22
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants