scheduler: fragment queue and querier pick-up coordination #6968

rubywtl · 2025-08-13T22:54:36Z

What this PR does:
This PR introduces a Fragmenter interface that splits logical query plans into fragments when distributed execution is enabled. The Fragmenter appends metadata to each fragment for tracking, which the scheduler then uses to route fragments to appropriate queriers. The scheduler maintains a mapping between fragments and querier addresses to track fragment locations across the distributed system.

Which issue(s) this PR fixes:

Checklist

Tests updated
Documentation added
CHANGELOG.md updated - the order of entries should be [CHANGE], [FEATURE], [ENHANCEMENT], [BUGFIX]

yeya24 · 2025-08-14T03:35:57Z

pkg/cortex/modules.go

@@ -414,7 +416,12 @@ func (t *Cortex) initQuerier() (serv services.Service, err error) {

 	t.Cfg.Worker.MaxConcurrentRequests = t.Cfg.Querier.MaxConcurrent
 	t.Cfg.Worker.TargetHeaders = t.Cfg.API.HTTPRequestHeadersToLog
-	return querier_worker.NewQuerierWorker(t.Cfg.Worker, httpgrpc_server.NewServer(internalQuerierRouter), util_log.Logger, prometheus.DefaultRegisterer)
+	ipAddr, err := ring.GetInstanceAddr(t.Cfg.Alertmanager.ShardingRing.InstanceAddr, t.Cfg.Alertmanager.ShardingRing.InstanceInterfaceNames, util_log.Logger)


Why using alertmanager config here

The gPRC params I needed are under RingConfig struct, which is called ShardedRing here, but it doesn't exist under querier

[update] I will add new field (ring configs) for querier 👍

Umm I don't think we want to add a Ring for querier. We just need the configurations for the addresses and interface, etc

yeya24 · 2025-08-14T03:36:20Z

pkg/distributed_execution/fragment_key.go

@@ -0,0 +1,21 @@
+package distributed_execution
+
+type FragmentKey struct {


Can you add comments for public exposed types

pkg/distributed_execution/fragment_key.go

yeya24 · 2025-08-14T03:40:21Z

pkg/distributed_execution/fragment_key.go

+	fragmentID uint64
+}
+
+func MakeFragmentKey(queryID uint64, fragmentID uint64) *FragmentKey {


There is no need to return a pointer for this I think.

yeya24 · 2025-08-14T03:43:27Z

pkg/scheduler/plan_fragments/fragmenter.go

+
+type Fragmenter interface {
+	Fragment(node logicalplan.Node) ([]Fragment, error)
+	getNewID() uint64


Does this need to be part of the interface? It is very weird to have 1 method in the interface to be public and another one is private.
I would remove this from the interface

yeya24 · 2025-08-14T03:45:02Z

pkg/scheduler/plan_fragments/fragmenter.go

+}
+
+func (f *DummyFragmenter) getNewID() uint64 {
+	return 1 // for dummy plan_fragments testing


If it is just for testing, you can just hardcode to 1 in the Fragment function

yeya24 · 2025-08-14T03:48:23Z

pkg/scheduler/plan_fragments/fragment_table.go

+	f.mappings[*key] = addr
+}
+
+func (f *FragmentTable) GetMappings(queryID uint64, fragmentIDs []uint64) ([]string, bool) {


Can we find a more descriptive name? It took me a while to understand what mapping it is. If it is getting child querier addresses we find a better name

yeya24 · 2025-08-14T03:51:13Z

pkg/scheduler/plan_fragments/fragment_table.go

+	defer f.mu.Unlock()
+
+	keysToDelete := make([]distributed_execution.FragmentKey, 0)
+	for key := range f.mappings {


Looking at the methods you have, is it easier to change mappings from mappings map[distributed_execution.FragmentKey]string to map[uint64]map[uint64]string?

You can find the map by just a lookup

True, but I made the FragmentKey struct so that it is easier to maintain (for example: if we ever want to change the types for the IDs or add more fields, we dont have to go through the codebase to fix it), and the code will be easier to understand (more literal). This fragment key type is also reused for remote nodes and child-root execution accesses to result cache in future PRs.

yeya24 · 2025-08-14T03:54:18Z

pkg/scheduler/plan_fragments/fragmenter.go

+
+import "github.com/thanos-io/promql-engine/logicalplan"
+
+type Fragmenter interface {


It is better to move the Fragmenter to distributed_execution as fragmentation is specific to remote distribution.

The fragment table can be just moved to scheduler folder

Signed-off-by: rubywtl <[email protected]>

yeya24 · 2025-08-26T23:14:14Z

pkg/querier/worker/worker.go

+
+	cfg.InstanceInterfaceNames = []string{"eth0", "en0"}
+	f.Var((*flagext.StringSlice)(&cfg.InstanceInterfaceNames), "querier.instance-interface-names", "Name of network interface to read address from.")
+	f.StringVar(&cfg.InstanceAddr, "querier.instance-addr", "", "IP address to advertise in the ring.")


There is no ring for Querier

yeya24 · 2025-08-26T23:17:44Z

pkg/scheduler/fragment_table/fragment_table.go

+	}
+}
+
+func (f *FragmentTable) AddMapping(queryID uint64, fragmentID uint64, addr string) {


Can we find a better name other than mapping?

yeya24 · 2025-08-26T23:17:53Z

pkg/scheduler/fragment_table/fragment_table.go

+	return "", false
+}
+
+func (f *FragmentTable) ClearMappings(queryID uint64) {


yeya24 · 2025-08-26T23:18:46Z

pkg/scheduler/scheduler.go

+
+	// queryKey <--> fragment-ids lookup table allows faster cancellation of the whole query
+	// compared to traversing through the pending requests to find matching fragments
+	queryToFragmentsLookUp map[queryKey][]uint64


Let's find a better name for this. LookUp is not a correct name

yeya24 · 2025-08-26T23:19:50Z

pkg/scheduler/scheduler.go

 }

-type requestKey struct {
+// additional layer to improve efficiency of deleting fragments of logical query plans
+// while maintaining previous logics


The comment is confusing. Which previous logic this struct maintains?

Previous logic is to cancel a query by its queryID and frontend address, but now there are multiple fragments under one queryID, and traversing through the pending request queue and checking the queryID is in-efficient, so I added an extra layer of mapping to keep track of the fragment IDs under the same queryKey.

yeya24 · 2025-08-26T23:21:44Z

pkg/scheduler/scheduler.go

+		return nil, err
+	}
+
+	fragmenter := plan_fragments.NewDummyFragmenter()


If this method belongs to Scheduler, why are we creating a new dummy fragmenter everytime?
Should it be part of the scheduler itself

yeya24 · 2025-08-26T23:22:10Z

pkg/scheduler/scheduler.go

+}
+
+func (s *Scheduler) getPlanFromHTTPRequest(req *httpgrpc.HTTPRequest) ([]byte, error) {
+	if req.Body == nil {


I don't see why this needs to be a method of scheduler itself

yeya24 · 2025-08-26T23:26:04Z

pkg/scheduler/scheduler.go

+				}
+
+				return nil
+			}(); err != nil {


Can we try to clean up the code a little bit? I see a lot of code duplicating here.

yeya24 · 2025-08-26T23:59:20Z

pkg/scheduler/scheduler.go

+			for _, childID := range req.fragment.ChildIDs {
+				addr, ok := s.fragmentTable.GetChildAddr(req.queryID, childID)
+				if !ok {
+					return


Do we need to return some error here if missing child addr?

yeya24 · 2025-08-27T00:25:37Z

pkg/scheduler/schedulerpb/scheduler.proto

@@ -9,6 +9,7 @@ import "github.com/weaveworks/common/httpgrpc/httpgrpc.proto";

 option (gogoproto.marshaler_all) = true;
 option (gogoproto.unmarshaler_all) = true;
+option (gogoproto.sizer_all) = true;


Is this required?

pull-request-size bot added the size/XL label Aug 13, 2025

dosubot bot added the component/querier label Aug 13, 2025

yeya24 reviewed Aug 14, 2025

View reviewed changes

rubywtl force-pushed the scheduler/logicalplan_fragment_coordination branch 3 times, most recently from 942674c to 89e8021 Compare August 20, 2025 16:44

allow logical plan fragment type for scheduler queue

67eff93

Signed-off-by: rubywtl <[email protected]>

rubywtl force-pushed the scheduler/logicalplan_fragment_coordination branch from 89e8021 to 67eff93 Compare August 21, 2025 21:39

yeya24 reviewed Aug 26, 2025

View reviewed changes

yeya24 reviewed Aug 27, 2025

View reviewed changes

		@@ -0,0 +1,21 @@
		package distributed_execution

		type FragmentKey struct {


		import "github.com/thanos-io/promql-engine/logicalplan"

		type Fragmenter interface {

scheduler: fragment queue and querier pick-up coordination #6968

Are you sure you want to change the base?

scheduler: fragment queue and querier pick-up coordination #6968

Uh oh!

Conversation

rubywtl commented Aug 13, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

rubywtl Aug 14, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

rubywtl Aug 14, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

rubywtl Aug 14, 2025 •

edited

Loading

rubywtl Aug 14, 2025 •

edited

Loading